Friday, March 14, 2014

Douglas-Peucker simplification... in PHP

Ramer-Douglas-Peuker. These immortal names send a warm thrill into the heart of everybody who works in GIS in any capacity. Simplifying geometries by removing "unused" points can reduce download times substantially, and since the vogue these days is to go heavy on the GeoJSON it helps to keep your GeoJSON as light as it can be (well, until you cut too many corners and have to back it off).

Go Get Some!

I know you're impatient. Head on over and just grab the code if you're in a hurry:
https://github.com/gregallensworth/PHP-Geometry/blob/master/Line_DouglasPeucker.php


The Need

So, there's an ArcGIS service and we want GeoJSON so the web site can load it from static files. This is for 3 reasons: a) the browser-side code is a lot simpler to load GeoJSON from an URL, than to load from ArcGIS services; b) hitting ArcGIS for updated data is silly since it won't be updated often, and loading from a static file means less server overhead; c) there's some post-processing needed on the data, e.g. standardizing "no" and "n" fields as false, converting the Length field from meters to miles and clipping to 2 decimals. One could argue that this is the client's job to massage their data, but I prefer not to argue and to just massage the data.


But there's a problem: The resulting set of trails totals 2.5 MB of GeoJSON. On my crummy household DSL that takes some 10 to 13 seconds to download, more if the PNGs and CSS files and JS files aren't already cached and are being downloaded. And during this time the would-be user is watching a loading spinner, because we can't render the trails to the map until we have them in the browser.

A colleague at GreenInfo Network ran the data through o' Dougie P with 3 meter tolerance, and the results were impressive. No visible change in the trails even when zoomed in pretty well -- the rounding errors we could see were already there due to the client's own digitization, and were no worse. So let's have Los Tres Amigos massage our data as we load it out of ArcGIS and into our GeoJSON file.

Problem: I found two RDP implementations in JavaScript, one of those being JSTS. The JavaScript ones don't do much good since getting the data into the browser is the problem we're trying to solve. I found only one for PHP, part of GeoPHP which presumed upon us using their object model (or in fact running Drupal). ArcGIS supplies the multilines as lists of lists, but nothing could just handle that unless I went all the way with GeoPHP which for this application was major overkill.

So, I wrote my own, and there is is on Github. Scroll to the bottom of this post if you just want the code.

The Results

The results are spectacular.

We fetch the data in WGS84 lat/lon and RDP it a tolerance of 0.00002 degrees (about 3 meters at this latitude). The file size dropped from 2.5 MB to 1.5 MB and now load in about 6  seconds. Since the map is loading tiles and PNGs are still loading, you barely notice.

And quality is as great as ever: again, the cut corners are the client's own data, and not introduced by the simplification.

The Code

So, now some whining about how tough it was. :)

The basic algo is fairly easy to comprehend (the best things often are):
  • Draw a straight line from the start vertex to the end vertex.
  • For all points aside from the start and end, find the distance (in whatever map units, even degrees) from the line. This was the hard part. As you do this, track which vertex is furthest from the line.
  • If none are outside the tolerance, then your segment is simply the first and last vertices and all other vertices in between are dropped.
  • If any are outside the tolerance, split the line into two pieces at that furthest-out vertex. Process each of those halves via the process above (recursively: if a half has a vertex outside the tolerance, split the half in half, ...).
The hardest part for me, was with finding the distance between a point and a line. The GeoPHP implementation uses a unit magnitude technique which I just couldn't follow, and which was giving me completely wrong distances when I tried to do the same. Eventually I settled on this technique using good ol' slope-intercept algebra. That sure blew some of the cobwebs off my high-scool algebra I don't mind saying!

The one lame hack I had to make, is that the slope-intercept technique doesn't work for a segment which runs completely vertical (the start and end vertices have the same longitude) since that's a "divide by zero" slope. So my hack: fudge one of the points' longitude by 0.00001 units, so there's a defined slope and the distance can be calculated.

I figure that this won't be a problem: 0.00001 units is quite small. Even in degrees that comes up to about 4 feet at the equator, and the GPS you used to measure the trail isn't that accurate anyway.

Friday, March 7, 2014

Leaflet Polylines: Find the minimum distance to another LatLng

So, the user is at a location (a L.LatLng instance, perhaps the map center or perhaps the latlng from a locationfound event). And out on the map there are trails, in the form of L.MultiPolyline instances. Question: What trails have any presence within a 5-mile circle of my location, and for bonus points how far away is the closest part of that trail?

In our case, we're writing the application 100% client-side, so I can't cop out and bounce it off PostGIS. And we don't need to go all the way with JSTS so we can buffer and find intersection. What we really need is to find the closest approach of the line to this point, and see whether that is within the given circle.

There are two ways to do this: one is correct and slow, the other is quick and dirty. The correct way involves complicated math and would be comparatively slow, and when we're going over several hundred MultiPolyline instances the speed boost is worth the sloppiness. The fast way is to find the closest vertex of the MultiPolyline, and its distance to our LatLng. This is a lot simpler and faster than the mathematical way, and in our case the sloppiness is not relevant: we're looking at a scale of miles and our vertices aren't so far apart that it'll matter.

For more info and downloads:
https://github.com/gregallensworth/Leaflet/blob/master/MultiPolyLine_Distance.js

Quick and Dirty

The brute force way: Loop over all vertices in the given Polyline or MultiPolyline, and track the one with the minimum distance as determined by LatLng.distanceTo() Super simple:
// search this Polyline for the closest vertex to the given LatLng
// return an object of both that vertex (LatLng instance) and the distance (meters)
L.MultiPolyline.prototype.calculateClosestVertexAndDistance = function (targetlatlng) {
    var closest_latlng = null;
    var closest_meters = 1000000000;

    var paths = this.getLatLngs();
    for (var pi=0, pl=paths.length; pi<pl; pi++) {
        var path = paths[pi];

        for (var vi=0, vl=path.length; vi<vl; vi++) {
            var d = path[vi].distanceTo(targetlatlng);
            if (d >= closest_meters) continue;
            closest_latlng = path[vi];
            closest_meters = d;
        }
    }

    return { latlng:closest_latlng, meters:closest_meters };
}
L.Polyline.prototype.calculateClosestVertexAndDistance = function (targetlatlng) {
    var closest_latlng = null;
    var closest_meters = 1000000000;

    var verts = this.getLatLngs();
    for (var vi=0, vl=verts.length; vi<vl; vi++) {
        var d = verts[vi].distanceTo(targetlatlng);
        if (d >= closest_meters) continue;
        closest_latlng = path[vi];
        closest_meters = d;
    }

    return { latlng:closest_latlng, meters:closest_meters };
}

Usage is about as simple as it looks:
var trail = L.polyline([vertices...]);
var here = map.getCenter();

var closest = trail.calculateClosestVertexAndDistance(here);
var title = "This spot is " + (closest.meters / 1609.34).toFixed(2) + " miles away";
L.marker(closest.latlng, { title:title}).addTo(MAP);


Quicker, but not Dirtier


Here's an idea. That "vertex finder" gives the closest vertex. But what if the entire trail or the majority of it were within a given circle? If your specific use case is simply to know whether the trail has any vertex within your search radius, you could stop at the first vertex whose distanceTo() is under the given radius.

That's what this does:
L.MultiPolyline.prototype.hasVertexWithinRadius = function (targetlatlng,meters) {
    var paths = this.getLatLngs();
    for (var pi=0, pl=paths.length; pi<pl; pi++) {
        var path = paths[pi];

        for (var vi=0, vl=path.length; vi<vl; vi++) {
            var d = path[vi].distanceTo(targetlatlng);
            if (d <= meters) return true;
        }
    }

    // if we got here, we never did find a vertex within range
    return false;
}
L.Polyline.prototype.hasVertexWithinRadius = function (targetlatlng,meters) {
    var verts = this.getLatLngs();
    for (var vi=0, vl=verts.length; vi<vl; vi++) {
        var d = verts[vi].distanceTo(targetlatlng);
        if (d <= meters) return true;
    }

    // if we got here, we never did find a vertex within range
    return false;
}
Since this one won't test every single vertex even after it has found an answer, it should run slightly faster if "is in range" is your only need. If you have several hundred lines, especially if you allow for a search radius which may match most vertices within any matching trails, this could shave off the milliseconds.

Usage:
var trail = L.polyline([vertices...]);
var here = map.getCenter();

var within = trail.hasVertexWithinRadius(here);
alert( within ? 'Close' : 'Too far' );

Enjoy!


Wait, it doesn't work (under a microscope)!

This isn't a true "minimum distance between point and line" algorithm, and there is a case where it will fall down. If the vertices are far apart compared to the radius of your circle, you may get a false negative: the vertices may be further away than the line between them. This illustration should... illustrate.


In this case, the line would not be matched since no vertices are within the given radius. But again, this is fairly contrived: your vertices must be very far apart (long straight segments) relative to the radius. But for a realistic use case of trails (vertices a few dozen meters apart) and large circles (5 miles) it does work well.

When it comes to finding actual minimum distance, that takes some significant algebra. See my next posting, where this is exactly what I do (though not in Leaflet).