Gregor The Map Guy: July 2012

Saturday, July 28, 2012

Routing is quick, finding the node is slow!

The last few days have been intense work, trying to get the routing stuff to work quickly now that we have a whole lot of nodes in there. The tricks that worked last week, are now hopelessly slow, so it's been a lot of scrabbling to catch up. And in the process I've come up with a few "outside the box" tricks for PostgreSQL's indexes, and some "obvious but it missed the first draft" methods for speeding up node selection.

THE BASIC NEED

Someone wants to route from a LatLng to another LatLng. Step 1 is to find the closest node to the starting point and the closest node to the ending point. This was done pretty simply:

$point = sprintf("ST_Transform(ST_GeometryFromText('POINT(%f %f)',4326),3734)", $_GET['lng'], $_GET['lat'] );

SELECT *, ST_Distance(the_geom, $point) AS distance FROM route_trails WHERE type!='Road' ORDER BY distance LIMIT 1

Now that we've loaded up the full dataset, this takes way too long, often a few minutes. No surprise: there are some 381,000 segments in all and 22,000 of them are not Road, so that's 22,000 distances to calculate.

So, what can we do to speed it up?

Speed up the filter for non-road nodes, so we get our subset more quickly.
Add a bounding box filter (the && operator) so we don't calculate distances for all 22,000 nodes, but only those within some acceptable distance. This does fit a real-world need too: if there's no routing node within a few miles of me, I shouldn't expect to be able to route here.

FINDING NON-ROADS: INDEXES ON TEXT FIELDS

The "type" field is indexed but is still quite slow. This means that even the first filter will take some time, before we even get to distances. Look at these costs: the indexed varchar field is still very expensive, and if we create a boolean field the index scan still isn't significantly faster.

explain SELECT * FROM routing_trails WHERE type='Road';
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on routing_trails (cost=8687.13..364331.17 rows=359073 width=1376)
   Recheck Cond: ((type)::text = 'Road'::text)
   -> Bitmap Index Scan on routing_trails_idx_type (cost=0.00..8597.36 rows=359073 width=0)
         Index Cond: ((type)::text = 'Road'::text)

explain SELECT * FROM routing_trails WHERE type!='Road';
--------------------------------------------------------------------------
Seq Scan on routing_trails (cost=0.00..553578.76 rows=21708 width=1376)
   Filter: ((type)::text <> 'Road'::text)

explain SELECT * FROM routing_trails WHERE is_road=true;
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on routing_trails (cost=9519.48..363155.52 rows=357052 width=1374)
   Filter: is_road
   -> Bitmap Index Scan on routing_trails_idx_is_road (cost=0.00..9430.21 rows=357052 width=0)
         Index Cond: (is_road = true)

But, I tinkered around a bit and found that the IS NULL operator is very inexpensive, cheaper than the index scan matching values. Check this out:

-- make a boolean field, but don't set true/false, set instead true/NULL
ALTER TABLE routing_trails ADD COLUMN is_road boolean;
UPDATE routing_trails set is_road=true WHERE type='Road';
CREATE INDEX routing_trails_idx_is_road ON routing_trails (is_road);

explain SELECT * FROM routing_trails WHERE is_road=true;
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on routing_trails (cost=9519.48..363155.52 rows=357052 width=1374)
   Filter: is_road
   -> Bitmap Index Scan on routing_trails_idx_is_road (cost=0.00..9430.21 rows=357052 width=0)
         Index Cond: (is_road = true)

explain SELECT * FROM routing_trails WHERE is_road IS NULL;
------------------------------------------------------------------------------------------------------------
Index Scan using routing_trails_idx_is_road on routing_trails (cost=0.00..35491.92 rows=24034 width=1374)
   Index Cond: (is_road IS NULL)

The first rewrite, now looks like this. The distance stuff is still ahead of me, but at least the "not a road" stuff now executes in a second instead of a minute.

SELECT *, ST_Distance(the_geom, $point) AS distance FROM route_trails WHERE is_road IS NULL ORDER BY distance LIMIT 1

BOUNDING BOX: MAKE A POLYGON FROM THE POINT

Narrowing down to 22,000 non-road nodes was a great start, how can we narrow it down further? A simple bounding box filter. The vast majority of nodes are on the opposite end of the city and thus uirrelevant to me. And besides, for this use case it's acceptable to say "Sorry, no route" if someone's over a mile from the nearest node.

So, simple stuff: define a "radius" around the point, build a polygon, do a && test.

    $point = sprintf("ST_Transform(ST_GeometryFromText('POINT(%f %f)',4326),3734)", $_GET['lng'], $_GET['lat'] );
    $boxbuffer = 0.05; // 3 miles or so at this latitude?
    $box = sprintf("ST_Transform(ST_GeometryFromText('POLYGON((%f %f, %f %f, %f %f, %f %f, %f %f))',4326),3734)",
        $_GET['lng']-$boxbuffer, $_GET['lat']-$boxbuffer,
        $_GET['lng']-$boxbuffer, $_GET['lat']+$boxbuffer,
        $_GET['lng']+$boxbuffer, $_GET['lat']+$boxbuffer,
        $_GET['lng']+$boxbuffer, $_GET['lat']-$boxbuffer,
        $_GET['lng']-$boxbuffer, $_GET['lat']-$boxbuffer
    );

SELECT *, ST_Distance(the_geom, ST_Transform($point) AS distance FROM routing_trails WHERE is_road IS NULL AND the_geom && $box ORDER BY distance LIMIT 1

Voila, the planner now says that the cost of this function is 14,000 instead of 550,000 and I can verify that this reduces 22,000 nodes to about 500. This 40-to-1 speedup isn't purely hypothetical either: it's now finding my closest node in a few seconds rather than several minutes.

Friday, July 27, 2012

I am one happy mapper.

We got the news today, that the Trust For Public Land's ParkScore Project (tm) website won a Making a Difference Award at the ESRI User Conference.

Here's the video, from the plenary session. Our part was 14:30 - 17:00, where Jack calls out the Trust For Public Land http://www.esri.com/events/user-conference/index.html On the wall as Breece and Will are receiving the award, you can see the screenshots of the website which GreenInfo Network did.

I'm pretty happy and proud to be working with folks (and with budgets) to really do impressive work. I did all of the map programming (ArcGIS API for JavaScript), but my co-workers did the management and the conference calls, the design and then hooking up with the CSS company for implementation of the design. (Hey, Jennifer, remember those long nights at FOSS4G cranking out the prelim version?)

Wednesday, July 25, 2012

Different strategies for breaking up the trails data for pgRouting

So, the trails data as given to us as multilinestring. We need it as plain ol' linestring to use ST_StartPoint() and use it in pgRouting. So, how best to separate the trails data? As long as we're pulling and loading, should we slice up the miles-long trails into individual segments?

ROUND ONE: EXTRACT ALL THE THINGS!

My very first experiment actually worked quite well. This split the multi into linestrings, but also into individual two-point segments. It was an "INSERT INTO SELECT" based on this query, thank you MerseyViking

SELECT ST_AsText( ST_MakeLine(sp,ep) )
FROM
-- extract the endpoints for every 2-point line segment for each linestring
(SELECT
  ST_PointN(geom, generate_series(1, ST_NPoints(geom)-1)) as sp,
  ST_PointN(geom, generate_series(2, ST_NPoints(geom)  )) as epFROM
   -- extract the individual linestrings
  (SELECT (ST_Dump(ST_Boundary(geom))).geom
   FROM mypolygontable
   ) AS linestrings) AS segments;

http://gis.stackexchange.com/questions/21648/explode-multilinestring-into-individual-segments-in-postgis-1-5

This method turned my 4000 trails into 100,000 segments each one some 10 feet in length. This turned out to work great: routing was quite fine-grained, and the nearest routing vertex would have been practically in arms' reach when mapped onto the real world.

But it did have issues:

We want to send the segments to the browser as WKT, so they can draw the route onto the map. With a vertex every 10 feet, this would overload a browser for non-trivial routes. We're targeting mobile platforms too, so a 3 MB download of WKT just isn't acceptable. (Simplify messes up the geometry so the path fails to lay over the trails. Leaving out odd-numbered vertices was just plain wonky.)
The later revision of the data was some 30X larger. We now had over 3,000,000 segments, and routing was unusably slow (300-900 seconds).

ROUND TWO: LEAVE ALONE ALL THE THINGS!

The later dataset was still multilinestring, but turns out that each multi had exactly 1 linestring within it. Very weird, but that does make it easy. All we do is load the routing table with the 1st geometry within each row of the trails table:

INSERT INTO route_segments (name, difficulty, the_geom)
SELECT name, skill AS difficulty, ST_GeometryN(geom,1) AS the_geom
FROM cm_trails;

Well that was simple! Routing is nice and quick, too: 120,000 trails means 120,000 segments so we're not much worse than in my original tests.

But no: The issue here is that routing would "overshoot" the source and destination locations, since it only knows intersections. In a case where we had miles-long trails, my route would start or end a mile away. Imagine that: I ask for directions from the Picnic Shelter to the Butterfly Pavilion, the directions and the blue highlight line start 1/4 mile away from this shelter, then go right past the pavilion for another 1/2 mile.

ROUND THREE: SPLIT ALL THE THINGS INTO REASONABLE LENGTHY THINGS!

(yeah, okay, that was a stretch)

So I had the right idea with loading up the trails as-is, but I want the trails to be smaller segments, say every 100 feet, so there would be an intersection somewhere near me.

Thank you, Dane Springmeyer and Regina, for your posting http://www.mail-archive.com/postgis-users@postgis.refractions.net/msg04582.html

INSERT INTO route_segments (name, difficulty, the_geom)
SELECT name, difficulty,
ST_Line_Substring(t.geometry, 100.0*n/length,
CASE
WHEN 100.0*(n+1) < length THEN 100.0*(n+1)/length
ELSE 1
END) AS geometry
FROM
(SELECT name, skill AS difficulty,
length,
ST_GeometryN(geom,1) AS geometry
FROM raw_trails
) t
CROSS JOIN generate_series(0,300) n
WHERE n*100.00/length < 1;

The results are perfect! The blue line and the directions now fit my intended start and destination acceptably, but the number of segments in the routing network is still reasonable.

pgRouting's hardcoded behavior ruined my day

Or: How I worked 11 hours yesterday debugging one line of code.

WEIGHTING PATHS TO PREFER TRAILS

We have been using dijkstra_sp() for routing over these trails I've been talking about lately. Works great.

So we made an adjustment: for all of the trails which are actually Road, multiply its cost by 3.0 so the trails are more preferable. This was pretty simple:

-- the second line is new: the pathfinder will walk 3 blocks of trail vs 1 block of street
UPDATE trails SET cost=length;
UPDATE trails SET cost=length * 3.0 WHERE trail_type='Road' OR trail_type='Street';

Also works great: dijkstra_sp() now routes people out of their way to get them onto a trail, and routes them over streets only when necessary, and 3.0 seemed to be the magic number here. Yay!

ADD A BBOX FILTER AND IT WON'T WEIGHT ANYMORE

But dijkstra_sp_delta() was ignoring this weighting and routing over streets anyway. That's too bad: now that we have quintupled the volume of data, we could really use the performance boost. I cranked up the multiplier to 10.0, checked the node connectivity, and was definitely in a location where a road was straighter but trails were only slightly longer. But no, streets are straighter so that's what goes.

This went on for over two hours. I re-analyzed the trail network with tighter tolerance and with looser tolerance. I tried dijkstra_sp() and dijkstra_sp_delta() I tried astar_sp_delta() too and same thing. I even wrote test cases in Shooting Star and was very unimpressed with the results.

Yes, Dijkstra can account for weights but Dikstra-with-box cannot and neither can A* That can't possibly be true.

USE THE SOURCE!

The key was to mis-type a vertex ID. I got the "query argument to EXECUTE is null" message, meaning that something had failed to look up the vertex. I popped open routing_core_wrappers.sql so I could find the line and figure out what wasn't being found.

And I was horrified at what I saw:

query := 'SELECT gid,the_geom FROM ' ||
'shortest_path_astar(''SELECT gid as id, source::integer, ' ||
'target::integer, length::double precision as cost, ' ||
'x1::double precision, y1::double precision, x2::double ' ||
'precision, y2::double precision ';

Hold on a second! "length::double precision as cost" Really? Oh geez! The length field was being used as the cost, and the cost field was being ignored completely!

I looked over the rest of routing_core_wrappers.sql and did a search-n-replace of "length::double precision as cost" to "cost", then reloaded routing_core_wrappers.sql to replace the functions.

Success!

HARDCODED BEHAVIORS

At one point in the past, I found in my notes later, I had updated routing_core_wrappers.sql to correct the "length as cost" issue in dijkstra_sp() But I had not done it globally (that was back on Day One, I didn't even know if it was a good idea back then) including the A* and Dijkstra-with-bbox wrappers.

The pgRouting functions have other hardcoded field names too:

the primary key must be "gid"
the geometry must be "the_geom"
the cost field must be "length"

Fortunately, it's pretty easy to pop open the wrappers and correct these to suit your situation.

Tuesday, July 24, 2012

pgRouting: A Technique for Generating Route Loops

Part 2: Cleaning up the loops

This supplements the posting a few days back about routing loops. Go back and read it if you like. Phase 2 was the fetching of routes between each WP pair, and the collection of $route_segments as a list of pgRouting line segments, complete with associated information about the segment such as its name. Phase 3 is up to you, iterating over the segments to generate text directions, WKT for sending to OpenLayers, etc.

But, let's add Phase 2B: cleanup of the route to remove meaningless spurs.

A nearly constant issue with the loops we've been generating, is that they don't "look like loops" The given path could have you walk a quarter-mile up a trail, then pull a U-turn to get back onto the path to the next one. All of these little spurs sticking off of the trail, made it look like one of those models of benzene. Ideally we wouldn't take such nonsensical detours (if the detour were far enough that it actually altered the routing, it wouldn't be nonsensical).

We gave thought to some ideas. One was to close the linestring into a polygon by adding a point, then use ST_MakeValid() to clean up the polygon. MakeValid would (presumably) remove the self-intersecting regions of the polygon, which means the spurs. But that won't work: once we've merged into a polygon, we lost the identity of each segment; though we may have a nice oval, we would have no feasible way to re-fetch the segment names, and we'd basically be restricted to a nameless polygon any any info we could derive about its segments.

So, what occurred to me: Knowing that it's sorted by TSP, and that we're routing for shortest distance between each TSP, any segment which appears more than 1 time (identified by its GID) must be part of a spur. I can't provide a mathematical proof about some outlandish topology, but it really looks true here. So let's go with it.

It worked great! The only catch was that the apex of a spur is only visited once: while we removed the spur, we do have extraneous points. Fortunately, these "spur apexes" are surrounded by two-visit segments so they should be easy to detect too.

PHASE 2B: REMOVE DUPLICATE SEGMENTS

// go over the segments and catalog them by ID#
// keeping a count of any segments which are touched twice
$already_seen = array();
foreach ($route_segments as $segment) {
if (! @$already_seen[ $segment->gid ] ) $already_seen[ $segment->gid ] = 0;
$already_seen[ $segment->gid ]++;
}

// go over them again and collect only segments which appear once AND which are not surrounded by two-fers
$route_segments_unique = array();
$howmany = sizeof($route_segments);
for ($i=0; $i<$howmany; $i++) {
// this node is a backtrack; omit it
if ($already_seen[$route_segments[$i]->gid ] != 1) continue;

// if this is not the first or last point, check the surrounding points

// if they are both repeats, then this one will be an island
if ($i != 0 and $i != $howmany-1) {
if ($already_seen[ $route_segments[$i-1]->gid ] != 1) continue;
if ($already_seen[ $route_segments[$i+1]->gid ] != 1) continue;
}

// fine, fine, then it's a valid node on a nice non-crossing path
$route_segments_unique[] = $route_segments[$i];
}
$route_segments = $route_segments_unique;
unset($route_segments_unique);
unset($howmany);

CONCLUSION

This works great! We are now getting nice, clean loops without weird spurs.

This does still suffer some of what I call "the trailhead effect" The very end of a loop may be trimmed off, e.g. the part where we walk 200 feet to the first fork, which we would walk back again. Technically it's a spur, and I've not figured out logic for knowing that this is "a good spur" Fortunately, this isn't a problem in the real world: the hiker is standing at the trailhead and can see on the map that "turn onto Butterfly Creek Trail" is right over there.

pgRouting: A Technique for Generating Route Loops

pgRouting (http://www.pgrouting.org/) is a set of stored procedures and C libraries for performing routing calculations over a network of intersecting lines. In this case, it's a multilinestring shapefile full of of trail segments, loaded into PostGIS.

My first day already had me doing Dijkstra routing, that was comparatively easy thanks to the FOSS4G presentations, which now form the how-to documentation for pgRouting.

But this latest task is, as far as I can tell, something that nobody has ever done before. The client wants to generate loops, where one starts at a location, walks approximately M miles, and ends up back at their starting location. I see one patent for generating route loops, but what I eventually came up with is completely dissimilar.

Note that this is a heuristic, not an algorithm. If your network has "islands" of connectivity, then it may pick waypoints on two islands and not be able to find a route. Still, for our needs (a dense network of trails, and a desire to take the scenic route) it's working quite nicely.

So here goes. Examples below are in CodeIgniter, but should be readily adaptable to your DB library of choice.

PHASE 1: WAYPOINTS FOR THE LOOP

* Start with an approximate desired total distance for the loop, call it M for miles. Multiply by 5280 to get the distance in feet, and divide by 2 on the assumption that our return trip will be about the same distance as our trip out. (M = 5 miles = 16400 feet / 2 = 8200 feet)

$totallength = 5280.0 * $_GET['miles'] / 2.0;

* Divide the total distance M by a fixed number of waypoints, to get the per-hop distance or P. We settled on 4 because experimentally it seemed to be the most fluid, not over-managing the route and forcing it into weird convolutions. (P = 8200 / 4 = 2050 feet)

$howmanywaypoints = 4;
$segmentlength = $totallength / $howmanywaypoints;

* Select your first waypoint (WP0), that being the routing node closest to your chosen Lat Lon location.

$saved_waypoints[] = array();

$point = sprintf("ST_Transform(ST_GeometryFromText('POINT(%f %f)',4326),3734)", $_GET['lng'], $_GET['lat'] );

$node = $this->db->query("SELECT *, ST_Distance(the_geom, $point) AS distance FROM $route_table ORDER BY distance LIMIT 1")->row();
$start_gid = $node->gid;

$saved_waypoints[$node->gid] = $node;

* Select a set of random waypoints (WPs), each one a random selection from the closest to that distance from the previous waypoint. Use something like the query below, inserting the geometry of the previous waypoint and the value of P.

for ($wp=1; $wp<=$howmanywaypoints; $wp++) {
$point = sprintf("ST_GeometryFromText('

POINT(%f %f)',3734)", $node->x, $node->y );
$node = $this->db->query("SELECT * FROM (SELECT * FROM $route_table ORDER BY ABS($segmentlength - ST_DISTANCE(the_geom,$point)) limit 25) AS foo ORDER BY random() LIMIT 1")->row();
$saved_waypoints[$node->gid] = $node;
}

* At this point, WP is a set of waypoints. WP0 is your starting node, WPs 1 2 3 and 4 are each a random direction from each other, and a straight line between them may form a random zig-zag and may crossing itself. Point is, one WP to another at approximately P straight-line distance per leg.

* Run a TSP on the set of points. Traveling Salesperson Problem (TSP) means "the straightest path to visit all of these points and return home" In the case of pgRouting, TSP is very minimal and merely takes a list of GIDs and returns a list of GIDs. That's why we kept the list of GIDs as $saved_waypoints

$waypoints = array();

$node_gids_string = implode(",",array_keys($saved_

waypoints));
$nodes = $this->db->query("SELECT * FROM tsp('SELECT gid AS source_id, x, y FROM $route_table WHERE gid IN ($node_gids_string)', '$node_gids_string', $start_gid)");
foreach ($nodes->result() as $r) {
$waypoints[] = $saved_waypoints[$r->vertex_id];
}

PHASE 2: ROUTING

Now we have a list of waypoints, sorted into what we hope is a sensical sequence. If we do Dijkstra or A* between them, we effectively have a meandering route via some random waypoints to a final location, and then routing from that final location back to our start should take us through a completely different route than we came.

// $waypoints is a list of routing nodes

// the A* is done as a subquery, and associates back to the route_segments table so we have info about

// each segment: name, length in miles, etc. Swap in your own.

$route_segments = array();

for ($i=0; $i<sizeof($waypoints)-1; $i++) {

$onode = $waypoints[$i];
$dnode = $waypoints[$i+1];

// run the query, to route between this WP and the next

$routesqlquery = "SELECT ST_AsText(route.the_geom) AS wkt, route_segments.gid, route_segments.length, route_segments.duration, route_segments.name FROM route_segments, (SELECT gid, the_geom FROM astar_sp_delta('$route_table',?,?,1000,'cost',true,true)) as route WHERE route_segments.gid=route.gid";

$routesqlparam = array($onode->source, $dnode->target);
$route = $this->db->query($routesqlquery,$routesqlparam);
if ($route->num_rows() == 0) die("Cannot find a route the entire way.");

foreach ($route->result() as $segment) $route_segments[] = $segment;

From here, you now have $route_segments which has everything you need to know. From here, you iterate over the segments and calculate total distance, total time, textual directions, what have you. Point is, if you plot those lines and directions onto a map, chances are good that you have a nice route There And Back Again.

Saturday, July 21, 2012

MapFish Print and Leaflet, Day 2

Today I expand on yesterday's post about MapFish Print and Leaflet, to include vector layers. MFP's documentation doesn't even mention that this exists; there is a brief mention in a MFP commit note, but zero documentation. Fortunately, the MapFish Print demo app uses OpenLayers, and I have Firebug, so analyzing the output was possible. Took me all day to get it right, but possible.

To add a vector feature, you expand on the "layers" parameter in the client spec. You saw yesterday that I constructed "layers" as a list, adding items one by one depending on whether the layer is enabled. Same idea here, except that instead of sending a WMS layer spec we're sending a spec for a Vector layer and then for a GeoJSON feature within that Vector layer.

The demo generates output like this:

    {"type":"Vector",
        "styles":{
            "1":{"externalGraphic":"http://openlayers.org/dev/img/marker-blue.png","strokeColor":"red","fillColor":"red","fillOpacity":0.7,"strokeWidth":2,"pointRadius":12}
        },
        "styleProperty":"_gx_style",
        "geoJson":{"type":"FeatureCollection",
        "features":[
            {"type":"Feature","id":"OpenLayers.Feature.Vector_52","properties":{"_gx_style":1},"geometry":{"type":"Polygon","coordinates":[[[15,47],[16,48],[14,49],[15,47]]]}},
            {"type":"Feature","id":"OpenLayers.Feature.Vector_61","properties":{"_gx_style":1},"geometry":{"type":"LineString","coordinates":[[15,48],[16,47],[17,46]]}},
            {"type":"Feature","id":"OpenLayers.Feature.Vector_64","properties":{"_gx_style":1},"geometry":{"type":"Point","coordinates":[16,46]}}]}
        ],
        "name":"vector","opacity":1}
    }

I'm going to go slightly differently, though, and have each vector feature in its own Layer,since we're not sure that each feature exists. Trust me, it'll be fine.

POINT MARKER

In our case, it's a single marker that may exist. For your map, you may have multiple markers so you may need to write this as a loop rather than a single "if" But still, this is the real meat of how to get a marker into your map.

    // MARKER_TARGET is a L.Marker, ICON_TARGET is a L.Icon subclass used by the marker 
    // wgsToGoogle() was posted yesterday; it converts a LatLng in WGS84 to a [x,y] in Web Mercator

    if (MARKER_TARGET && MAP.hasLayer(MARKER_TARGET) ) {
        var iconurl  = ICON_TARGET.prototype.iconUrl;
        var projdot  = wgsToGoogle(MARKER_TARGET.getLatLng());
        var iconxoff = -10; // offset to place the marker; MFP drifts it for some reason
        var iconyoff = 0; // offset to place the marker; MFP drifts it for some reason
        var iconsize = 15; // the scaling factor for the icon; determined to taste through trial and error

        // all of this is required: styleProperty and properties form the link to a style index, fillOpacity really works
        layers[layers.length] = {
            type:"Vector", name:"Target Marker", opacity:1.0,
            styleProperty: "style_index",
            styles:{
                "default":{ externalGraphic:iconurl, fillOpacity:1.0, pointRadius:iconsize, graphicXOffset: iconxoff, graphicYOffset: iconyoff }
            },
            geoJson:{
                type:"FeatureCollection",
                features:[
                    { type:"Feature", properties:{ style_index:"default" }, geometry:{ type:"Point", coordinates:projdot } }
                ]
            }
        };
    }

Some of this stuff is familiar to OpenLayers folks.

You'll see that the "styles" supplies a single style ("default"), that this is referenced by the "properties.style_index", and that "style_index" is made special by it being the styleProperty. Note that the iconurl must be a full path including http://server.com/ since MFP won't know the relative path to the icon.

The Point geometry is defined within the Feature; the format is a single coordinate pair [x,y] and recall that we're using Web Mercator. As such, the returned array from wgsToGoogle() is perfect for this situation.

LINESTRING AND MULTILINESTRING

    // DIRECTIONS_LINE is either a L.Polyline or a L.MultiPolyline, so we support both
    // DIRECTIONS_LINE_STYLE is an {object} of L.Path options, e.g. color and opacity
    // we use wgsToGoogle() on each vertex so the resulting geometry is in the right SRS
    if (DIRECTIONS_LINE && MAP.hasLayer(DIRECTIONS_LINE) ) {
        // the directions line, using either a single Polyline (not Google driving directions) or a MultiPolyline (Google)
        // Construct a list-of-lists multilinestring. Remember that OpenLayers and MFP do lng,lat instead of lat,lng
        var vertices = [];
        if (DIRECTIONS_LINE.getLatLngs) {
            // a single Polyline
            // collect the coordinates into a list, then make that list the only list within "vertices" (a multilinestring with 1 linestring component)
            var vx = DIRECTIONS_LINE.getLatLngs();
            for (var i=0, l=vx.length; i<l; i++) {
                vertices[vertices.length] = wgsToGoogle([ vx[i].lng, vx[i].lat ]);
            }
            vertices = [ vertices ];
        } else {
            // a MultiPolyline
            // use non-API methods to iterate over the line components, collecting them into "vertices" to form a list of lists
            for (var li in DIRECTIONS_LINE._layers) {
                var subline = DIRECTIONS_LINE._layers[li];
                var subverts = [];
                for (var i=0, l=subline._latlngs.length; i<l; i++) {
                    subverts[subverts.length] = wgsToGoogle([ subline._latlngs[i].lng, subline._latlngs[i].lat ]);
                }
                vertices[vertices.length] = subverts;
            }
        }

        // the styling is simply pulled from the styling constant
        var opacity = DIRECTIONS_LINE_STYLE.opacity;
        var color   = DIRECTIONS_LINE_STYLE.color;
        var weight  = 3;

        layers[layers.length] = {
            type:"Vector", name:"Directions Line", opacity:opacity,
            styles:{
                "default":{ strokeColor:color, strokeWidth:weight, strokeLinecap:"round" }
            },
            styleProperty:"style_index",
            geoJson:{
                type: "FeatureCollection",
                features:[
                    { type:"Feature", properties:{"style_index":"default"}, geometry:{ type:"MultiLineString", coordinates:vertices } }
                ]
            }
        };
    }

Not a lot new to say about this one that wasn't said about the point. Again, similar to OpenLayers styles. The vertices are in [x,y] per OGC spec, and are in Web Mercator.

POLYGONS

This project doesn't have any vector polygons to print, so it didn't come up.

CONCLUSION

All told, this finally got printing working! We have base maps, overlays, markers, and lines all drawing into the PDF, and the map isn't unduly distorted. Success!

Friday, July 20, 2012

MapFish Print and Leaflet, Day 1

Today's rant (sorry, I mean constructive how-to) isn't about Leaflet at all, but about MapFish Print and getting it to work with Leaflet (well okay, it's kinda about Leaflet).

MapFish Print is a server component of the MapFish system, which creates PDFs containing maps. You create a YAML configuration file defining "layouts" such as paper size, an icon here, a border here, user-supplied text here, a map there, make a call to it via the "client spec" and get the URL of a PDF.

INSTALLATION

The installation docs didn't work out. The Java app compiled, and would run the help message. But when I tried to run it using the provided sample config, it failed. I posted to the MapFish users list to ask for help, but the only reply I got was from my own client, saying that I should use the GeoServer version of MapFish and that he had already installed it.

Hilarious, right? Fortunately I get really great clients, who don't think less of me for asking the list for help, who are in fact on that list, and who do things like install MapFish Print for me and run its example app to make sure it works. Thanks a million, man; you'll come across this post some day and know who you are and how much I appreciate clients like you. :)

CLIENT SPEC

To request a print, you send a "client spec" to MFP's URL. The client spec is a JSON document sent as a raw POST body, specifying the map layers, overlaid vector features, choice of layout, current bounding box (wrong! see below), et cetera. MapFish Print has a Control built into its bundled OpenLayers which can do this, but again this is Leaflet so I had to invent a method for adapting it.

The client spec has these components.
- Map scale. Leaflet doesn't have a scale calculation, I had to invent it.
- The SRS and map center. Catch: Using the native SRS (lon-lat) makes for very badly stretched-out maps in the PDF, so we really want to use Web Mercator.
- Choice of layout as defined in the config.yaml
- List of image layers
- List of vector features

Unfortunately, the documentation doesn't even mention the vector features; the only mention anywhere is a commit note saying "added markers and vectors" Fortunately, the demo app has a point, a line, and a polygon, and I know a thing or two about OGC specs, OpenLayers, and GeoJSON and figured it out.

Also, the client spec can take a "bbox" param rather than a center and scale. Unfortunately, this worked very badly: the map in the PDF isn't even close to the size of the average monitor, so using the same bounding box meant that the print version was 1 or 2 zoom levels off of what was displayed on the monitor. It's a common issue even when staying in the web browser: you just can't cram the same spatial rectangle onto two differently-sized browser window and expect the exact same thing, so you go with center-and-zoom so the difference is merely trimmed edges.

So, here are the relevant snips of code for doing each part.

1. Load Proj4JS so you can do reprojection from LatLng to Web Mercator, and load a JSON encoder since most browsers still don't have one built in.

    <script type="text/javascript" src="json.js"></script>
    <script type="text/javascript" src="proj4js/lib/proj4js-compressed.js"></script>
    <script type="text/javascript" src="proj4js/lib/projCode/utm.js"></script>
    <script type="text/javascript" src="proj4js/lib/projCode/merc.js"></script>
    <script type="text/javascript" src="proj4js/lib/defs/EPSG4326.js"></script>
    <script type="text/javascript" src="proj4js/lib/defs/EPSG900913.js"></script>

2. This little function reprojects a [x,y] array or a L.LatLng to Web Mercator, and returns a [x,y] array.

// reproject from WGS84 (Leaflet coordinates) to Web Mercator (primarily for printing)
// accepts a L.LatLng or a two-item array [lng,lat]    (note that array is X,Y)
// returns a two-item list:  [ x,y ]  in Web mercator coordinates
function wgsToGoogle(dot) {
    var srsin    = new Proj4js.Proj('EPSG:4326');
    var srsout   = new Proj4js.Proj('EPSG:900913');
    var newdot   = dot.lat ? new Proj4js.Point(dot.lng,dot.lat) : new Proj4js.Point(dot[0],dot[1]);
    Proj4js.transform(srsin,srsout,newdot);
    return  [ newdot.x, newdot.y ];
}

3. Calculate the map scale. Note that we round to the nearest 250, since MFP requires that we define a finite set of scales. Note too that within a metropolitan area, there can be significant
drift in the scale, e.g. 10500 at the south end of town and 10250 at the north end of town. Watch your "Internal Server Error" text as you test.

    var center  = MAP.getCenter();
    var DOTS_PER_INCH    = 72;
    var INCHES_PER_METER = 1.0 / 0.02540005080010160020;
    var INCHES_PER_KM    = INCHES_PER_METER * 1000.0;
    var sw       = MAP.getBounds().getSouthWest();
    var ne       = MAP.getBounds().getNorthEast();
    var halflat  = ( sw.lat + ne.lat ) / 2.0;
    var midLeft  = new L.LatLng(halflat,sw.lng);
    var midRight = new L.LatLng(halflat,ne.lng);
    var mwidth   = midLeft.distanceTo(midRight);
    var pxwidth  = MAP.getSize().x;
    var kmperpx  = mwidth / pxwidth / 1000.0;
    var scale    = (kmperpx || 0.000001) * INCHES_PER_KM * DOTS_PER_INCH;
    scale *= 2.0; // no idea why but it's doubled
    scale = 250 * Math.round(scale / 250.0); // round to the nearest 1,000 so we can fit MapFish print's finite set of scales

4. Collect the photo tile layers. Possible improvements here would be to read the opacity instead of forcing 1.0, and to read the TileLayer's URL instead of hardcoding it here.

    var layers = [];
    if ( MAP.hasLayer(PHOTOBASE) ) {
        layers[layers.length] = { baseURL:"http://server.com/wms", opacity:1, singleTile:true, type:"WMS", layers:["photo"], format:"image/jpeg", styles:[""], customParams:{} };
    }
    if ( MAP.hasLayer(MAPBASE) ) {
        layers[layers.length] = { baseURL:"http://server.com/wms", opacity:1, singleTile:true, type:"WMS", layers:["topo"], format:"image/jpeg", styles:[""], customParams:{} };
    }
    for (var i=0, l=OVERLAYS.length; i<l; i++) {
        var layer = OVERLAYS[i];
        if (! MAP.hasLayer(layer) ) continue;
        var layernames = layer.options.layers.split(",");
        var opacity = 1.0;
        layers[layers.length] = { baseURL:"http://server.com/wms", opacity:opacity, singleTile:true, type:"WMS", layers:layernames, format:"image/png", styles:[""], customParams:{} };
    }

 5. Reproject the map center to Web Mercator.

    var projcenter = wgsToGoogle(MAP.getCenter());

6. Ready for our first tests!

    // compose the client spec for MapFish Print
    var params = {
        "units":"meters",
        "srs":"EPSG:900913",
        "layout":"Landscape",
        "dpi":300,
        "layers":layers,
        "pages":[
            { center:projcenter, scale: scale, rotation:"0" }
        ],
        "layersMerging" : false
    };

    // send it out over jQuery. Use a raw POST here, since $.post() will mangle the body content

    $.ajax({
        url: 'http://server.com/create.json', type:'POST',
        data: JSON.stringify(params), processData:false, contentType: 'application/json',
        success: function (reply) {
            var url = reply.getURL;
            window.open(url);
        }
    });

CONFIG YAML

I won't go into detail here; it's documented by the MapFish project, and doesn't really pertain to integrating with Leaflet.

Fortunately, the client also provided a config.yaml from a previous project. This saved me weeks of learning it, especially given MFP's very poor documentation as to what options exist.

Monday, July 9, 2012

pgRouting and PostGIS 2.0 (and hardcoded field names)

We have had very good fortune with running pgRouting 1.0.5 on PostGIS 2.0 We asked Steve Woodbridge about it some time ago, he said he wasn't sure if it would work, but I'm glad to report that after 2 months of using it intensively, virtually all bugs we're finding would also be present in PostGIS 1.x

RENAMED POSTGIS FUNCTIONS

You need to make a few changes to routing_core_wrappers.sql before you load it, or remember to re-load it after you make changes. PostGIS 2.0 changed the names of a bunch of functions, enforcing the ST_* naming convention, so that AsText() is now ST_AsText()

Open up routing_core_wrappers.sql and do some search-n-replace:

X( with ST_ X(
Y( with ST_Y(
SRID( with ST_SRID(
SETSRID( with ST_SETSRID(
STARTPOINT( with ST_STARTPOINT(
ENDPOINT( with ST_ENDPOINT(

THE GEOMETRY FIELD

The geometry field in PostGIS 1.x is "the_geom" and pgRouting expects that too. PostGIS 2.0 names it "geom" Some possible fixes include:

When you load the data via shp2pgsql, supply "-g the_geom"
If you already loaded, you can simply rename the column: "ALTER TABLE trails RENAME COLUMN geom TO the_geom"
You can create a view. Performance hit, though, maybe not a good move.
You can edit routing_core_wrappers.sql and do a search-and-replace so pgRouting expects "geom"

For ourselves, we went with an entirely new table and named the column "the_geom" We had a need to separate out the trails data as displayed on the map, from the trails data used for routing. More on that in some other post if it's interesting enough.

THE LENGTH / COST FIELD

The "length" field is used as the cost of traversing a segment. You need this field and it needs to be named "length":

ALTER TABLE trails ADD COLUMN length double precision;
UPDATE trails SET length=ST_LENGTH(geom);

UPDATE: The "length" field is hardcoded into routing_core_wrappers.sql as the "cost" of a segment for routing purposes. I suggest this search-and-replace as well, to separate the length from the cost.

length::double precision as cost with cost

This does mean that you'll need a "cost" field in your data:

ALTER TABLE trails ADD COLUMN cost double precision;
UPDATE trails SET cost=length;

THE GID FIELD

The primary key field must be "gid" and not something else like "id" If you loaded up via shp2pgsql this probably is already the case. But just saying, if your table wasn't created by shp2pgsql you may need to do more search-n-replace in routing_core_wrappers.sql to rename "gid"

And if you use a mix of them, some tables with gid and some with a different primary key name, well, you'll have a tough time.