Wednesday, July 25, 2012

Different strategies for breaking up the trails data for pgRouting

So, the trails data as given to us as multilinestring. We need it as plain ol' linestring to use ST_StartPoint() and use it in pgRouting. So, how best to separate the trails data? As long as we're pulling and loading, should we slice up the miles-long trails into individual segments?


My very first experiment actually worked quite well. This split the multi into linestrings, but also into individual two-point segments. It was an "INSERT INTO SELECT" based on this query, thank you MerseyViking
SELECT ST_AsText( ST_MakeLine(sp,ep) )
-- extract the endpoints for every 2-point line segment for each linestring
  ST_PointN(geom, generate_series(1, ST_NPoints(geom)-1)) as sp,
  ST_PointN(geom, generate_series(2, ST_NPoints(geom)  )) as epFROM
   -- extract the individual linestrings
  (SELECT (ST_Dump(ST_Boundary(geom))).geom
   FROM mypolygontable
   ) AS linestrings) AS segments;

This method turned my 4000 trails into 100,000 segments each one some 10 feet in length. This turned out to work great: routing was quite fine-grained, and the nearest routing vertex would have been practically in arms' reach when mapped onto the real world.

But it did have issues:
  • We want to send the segments to the browser as WKT, so they can draw the route onto the map. With a vertex every 10 feet, this would overload a browser for non-trivial routes. We're targeting mobile platforms too, so a 3 MB download of WKT just isn't acceptable. (Simplify messes up the geometry so the path fails to lay over the trails. Leaving out odd-numbered vertices was just plain wonky.)
  • The later revision of the data was some 30X larger. We now had over 3,000,000 segments, and routing was unusably slow (300-900 seconds).


The later dataset was still multilinestring, but turns out that each multi had exactly 1 linestring within it. Very weird, but that does make it easy. All we do is load the routing table with the 1st geometry within each row of the trails table:
INSERT INTO route_segments (name, difficulty, the_geom)
    SELECT name, skill AS difficulty, ST_GeometryN(geom,1) AS the_geom
    FROM cm_trails;
Well that was simple! Routing is nice and quick, too: 120,000 trails means 120,000 segments so we're not much worse than in my original tests.

But no: The issue here is that routing would "overshoot" the source and destination locations, since it only knows intersections. In a case where we had miles-long trails, my route would start or end a mile away. Imagine that: I ask for directions from the Picnic Shelter to the Butterfly Pavilion, the directions and the blue highlight line start 1/4 mile away from this shelter, then go right past the pavilion for another 1/2 mile.


(yeah, okay, that was a stretch)

So I had the right idea with loading up the trails as-is, but I want the trails to be smaller segments, say every 100 feet, so there would be an intersection somewhere near me.

Thank you, Dane Springmeyer and Regina, for your posting
INSERT INTO route_segments (name, difficulty, the_geom)
    SELECT name, difficulty,
        ST_Line_Substring(t.geometry, 100.0*n/length,
        WHEN 100.0*(n+1) < length THEN 100.0*(n+1)/length
        ELSE 1
        END) AS geometry
   (SELECT name, skill AS difficulty,
    ST_GeometryN(geom,1) AS geometry
    FROM raw_trails
    ) t
CROSS JOIN generate_series(0,300) n
WHERE n*100.00/length < 1;
The results are perfect! The blue line and the directions now fit my intended start and destination acceptably, but the number of segments in the routing network is still reasonable.

No comments:

Post a Comment