Saturday, September 29, 2012

WHY WON'T IT FIND A ROUTE? BOUNDING BOX AND TOLERANCE

Today's post is fairly brief, but is the culmination of eight hours of tediously trying routes and adjustments. If it saves you one hour, won't you consider it worthwhile? :)

So, we have a routing network of trails for a Parks N Rec district. Directions via dijkstra_sp_delta() are working A-OK and have been for some months. But, the client wrote back with some news: he's having trouble finding a longer path across this one park.

I tried it too: about half of the routes would work, and half would not. I could route the 15 miles to a location, but move the marker a few hundred feet in any direction, and it may or may not be able to route there, even though my eyes can see the trail segments laid out.

It's consistent too: the same point would always work or always fail. And I followed it back into the SQL and verified that the given pair of nodes always would or would not have a route. So it's not some sysadmin issue such as out of memory or core dumps.


TOLERANCE


The directions handed back to the browser, include WKT of the segments so they can be plotted onto the map. I noticed that the route as given would skip from one trail to an adjacent one: walking a straight line, it would suddenly be 20 feet to the left on an adjacent trail, and sometimes back to the original trail. The visual effect was rather goofy and unattractive.

And, I suspect that this was routing us onto adjacent trails which would later diverge, and coming up with false negatives.

The cause here, is a too-high tolerance (25 feet) when generating the routing network via assign_vertex_id() The data had some known "dangles" due to human error in digitizing the trails, and a higher tolerance fixed that. We reanalyzed with a tolerance of 10 feet, and this stopped happening.

This solved an annoyance, but didn't get us any closer to finding the routes. Turns out that wasn't the problem at all...



BOUNDING BOX


Ultimately I discovered it:the directions stuff uses dijkstra_sp_delta() with a tolerance of 2640 feet. That is to say, the calculation excludes any segments that are not within a 2640-foot bounding box of the start and end vertices. That excludes the 900 square miles that can't possibly be relevant to this route, and gives us plenty of wiggle room to find routes.

Wrong.

In the case of this trails data, 5280 was the magic number. The trails meander enough, that a good path really can deviate by 1 mile away from the straight-line between the two points. And the performance loss isn't that bad, certainly not as bad as complete non-performance!

Anyway, no huge breakthrough here, but a word for someone else who just isn't finding the path: check your delta bounding box, and see if you're being too strict.

Tuesday, September 25, 2012

My goodness, it's been some 3 weeks since my last post. Been too busy DOING instead of WRITING.  :)

SOME STEPS IN OPTIMIZING A MAP'S LOADING SPEED


A few days ago, a client asked me to look over a map app and see what I could do to speed it up. The app is ExtJS, PHP, PostGIS, and MapServer. I made some substantial improvements, and here's what we did and how well it worked.

Nothing I'll mention today is  particularly cutting edge, and some of them date back to 2007 and MapServer 5.0 But, obviously here in 2011 and 2012, some folks didn't know about it, so here goes.


THE EXTJS LIBRARY / APACHE COMPRESSION WINS THE DAY

 The first roadblock of the app loading and starting, was  the download of ext-all.js, the core of the ExtJS JavaScript framework. They were loading it from Google's CDN, and it's 700 KB. On my 1 Mbps household DSL, with other PNGs and CSSs downloading too, this was taking about 10 seconds. The theory of using a CDN is that Google's network is faster than their server's pipe, but fact is that for DSL users it's still 10 seconds.

I instead had it load a copy of ExtJS from the web server, and then enabled compression (see here). This is an old trick from my sysadmin days, and BOY DID IT WORK here. The download was now 135 KB (use Firebug, watch the Content-length header) and took about 2 seconds.

Ta-da! Eight seconds shaved off the page startup, by not blindly assuming that Google's CDN is the panacea.


MULTIPLE HOSTNAMES

An old trick, but one that was not being implemented here. The second argument to the OpenLayers.Layer.WMS constructor is the URL of the WMS, or an array of URLs, in which case it will round-robin requests to these URLs. See here.

In the case of a tile-cached layer, this is a great boon. The browser has built-in limits about how many requests it will make to a webserver, and when your tiles are served up quickly enough, more time is spent on the browser's politeness, than is spent fetching and transferring.

So I swapped this:
var countylines = new OpenLayers.Layer.WMS("Counties", "/tilecache.cgi", { blah blah });
For this:
var TILECACHE = [
    "http://maps1.example.com//tilecache.cgi",
    "http://maps2.example.com//tilecache.cgi",
    "http://maps3.example.com//tilecache.cgi",
    "http://maps4.example.com//tilecache.cgi",
    "http://maps5.example.com//tilecache.cgi",
    "http://maps6.example.com//tilecache.cgi"
];
var countylines = new OpenLayers.Layer.WMS("Counties", TILECACHE, { blah blah });
Now the web server spends as much time download as it does blocking (Firebug is great), and the difference is noticeable.

You don't want to go too far with this trick. If the layer requires calculation, or is slow at all, or isn't tile cached, this may not be a good move. If your layer is slow, hitting it 6 times simultaneously won't be any faster. Even if the layer is plenty fast, once your browser is opening up 24 requests at a time to your web server instead of 4, your webserver may start blocking clients because of the flooding. Watch your MaxClients setting and your Apache error log, and consider what will happen if this app gets a sudden surge in popularity.

CENTROIDS FOR LABELS


The map features a grid of sampling quads, and each quad has an icon in the center indicating a status. The quad rectangles and the icons in the center, are separate layers. The icons use a URL param for filtering, so one can specify a parameter and show the relevant quad icons (in this case, the ID# of an invasive species, e.g. species_id=123). This layer is not served via TileCache, due to the dynamic nature of the queries, so its performance is naturally slow. But of course, we want it faster.

To place these icons, the DATA statement was as follows:

DATA "the_centroid from (select ST_centroid(merc_geom) AS the_centroid,* from quads where
%SPECIESFILTER%) as temp using unique quad_id using SRID=900913"
 A simple optimization here was to precalculate the centroid, as it's not going to change. Second, I rewrote the mapfile syntax for readability. Third, the species_id which would be used in the %SPECIESFILTER% wasn't even indexed. I suspect that the spatial index would supersede the species_id index, but this is also used in PHP-based reports so I know that indexing it can't hurt.
# some SQL to precalculate and index
SELECT ADDGEOMETRYCOLUMN('','quads','merc_center',900913,'POINT',2);
CREATE INDEX quads_merc_center ON quads USING GIST (merc_center);
UPDATE  quads SET merc_center=ST_CENTROID(merc_geom);
CREATE INDEX quads_species_id ON quads (species_id);

# revised mapfile
DATA "merc_center from quads using unique quad_id using SRID=900913"
FILTER "%SPECIESFILTER%" 
The net result is that the queries happen in 20 milliseconds instead of 250 milliseconds. Not bad, shaving about one-quarter second off the processing time for each tile.

MAPSERVER IMAGE QUALITY: AGG/PNG vs GD/GIF


One of the avenues I investigated, was the image tiles they're downloading. The app has multiple hostnames configured,, so the browser is downloading some 8 tiles at a time. And these are pre-rendered so they should be super quick.

But they're each 50 KB to 100 KB in size. Which means that a full page load of 20 tiles (4 wide, 5 high) is about 1.5 MB. The download itself takes 15 seconds, not counting wait times and server times. Again, this is where performance testing benefits from my having crummy household DSL.

The tiles use MapServer's wonderful AGG library (the same one that powers Mapnik) and output as gorgeous PNG-24 with porcelain-smooth line edges, which is good when we're drawing rivers and county lines. But man, 100 KB average tile size?

I switched over to GD/GIF This is the older renderer, and the line quality is poor. But the rendering time is about 1/3 that of AGG, and file size is magnificent: 5 KB - 10 KB now. I presented it to the client, and they loved it: the grainy lines were a small price to pay for having a map view load up in 3-8 seconds instead of 20-60.

WHAT'S WITH THOSE BLANK LAYERS?


And lastly, I noticed that one of the layers was always blank. The roads layer was turned on at all zoom levels, but the MapServer mapfile stated that roads don't come on until we're at 1:500,000 scale (MAXSCALEDENOM). That is to say, the browser was downloading 20 tiles with each page view, when we knew for a fact that those tiles are 100% transparent. Even without the transfer time, the browser was blocking while it loaded these tiles.


The solution here was simple: specify a maxResolution or maxScale setting for the OpenLayers.Layer.WMS constructor, so the layer doesn't even try to load tiles until we get into the usable zoom levels.


TILECACHE IN MOD_PYTHON (no)

I also investigated using TileCache via mod_python. The theory here is that the Python interpreter and various modules are cached into RAM, giving a performance boost over CGI.

In our tests, we didn't go with this. The difference in start-to-finish request time, as measured via Firebug, was negligible if it was real at all. But the memory consumption is remarkable: Apache bloated up to 1 GB size within a few hours of testing.

This particular client is hosted in a VPS, and they only have 2 GB of RAM and no swap. So having 1 GB locked in memory was not welcome, especially without corresponding speed gains.

THANKS FOR READING


Well, that's it. Nothing new or awesome, but a few tricks which weren't known to these developers, and therefore which may be new and useful to you. Happy mapping!