Thursday, March 7, 2013

Automating CartoDB and Google Fusion Spreadsheets on GoDaddy

A whole month without a posting. I've been busy on some fun stuff here. Here's a peek.

A client wanted something simple. They have a registration form which logs submissions regarding trees, to a Google Fusion Spreadsheet. They want a choropleth map of Census Block Groups (CBGs), indicating how many trees had been registered there. But it's not that simple: they don't want to pay for geospatial hosting, they want to stick with their $8 GoDaddy hosting.


So, our basic needs are:
  1. Update the Google spreadsheet of trees, populating the lat and lon fields for any which don't have them.
  2. Have a PostGIS table of CBGs someplace for free.
  3. Perform a spatial join between the CBG polygons and the  tree points.
  4. Render it as a Leaflet map, using only client-side tech, knowing that the CBGs are too much to be communicated as raw vectors and rendered as a L.Polygon layer.
Our resources are:
  1. The Google Spreadsheet.
  2. A website at GoDaddy which supports PHP, but not many modules.
  3. Plenty of disk space.
  4. A free account at CartoDB.
It all worked out beautifully, using client-side-only mapping, and PHP with no special configurations, suitable for hosting on GoDaddy's low-priced web hosting. No server-side mapping infrastructure!

An annotated version of the app's source is available here:
http:/tilelab.greeninfo.org/~gregor/cartodb_google/


Step 1: GoogleFusionTable.php


Step 1 is comparatively easy. Google Docs already has a web API for doing SQL-like queries to spreadsheets, and a PHP library already exists for this.

https://github.com/gregallensworth/CloudServices/blob/master/GoogleFusionTable.php

This was just simple: Google's geocoder service, the GFT PHP, and a few dozen lines of code. Now we just visit the URL and all non-geocoded registrations will have their Lat and Lon fields updated.



Steps 2: Uploading Google to CartoDB


We had been itching to try out CartoDB's offerings, this was a great opportunity.

CartoDB is a website, which offers an easy-to-use interface to PostGIS tables. You upload a spreadsheet or a shapefile, they grok it and load it, and you get a nice table UI and a panel for entering SQL. A step further, they have a web API for performing these SQL queries, so you can get back a JSON object full of rows, by submitting a query via GET or POST.

The heart of accessing it via PHP, boils down to this wrapper function I whipped up:
https://github.com/gregallensworth/CloudServices/blob/master/CartoDB.php

And the implementation is just a tie-together of Google's SQL API for Fusion Docs, and CartoDB's SQL API to your table:
https://github.com/gregallensworth/CloudServices/blob/master/google2cartodb.php

Remember: Each query really is a hit to CartoDB's web service. If you have 12,000 rows to insert, you really will make 12,000 hits to CartoDB. Be kind and thoughtful. Use a common key between your two tables, so you don't need to re-insert every single record if it's already there. And consider doing the initial load via CartoDB's dashboard.


Step 3: Spatial join via CartoDB's web SQL API


Pretty simple here, now that you've already seen the CartoDB web SQL API.
Just run these two queries:

DELETE FROM census_block_groups_treecounts;

INSERT INTO census_block_groups_treecounts (
    the_geom,
    treecount
)
SELECT
    blocks.the_geom,
    SUM(treeshere) AS treecount
FROM census_block_groups AS blocks
JOIN registrations AS trees
ON ST_Contains(blocks.the_geom, trees.the_geom)
GROUP BY blocks.cartodb_id;


Step 4: Client side map... CartoDB again


Further still, CartoDB created a JavaScript library which bundles Leaflet, their web SQL API, mouse behaviors, CartoCSS parsing, and a dozen other great things. So getting your data back out of CartoDB and into Leaflet requires very little code.

http://developers.cartodb.com/documentation/cartodb-js.html

http://tilelab.greeninfo.org/~gregor/cartodb_google/index.js

Not a lot to say here. The docs are less than rich in real-use-case examples, and CartoDB.js brings together a lot of moving parts. But once you figure it out, it's really something quite nice.

BONUS: CartoDB layers can have their CartoCSS changed at runtime, and have their SQL changed at runtime. This app doesn't make use of that, but in theory filtering features or changing color schemes, has never been easier.


Step 5: Nightly Automation


GoDaddy has a facility for setting up cronjobs, right there in the control panel. At 1am it runs the geocode PHP. At 2am it runs the google2cartodb PHP. At 3am it runs the spatial join. Pretty groovy.


Trouble and Surprises


Not a lot in the way of trouble and surprises.

GoDaddy's PHP has a execution time limit. This seems to be 5 minutes, and ini_set() and set_time_limit() don't seem to change it. As such, the steps really are handled as separate PHPs instead of one monolithic one to geocode, upload, and join. This is probably for the best anyway, as it's easier to debug individual parts this way.

CartoDB's free account limits you to 5 MB of storage. In the case of this client, their registrations came up to 1 MB even if we stripped down to only the Lat & Lon, and they are expecting plenty more registrations . The set of CBGs comes up to 2 MB for the SF Bay Area, and that's after I stripped out areas that were unlikely to get registrations, then simplified, and clipped to fit land. The joined layer displayed on the map, will vary in size based on the number of matching CBGs; there's a maximum of 2 MB here too, if some day every single CBG were to have at least one registered tree. As such, hitting the 5 MB limit seems very likely as they get more registrations and they may end up upgrading anyway.

CartoDB's web SQL API cannot create tables, only populate them. I had to upload a spreadsheet of the first few registrations, for example, and has to manually run the spatial join SQL in CartoDB's dashboard. After the table structure was there, a DELETE FROM followed by later INSERTs worked just perfectly.

CartoDB.js involves a lot of parts, and the docs are good but not always great. Though the final map involves fairly little code, figuring out exactly which code took some time. I had to read up on CartoCSS a bit, and had to read a few times to figure out that CartoDB's API won't return the_geom but that you must do query for "ST_ASGEOJSON(the_geom) as json_geom" to get it, e.g. if you want to highlight features.


Conclusion


A client wanted a geospatial app with some moderate back-end complexity, but without paying for a geo infrastructure. Using a Google-hosted contact "database", CartoDB's free tier plan, and a pinch of PHP to tie them together, it all came together beautifully.

In the meantime, I learned a lot about CartoDB and really admire what they've come up with. The $30 price tier is a bit steep for those intermediate use cases: this client may go up to 15 MB of usage, over the free tier but well under the need for the Magellan tier. If they had a smaller paid plan that would be really excellent.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.