Gregor The Map Guy: 2012

Monday, December 31, 2012

CodeIgniter + DataMapper

Improvements on what I posted last time

This posting, expands on my posting last week (DataMapper & CodeIgniter for writing access controls) to provide some enhancements and good practices in security. I left it out of that posting, as it was becoming overly lengthy.

The Login Process

The Site controller's login method, as really implemented, is as follows. It adds one additional line: purging the session token before attempting the login check.

public function login() {
    // clear any existing credentials
    $this->session->unset_userdata('userid');

    // if a user/pass were submitted, attempt login using the model
    $user = User::attemptLogin(@$_POST['username'],@$_POST['password']);
    if ($user) {
        $this->session->set_userdata('userid', $user->id);
        return redirect(site_url('management/'));
    }

    // got here: either they didn't submit, or it didn't work out. either way, bail to a form
    return $this->load->view('site/login.phtml');
}

I consider this a Good Practice during a login phase. By clearing the session token, we have effectively logged them out (recall that the Management controller checks for the session token, then uses it to fetch their User object) and prevented any possible, hypothetical shenanigans that could result from the rest of the login process screwing up.

Consider, if the programmer had left out that line. A failed login would spit out the login.phtml but their old login session would still be intact. The result would be a Log In page, possibly decorated with elements indicating that they are still logged in. It's a small bug, but would look goofy -- and it may result in some information leakage if you're not careful, since you probably weren't expecting that page to be accessed with an existing session. It's possible, too, that one could contrive less-trivial bug scenarios based on your old session information being present during an attempted login.

Point is, I consider it a good thing to empty out the login session before attempting to repopulate it. Whether this is done before the attemptLogin() call, or in an else statement immediately afterward, doesn't matter as much as ensuring that they are either "logged in as intended" or else "logged out" with no wiggle room for an unknown state.

Password Storage

In the previous article, the attemptLogin() method checked for a literal username and password match:

$u = new User();
$u->where('username',$username)->where('password',$password)->get();
return $u->id ? true : false;

In the real world, of course, you would never store raw passwords in the database, right? (hint: the correct answer is "Never! And I can't believe that you hinted at doing so last week!") The proper way to store passwords is with a non-reversible hash, using a salt.

Okay, lemme back up a little and explain.

The basic problem of storing passwords, is that someone can do a SELECT and get every password in the system. From there, they don't need additional security holes, hacks, and injected files to wreak havoc: they can login as authorized personnel. Or, your own messup during programming may display the whole list of passwords long enough for someone to snap a photo.
What we need is a zero-knowledge proof: The server doesn't have your password, but it does have a way of knowing that whatever you supplied as the password is correct.
A non-reversible hash, also called a message digest, is a technique for scrambling text. The two most common "hashing algorithms" are MD5 and SHA-1. A scramble (called a hash) would be unique based on the input: the SHA-1 hash of the word "apple" will always be d0be2dc421be4fcd0172e5afceea3970e2f3d940, but it will take a billion years to discover a second string with the same hash.
At its very simplest, the users table's password field could store the MD5 hash, and then compare the MD5 of whatever you typed to the stored MD5. Voila: no stored passwords, but a watertight way to know that you supplied the correct password.

But, hashing does have a drawback: its own consistency. The word "apple" will always hash to the same thing, and there are growing catalogs of known words and their hashes. This is a real-world problem: paste d0be2dc421be4fcd0172e5afceea3970e2f3d940 into Google and you get http://md5-database.org/sha1/apple

To "salt" a password means that we have an agreed-upon string that we mix into your password, to make the hash even more scrambled. The word "apple" has been cataloged, but the word "H87JKua8apple" has not, and it's gonna be a long time before one of those hash catalogs gets around to that one. Typically, the salt is of a known length and is stored along with the password:

Random 8 character salt: H87JKua8

SHA-1 hash of the word "apple": d0be2dc421be4fcd0172e5afceea3970e2f3d940
Password field: H87JKua8d0be2dc421be4fcd0172e5afceea3970e2f3d940

When encrypting a password, we generate 8 random characters and prepend them to the password, then take the hash of that resulting non-word.

When checking a password, we take the first 8 characters from the password field and prepend them to whatever password the user is trying. That is, they gave us "apple" and we know "H87JKua8" so we hash "H87JKua8apple" and it should match what's in the database.

The result is an extra million years of security on your password hashes. And if you think 8 characters aren't enough of a salt, feel free to crank it up: a 40-character salt with a 40-character hash may seem a bit extreme, but also gives a hacking time longer than the universe is expected to last no matter what mythology you prefer.

Implemented in code, it's not really as complicated as it sounds:

public static function attemptLogin($username,$password) {
    if (! $username) return false;
    if (! $password) return false;

    // fetch the user
    $u = new User();
    $u->where('username',$username)->get();
    if (! $u->password) return false; // user not found (or a blank password), automatic failure

    // check the password field
    // split off the 16-byte salt, prepend it to whatever password is being tried
    // and it should MD5 to the same hash as we have in the database
    // MD5 outputs are 32 characters in length, so 'password' is a varchar(48) field
    $salt = substr($u->password,0,16);
    $crypt = $salt . md5($salt . $password);
    if ($crypt == $u->password) return true;
    return false;
}

public static function encryptPassword($password) {
// salt is 16 random characters; there are any number of ways to make random strings
    // the password field is the MD5 hash of the salt + password
    // MD5 outputs are 32 characters in length, so 'password' is a varchar(48) field
    $salt = substr(md5(mt_rand()),0,16);
    $crypt = $salt . md5($salt . $password);
    return $crypt;
}

Some closing notes here:

There are various ways to generate 16 random characters. At this level it doesn't really matter which technique you choose: the randomness of distribution of the choice of salts, is less important than the fact that "abcccbaaccbabcabapple" still isn't in any hashing database.
The MD5 hash has some known circumstances in which a collision (that is, a second word with the same hash) can occur. But it's incredibly farfetched with a password field, more for multi-kilobyte files. At this time, PHP doesn't have a sha1() function but does have a md5() function. When PHP does bring up a simple sha1() function,I'd recommend using it over MD5.

Session Cookie Encryption

This isn't program code at all, but is another example of why CodeIgniter rocks.

A known weakness of PHP's sessions, is that they are carried in cookies. If someone intercepts a cookie (say, sniffing your wifi) they can hack their browser's cookie jar to continue using YOUR login session indefinitely. If your program has a logout function, it would invalidate the meaning of the session, but frankly you can't rely on your users to log out reliably.

CodeIgniter encrypts session cookies. This uses a simple single-key encryption, coded into config.php when you first set up CodeIgniter. This means that nobody, including the authorized user of the session, can crack open the cookie and tinker with its contents.

This single fact in itself, is a huge gain to security, and a great reason to use CodeIgniter's session helper instead of PHP's naked $_SESSION system.

Wednesday, December 26, 2012

CodeIgniter + DataMapper = Concise & Tight Access Controls

My goodness, it's been five weeks since my last posting. Too busy programming, to stop and write about programming. I'm sure you understand.

The topic today, is about CodeIgniter and DataMapper ORM, and how they can be leveraged to create sophisticated and secure authentication models. This will be a lengthy posting, as it covers a lot of ground. It will also be in multiple parts, as I post additional enhancements such as why you shouldn't store literal passwords as I do in my example below...

Introducing CodeIgniter and DataMapper

CodeIgniter is a MVC framework for PHP.

MVC is Model-View-Controller, an object-oriented concept of separating out complex in-object logic ("for a given Customer, generate a list of past Orders") from the front-end logic ("this is the My Orders page, fetch that list of past Orders from their Customer profile") and from the front-facing HTML presentation ("<ul> (foreach order in orders, print <li>) </ul>"). If you're not programming your non-trivial apps in MVC, you probably should be...

DataMapper ORM is an Object Relational Mapper. That is, it's a library which provides an object-oriented approach to your database tables using models. This bypasses the need for writing out SQL longhand, providing you with techniques for selecting, creating, updating, and deleting, as well as performing more complex relational queries. The resulting code is very readable and extensible, and much less of an eyesore than longhand SQL.

CodeIgniter is one of several popular MVC frameworks (CakePHP, Zend Framework). It is my favorite, due to its simplicity, its fast loading and execution times, and its willingness to let me break the MVC rules when I need to, e.g. to use PHP calculations in a view template, or to print from within a controller method.

And in both cases, the documentation is quite good, always a bonus...

Basic Example: A Login Function

Here is an excerpt of the Controller and Model logic which implement the login phase.

The /site/login URL (the Controller's name plus the name of the function) accepts POST data 'username' and 'password', uses the User model to check whether the user-and-pass matches, then sends them either to /management (the Management controller) or to /welcome (the Welcome controller) based on the success. As a side effect, it sets session information which is vital later.

The User model houses the static method attemptLogin() which accepts a username & password and returns true/false.

class Site extends CI_Controller {
    public function login() {
        // attempt login. The complexities are hidden within the model, where such logic belongs
// if login fails, we bail instantly
      $user = User::attemptLogin(@$_POST['username'],@$_POST['password']);
    if (! $user) return redirect(site_url("welcome/"));}

// guess it was good, save their session token and send them to the control panel
      $this->session->set_userdata('userid', $user->id);
        return redirect(site_url("management/"));
    }
}

class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname'');

    public static function attemptLogin($username,$password) {
        if (! $username) return null;
        if (! $password) return null;

        // fetch the user
        $u = new User();
        $u->where('username',$username)->where('password',$password)->get();
        return $u->id ? true : false;
    }
}

Some key points here:

The User model is a DataMapper subclass. Its one static method contains DataMapper logic: creating a new User handle, then querying "WHERE username=XXX and password=YYY" The determination of whether this login was in fact valid, is made by checking that the resulting recordset object has an "id" attribute in the current record. If there is no record with the given username and password, the resulting recordset would be empty and $u->id would be null.
Isn't that DataMapper code concise and readable? I am entirely comfortable with writing out SQL longhand, but if I can read out the logic almost like English, all the better!
The Site controller method shows how concise you can be with CodeIgniter. In the event of a login failure, the controller method (the "page" so to speak) issues a redirect to some other page (presumably the login or Forgot Your Password? page) and exits immediately.

So, our use case...

The use case here was a somewhat deep set of relations.

Organizations are a top-level entity. Organizations contain multiple Users, and a User belongs to exactly one Organization (one-to-many). Organizations have multiple Projects, and a Project belongs to exactly one Organization (one-to-many).Within an Organization, a User may only see details of a Project if they are on the Project's list (many-to-many), unless they are a Manager in which case they may see any Project within their same Organization. A Project has multiple Costs, multiple Actions, and multiple Comments. Actions have multiple Assessments and multiple Locations. A separate viewing mode on the front-facing site is available for each Project, with 3 access options: anybody in the same Organization as the Project, people in other Organizations, OR any non-logged-in user. The list goes on over an initial 15-page spec doc, a dozen wireframes and a few supplemental spec docs, and 8 hours of meetings, including lookup fields and various filtering needs...

And this is for a bunch organizations who aren't entirely comfortable with sharing their info with the central coordinating body in the first place, and where a data leak could be extremely bad (not "they know I live in Santa Cruz" bad, but "someone edited my report to indicate that I blew $20M on a project that went wrong, and I got fired" bad), so this stuff needs to be 100% secure. (which I always do anyway, of course)

Organization
        x Project
            x User
            x Cost
            x Action
                 x Assessment
                 x Location
x Comment
1 User
        x User            x Project

What followed was a set of DataMapper classes, with relations defined in them. Here are a few of the core classes, so you can see how one sets up the relations. It's so easy.

class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname');
    var $has_one = array('organization');
    var $has_many = array('project','comment');
}
class Project extends DataMapper {
    var $table = 'projects';
    var $default_order_by = array('title');
    var $has_one = array('organization');
    var $has_many = array('user','comment');
}
class Organization extends DataMapper {
    var $table = 'orgs';
    var $default_order_by = array('firstname');
    var $has_one = array();
    var $has_many = array('user','project');
}
class Comment extends DataMapper {
    var $table = 'replies';
    var $default_order_by = array('postedwhen');
    var $has_one = array('project','user');
    var $has_many = array();
}

There are back-end tables in the database, of course. There are the basic data tables: users, projects, orgs, replies. They all have a serial / primary id field. The comments table has a project_id field which is automagically detected by DataMapper and used to relate a Comment to a Project and vice versa. For the mapping of many Users to many Projects, and of Users to Orgs, there are standard join tables: users_organizations has two columns user_id and organization_id, and users_projects is similar.

An extra bonus: unlike Django, these models do not need to define the fields and their datatypes. DataMapper scans the table and uses whatever columns are present. We only need to define the metadata above, and the computer figures it out. (isn't that the computer's job?)

The only shortcoming here, is that you're on your own for creating the database tables. You can use whatever tool you like, though: command-line psql, MySQL Query Browser, phpXXAdmin. And in the case of the users-to-orgs and users-to-projects table, I had to run a failed query once so I could see the name of the table it couldn't find: organization_users, user_organizations, users_orgs, ... ? Still, that wasted about 2 minutes of my time, out of the 4 hours of reading and replicating the database as specified.

Tying it together: Controller Logic

Now that we have the relations defined, we can use DataMapper's genius to make some very readable queries, without tediously writing out subqueries.

// fetch all Users who have access to this Project (that is, who have a relation to it)
// Variant 1: build an array of User objects
$users = array();
$proj = new Project();
$proj->where('id',$PROJECT_ID)->get();
foreach ($proj->user as $u) $users[] = $u;

// fetch all Users who have access to this Project (that is, who have a relation to it)
// Variant 2: build a User recordset which could be passed through more filters by the caller
$proj = new Project();
$proj->where('id',$PROJECT_ID)->get();
$uids = array_map( create_function('$u','return $u->id;') , $proj->user );
$users = new User();
$users->where_in('id',$uids)->get();

// Fetch all Locations under my same Organization (remember: Org->Project->Action->Location)
// Note the use of project.organization_id instead of project.organization.id
// This is marginally faster as it doesn't access the organizations table at all
$locs = new Location();
$locs ->where_related('action/project','organization_id',$user->organization_id)->get();

// Fetch all Locations which are for a Project Action for which I am a listed User
$locs = new Location();
$locs ->where_related('action/project/user','id',$user->id)->get();

// fetch a Location by its ID#
// and make sure that it's for an Action for a Project for which I an a listed User
$locs = new Location();
$locs->where('id',$what->id)->where_related('action/project/user','id',$user->id)->get();

As you can see, a concept such as $costs->where_related('action/project/user','id',$user->id) is something you would use a lot when working with these sorts of relations. It's tedious but not terribly difficult to implement in SQL, ... but it's even less difficult when using DataMapper!

Tying it together: Controller Logic and Access Control

Okay, enough background. Finally time for what I promised: using DataMapper's supremely readable and flexible logic, to build super-high-security access controls.

This consists of 3 parts.

The User model has a wrapper function $user->hasEditingAccessTo($what) which returns true/false indicating whether they should be allowed to edit that object. The model is The Proper Place for such deeper logic, and this method accounts for the User's access level since Managers and non-Managers have different criteria.
The controller named Management (thus, all URLs under /management) includes as part of its constructor,the fetching of $this->user which would be the session's corresponding User.
The logic for any single controller method, then, can verify your access simply by checking $this->user->hasAccessTo() There's very little to copy-and-paste into every single method, and thus very little to mess up...

Snippet from the Management controller: the URL /project/123 would load up Project # 123 and print or process an editing form, but only if you're allowed.

// all functions in this controller, are only accessible if the session token is set and your access level is high enough
function __construct() {
parent::__construct();

    // load the user's User account; largely used for validating access to a Project
    $userid = $this->session->userdata('userid');
    $this->user = new User();
    $this->user->where('id',$userid)->get();
    if (! $this->user->id) return redirect(site_url('site/login')); // should be impossible, but it can't possibly be good if it happens

// check their access level, and make sure it's Manager or else bail
if ($this->session->userdata('access_level') != ACCESSLEVEL_MANAGER) return redirect(site_url('site/login'));

}

public function project($id) {
    $project = new Project();
    $project->where('id',$id)->get();
    $ok = $this->user->hasEditingAccessTo($project);
    if (! $ok) return redirect(site_url('management/dashboard'));

    if (! @$_POST['save']) return $this->load->view('project_editing.phtml');
    // if we got here, it's a POSTed form so process it...
}

The magic (rather, the brains) is in the User instance method hasEditingAccessTo() as follows:

class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname');
    var $has_one = array('organization');
    var $has_many = array('project','comment');

     // $this is a User object (this is the User model, right?)
    // Managers filter by $this->organization_id matching a $project->organization_id
    // Editors (non-managers) filter by $this->id being an associate of $project->user

    // Note a security practice here: We don't do "if manager, else" to determine whether
    // someone is a Manager, and presume that they're an Editor if not. Instead, use a switch
    // to account for Manager, Editor, and Other. This limits the possible impacts of any possible
    // upstream programming bug or failed security measure,
    // e.g. someone screwed up and used hasEditingAccessTo() without checking that the user is logged in.

    public function hasEditingAccessTo($what) {
        switch ($this->access_level) {
            case ACCESSLEVEL_MANAGER:
                return $this->_hasEditingAccessTo_manager($what);
                break;
            case ACCESSLEVEL_EDITOR:
                return $this->_hasEditingAccessTo_nonmanager($what);
                break;
            default:
                return false; // a level 0 (logged out) should never have even called this
        }
    }

    private function _hasEditingAccessTo_manager($what) {
        // a simple bail condition: the recordset doesn't have an ID#, the recordset is already blank,
        // probably because the ID# was tried at random, or maybe for a very-recently-deleted record
        if (! $what->id) return false;

        $ok = false;
        switch($what->model) {
            case 'user':
                $what->where('id',$what->id)->where('organization_id',$this->organization_id)->get();
                break;
            case 'project':
                $what->where('id',$what->id)->where('organization_id',$this->organization_id)->get();
                break;
            case 'action':
            case 'link':
                $what->where('id',$what->id)->where_related_project('organization_id',$this->organization_id)->get();
                break;
            case 'location':
                $what->where('id',$what->id)->where_related('action/project','organization_id',$this->organization_id)->get();
                break;
            case 'cost':
                $what->where('id',$what->id)->where_related('project','organization_id',$this->organization_id)->get();
                break;
            case 'assessment':
                $what->where('id',$what->id)->where_related('action/project','organization_id',$this->organization_id)->get();
                break;
        }

        // the yes/no status is simply whether there is still an ID# after all that filtering
        $ok = (boolean) $what->id;
        return $ok;
    }

    private function _hasEditingAccessTo_nonmanager($what) {
        // $what is a DataMapper recordset handle, e.g. a Project filtering by ID#
        // $this is a User object (this is the User model, right?) so $user->project is highly relevant here for finding associated Projects

        // a simple bail condition: the recordset doesn't have an ID#, the record doesn't even exist
        if (! $what->id) return false;

        $ok = false;
        switch($what->model) {
            case 'project':
                $what->where('id',$what->id)->where_related('user','id',$this->id)->get();
                break;
            case 'action':
            case 'link':
                $what->where('id',$what->id)->where_related('project/user','id',$this->id)->get();
                break;
            case 'location':
                $what->where('id',$what->id)->where_related('action/project/user','id',$this->id)->get();
                break;
            case 'cost':
                $what->where('id',$what->id)->where_related('project/user','id',$this->id)->get();
                break;
            case 'assessment':
                $what->where('id',$what->id)->where_related('action/project/user','id',$this->id)->get();
                break;
        }

        // the yes/no status is simply whether there is still an ID# after all that filtering
        $ok = (boolean) $what->id;
        return $ok;
    }

}

Conclusion

CodeIgniter is great for separating out a non-trivial app into its MVC components. There are other MVC frameworks out there, but I like CI best for its simplicity, its fairly low number of magic behaviors, and its willingness to let you break the MVC rules.

DataMapper is overkill for more trivial database-driven apps, and it's not the panacea for every single use case. But for a complex set of relations (there are 33 tables in all, more than described above) it really hits the spot, saving a lot of tedious SQL.

The combination of MVC and DataMapper, allows us to look up a session User and then do complex relational queries, most notably permission checks, with comparatively little code and with very few places for bugs to creep in.

Sunday, November 18, 2012

MOBILE DEVELOPMENT DAY TWO:
Hours' worth of hidden gotchas and forgotten details

My previous posts about making mobile apps with Phonegap/Cordova, ended with launching the simulators and running the sample app. Today's project was to get a have the page load to a simple Leaflet map centered on my location. This is about the simplest non-trivial app possible, but it took me about 4 hours because of the hidden gotchas.

So, here are a few of the snags I hit upon. If even one of them saves you even a half-hour of puzzling, you're welcome. :)

ANDROID: PHONEGAP FAILS TO CREATE ASSETS/WWW

After my first draft, I threw out my project folder and used ./create to make a new one as before.
This new one lacked the assets/www folder and its behavior was "Hello World, YourAppNameHere!" Creating a new www folder, and copying the content of the demo app out of Phonegap's folder, didn't change anything. I even went as far as destroying the project folder, using ./create again, and getting the same result: no www, and creating one doesn't help.

The solution was to move Phonegap out of C:\Program Files where I had it. It is now C:\Users\Gregor\phonegap-2.2.0 and it creates new projects perfectly, even with their www content.

I don't know what was going wrong, as I'm an admin account, and copying files out of C:\Program Files shouldn't have been an issue.. But short version: My previous advice about using C:\Program Files may not work as well as I had thought, after a few more creations.

ANDROID: PUTTING THE APP ONTO A REAL PHONE

To deploy an app onto your phone, you will need the Google USB Driver. This allows communication between the Android SDK and the phone's innards. This is not the same as using your phone as a Disk Drive or USB Storage device.

Go into the Start menu, type "android" You should see android.bat as an option. Click it.
This opens the Android SDK Manager. Scroll down to Extras and make sure that Google USB Driver is listed as Installed. If it is not, check it, click Install 1 Package, and then close the SDK Manager.

The SDK Manager doesn't install the driver at all, but merely downloads the driver INF files. You still need to install them, and that process varies by your operating system. Fortunately there is a decent writeup about this already:
http://developer.android.com/tools/extras/oem-usb.html#InstallingDriver

On my device (an HTC Incredible, which was hot stuff in 2010) the Google and OEM drivers do not work. For the HTC Incredible, I used HTC Sync from HTC's website. Despite its name, it also involves that special USB driver.

Once these two surprises were taken care of, I can now deploy my Leaflet app to my old phone, and it works A-OK.

IOS:APP SIGNING & PROVISIONING

When I tried to run my app on the borrowed iPad, it wouldn't build: The profile doesn't have a valid key for signing the app. Evidently deploying your own app onto your own iPad, you still need Apple's permission. And getting it was a half hour of looking up documentation about "provisioning profiles" so I could get permission to develop on my own iPad.

* Connect your iPad to your Mac. In XCode, look in the upper-right to find the Organizer, and look in the Devices section. You should see your iPad listed, including an Identifier, a Provisioning section, and a list of Applications which you have installed onto it.
* Do a Google search for "ios provisioning portal" and follow the link, logging in to the iOS Dev Center. Presumably you already have an account here if you signed up for a developer license.
* After you log in, follow the link for iOS Provisioning Portal, then use the Development Provisioning Assistant. This is a somewhat lengthy wizard, which will walk you through several steps.

Eventually, you will have imported your new identity into XCode and will have tagged your iPad with an identity as well, and Apple and XCode will allow you the privilege of installing programs onto your own hardware. (and folks gripe that Microsoft is invasive and controlling?)

ANDROID AND iOS CORDOVA.JS ARE DIFFERENT

This one caught me by surprise: I got my app loaded onto my iPad, but it wouldn't run. If I put alert() statements into index.js they would execute, but the deviceready event would never happen.

Turns out, cordova-2.2.0.js is not the same between platforms. The first 100 lines or so are the same (which is what threw me) but it's definitely the case that they are different later on.

In my case, I have the one codebase stored on a network drive, and I don't want to copy each file over before each editing session. My solution is as follows:

I copied in both versions of the cordova.js, naming them IOS-cordova-2.2.0.js and ANDROID-cordova-2.2.0.js
In index.html I have <script> tags for both versions, and one is commented out.
When I want to compile or test on a platform, I comment and uncomment the appropriate <script> tag.

IOS: NETWORK WHITELISTING

The app was now almost running: it starts, and up comes Leaflet, but no tiles! Fortunately, XCode finally came through and gave me something to go on: "whitelist failed for http://a.tiles.mapbox.com/..."

A straightforward problem: iOS requires a whitelist if they're to make network access (Android does not). This is in the project's Cordova.plist file. Look for the ExternalHosts setting, and simple add an entry for * (yes, a single asterisk as a host name)

This will enable all hosts, and now my Leaflet app works.

It's up to you if you would rather list individual hosts. Personally I find that an unnecessary nuisance, since I'm the author of the app and I doubt that it could be tricked or hijacked to access other sites.

SUCCESS!

After learning and solving these hurdles (getting Apple's permission, finding HTC's special USB drivers, reinstalling Phonegap, and more) everything is working: One tap and I have a Leaflet map centered on my location.

Next steps: make this app something worthwhile...

Monday, November 12, 2012

Mobile Development with Phonegap / Cordova

Part 1 (again): Getting set up for iOS development on OSX

Last week I got my development environment set up for Android: Windows 7, Eclipse, and Phonegap/Cordova. I also got my iBook and set up Xcode, and got the demo app running on iOS. And here's the writeup.

Some if it may be a bit brief, as I'm presuming you read my writeup about Android, where I describe some synonyms and terminology.

THE STACK

The way to use Cordova (aka Phonegap) to build iOS apps, is to use Xcode on a Mac. The basic stack is:

- A Mac. It must be running Mountain Lion (OSX 10.8) or later.
- An Apple ID so you can use the App Store
- Xcode 4.x
- Phonegap and its create script
- Your iOS device, or the built-in simulator

In fact, it is a violation of the iOS Terms Of Use, to develop an app for iOS not using Apple hardware and authorized tools. So this stack really is it. (comments about Microsoft and the anti-competition lawsuits of the late 1990s, tastefully deleted)

In my case, I hit up eBay and found a used iBook running Mountain Lion for $400.

GETTING AN APPLE ID AND IOS DEVELOPER LICENSE

If you want to deploy your mobile app via the App Store, you will need to extend your Apple ID with an iOS Developer License. This means visiting Apple's website (https://developer.apple.com/programs/ios/) and spending a half hour or so entering your information a few times. It also means spending $99 for a one-year registration as a Bona Fide Developer. (must fight... must not make... snarky comment)

This step is optional, required only once you're past the testing phase and want your app to be available to others.

INSTALLING XCODE

Fire up your Mac, open the App Store, so a search for Xcode. It comes right up, and a few clicks later it's downloading and installing. It's really that painless.

This is a 1.6 GB download, so on household DSL it took overnight. On a cable modem, you'll still have time to catch dinner and a couple episodes of Futurama...

After it's set up, start Xcode and then install the Xcode command-line tools: Preferences / Downloads / Components As long as I was there, I also had it grab a bunch more documentation and libraries. The whole shebang was another 1.1 GB, but again went painlessly.

INSTALLING PHONEGAP (CORDOVA)

Do a Google search for "phonegap download" and download the Cordova library. The version 2.2.0 is about 26 MB, and downloads in a few moments.

The Phonegap library doesn't really "install" but you extract it someplace. On my Mac (Mountain Lion, Safari, very default configuration) the Cordova library unzipped, leaving me a folder. I simply moved the folder into my Documents folder, and renamed it to phonegap

SET UP A PROJECT

Open up a Terminal (use the Spotlight, the magnifying glass, and type "terminal". The icon is a black square) and run these commands:

./Documents/phonegap/lib/ios/bin/create FOLDER_PATH PACKAGENAME PROJECTNAME
Close the terminal

FOLDER_PATH - This is the path to the intended folder for your new project, e.g. ./Documents/MobileApps/WalkTracker
PACKAGENAME - A dotted-format app name combined with domain name, e.g. org.greeninfo.walktracker
PROJECTNAME - The name of your app as you want it to appear, e.g. "Walk Tracker Calorie Counter"

Example (don't forget those quotes!):

./Documents/phonegap/lib/ios/bin/create./Documents/MobileApps/"Walk Tracker" org.greeninfo.walktracker "Walk Tracker Calorie Counter"

This will create a new folder, containing an Xcode project file.

RUN IT!

Open up Xcode and load that project file (or, simply open the project file).

In the upper-left is the familiar arrow button indicating Play. Pick your iOS simulator version, click the button, and there's your demo app.

CONCLUSION

Unlike Windows / Eclipse / Android setup, this one was a snap. Most of my time wasn't spent researching errors, but periodically checking a download bar. The create script ran without issues, and aside from download time I had the demo app running in 10 minutes.

The official version of this setup process is at Phonegap's own website. It's actually really accurate, and I admit that I had little to add except for narrative. (that's a compliment, Phonegap developers)

http://docs.phonegap.com/en/2.2.0/guide_getting-started_ios_index.md.html#Getting%20Started%20with%20iOS

NEXT STEPS

My goal for today (a break, on account of Veterans' Day) is to sit and play with both platforms and get a basic mobile app running. I have only vague ideas as to what it will do, but that's not the important part just yet. :)

Thursday, November 8, 2012

Mobile Development with Phonegap / Cordova

Part 1: Getting set up for Android development on Windows 7

I got a new computer, which freed up my old one for dedicated use as a mobile development platform. My intended approach is to use Phonegap, so I can continue writing in HTML/CSS/JavaScript as I've been doing for 16 years. (then determine whether this method has problems such as slow speed, large app size, etc)

I run Windows and have an Android phone, so this first writeup is about those environments. A future writeup will cover Mac OSX, iOS, and xcode.

THE BASIC IDEA

The basic idea of Phonegap (sorry, I mean Cordova; they changed their name in 2012), is that you create a project folder which contains your HTML, CSS, and JavaScript assets, as well as the Phonegap library. Phonegap effectively acts as a web browser (but it's not), executing the JavaScript and rendering the HTML/CSS, and it brings some non-standard JavaScript features such as cross-domain loading, access to local files, and APIs to the phone's hardware.

Long story short: You start with a project folder, punch in HTML/CSS/JS, and export the whole shebang as an APK.

There are five basic parts to getting started with Phonegap:
* Java Development Kit (JDK), since Phonegap is Java
* Android SDK, since you're writing for Android
* Apache Ant, a build tool for Java apps
* The Cordova library itself, of course
* Eclipse SDK, the all-purpose editor to tie it all together

INSTALLING PREREQUISITES

Do a Google search for "jdk download", find Oracle, download the JDK 7 package, run the installer and click through the menus. Make a note of the directory (something like C:\Program Files\Java\jdk1.7) because you will need it later.

Do a Google search for "android sdk download", download the SDK installer, run it. I prefer to have it directly in my folder C:\Users\Gregor\AndroidSDK so it's easier to find and remember. Note that the SDK will later want to download files and create subfolders, so it should be in your documents and not someplace system-wide such as C:\Program Files. Wherever you install it, keep a note of the folder location because you will need it later.

Do a Google Search for "apache ant binary" and download a binary package. There is no installer, simply unzip it someplace. I put it under C:\Program Files\Apache Ant but had to provide admin privilege. Make a note of this directory, because you will need it later.

All three of these packages will involve some command-line usage later, so we have to set some environment variables.

Go into your Start menu, type "environment" and select "Edit environment variables for your account"
You'll get a panel listing a few variables such as TEMP, and some self-explanatory buttons such as New and Edit.
Set JAVA_HOME to the installation directory for the JDK, e.g. C:\Program Files\Java\jdk1.7
Set the PATH to include the JDK's bin directory, e.g. C:\Program Files\Java\jdk1.7\bin
Set the PATH to include the Android SDK's tools subdirectory, e.g. C:\Users\Gregor\AndroidSDK\tools
Set the PATH to include Apache Ant's bin subdirectory, e.g. C:\Program Files\Apache Ant\bin

Note that the syntax of a PATH is to join together multiple directory names with a semicolon ; character, like this:
C:\Program Files\Java\jdk1.7\bin;C:\Users\Gregor\AndroidSDK\tools;C:\Program Files\Apache Ant\bin
If you already had a PATH set, simply add a ; to the end of it, then add these three new ones.

Now test it by opening a command prompt (if you had one open already, close it and open a new one). Run the following commands and check the output:

# This should display the path of the JDK
echo %JAVA_HOME%

# This should run the Java compiler, which will show some usage help
"%JAVA_HOME%"\bin\javac

# This should run Ant, which will complain that build.xml doesn't exist
ant

# This should launch the Android SDK manager. just close it
android.bat

SETTING UP ECLIPSE & ANDROID

We're not finished yet!

Now Google and download Eclipse SDK (also called Eclipse Classic) and unpack it someplace. There is no installer, just unpack it someplace convenient. I like to put it in C:\Program Files\Eclipse and then drag it into my Start menu.

Now start Eclipse and install the Android Development Tools (ADT) plugin, which allows Eclipse to access the Android SDK.

Start Eclipse
Go into Help / Add New Software
Add a new repository. name it "ADT Plugin" and use the URL https://dl-ssl.google.com/android/eclipse/
It will show you a list of default packages which it will download. Click OK and let it download for a little while...
When Eclipse restarts, you should be prompted by ADT asking where to find the Android SDK. Tell it to Use Existing SDKs, and then show it where you installed Android SDK.

I prefer to test my future apps on an Android emulator, so I can test multiple OS versions and not risk my phone while I tinker. As such, I will take an extra step here to install several more packages:

Open Windows / Android SDK Manager, Look over the list of packages. For various versions of Android OS, the SDK platform and also the Intel x86 Atom System Image. You will need these for the versions of Android OS you want to support.
Install the Extras / Intel x86 Emulator Accelerator (HAXM). This enables the Android Virtual Device (AVD) to run much faster than it would without.
Click OK, Accept, and wait a while...

If you installed the HAXM, you will need to run an installer too; the SDK Manager downloaded but didn't really install it. Go into your Android SDK folder (C:\Users\Gregor\AndroidSDK\tools), go into Extras / Intel and look for the IntelHaxm.exe setup file. It's a standard wizard: double click, Next, Finish.

Now you're finally done setting up! It's time to make a project!

FINALLY, START A PROJECT!

Download the Phonegap / Cordova package from http://phonegap.com/download It is not an installer, just a ZIP file. Unpack it someplace convenient like C:\Program Files\Phonegap\Cordova 2.2.0

The Cordova distribution supports several different target platforms, all under the lib folder. Naturally, we want to use "lib/android"

To start a new project, use the command-line "create" script. This will create a folder containing various bootstrap elements for your new app: a copy of the Phonegap library, a folder for WWW assets with JS, CSS, and HTML subfolders, and so on.

Open a command prompt
Type: C:\Program Files\Phonegap\Cordova 2.2.0\lib\Android\bin\create FOLDER_PATH PACKAGENAME PROJECTNAME
Exit the command prompt

FOLDER_PATH - This is the path to the intended folder for your new project, e.g. C:\Users\Gregor\Mobile\Demo Project
PACKAGENAME - A dotted-format app name combined with domain name, e.g. org.greeninfo.demoproject1
PROJECTNAME - The name of your app as you want it to appear, e.g. "Demonstration Project"

Example (don't forget those quotes!):

C:\Program Files\Phonegap\Cordova 2.2.0\lib\Android\bin\create C:\Users\Gregor\Mobile\"Demo Project" org.greeninfo.demoproject1 "Demonstration Project"

Start up Eclipse. Go into File / New / Project, and a wizard will allow you to pick from a set of project templates. Select Android Project From Existing Code, then lead it to the project folder you just created.

You now have an Android project underway! The left-hand panel shows the files in your project (Package Explorer). The right-hand panel will show your program code.

TEST IT RIGHT AWAY

If you're like me, you'll immediately want to make test their own demo. (don't you hate it when someone's own Hello World doesn't work?)

In the Package Explorer (the left-hand side), scroll to the top and right-click on the root object, which is your project. Select Run As / Android Application.

There are three possible things that may happen here:

If your phone is attached and USB Debugging is enabled, or if you have previously created an Android Virtual Device (AVD) , you will be presented with a list of target devices. Pick the one you want to use.
If there is only one Android device (phone or AVD), it will be chosen automatically without prompting you.
If there are no Android devices available (no phones, no AVDs) then you will need to create one or attach one. This is a few New buttons, and selecting your choice of Android OS version (aka SDK level) and amount of virtual storage. It then lets you choose which one to use, or uses the only one.

Now the Android emulator will fire up. Like your real phone, it will take a brief slice of eternity. Eventually you'll unlock the "phone" and browse through the apps. Hey, there it is! Tap it and it works!

Note: I did get an error INSTALL_FAILED_OLDER_SDK and needed to make an adjustment to the Manifest file. See below.

SUCCESS! AND NEXT STEPS...

So far, so good. It was a long and tortuous road, but our environment is set up. For future projects, we just need to run "create" to bootstrap a folder, then open up Eclipse and Run As to test it.

The immediate next step will be development of the app, and this largely means editing HTML, CSS, and JavaScript in the assets folder. Eclipse has the build tools, but for the editing I may fall back to my trusted Komodo Edit.

A followup article will cover the development of my own Hello World app, packaging and permissions via the Manifest file, and actual deployment of a distributable APK file.

FOLLOWUP NOTES: ANDROID OS VERSION, aka SDK LEVEL

A note on terminology: Each version of the Android OS is also referred to as a SDK Level. Android 4.1 Jelly Bean is SDK Level 16. 4.0 Ice Cream Sandwich is SDK Level 15.

The Phonegap starter app probably didn't run when the emulator came up, giving an error of This means that your Manifest file (metadata and configuration about the app, device suport, etc) indicates that the app is for a different version of Android OS.

Check the AndroidManifest.xml file, down at the bottom look for android:minSdkVersion and android:targetSdkVersion In my case, the targetSdkVersion was __VERSION__ instead of a proper number. Setting this to a real number (16 for Jelly Bean) fixed it.

FOLLOWUP NOTES: TIPS AND TROUBLESHOOTING

A note about the Android SDK installation directory. I lied: I really put mine into C:\Program Files\Android SDK. This did mean some additional fussing, though: I had to provide admin privilege to install it, then I had to set the ownership of C:\Program Files\Android SDK to myself, and to grant Full Control to myself. If you have multiple accounts on your computer, this solution may not work. Keep in mind that the SDK will try to download and create files and folders here; if it fails when trying to install packages, check the ownership and permissions.

I chose to install Apache Ant and Cordova to C:\Program Files since it's a convenient place to keep them from cluttering up my documents or my desktop. I did need to provide admin privilege to put the folder there, but it works just fine after that because (unlike Android SDK) these won't make changes later and require special permission.

When running Cordova's "create", it has one very general error message: "missing one of the following". This means that one of those JAVA_HOME or PATH entries is wrong back when you set up JDK, Ant, and Android SDK. Run through those steps again, including the tests.

The Android emulator should be slow but reasonably fast, even on my two-year-old laptop. If it takes more than a few seconds to load, or is really slow, make sure that you have really installed HAXM. Remember, you must download the package via the SDK Manager and then run an installer wizard.

HAXM won't work with some older CPUs, particularly laptop CPUs before 2010. If your CPU is missing the virtualization features, either buy a newer computer or just accept that the emulator will be slow.

If you want to create more Android Virtual Devices (AVDs), e.g. for a new version of Android OS, open Eclipse, go into Window, and select the AVD Manager.

Thursday, October 4, 2012

MAKING VERY LARGE REQUESTS TO WMS SERVERS

For purposes of printing, we need to fetch a giant image from a WMS. As in, 9 megapixels. Unfortunately, the WMS server isn't configured to allow requests of that size, and since we don't have an "in" with them we're not able to have them correct this.

http://gis4.oit.ohio.gov/arcgis/services/OSIP/MapServer/WMSServer?SERVICE=WMS&LAYERS=0&CRS=EPSG%3A4326&FORMAT=image%2Fpng&TRANSPARENT=FALSE&BGCOLOR=0xFFFFFF&REQUEST=GetMap&BBOX=41.383726033437%2C-81.70929151108008%2C41.39716625056923%2C-81.694329632648&STYLES=&VERSION=1.3.0&WIDTH=3236&HEIGHT=2906

Using tiles isn't an option, for reasons I won't get into here. Point is, we need a giant image from a WMS service that just won't give out images that large.

HORIZONTAL COMPOSITION PHP PROXY

My solution: I wrote a proxy server in PHP, which takes our large request, splits it into 4 logical requests, fetches the 4 images, and then composites them and returns that finished image. Ta-da, an interface that accepts any arbitrary parameters which the source WMS supports, but which also provides for very-large requests.

function ohioimagery() {
    $BASE_URL = "http://gis4.oit.ohio.gov/arcgis/services/OSIP/MapServer/WMSServer";

    // parse the bbox into 4 numbers: NOTE THIS IS NOT THE USUAL SEQUENCE
    // this is S W N E, and I have no idea how they do that. Maybe that's WMS 1.3.0 ?
    list($miny,$minx,$maxy,$maxx) = explode(",",$_GET['BBOX']);
    //list($minx,$miny,$maxx,$maxy) = explode(",",$_GET['BBOX']); // the normal way

    // calculate the W/S/E/N for the 4 quadrants, e.g. a logical "nw" box bounded by $nw_w $nw_s $nw_e $nw_n
    $halfx = ($maxx + $minx) / 2.0;
    $halfy = ($maxy + $miny) / 2.0;
    $nw_w = $minx; $nw_s = $halfy; $nw_e = $halfx; $nw_n = $maxy;
    $ne_w = $halfx; $ne_s = $halfy; $ne_e = $maxx; $ne_n = $maxy;
    $sw_w = $minx; $sw_s = $miny; $sw_e = $halfx; $sw_n = $halfy;
    $se_w = $halfx; $se_s = $miny; $se_e = $maxx; $se_n = $halfy;

    // split the width and height in half, but use subtraction so we don't fall victim to rounding
    $nw_width = $sw_width = round($_GET['WIDTH'] / 2.0);
    $ne_width = $se_width = $_GET['WIDTH'] - $nw_width;
    $nw_height = $ne_height = round($_GET['HEIGHT'] / 2.0);
    $sw_height = $se_height = $_GET['HEIGHT'] - $nw_height;

    // now generate a new set of WMS params and then URLs for each of the 4 boxes
    $nw = $_GET; $nw['WIDTH'] = $nw_width; $nw['HEIGHT'] = $nw_height; $nw['BBOX'] = "$nw_s,$nw_w,$nw_n,$nw_e";
    $ne = $_GET; $ne['WIDTH'] = $ne_width; $ne['HEIGHT'] = $ne_height; $ne['BBOX'] = "$ne_s,$ne_w,$ne_n,$ne_e";
    $sw = $_GET; $sw['WIDTH'] = $sw_width; $sw['HEIGHT'] = $sw_height; $sw['BBOX'] = "$sw_s,$sw_w,$sw_n,$sw_e";
    $se = $_GET; $se['WIDTH'] = $se_width; $se['HEIGHT'] = $se_height; $se['BBOX'] = "$se_s,$se_w,$se_n,$se_e";
    $nw_url = $BASE_URL . '?' . http_build_query($nw);
    $ne_url = $BASE_URL . '?' . http_build_query($ne);
    $sw_url = $BASE_URL . '?' . http_build_query($sw);
    $se_url = $BASE_URL . '?' . http_build_query($se);
    /*
    header("Content-type: text/plain");
    print "ORIGINAL\n"; print_r($_GET); print "\n\n";
    print "NW\n"; print_r($nw); print "\n\n";
    print "NE\n"; print_r($ne); print "\n\n";
    print "SW\n"; print_r($sw); print "\n\n";
    print "SE\n"; print_r($se); print "\n\n";
    print "NW\n$nw_url\n";
    print "NE\n$ne_url\n";
    print "SW\n$sw_url\n";
    print "SE\n$se_url\n";
    */

    // fetch the four images and composite them into a single canvas
    $canvas = imagecreatetruecolor($_GET['WIDTH'],$_GET['HEIGHT']);
    $nw_img = imagecreatefrompng($nw_url);
    $ne_img = imagecreatefrompng($ne_url);
    $sw_img = imagecreatefrompng($sw_url);
    $se_img = imagecreatefrompng($se_url);
    imagecopy($canvas, $nw_img, $sw_width-1, $ne_height-1, 0, 0, imagesx($nw_img), imagesy($nw_img) );
    imagecopy($canvas, $ne_img, $nw_width-1, 0, 0, 0, imagesx($ne_img), imagesy($ne_img) );
    imagecopy($canvas, $sw_img, 0, $nw_height-1, 0, 0, imagesx($sw_img), imagesy($sw_img) );
    imagecopy($canvas, $se_img, 0, 0, 0, 0, imagesx($se_img), imagesy($se_img) );

    // spit it back out
    header("Content-type: image/png");
    imagepng($canvas, null, 0);
}

The result, is that I can now make exactly the same WMS request, along with whatever changes in layers, styles, and SRS, and get back oversized images.

http://myserver/proxy.php?SERVICE=WMS&LAYERS=0&CRS=EPSG:4326&FORMAT=image%2Fpng&TRANSPARENT=FALSE&HEIGHT=2906&BGCOLOR=0xFFFFFF&REQUEST=GetMap&BBOX=41.370285816305135,-81.70929151108008,41.39716625056923,-81.67936775421546&WIDTH=3235&STYLES=&VERSION=1.3.0

WATCH THAT BBOX

For reasons I do not know, the remote WMS takes their BBOX parameter in an unusual order. Instead of the usual W,S,E,N they do S,W,N,E The code above has been modified to fit their bizarre case, and I have noted it.

Saturday, September 29, 2012

WHY WON'T IT FIND A ROUTE? BOUNDING BOX AND TOLERANCE

Today's post is fairly brief, but is the culmination of eight hours of tediously trying routes and adjustments. If it saves you one hour, won't you consider it worthwhile? :)

So, we have a routing network of trails for a Parks N Rec district. Directions via dijkstra_sp_delta() are working A-OK and have been for some months. But, the client wrote back with some news: he's having trouble finding a longer path across this one park.

I tried it too: about half of the routes would work, and half would not. I could route the 15 miles to a location, but move the marker a few hundred feet in any direction, and it may or may not be able to route there, even though my eyes can see the trail segments laid out.

It's consistent too: the same point would always work or always fail. And I followed it back into the SQL and verified that the given pair of nodes always would or would not have a route. So it's not some sysadmin issue such as out of memory or core dumps.

TOLERANCE

The directions handed back to the browser, include WKT of the segments so they can be plotted onto the map. I noticed that the route as given would skip from one trail to an adjacent one: walking a straight line, it would suddenly be 20 feet to the left on an adjacent trail, and sometimes back to the original trail. The visual effect was rather goofy and unattractive.

And, I suspect that this was routing us onto adjacent trails which would later diverge, and coming up with false negatives.

The cause here, is a too-high tolerance (25 feet) when generating the routing network via assign_vertex_id() The data had some known "dangles" due to human error in digitizing the trails, and a higher tolerance fixed that. We reanalyzed with a tolerance of 10 feet, and this stopped happening.

This solved an annoyance, but didn't get us any closer to finding the routes. Turns out that wasn't the problem at all...

BOUNDING BOX

Ultimately I discovered it:the directions stuff uses dijkstra_sp_delta() with a tolerance of 2640 feet. That is to say, the calculation excludes any segments that are not within a 2640-foot bounding box of the start and end vertices. That excludes the 900 square miles that can't possibly be relevant to this route, and gives us plenty of wiggle room to find routes.

Wrong.

In the case of this trails data, 5280 was the magic number. The trails meander enough, that a good path really can deviate by 1 mile away from the straight-line between the two points. And the performance loss isn't that bad, certainly not as bad as complete non-performance!

Anyway, no huge breakthrough here, but a word for someone else who just isn't finding the path: check your delta bounding box, and see if you're being too strict.

Tuesday, September 25, 2012

My goodness, it's been some 3 weeks since my last post. Been too busy DOING instead of WRITING. :)

SOME STEPS IN OPTIMIZING A MAP'S LOADING SPEED

A few days ago, a client asked me to look over a map app and see what I could do to speed it up. The app is ExtJS, PHP, PostGIS, and MapServer. I made some substantial improvements, and here's what we did and how well it worked.

Nothing I'll mention today is particularly cutting edge, and some of them date back to 2007 and MapServer 5.0 But, obviously here in 2011 and 2012, some folks didn't know about it, so here goes.

THE EXTJS LIBRARY / APACHE COMPRESSION WINS THE DAY

The first roadblock of the app loading and starting, was the download of ext-all.js, the core of the ExtJS JavaScript framework. They were loading it from Google's CDN, and it's 700 KB. On my 1 Mbps household DSL, with other PNGs and CSSs downloading too, this was taking about 10 seconds. The theory of using a CDN is that Google's network is faster than their server's pipe, but fact is that for DSL users it's still 10 seconds.

I instead had it load a copy of ExtJS from the web server, and then enabled compression (see here). This is an old trick from my sysadmin days, and BOY DID IT WORK here. The download was now 135 KB (use Firebug, watch the Content-length header) and took about 2 seconds.

Ta-da! Eight seconds shaved off the page startup, by not blindly assuming that Google's CDN is the panacea.

MULTIPLE HOSTNAMES

An old trick, but one that was not being implemented here. The second argument to the OpenLayers.Layer.WMS constructor is the URL of the WMS, or an array of URLs, in which case it will round-robin requests to these URLs. See here.

In the case of a tile-cached layer, this is a great boon. The browser has built-in limits about how many requests it will make to a webserver, and when your tiles are served up quickly enough, more time is spent on the browser's politeness, than is spent fetching and transferring.

So I swapped this:

var countylines = new OpenLayers.Layer.WMS("Counties", "/tilecache.cgi", { blah blah });

For this:

var TILECACHE = [
    "http://maps1.example.com//tilecache.cgi",
    "http://maps2.example.com//tilecache.cgi",
    "http://maps3.example.com//tilecache.cgi",
    "http://maps4.example.com//tilecache.cgi",
    "http://maps5.example.com//tilecache.cgi",
    "http://maps6.example.com//tilecache.cgi"
];
var countylines = new OpenLayers.Layer.WMS("Counties", TILECACHE, { blah blah });

Now the web server spends as much time download as it does blocking (Firebug is great), and the difference is noticeable.

You don't want to go too far with this trick. If the layer requires calculation, or is slow at all, or isn't tile cached, this may not be a good move. If your layer is slow, hitting it 6 times simultaneously won't be any faster. Even if the layer is plenty fast, once your browser is opening up 24 requests at a time to your web server instead of 4, your webserver may start blocking clients because of the flooding. Watch your MaxClients setting and your Apache error log, and consider what will happen if this app gets a sudden surge in popularity.

CENTROIDS FOR LABELS

The map features a grid of sampling quads, and each quad has an icon in the center indicating a status. The quad rectangles and the icons in the center, are separate layers. The icons use a URL param for filtering, so one can specify a parameter and show the relevant quad icons (in this case, the ID# of an invasive species, e.g. species_id=123). This layer is not served via TileCache, due to the dynamic nature of the queries, so its performance is naturally slow. But of course, we want it faster.

To place these icons, the DATA statement was as follows:

DATA "the_centroid from (select ST_centroid(merc_geom) AS the_centroid,* from quads where
%SPECIESFILTER%) as temp using unique quad_id using SRID=900913"

A simple optimization here was to precalculate the centroid, as it's not going to change. Second, I rewrote the mapfile syntax for readability. Third, the species_id which would be used in the %SPECIESFILTER% wasn't even indexed. I suspect that the spatial index would supersede the species_id index, but this is also used in PHP-based reports so I know that indexing it can't hurt.

# some SQL to precalculate and index
SELECT ADDGEOMETRYCOLUMN('','quads','merc_center',900913,'POINT',2);
CREATE INDEX quads_merc_center ON quads USING GIST (merc_center);
UPDATE quads SET merc_center=ST_CENTROID(merc_geom);
CREATE INDEX quads_species_id ON quads (species_id);

# revised mapfile
DATA "merc_center from quads using unique quad_id using SRID=900913"
FILTER "%SPECIESFILTER%"

The net result is that the queries happen in 20 milliseconds instead of 250 milliseconds. Not bad, shaving about one-quarter second off the processing time for each tile.

MAPSERVER IMAGE QUALITY: AGG/PNG vs GD/GIF

One of the avenues I investigated, was the image tiles they're downloading. The app has multiple hostnames configured,, so the browser is downloading some 8 tiles at a time. And these are pre-rendered so they should be super quick.

But they're each 50 KB to 100 KB in size. Which means that a full page load of 20 tiles (4 wide, 5 high) is about 1.5 MB. The download itself takes 15 seconds, not counting wait times and server times. Again, this is where performance testing benefits from my having crummy household DSL.

The tiles use MapServer's wonderful AGG library (the same one that powers Mapnik) and output as gorgeous PNG-24 with porcelain-smooth line edges, which is good when we're drawing rivers and county lines. But man, 100 KB average tile size?

I switched over to GD/GIF This is the older renderer, and the line quality is poor. But the rendering time is about 1/3 that of AGG, and file size is magnificent: 5 KB - 10 KB now. I presented it to the client, and they loved it: the grainy lines were a small price to pay for having a map view load up in 3-8 seconds instead of 20-60.

WHAT'S WITH THOSE BLANK LAYERS?

And lastly, I noticed that one of the layers was always blank. The roads layer was turned on at all zoom levels, but the MapServer mapfile stated that roads don't come on until we're at 1:500,000 scale (MAXSCALEDENOM). That is to say, the browser was downloading 20 tiles with each page view, when we knew for a fact that those tiles are 100% transparent. Even without the transfer time, the browser was blocking while it loaded these tiles.

The solution here was simple: specify a maxResolution or maxScale setting for the OpenLayers.Layer.WMS constructor, so the layer doesn't even try to load tiles until we get into the usable zoom levels.

TILECACHE IN MOD_PYTHON (no)

I also investigated using TileCache via mod_python. The theory here is that the Python interpreter and various modules are cached into RAM, giving a performance boost over CGI.

In our tests, we didn't go with this. The difference in start-to-finish request time, as measured via Firebug, was negligible if it was real at all. But the memory consumption is remarkable: Apache bloated up to 1 GB size within a few hours of testing.

This particular client is hosted in a VPS, and they only have 2 GB of RAM and no swap. So having 1 GB locked in memory was not welcome, especially without corresponding speed gains.

THANKS FOR READING

Well, that's it. Nothing new or awesome, but a few tricks which weren't known to these developers, and therefore which may be new and useful to you. Happy mapping!

Friday, August 31, 2012

ESTIMATING HIKING TIME ALONG A SEGMENT - TOBLER

The simplest way to calculate travel time over your route segments, is to make an assumption about walking speed, e.g. a constant 3 miles per hour. However, this client wanted to account for slope a la Tobler's hiking function, since this is substantially hilly terrain and a constant walking speed is not necessarily realistic.

This process requires 3D data, containing elevation. If your data is only 2D, get a DEM and load up ArcGIS and produce 3D data, aka LINESTRING Z. Don't worry, I'll wait...

And now it's fairly easy:

-- Supplement each trail segment with a length. The example here transforms the geometry to
-- a State Plane SRS with feet as units.
ALTER TABLE trails ADD COLUMN length_feet float;
UPDATE trails SET length_feet = ST_LENGTH(ST_TRANSFORM(the_geom,3734));

-- Now the Tobler calculation. We fetch the elevation of the first and last
-- vertices of the linestring, divide by length, and that's the slope.
-- Remember, we split our segments into 100-foot lengths so this is mostly accurate.
-- If you keep yours as mile-long trails your accuracy may vary.
-- The 0.911344 converts from Tobler's own units (kilometers per hour)
-- to our preferred units (feet per second, to match the length in feet)
ALTER TABLE routing_trails ADD COLUMN tobler_fps FLOAT;
UPDATE routing_trails SET tobler_fps =
0.911344 * 6 * EXP(
-3.5 * ABS(
    (
      ST_Z(ST_StartPoint(ST_TRANSFORM(the_geom,3734)))
      -
      ST_Z(ST_EndPoint(ST_TRANSFORM(the_geom,3734)))
    )/length )
);

-- Now that each segment has its speed in feet per second and its length in feet,
-- the duration in seconds is simple. In our case, we calculate for three different
-- travel modes and we assume that bicyclists are being safe and only doing 3X as fast as walking.
UPDATE routing_trails SET duration_hike = length_feet / tobler_fps;
UPDATE routing_trails SET duration_bike = duration_hike * 0.33;
UPDATE routing_trails SET duration_horse = duration_hike * 0.66;

Discussion

I don't really know how accurate Tobler's function is, nor the degree to which it makes a difference in the accuracy of route duration estimates. Still, our original approach is to presume 3 miles per hour (4.4 feet per second) on foot, 3X that for bicycles, and 1.5X that for equestrian. Taking the slope into account may miss a lot of factors, but it can't be any worse!

Monday, August 27, 2012

TURNING DIRECTIONS

For every segment, turn, turn, turn, turn, ...

So, it's been a few weeks of mildly complicated stuff but not a lot that really seemed blog worthy. But, these last three days have been spent on TURNING DIRECTIONS for the trails routing.

It wasn't good enough that our directions read: Start on Blackberry Loop Trail, 0.25 miles, Red Wing Hiking Path, 1.84 miles, Soda Springs Maintenance Road, 7.25 miles. Turning directions are really nice: TURN RIGHT onto Red Wing, SLIGHT LEFT onto Soda Springs.

Now, pgRouting's dijkstra_sp() et al return a bunch of line segments and the segments' GIDs. And you can do a simple join to fetch the segments' other info (its name, whether it allows bicycles) in one go. But, turning directions are something else entirely, and must be computed based on the relative azimuths of the two segments. (the WHAT? tell you later)

The basic query to fetch segments

SELECT
ST_AsText(ST_Transform(route.the_geom,4326)) AS wkt,
ST_Length(route.the_geom) AS length,
hiking_trails.name, hiking_trails.duration
FROM
hiking_trails,
(SELECT gid, the_geom FROM astar_sp_delta_directed('hiking_trails',?,?,2640,true,true)) AS route
WHERE
hiking_trails.gid=route.gid

This fetches the route, and does an implicit join to get the segments' name. Note that we also fetched the Well Known Text (WKT) for each segment -- that comes in handy later when we have to do geometry stuff in PHP.

Iterate over the segments and build directions

The strategy here is that we iterate over the segments in PHP, and keep on tallying up the distance and the time. When we detect that the new segment is no longer on the same road as the previous segment, we call this a transition and we add a new step to the directions. As part of this step, we calculate text directions including whether you turn right or left.

Obvious question: Why didn't we aggregate the trail segments by name within the database? Answer: the trail names are really a mess, and it's really only feasible using more complex expressions. In the code below, the Trailpiece::trailContainsSameName() function encapsulates this -- it returns true to indicate that the current trail segment should be considered a continuation of the previous step (e.g. we are still on Blackberry Loop, even though the text name may imply otherwise) and a false indicates that this segment is not a continuation but should be considered a new trail and a new step.

The turning directions are the largest part of the code here. The strategy is as follows: if we are at a transition, where this segment is a new step from the previous segment, calculate the azimuth of both this segment and of the prior segment. (azimuth: the compass heading, e.g. 45 is northeast, 210 is south-southwest) The difference between these azimuths is the direction you would turn: if the previous segment was on a heading 90 (east) and the current segment is on a heading 180 (south) then it's a +90 difference or a right turn. Easy, huh?

    $steps      = array();
    $stepnumber = 0;
    $current_step_distance = 0;
    $current_step_duration = 0;
    $current_step_name     = $segments[0]->name;
    for ($i=0; $i<$howmanysegments; $i++) {
        $segment       = $segments[$i];
        $is_transition = ! Trailpiece::trailContainsSameName($current_step_name,$segment->name);
        $is_end        = $i == sizeof($segments)-1;

        // if this is a transition from some other step, zero out the counters for this step and append the step
        //but half of the code is just to determine Right or Left
        if ($is_transition) {
            // phase 1: fetch the first and last vertices for the previous and current segment, using regular expressions

            $thisvert = preg_split('/,\s*/', preg_replace( array('/^[\w\s]+$+/', '/[$]+$/') , array('',''), $segment->wkt ) );
            $thisvert1 = explode(' ', $thisvert[0]);
            $thisvert2 = explode(' ', $thisvert[ sizeof($thisvert)-1]);
            $prevvert = preg_split('/,\s*/', preg_replace( array('/^[\w\s]+$+/', '/[$]+$/') , array('',''), $segments[$i-1]->wkt ) );
            $prevvert1 = explode(' ', $prevvert[0] );
            $prevvert2 = explode(' ', $prevvert[ sizeof($prevvert)-1 ] );
            $thislon1 = $thisvert1[0]; $thislat1 = $thisvert1[1];
            $thislon2 = $thisvert2[0]; $thislat2 = $thisvert2[1];
            $prevlon1 = $prevvert1[0]; $prevlat1 = $prevvert1[1];
            $prevlon2 = $prevvert2[0]; $prevlat2 = $prevvert2[1];

            // phase 2: either/both of the line segments may need to be flipped, depending on the distance, since the endpoints may be the two touching ends, the two far ends, or any combination
            // the vertices as listed above, may give the azimuth from the segment's end to its start, backwards!
            // strategy: find which combination of endpoints is closest together, and that would be the two touching endpoints
            // remember, "1" indicates the start of a segment and "2" indicates the end of a segment, so $dx12 means the distance from previous seg start to current seg end
            // two segments should meet with previous2 touching current1 ($dx21 is smallest), for the previous to END where the current one STARTS
            // if this is not the case, then one or both of the segments needs to have its vertices swapped
            $dx11 = ($thislon1 - $prevlon1) * ($thislon1 - $prevlon1) + ($thislat1 - $prevlat1) * ($thislat1 - $prevlat1); // distance (squared) between $thisvert1 and $prevvert1
            $dx22 = ($thislon2 - $prevlon2) * ($thislon2 - $prevlon2) + ($thislat2 - $prevlat2) * ($thislat2 - $prevlat2); // distance (squared) between $thisvert2 and $prevvert2
            $dx12 = ($thislon1 - $prevlon2) * ($thislon1 - $prevlon2) + ($thislat1 - $prevlat2) * ($thislat1 - $prevlat2); // distance (squared) between $thisvert1 and $prevvert2
            $dx21 = ($thislon2 - $prevlon1) * ($thislon2 - $prevlon1) + ($thislat2 - $prevlat1) * ($thislat2 - $prevlat1); // distance (squared) between $thisvert2 and $prevvert1
            $whichdx = min(array($dx11,$dx22,$dx12,$dx21));
            switch ($whichdx) {
                case $dx11:
                    // previous segment's start meets current segment start; flip the previous segment
                    list($prevvert1,$prevvert2) = array($prevvert2,$prevvert1);
                    $prevlon1 = $prevvert1[0]; $prevlat1 = $prevvert1[1];
                    $prevlon2 = $prevvert2[0]; $prevlat2 = $prevvert2[1];
                    break;
                case $dx12:
                    // segments are end-to-end and both need to be flipped
                    list($thisvert1,$thisvert2) = array($thisvert2,$thisvert1);
                    $thislon1 = $thisvert1[0]; $thislat1 = $thisvert1[1];
                    $thislon2 = $thisvert2[0]; $thislat2 = $thisvert2[1];
                    list($prevvert1,$prevvert2) = array($prevvert2,$prevvert1);
                    $prevlon1 = $prevvert1[0]; $prevlat1 = $prevvert1[1];
                    $prevlon2 = $prevvert2[0]; $prevlat2 = $prevvert2[1];
                    break;
                case $dx22:
                    // current segment end meets previous segment end, flip the current segment
                    list($thisvert1,$thisvert2) = array($thisvert2,$thisvert1);
                    $thislon1 = $thisvert1[0]; $thislat1 = $thisvert1[1];
                    $thislon2 = $thisvert2[0]; $thislat2 = $thisvert2[1];
                    break;
                case $dx21:
                    // current start is previous end, already fine
                    break;
            }

            // phase 3: find the azimuth of each, and thus the angle between them
            $thisaz = (180 + rad2deg(atan2(sin(deg2rad($thislon2) - deg2rad($thislon1)) * cos(deg2rad($thislat2)), cos(deg2rad($thislat1)) * sin(deg2rad($thislat2)) - sin(deg2rad($thislat1)) * cos(deg2rad($thislat2)) * cos(deg2rad($thislon2) - deg2rad($thislon1)))) ) % 360;
            $prevaz = (180 + rad2deg(atan2(sin(deg2rad($prevlon2) - deg2rad($prevlon1)) * cos(deg2rad($prevlat2)), cos(deg2rad($prevlat1)) * sin(deg2rad($prevlat2)) - sin(deg2rad($prevlat1)) * cos(deg2rad($prevlat2)) * cos(deg2rad($prevlon2) - deg2rad($prevlon1)))) ) % 360;
            $angle = round($thisaz - $prevaz);
            if ($angle > 180) $angle = $angle - 360;
            if ($angle < -180) $angle = $angle + 360;
            //printf("%s x %s = %d x %d = %d<br/>\n", $current_step_name, $segment->name, $prevaz, $thisaz, $angle );

            // phase 4: assign a direction word based on that angle
            $turnword = "Turn onto";
            if      ($angle >= -30 and $angle <= 30)   $turnword = "Continue on";
            else if ($angle >= 31 and $angle <= 60)   $turnword = "Take a slight right onto";
            else if ($angle >= 61 and $angle <= 100) $turnword = "Take a right onto";
            else if ($angle >= 101)                    $turnword = "Take a sharp right onto";
            else if ($angle <= -30 and $angle >= -60) $turnword = "Take a slight left onto";
            else if ($angle <= -61 and $angle >= -100) $turnword = "Take a left onto";
            else if ($angle <= -101)                   $turnword = "Take a sharp left onto";

            // add the step to the list
            $step = array(
                'stepnumber' => ++$stepnumber,
                'turnword' => $turnword, 'text' => $segment->name,
                'distance' => $current_step_distance, 'duration' => $current_step_duration
            );
            $steps[] = $step;

            // reset the counters for this next step
            $current_step_distance = 0;
            $current_step_duration = 0;
            $current_step_name     = $segment->name;
        }

        // increment the length & duration of the current step, even if that step was just now reset because of a transition
        $current_step_distance += $segment->length;
        $current_step_duration += $segment->seconds;

        // and lastly, if this is the end segment, add the Arrival step so we can indicate the length of travel on this last step
        if ($is_end) {
            $step = array(
                'stepnumber' => ++$stepnumber,
                'turnword' => "Arrive at", 'text' => "your destination",
                'distance' => $current_step_distance, 'duration' => $current_step_duration
            );
            $steps[] = $step;
        }
    }

    // prepend the Start At step, to indicate the name of the street where we start
    array_unshift($steps, array(
        'stepnumber' => null,
        'turnword' => "Start on", 'text' => $segments[0]->name,
        'distance' => null, 'duration' => null
    ));

And there you have it: iterating over segments and not only deciding when to start a new step, but also figuring out which direction you would be turning.

Discussion: Which vertices to compare

Phase 1 fetches the first and last vertex of the current linestring and of the previous linestring, and uses these to determine the heading of each path. Technically, this should be the last vertex and the next-to-last vertex, like this:

            $thisvert1 = explode(' ', $thisvert[ sizeof($thisvert)-2 ]);
            $thisvert2 = explode(' ', $thisvert[ sizeof($thisvert)-1 ]);
            $prevvert1 = explode(' ', $prevvert[ sizeof($prevvert)-2 ]);
            $prevvert2 = explode(' ', $prevvert[ sizeof($prevvert)-1 ]);

However, in this case the individual vertices are very high resolution, and the average length of such segments is only 1-3 feet. As such, a single-pixel hand-cramp while generating the data could show the trail as facing northeast when in fact only the last 4 feet are northeast and the rest is due east.

Our use of the first and last does give some surprises: a very, very convoluted segment can have its start and end vertices indicate a southwest azimuth when in fact the last 20 feet of trail leads north. But, using the last segment or two gave such wonky results regularly and I eventually resigned myself that "sometimes slightly off" is better than "usually wrong"

Discussion: segment flipping

When I first read other discussion of generating turning directions, I didn't get the part about flipping segments. But, it made sense later: pgRouting does not present the linestrings with their vertices sorted from the starting point to the ending point of the route. At a transition from one linestring to another, there are 4 combinations of vertex layout: the previous segment's first vertex may be the one closer to the starting point, or it may be the one closer to the end point, and the current segment's first vertex may be the one closer to the starting point or it may be the one closer to the end point.

If we do the azimuth calculation without considering this, we could get an azimuth the exact opposite of correct: the user is heading east on a street but we measured the azimuth from end to start, so got an azimuth of 270 instead of 90.

So, having fetched the vertices in Phase 1, Phase 2 checks the segments' alignment using distance. The proper case is when the previous segment's final vertex ($prevvert2) is closest to the current segment's starting vertex ($thisvert1). The code in Phase 2 does distance calculations, and then inverts the vertices appropriately, so we know that the route properly takes us through $prevvert1, $prevvert2, $thisvert1, and $thisvert2 in that order.