Monday, December 31, 2012

CodeIgniter + DataMapper

Improvements on what I posted last time



This posting, expands on my posting last week (DataMapper & CodeIgniter for writing access controls) to provide some enhancements and good practices in security. I left it out of that posting, as it was becoming overly lengthy.


The Login Process


The Site controller's login method, as really implemented, is as follows. It adds one additional line: purging the session token before attempting the login check.

public function login() {
    // clear any existing credentials
    $this->session->unset_userdata('userid');

    // if a user/pass were submitted, attempt login using the model
    $user = User::attemptLogin(@$_POST['username'],@$_POST['password']);
    if ($user) {
        $this->session->set_userdata('userid', $user->id);
        return redirect(site_url('management/'));
    }

    // got here: either they didn't submit, or it didn't work out. either way, bail to a form
    return $this->load->view('site/login.phtml');
}

I consider this a Good Practice during a login phase. By clearing the session token, we have effectively logged them out (recall that the Management controller checks for the session token, then uses it to fetch their User object) and prevented any possible, hypothetical shenanigans that could result from the rest of the login process screwing up.

Consider, if the programmer had left out that line. A failed login would spit out the login.phtml but their old login session would still be intact. The result would be a Log In page, possibly decorated with elements indicating that they are still logged in. It's a small bug, but would look goofy -- and it may result in some information leakage if you're not careful, since you probably weren't expecting that page to be accessed with an existing session. It's possible, too, that one could contrive less-trivial bug scenarios based on your old session information being present during an attempted login.

Point is, I consider it a good thing to empty out the login session before attempting to repopulate it. Whether this is done before the attemptLogin() call, or in an else statement immediately afterward, doesn't matter as much as ensuring that they are either "logged in as intended" or else "logged out" with no wiggle room for an unknown state.


Password Storage


In the previous article, the attemptLogin() method checked for a literal username and password match:

$u = new User();
$u->where('username',$username)->where('password',$password)->get();
return $u->id ? true : false;

In the real world, of course, you would never store raw passwords in the database, right? (hint: the correct answer is "Never! And I can't believe that you hinted at doing so last week!") The proper way to store passwords is with a non-reversible hash, using a salt.

Okay, lemme back up a little and explain.

  • The basic problem of storing passwords, is that someone can do a SELECT and get every password in the system. From there, they don't need additional security holes, hacks, and injected files to wreak havoc: they can login as authorized personnel. Or, your own messup during programming may display the whole list of passwords long enough for someone to snap a photo.
  • What we need is a zero-knowledge proof: The server doesn't have your password, but it does have a way of knowing that whatever you supplied as the password is correct.
  • A non-reversible hash, also called a message digest, is a technique for scrambling text. The two most common "hashing algorithms" are MD5 and SHA-1. A scramble (called a hash) would be unique based on the input: the SHA-1 hash of the word "apple" will always be d0be2dc421be4fcd0172e5afceea3970e2f3d940, but it will take a billion years to discover a second string with the same hash.
  • At its very simplest, the users table's password field could store the MD5 hash, and then compare the MD5 of whatever you typed to the stored MD5. Voila: no stored passwords, but a watertight way to know that you supplied the correct password.
 But, hashing does have a drawback: its own consistency. The word "apple" will always hash to the same thing, and there are growing catalogs of known words and their hashes. This is a real-world problem: paste d0be2dc421be4fcd0172e5afceea3970e2f3d940 into Google and you get http://md5-database.org/sha1/apple

To "salt" a password means that we have an agreed-upon string that we mix into your password, to make the hash even more scrambled. The word "apple" has been cataloged, but the word "H87JKua8apple" has not, and it's gonna be a long time before one of those hash catalogs gets around to that one. Typically, the salt is of a known length and is stored along with the password:

Random 8 character salt: H87JKua8

SHA-1 hash of the word "apple": d0be2dc421be4fcd0172e5afceea3970e2f3d940
Password field: H87JKua8d0be2dc421be4fcd0172e5afceea3970e2f3d940

When encrypting a password, we generate 8 random characters and prepend them to the password, then take the hash of that resulting non-word.

When checking a password, we take the first 8 characters from the password field and prepend them to whatever password the user is trying. That is, they gave us "apple" and we know "H87JKua8" so we hash "H87JKua8apple" and it should match what's in the database.

The result is an extra million years of security on your password hashes. And if you think 8 characters aren't enough of a salt, feel free to crank it up: a 40-character salt with a 40-character hash may seem a bit extreme, but also gives a hacking time longer than the universe is expected to last no matter what mythology you prefer.

Implemented in code, it's not really as complicated as it sounds:

public static function attemptLogin($username,$password) {
    if (! $username) return false;
    if (! $password) return false;

    // fetch the user
    $u = new User();
    $u->where('username',$username)->get();
    if (! $u->password) return false; // user not found (or a blank password), automatic failure

    // check the password field
    // split off the 16-byte salt, prepend it to whatever password is being tried
    // and it should MD5 to the same hash as we have in the database
    // MD5 outputs are 32 characters in length, so 'password' is a varchar(48) field
    $salt  = substr($u->password,0,16);
    $crypt = $salt . md5($salt . $password);
    if ($crypt == $u->password) return true;
    return false;
}

public static function encryptPassword($password) {
    // salt is 16 random characters; there are any number of ways to make random strings
    // the password field is the MD5 hash of the salt + password
    // MD5 outputs are 32 characters in length, so 'password' is a varchar(48) field
    $salt  = substr(md5(mt_rand()),0,16);
    $crypt = $salt . md5($salt . $password);
    return $crypt;
}


Some closing notes here:
  • There are various ways to generate 16 random characters. At this level it doesn't really matter which technique you choose: the randomness of distribution of the choice of salts, is less important than the fact that "abcccbaaccbabcabapple" still isn't in any hashing database.
  • The MD5 hash has some known circumstances in which a collision (that is, a second word with the same hash) can occur. But it's incredibly farfetched with a password field, more for multi-kilobyte files. At this time, PHP doesn't have a sha1() function but does have a md5() function. When PHP does bring up a simple sha1() function,I'd recommend using it over MD5.


Session Cookie Encryption


This isn't program code at all, but is another example of why CodeIgniter rocks.

A known weakness of PHP's sessions, is that they are carried in cookies. If someone intercepts a cookie (say, sniffing your wifi) they can hack their browser's cookie jar to continue using YOUR login session indefinitely. If your program has a logout function, it would invalidate the meaning of the session, but frankly you can't rely on your users to log out reliably.

CodeIgniter encrypts session cookies. This uses a simple single-key encryption, coded into config.php when you first set up CodeIgniter. This means that nobody, including the authorized user of the session, can crack open the cookie and tinker with its contents.

This single fact in itself, is a huge gain to security, and a great reason to use CodeIgniter's session helper instead of PHP's naked $_SESSION system.

Wednesday, December 26, 2012

CodeIgniter + DataMapper = Concise & Tight Access Controls

 
My goodness, it's been five weeks since my last posting. Too busy programming, to stop and write about programming. I'm sure you understand.

The topic today, is about CodeIgniter and DataMapper ORM, and how they can be leveraged to create sophisticated and secure authentication models. This will be a lengthy posting, as it covers a lot of ground. It will also be in multiple parts, as I post additional enhancements such as why you shouldn't store literal passwords as I do in my example below...


Introducing CodeIgniter and DataMapper


CodeIgniter is a MVC framework for PHP.

MVC is Model-View-Controller, an object-oriented concept of separating out complex in-object logic ("for a given Customer, generate a list of past Orders") from the front-end logic ("this is the My Orders page, fetch that list of past Orders from their Customer profile") and from the front-facing HTML presentation ("<ul> (foreach order in orders, print <li>) </ul>"). If you're not programming your non-trivial apps in MVC, you probably should be...

DataMapper ORM is an Object Relational Mapper. That is, it's a library which provides an object-oriented approach to your database tables using models. This bypasses the need for writing out SQL longhand, providing you with techniques for selecting, creating, updating, and deleting, as well as performing more complex relational queries. The resulting code is very readable and extensible, and much less of an eyesore than longhand SQL.

CodeIgniter is one of several popular MVC frameworks (CakePHP, Zend Framework). It is my favorite, due to its simplicity, its fast loading and execution times, and its willingness to let me break the MVC rules when I need to, e.g. to  use PHP calculations in a view template, or to print from within a controller method.

And in both cases, the documentation is quite good, always a bonus...


Basic Example: A Login Function


Here is an excerpt of the Controller and Model logic which implement the login phase.

The /site/login URL (the Controller's name plus the name of the function) accepts POST data 'username' and 'password', uses the User model to check whether the user-and-pass matches, then sends them either to /management (the Management controller) or to /welcome (the Welcome controller) based on the success. As a side effect, it sets session information which is vital later.

The User model houses the static method attemptLogin() which accepts a username & password and returns true/false.
class Site extends CI_Controller {
    public function login() {
        // attempt login. The complexities are hidden within the model, where such logic belongs
        // if login fails, we bail instantly
        $user = User::attemptLogin(@$_POST['username'],@$_POST['password']);
        if (! $user) return redirect(site_url("welcome/"));}

        // guess it was good, save their session token and send them to the control panel
        $this->session->set_userdata('userid', $user->id);
        return redirect(site_url("management/"));
    }
}

class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname'');

    public static function attemptLogin($username,$password) {
        if (! $username) return null;
        if (! $password) return null;

        // fetch the user
        $u = new User();
        $u->where('username',$username)->where('password',$password)->get();
        return $u->id ? true : false;
    }
}

Some key points here:
  • The User model is a DataMapper subclass. Its one static method contains DataMapper logic: creating a new User handle, then querying "WHERE username=XXX and password=YYY" The determination of whether this login was in fact valid, is made by checking that the resulting recordset object has an "id" attribute in the current record. If there is no record with the given username and password, the resulting recordset would be empty and $u->id would be null.
  • Isn't that DataMapper code concise and readable? I am entirely comfortable with writing out SQL longhand, but if I can read out the logic almost like English, all the better!
  • The Site controller method shows how concise you can be with CodeIgniter. In the event of a login failure, the controller method (the "page" so to speak) issues a redirect to some other page (presumably the login or Forgot Your Password? page) and exits immediately.


So, our use case...


The use case here was a somewhat deep set of relations.

Organizations are a top-level entity. Organizations contain multiple Users, and a User belongs to exactly one Organization (one-to-many). Organizations have multiple Projects, and a Project belongs to exactly one Organization (one-to-many).Within an Organization, a User may only see details of a Project if they are on the Project's list (many-to-many), unless they are a Manager in which case they may see any Project within their same Organization. A Project has multiple Costs, multiple Actions, and multiple Comments. Actions have multiple Assessments and multiple Locations. A separate viewing mode on the front-facing site is available for each Project, with 3 access options: anybody in the same Organization as the Project, people in other Organizations, OR any non-logged-in user. The list goes on over an initial 15-page spec doc, a dozen wireframes and a few supplemental spec docs, and 8 hours of meetings, including lookup fields and various filtering needs...

And this is for a bunch organizations who aren't entirely comfortable with sharing their info with the central coordinating body in the first place, and where a data leak could be extremely bad (not "they know I live in Santa Cruz" bad, but "someone edited my report to indicate that I blew $20M on a project that went wrong, and I got fired" bad), so this stuff needs to be 100% secure. (which I always do anyway, of course)
Organization
        x Project
            x User
            x Cost
            x Action
                 x Assessment
                 x Location
            x Comment
                1 User
        x User            x Project

What followed was a set of DataMapper classes, with relations defined in them. Here are a few of the core classes, so you can see how one sets up the relations. It's so easy.
class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname');
    var $has_one = array('organization');
    var $has_many = array('project','comment');
}
class Project extends DataMapper {
    var $table = 'projects';
    var $default_order_by = array('title');
    var $has_one = array('organization');
    var $has_many = array('user','comment');
}
class Organization extends DataMapper {
    var $table = 'orgs';
    var $default_order_by = array('firstname');
    var $has_one = array();
    var $has_many = array('user','project');
}
class Comment extends DataMapper {
    var $table = 'replies';
    var $default_order_by = array('postedwhen');
    var $has_one = array('project','user');
    var $has_many = array();
}

There are back-end tables in the database, of course. There are the basic data tables: users, projects, orgs, replies. They all have a serial / primary id field. The comments table has a project_id field which is automagically detected by DataMapper and used to relate a Comment to a Project and vice versa. For the mapping of many Users to many Projects, and of Users to Orgs, there are standard join tables: users_organizations has two columns user_id and organization_id, and users_projects is similar.

An extra bonus: unlike Django, these models do not need to define the fields and their datatypes. DataMapper scans the table and uses whatever columns are present. We only need to define the metadata above, and the computer figures it out. (isn't that the computer's job?)

The only shortcoming here, is that you're on your own for creating the database tables. You can use whatever tool you like, though: command-line psql, MySQL Query Browser, phpXXAdmin. And in the case of the users-to-orgs and users-to-projects table, I had to run a failed query once so I could see the name of the table it couldn't find: organization_users, user_organizations, users_orgs, ... ? Still, that wasted about 2 minutes of my time, out of the 4 hours of reading and replicating the database as specified.

Tying it together: Controller Logic


Now that we have the relations defined, we can use DataMapper's genius to make some very readable queries, without tediously writing out subqueries.

// fetch all Users who have access to this Project (that is, who have a relation to it)
// Variant 1: build an array of User objects
$users = array();
$proj = new Project();
$proj->where('id',$PROJECT_ID)->get();
foreach ($proj->user as $u) $users[] = $u;

// fetch all Users who have access to this Project (that is, who have a relation to it)
// Variant 2: build a User recordset which could be passed through more filters by the caller
$proj = new Project();
$proj->where('id',$PROJECT_ID)->get();
$uids = array_map( create_function('$u','return $u->id;') , $proj->user );
$users = new User();
$users->where_in('id',$uids)->get();

// Fetch all Locations under my same Organization (remember: Org->Project->Action->Location)
// Note the use of project.organization_id instead of project.organization.id
// This is marginally faster as it doesn't access the organizations table at all
$locs = new Location();
$locs ->where_related('action/project','organization_id',$user->organization_id)->get();

// Fetch all Locations which are for a Project Action for which I am a listed User
$locs = new Location();
$locs ->where_related('action/project/user','id',$user->id)->get();
 
// fetch a Location by its ID#
// and make sure that it's for an Action for a Project for which I an a listed User
$locs = new Location();
$locs->where('id',$what->id)->where_related('action/project/user','id',$user->id)->get();




As you can see, a concept such as $costs->where_related('action/project/user','id',$user->id) is something you would use a lot when working with these sorts of relations. It's tedious but not terribly difficult to implement in SQL, ... but it's even less difficult when using DataMapper!


Tying it together: Controller Logic and Access Control


Okay, enough background. Finally time for what I promised: using DataMapper's supremely readable and flexible logic, to build super-high-security access controls.

This consists of 3 parts.
  • The User model has a wrapper function $user->hasEditingAccessTo($what) which returns true/false indicating whether they should be allowed to edit that object. The model is The Proper Place for such deeper logic, and this method accounts for the User's access level since Managers and non-Managers have different criteria.
  • The controller named Management (thus, all URLs under /management) includes as part of its constructor,the fetching of $this->user which would be the session's corresponding User.
  • The logic for any single controller method, then, can verify your access simply by checking $this->user->hasAccessTo() There's very little to copy-and-paste into every single method, and thus very little to mess up...

Snippet from the Management controller: the URL /project/123 would load up Project # 123 and print or process an editing form, but only if you're allowed.
// all functions in this controller, are only accessible if the session token is set and your access level is high enough
function __construct() {
    parent::__construct();
    // load the user's User account; largely used for validating access to a Project
    $userid = $this->session->userdata('userid');
    $this->user = new User();
    $this->user->where('id',$userid)->get();
    if (! $this->user->id) return redirect(site_url('site/login')); // should be impossible, but it can't possibly be good if it happens
         // check their access level, and make sure it's Manager or else bail
        if ($this->session->userdata('access_level') != ACCESSLEVEL_MANAGER) return redirect(site_url('site/login'));
}

public function project($id) {
    $project = new Project();
    $project->where('id',$id)->get();
    $ok = $this->user->hasEditingAccessTo($project);
    if (! $ok) return redirect(site_url('management/dashboard'));

    if (! @$_POST['save']) return $this->load->view('project_editing.phtml');
    // if we got here, it's a POSTed form so process it...   
}
The magic (rather, the brains) is in the User instance method hasEditingAccessTo() as follows:

class User extends DataMapper {
    var $table = 'users';
    var $default_order_by = array('firstname');
    var $has_one = array('organization');
    var $has_many = array('project','comment');

     // $this is a User object (this is the User model, right?)
    // Managers filter by $this->organization_id matching a $project->organization_id
    // Editors (non-managers) filter by $this->id being an associate of $project->user

    // Note a security practice here: We don't do "if manager, else" to determine whether
    // someone is a Manager, and presume that they're an Editor if not. Instead, use a switch
    // to account for Manager, Editor, and Other. This limits the possible impacts of any possible
    // upstream programming bug or failed security measure,
    // e.g. someone screwed up and used hasEditingAccessTo() without checking that the user is logged in.

    public function hasEditingAccessTo($what) {
        switch ($this->access_level) {
            case ACCESSLEVEL_MANAGER:
                return $this->_hasEditingAccessTo_manager($what);
                break;
            case ACCESSLEVEL_EDITOR:
                return $this->_hasEditingAccessTo_nonmanager($what);
                break;
            default:
                return false; // a level 0 (logged out) should never have even called this
        }
    }

    private function _hasEditingAccessTo_manager($what) {
        // a simple bail condition: the recordset doesn't have an ID#, the recordset is already blank,
        // probably because the ID# was tried at random, or maybe for a very-recently-deleted record
        if (! $what->id) return false;

        $ok = false;
        switch($what->model) {
            case 'user':
                $what->where('id',$what->id)->where('organization_id',$this->organization_id)->get();
                break;
            case 'project':
                $what->where('id',$what->id)->where('organization_id',$this->organization_id)->get();
                break;
            case 'action':
            case 'link':
                $what->where('id',$what->id)->where_related_project('organization_id',$this->organization_id)->get();
                break;
            case 'location':
                $what->where('id',$what->id)->where_related('action/project','organization_id',$this->organization_id)->get();
                break;
            case 'cost':
                $what->where('id',$what->id)->where_related('project','organization_id',$this->organization_id)->get();
                break;
            case 'assessment':
                $what->where('id',$what->id)->where_related('action/project','organization_id',$this->organization_id)->get();
                break;
        }

        // the yes/no status is simply whether there is still an ID# after all that filtering
        $ok = (boolean) $what->id;
        return $ok;
    }

    private function _hasEditingAccessTo_nonmanager($what) {
        // $what is a DataMapper recordset handle, e.g. a Project filtering by ID#
        // $this is a User object (this is the User model, right?) so $user->project is highly relevant here for finding associated Projects

        // a simple bail condition: the recordset doesn't have an ID#, the record doesn't even exist
        if (! $what->id) return false;

        $ok = false;
        switch($what->model) {
            case 'project':
                $what->where('id',$what->id)->where_related('user','id',$this->id)->get();
                break;
            case 'action':
            case 'link':
                $what->where('id',$what->id)->where_related('project/user','id',$this->id)->get();
                break;
            case 'location':
                $what->where('id',$what->id)->where_related('action/project/user','id',$this->id)->get();
                break;
            case 'cost':
                $what->where('id',$what->id)->where_related('project/user','id',$this->id)->get();
                break;
            case 'assessment':
                $what->where('id',$what->id)->where_related('action/project/user','id',$this->id)->get();
                break;
        }

        // the yes/no status is simply whether there is still an ID# after all that filtering
        $ok = (boolean) $what->id;
        return $ok;
    }

}





Conclusion


CodeIgniter is great for separating out a non-trivial app into its MVC components. There are other MVC frameworks out there, but I like CI best for its simplicity, its fairly low number of magic behaviors, and its willingness to let you break the MVC rules.

DataMapper is overkill for more trivial database-driven apps, and it's not the panacea for every single use case. But for a complex set of relations (there are 33 tables in all, more than described above) it really hits the spot, saving a lot of tedious SQL.

The combination of MVC and DataMapper, allows us to look up a session User and then do complex relational queries, most notably permission checks, with comparatively little code and with very few places for bugs to creep in.