Thursday, January 24, 2013

SODA V2 - There's a new API

In October, Socrata released a new version of the API, SODA V2.  It's a major change and a great improvement over V1.  As a result, all of the examples and discussion of the API in the previous posts are now out of date.  They should continue to work for some time but eventually the old API will be discontinued and the examples will no longer work.

I don't know when, or if, I'll have time to create a new set of examples with the new API.

You can read about the API and how to get started with it at the Socrata's, Getting Started, page.

Monday, September 26, 2011

Other open data sources

In addition to the data portals for the city of Chicago, Cook County, and state of Illinois, metro Chicago has a couple other places with open data:

  • CMAP - Chicago Metropolitan Agency for Planning has an API to access thousands of data sets.  Start at their Data API page to read about it.
  • Chicago Tribune - has worked with others to make U.S. Census data available.  You can find datasets for Chicago census data at the Investigative Reporters and Editors site.

Code and reference card are on github

All the code found used in the examples is now on github under snewell4/Metro Chicago Open Data Examples. In addition, a PDF of the views API reference card from Example #10 is included.

The github README explains it:

9/26/11
These files accompany the Metro Chicago Open Data Examples blog.  They are
simple examples of using the Socrata Open Data API (SODA) with the Socrata data
portals in use at City of Chicago and Cook County.

The blog has a series of examples each intended to explain the API and
different types of uses.  The blog entries are titled "Example #n".  The
filenames use the same naming convention.  You can read about the purpose
and the explanation of the code in the blog.

There are two exceptions to the naming convention:
1)  The blog entry for Example #20 describes three command line utilities.
    Those utilities are named getcolumns.py, getquery.py, and getview.py.
2)  Example #10 includes a one-page overview of the view API.  A copy of that
    overview has the filename, view-ref-card.pdf

Sunday, September 25, 2011

Example #23 - Creating a simple web app

Let's take the code from Example 19 and turn it into a web app.  This will be a trivial example that can be the starting point for building a full web application.

Example 19 used the command line to run a Python program.  One of the program's arguments was a street address.  The program ran and displayed a map in your browser of nearby rat sightings.  In this example, you can enter the street address into a form on a web page and the same sort of map will be displayed.

Here's what it looks like.  You type in the street address and click "Search".



The map is then displayed with the address you entered shown as the blue 'C' marker that denotes the center of the map and the rat sightings (ie., the 311 requests for rodent baitings) shown as red 'R' markers.



Django
I created this using the Django framework.  Django is a platform that supports Python web applications.  You can read about it at the Django project site.

The example runs on my laptop using the free Windows DjangoStack for Windows from BitNami.  This provides the web server, database, and django environment to build apps and test them on your own system.

For production use, you could take the code after it has been tested out on your system and move it to a web hosting provider that runs Django.  I have not done that.  I just wanted to get something simple running on Django.

There is a lot of information on the web about using Django.  The two resources that were most helpful to me were:
  1. Django Project's, Tutorial for writing your first Django app
  2. Django Project's, The Django Book, esp. Chapter 7 on forms

Installation
Follow BitNami's installation instructions.

I installed MySQL but database choice really doesn't matter.  The application does not use any tables. 

I called my project, "Rats."  I then called the only application for the project, "rats."  I didn't understand the naming convention.  It might have made more sense to call the project something like, "opendata," and the application, "rats."

After installing, the directory for the Rats project was:
C:\Documents and Settings\Administrator\BitNami DjangoStack projects\Rats

The contents of the Rats project directory:
09/23/2011  04:08 PM               517 manage.py
09/25/2011  06:19 PM    <DIR>          rats
09/25/2011  04:51 PM             5,586 settings.py
09/25/2011  04:51 PM             3,086 settings.pyc
09/25/2011  06:18 PM               199 urls.py
09/25/2011  06:19 PM               479 urls.pyc
09/23/2011  04:08 PM                 0 __init__.py
09/23/2011  05:32 PM               172 __init__.pyc


The directory for the rats application:
C:\Documents and Settings\Administrator\BitNami DjangoStack projects\Rats\rats

The contents of the rats application directory:
09/23/2011  08:31 PM                60 models.py
09/23/2011  08:34 PM               231 models.pyc
09/25/2011  05:47 PM             7,066 ratslib.py
09/25/2011  05:48 PM             3,815 ratslib.pyc
09/25/2011  05:19 PM    <DIR>          templates
09/23/2011  08:31 PM               399 tests.py
09/25/2011  06:17 PM               565 views.py
09/25/2011  06:17 PM             1,143 views.pyc
09/23/2011  08:31 PM                 0 __init__.py
09/23/2011  08:34 PM               177 __init__.pyc

The template under the rats directory has one html file:
09/25/2011  05:19 PM               274 search_form.html


Files
Following are the contents of the files for this application.  Details about what they do and how they fit together can be found in the two references listed above.  I'll provide a brief overview after listing the files.

settings.py
I made a few changes to ../Rats/settings.py
  1. Set the location for my urlconf file:
    ROOT_URLCONF = 'Rats.urls' 
  2. Added my app to the list of installed apps:
    INSTALLED_APPS = (
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.sites',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'rats',                 
  3. Added my template directory for my html file(s):
    TEMPLATE_DIRS = (
    DjangoStack projects\Rats\rats\templates',
    )
urls.py
from django.conf.urls.defaults import patterns, include, url

urlpatterns = patterns('',
                (r'^search/$', 'rats.views.search'),
                (r'^map/$',    'rats.views.map')


views.py
from django.http import HttpResponse, HttpResponseRedirect
from django.shortcuts import render_to_response
from ratslib import geocode, getRatLocs, createMapUrl

def search(request):
    return render_to_response('search_form.html')

def map(request):
    if 'q' in request.GET:
        err1, lat, lng = geocode( request.GET['q'], 'Chicago', 'IL')
        err2, locs = getRatLocs(lat, lng, '100')
        url = createMapUrl(lat, lng, locs, 20)
        return HttpResponseRedirect(url)
    else:
        return HttpResponse('You submitted an empty form.'


search_form.html
<html>
        <head>
            <title>Search for Rats</title>
        </head>
    <body>
        <form action="/map/" method="get">
            <input type="text" name="q">
            <input type="submit" value="Search">
        </form>
    </body>
</html>


Other files
models.py - No changes.  There are no models (database tables) needed for this application.
tests.py - No changes
manage.py - No changes. 
ratslib.py - this contains functions used in views.py.  It is a module holding the geocode(), getRatLocs(), and createMapUrl() functions from Example 19.  See that example to understand the code.  Because it's lengthy and not really germane to how to use Django, you'll find the contents of this file at the bottom of this entry.



Overview
So, how's this all work?
  1. When you start up Django from the Rats directory with python manage.py runserver, it gets the config settings from settings.py. Now, it knows there's a rats app (in the subdirectory, rats), where the urlconf is located, and where to find the html template files.
  2. Go to a browser and use the URL localhost:8000/search. Django finds out from urls.py to call the function search() in views.py.
  3. The search(request) function in views.py runs. Django goes to the template directory and opens search_form.html and sends it to the browser. This file has the HTML for the simplest query form you could have:  one text entry box and a submit button.  When search() finishes, the browser looks like the first screenshot above.  The user will enter a street address and click the Submit button.  The HTML generates a URL: localhost:8000/map, as a GET request. The street address, entered by the user, will be tacked on to the end of the URL in a name/value pair with the name of q  and the value being the street address.
  4. The Django server now gets the URL, localhost:8000/map/q?<street address>. Django looks in urls.py and finds it should call map(request) in views.py.
  5. The map(request) function in views.py runs. It checks that there is a 'q' in the URL string. If not, the user clicked Submit without entering a street address so give an error message. Now, use the code in the ratslib to do the hard work. First, pass in the address to geocode().  The address is the value from the GET request associated with the name of q.  After geocoding the street address, use the lat/lng to query for the rat locations, and build the URL for the static map.  Finally, call HttpResponseRedirect(url). By redirecting to that URL, Google will generate the map and paint it in the browser.
Comments
This example is just the basics for web application doing a query using SODA.  You can take this in a lot of different directions:
  • By changing getRatLocs(), you could query other views that have locations.  For example, you could search for graffiti, crime, or fallen tree limbs.  You'll need to set the right values for hostName, originalViewId, columnId, colNumLat, and colNumLng.
  • Use a Google dynamic map instead of a static map.  The dynamic map will let the user zoom in and out.
  • Use paging to bring back, say 20, rats at a time and place on the map.
  • Add an entry field or radio buttons on the form for the radius of the circle.  The user could enter the street address and enter or select the size of the circle to search.
The example left out considerations for security, performance, user experience, and best django coding techniques.  There's work needed in all these areas before making it a public app.


ratslib.py

import httplib
import json
import urllib
from urllib2 import Request, urlopen, URLError, HTTPError

#
#   geocode(street, city, state) returns the latitude and longitude of a
#       street address using Google's map API.  Note, there is a limit on
#       the number of geocode requests you can make each day to Google.
#
#   Args:
#       street - string - street address
#       city - string - city name
#       state - string - state name
#
#   Returns:
#       err - string - error information.  If empty, geocode was successful
#       lat - string - latitude of the street address
#       lng - string - longitude of the street address
#
def geocode(street, city, state):
    err = ""
    lat = ""
    lng = ""
    url =  "http://maps.googleapis.com/maps/api/geocode/json?address=" + \
        urllib.quote_plus(street) + ',' + \
        urllib.quote_plus(city)   + ',' + \
        urllib.quote_plus(state)  + \
        "&sensor=false"
    req = Request(url)
    try:
    u = urlopen(req)
    except URLError, e:
        if hasattr(e, 'reason'):
            err = e.reason
    elif hasattr(e, 'code'):
            err = e.code
    else:
        response = json.load(u)
        # Check that Google sent back valid data.  If so, get lat and long.
        if response['status'] == 'OK':
            lat = response['results'][0]['geometry']['location']['lat']
            lng = response['results'][0]['geometry']['location']['lng']
        # otherwise, Google returned an error
        else:
            err = 'Google error code: %s\n' % response['status']
    return err, str(lat), str(lng)

#
#   getRatLocs(lat, lng, rad) returns a list of the locations of rats
#       from the Chicago 311 service requests for rodent baiting.  The
#       locations will be within the circle that has a center point at
#       lat/lon and a radius of rad meters.
#
#   Args:
#       lat - string - latitude of the center point of the circle
#       lng - string - longitude of the center point of the circle
#       rad - string - radius, in meters, of the circle
#
#   Returns:
#       err - string - error information.  If empty, geocode was successful
#       locs - list - each item being a sublist containing two elements:
#           latitude and longitude that represents the location of a rat
#           baiting request.
#
def getRatLocs(lat, lng, rad):
    # parameters used in the SODA POST request to do the search
    hostName   = "data.cityofchicago.org"
    service    = "/views/INLINE/rows"
    formatType = "json"
    parameters = "method=index"
    headers    = { "Content-type:" : "application/json" }
    # SODA inline query.  Lat, Lng, Radius are set to 0 here.
    query = {
        "originalViewId": "97t6-zrhs",
        "name": "Nearby rats",
        "query": {
            "filterCondition": {
                "type": "operator",
                "value": "within_circle",
                "children":
                    [
                        {
                            "type": "column",
                            "columnId": 2849547
                        },
                        {
                            "type": "literal",
                            "value": 0
                        },
                        {
                            "type": "literal",
                            "value":  0
                        },
                        {
                            "type": "literal",
                            "value": 0
                        }
                    ]
                }
            }
         }
    # constants used to index into inline filter children[]
    queryLat = 1
    queryLng = 2
    queryRad = 3
    # constants for table column numbers in query return data
    colNumLat = 22
    colNumLng = 23
    # constants used to index into locs
    locLat = 0
    locLng = 1

    # initialize return variables to be empty
    err = ''
    locs = list()

    # put lat, lng, radius into the inline query
    query["query"]["filterCondition"]["children"][queryLat]["value"] = lat
    query["query"]["filterCondition"]["children"][queryLng]["value"] = lng
    query["query"]["filterCondition"]["children"][queryRad]["value"] = rad
    # setup and send the inline query
    jsonQuery = json.dumps(query)
    request = service + '.' + formatType + '?' + parameters
    conn = httplib.HTTPConnection(hostName)
    conn.request("POST", request, jsonQuery, headers)
    response = conn.getresponse()
    # check for good response, pull data out of response and setup locs
    if response.reason != 'OK':
        err = "%s %s" % (response.status, response.reason)
    else:
        rawResponse = response.read()
        jsonResponse = json.loads(rawResponse)
        for rowData in jsonResponse['data']:
            locs.append([rowData[colNumLat], rowData[colNumLng]])
    return err, locs


#
#   createMapUrl(centerLat, centerLng, locs) returns the URL for a Google
#       static map that has a marker for the center of the map and markers
#       for the locations in locs.
#
#   Args:
#       centerLat - string - latitude of the center point of the map
#       centerLng - string - longitude of the center point of the map
#       locs - list - where each item is a location to be marked on the map.
#           Each item is a sublist with two strings: latitude and longitude
#       n - integer - the first n locations in locs will be marked on the
#           map.  There is an upper limit on the size of the URL sent to
#           the Google Map API.  A long list of locations in the URL will
#           not be accepted.  n keeps the URL within the limit.
#
#   Returns:
#       URL - string - URL to send to the Google Map API to create a static
#           map in the default browser.
#
def createMapUrl(centerLat, centerLng, locs, n):
    # define constants for indexing into the locs sublist
    locLat = 0
    locLng = 1
    # define parameters for the map
    urlMapAPI = "http://maps.googleapis.com/maps/api/staticmap"
    mapZoomLevel = "14"
    mapHeight = "512"
    mapWidth = "512"
    mapType = "roadmap"
    centerMarkerColor = "blue"
    centerMarkerLabel = "C"
    locMarkerColor = "red"
    locMarkerLabel = "R"

    url = \
        urlMapAPI + '?' + \
        'center=' + centerLat + ',' + centerLng + \
        '&' + \
        'size=' + mapHeight + 'x' + mapWidth + \
        '&' + \
        'maptype=' + mapType + \
        '&' + \
        'sensor=false' + \
        '&' + \
        'markers=color:' + centerMarkerColor + '%7C' + \
        'label:' + centerMarkerLabel + '%7C' + centerLat + ',' + centerLng

    i = 0
    for loc in locs:
        if i < n:
            url += '&' + \
                'markers=color:' + locMarkerColor + '%7C' + \
                'label:' + locMarkerLabel + '%7C' + \
                loc[locLat] + ',' + loc[locLng]
        i +=1
    return url


Wednesday, September 7, 2011

Example #22 - Receiving data via paging

For a couple of the API requests that return large amounts of data, you can receive the data in pages.  Instead of waiting for and then receiving one very long stream of output, you can ask for and receive consecutive chunks - pages.

Here are two examples:  receiving the metadata for all views at a site and receiving all rows in a view.

Paging the return of all views
This uses the /api/views API service with two parameters.  page is the number of the page to retrieve.   limit is the number of views to return; the size of the page. The code prints out the names of all the views currently available on the Cook County site.  Note the first page number is 1, not 0.

Code
from urllib2 import urlopen
from json import load

hostname = "datacatalog.cookcountyil.gov"

pageSize = '20'
pageNum = 1
while True:
    url = "http://%s/api/views.json?page=%s&limit=%s" % (hostname, pageNum, pageSize)
    response = urlopen(url)
    views = load(response)
    numViewsRcvd = len(views)
    if numViewsRcvd == 0:
        break
    print '\n======  PAGE %s - %s VIEWS DOWNLOADED:  ======' % (pageNum, numViewsRcvd)
    for view in views:
        print view['name']
    pageNum += 1

Output

Paging the return of all rows in a table
This example uses the Chicago Ward Office table.  The code prints the contents for the Ward Number column for each of the rows.  Note that the first row is 0.

Code
from urllib2 import urlopen
from json import load

hostname = "data.cityofchicago.org"
viewID = "htai-wnw4"
startRow = 0
numRows = 20

while True:
    url = "http://%s/api/views/%s/rows.json?method=getRows&start=%s&length=%s" % (hostname, viewID, str(startRow), str(numRows))
    response = urlopen(url)
    rows = load(response)
    numRowsRcvd = len(rows)
    if numRowsRcvd == 0:
        break
    print '\n====== %s ROWS DOWNLOADED:  ======' % (numRowsRcvd)
    for row in rows:
        print row['2609304']      # rowID for ward number
    startRow += numRow

Output

Summary
Receiving your data in pages might make it easier for your app.  You may want to show, say, 20 results at a time.  Or, you may want to show the first 20 while you are getting the rest of the results.

For getting all the views in a site, you need to use paging if you there are more than 200.  The default is to receive 50 views.    You can use a parameter, XXX, to get 200 in one call.  Beyond that, you have to page to get all the views.

Tuesday, September 6, 2011

Example #21 - Operators for inline filters

I couldn't find a complete list on the SODA site of the operators available for use in an inline filter.  Socrata kindly provide one.

All types:
EQUALS
NOT_EQUALS
IS_BLANK
IS_NOT_BLANK

Textual types:
STARTS_WITH
CONTAINS
NOT_CONTAINS

Numeric and date types:
LESS_THAN
LESS_THAN_OR_EQUALS
GREATER_THAN
GREATER_THAN_OR_EQUALS
BETWEEN

Location:
WITHIN_CIRCLE

All operators are binary except for is/isnot blank, which are unary, and within circle, which is tri-ary.

(Thanks Clint!)

Sunday, September 4, 2011

Example #20 - Three utilities for building inline filters

Along the way, I put together three short scripts that help in finding the details to build the inline filters.

getcolumns.py - prints the columnNames and columnIDs from a view
getquery.py - prints the query value for a filtered view
getview.py - prints the metadata for a view

They've been helpful for quickly getting the behind-the-scenes information from one of the portals.

They share the same command syntax:
python <program> [-cook | -chicago | - il | <hostname>] <viewID>

Where:
   -cook opens the Cook County Data Portal
   -chicago opens the City of Chicago Data Portal
   -ill opens the State of Illinois Data Portal
   <hostname> - allows the user to specify another Socrata site
   <viewID> - is the 9 character ID of the view to be used.  The format is cccc-cccc

Examples
Here is one example of each utility.

getcolumns
python getcolumns.py -cook e9qr-rmq4
Prints the IDs and names for all columns in the Cook County view for the DEC2010 Check Register.  This helps in getting the columnID to put in your own filter.


getquery

python getquery.py -cook 2wek-2jap
Prints the filter query Socrata generated for Example #15 where I used the web UI to search the DEC2010 Check Register for payments between $1m and $10m for construction services (which is a product code of 912).  You can create a filter using your browser, use this to see it, and use it to guide you in writing one in your code.


getview
python getview.py opendata.socrata.com n5m4-mism
Prints the metadata for the White House Nominations and Appointments dataset.  Direct the output to a file and use a text editor to browse through it to peruse a view.  The screenshot shows the head of the output.  There's much more that is printed.



Code
getcolumns
#
#  getcolumns - Prints the column IDs along with the column names for a view.
#
#  args:      <hostname> <viewID>
#             If <hostname> is "-cook", use the Cook County Data Portal host name
#             If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#             If <hostname> is "-ill", use the State of Illinois Data Portal host name
#             Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#             <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:    Lists the view's columnID and column name for all columns; one per line.
#             The column ID is the value for the 'id' key. 
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) !=3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s/columns.json" % (hostName, viewID)

try:
    u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)

    i = 0
    while i < len(response):
        print response[i]['id'], ' : ', response[i]['name']
        i += 1



getquery
#
#  getquery - Prints the query resource in a filtered view
#             This is used to quickly create the query document used in an INLINE filter API request.
#
#  args:      <hostname> <viewID>
#             If <hostname> is "-cook", use the Cook County Data Portal host name
#             If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#             If <hostname> is "-ill", use the State of Illinois Data Portal host name
#             Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#             <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:    The 'query' portion of the view metadata.  Unicode indicators are stripped off the strings.
#             The output is intended to be ready to be used in a program that is going to call the Socrata
#             views service API with an INLINE filter request.
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) !=3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s/rows.json" % (hostName, viewID)

try:
   u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)

    pp = pprint.PrettyPrinter(indent=3)
    pp.pprint(response['meta']['view']['query'])


getview
#
#  getview - Prints the metadata for a view
#
#  args:     <hostname> <viewID>
#            If <hostname> is "-cook", use the Cook County Data Portal host name
#            If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#            If <hostname> is "-ill", use the State of Illinois Data Portal host name
#            Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#            <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:   An indented print the response from /api/views/<viewID>.
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) != 3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s.json" % (hostName, viewID)

try:
   u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)
   
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(response)