Wednesday, August 31, 2011

Example #15 - Complex query using a filtered view

So far, I've shown examples of querying for a word or phrase in name field or the description field.  Most apps will need more complicated queries.

Socrata has a page dedicated to this topic.  It's the basis for this example and the next example.

There's a new term involved - filtered.  In the Socrata web browser interface where you are using a data portal you can create a filter by specifying which rows to include based on their values, summing values in rows, and sorting rows.  It's the way to do the operations to build or find the rows that meet your needs.

There's a simple way to do a complex query.  Socrata calls it piggybacking on an existing filtered view.  Use your web browser, go to the portal, build your query, save it as a view, and in your code you specify the viewID for that query.  The result: your code calls the stored filter.  That filter is just another view.  The advantage:  it is quick and you already know how to write the code to request a view.  There are a couple of disadvantages:  1) It's fixed.  Your app can't modify the query.  2) Your code is in two places.  Your query is stored at the portal and your app calls that query.  You might be maintaining things in two places instead of
one.

Create your filter
We'll use the Chkregist DEC2010 data set on the Cook County Data Portal.  Here's a screenshot of it.  You can find the viewID in the URL in the URL bar at the top of the web browser.  It is e9qr-rmq4.



Let's find the checks for construction services between $1m and $10m.  This screenshot shows the filter parameters on the right.  I'm asking for all rows that have a product description of 912 (this means construction services) and an amount between $1m and $10m.  After creating it, I saved the filtered view as Construction over 1m.  You can see the viewID Socrata assigned to my filtered view in the URL in the URL bar at the top of the web browser.  It is 2wek-2jap.


Use API to retrieve filtered results
Now that the filtered view is in place you can access it from a program.  Here's the code to get all the rows from the filtered view, load the JSON into an array, and print the array.  This should look familiar.

from urllib2 import urlopen
from json import load
import pprint

url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json'
u = urlopen(url)
response = load(u)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)


You get back the data in the rows along with the metadata for columns.   I ran it by typing python example15a.py > out.txt and then looked at out.txt in a text editor.


The value for 'data' is a list with two elements, one for each row in our filtered view.  Each of those elements, in turn, has a list for the values of the columns in the row.  As you'll see next, the first eight are metadata values.  There are eight hidden columns each having metadata for a row.

After the 'data' list, is 'meta' which contains metadata for the view.  'view' is a dictionary and one of its keys is 'columns' with a value of a list with elements for each column.  Each of those list elements has the metadata for the column.  We care about a couple of those metadata elements:  1) 'id' - if it is -1 then this is a metadata column; and 2) 'name' - the name of the column.  Translation: we now know the names of each column in the view.  We also know which of these columns are hidden from the web browser UI.  Turns out the first eight columns are hidden.

Program to get filtered view column names and data
We can use our understanding of the API response  to create a small program, example15b.py, to load up the JSON data for the filtered view and print out the column name along with its value for each row.  Here's the code:


from urllib2 import urlopen
from json import load


url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json'
u = urlopen(url)
response = load(u)


rowNum = 1
for rowData in response['data']:
   print "\n===== Row %d ====" % (rowNum)
   colNum = 0
   for columnMeta in response['meta']['view']['columns']:
      if columnMeta['id'] > 0:
         print columnMeta['name'], ' = ', rowData[colNum]
      colNum += 1
   rowNum += 1

The outer loop iterates through the two rows returned in 'data'.  For each of those rows, the inner loop iterates through each column's metadata.  If a column has a negative value for 'id', then it's metadata and we won't print it.  (Note, if you want to see the hidden metadata and its value for each column, just comment out that if statement.)  Otherwise, print the name of the column along with it's corresponding value from the 'data' array.  Here's the output.


This program lays the groundwork for you to create a program that could explore the column names and, on the fly, give the user column names and values.  You have both the column metadata and the row data.

Program to get just the filtered data
If the above is too much bother and you just want to get your hands on the data values.  You can ask the API to send back just the data.  Your code would then do things like hard code that the 11th element of the 'data'list is the Company Name.  By specifying meta=false parameter on your API request, the views service will return just the data. Here's the code:

from urllib2 import urlopen
from json import load
import pprint


url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json?meta=false'
u = urlopen(url)
response = load(u)


pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)

Here's the output of just the data.



Summary
This is one approach to running a complex query.  You can use the web browser UI to create your filtered view and save it.  You can write a simple program to get back the data.  You can write a more complicated program to plow through the metadata and data to respond to the user's needs.

With a filtered view, you can't change the specifics of the query.  If I wanted to find construction services over $10,000 or if I wanted to find consulting services over $1m, I'd have to create a new filter, get it's viewID, and put it in the code.

The next example shows how to do a query without a filtered view.  It gives you the ability to create a query or change the parameters of a query in your code.

Example #14 - Error handling

For a real app, you'll need to do some error checking and handling on your API requests.  This example shows one approach.

You need to check that the urlopen() call worked.  If it didn't, you either couldn't reach the web site or you had a bad service request.  Reasons you might not reach the web site include:  you misspelled the data portal URL, the portal is down, or you aren't connected to the network.  A bad service request would typically be you misspelled it or (rare but I suppose it could happen) that service is down.

We'll use Python's exception for the urllib2 urlopen() function.  urlopen() sends back errors by raising exceptions and providing us some details to start the debugging.  You'll see we will try the urlopen() and check for two different exceptions:  HTTPError and URLError.

HTTPError tells us something is wrong with what we sent the server.  The server got the request but had problems with it.  You can find details on the various error codes about halfway down on this Python reference page.  There's a table of the values and a link to more info.

URLError tells us something is wrong with the URL we are using.  The server didn't get the request.  It didn't get that far.

Example Code
Four snippets are provided:  the first without error and the next three have different sorts of errors in the API request.

from urllib2 import urlopen, URLError, HTTPError


# 1 - Good URL and service name
print '\nHere\'s what happens with a good URL.'
url = 'http://data.cityofchicago.org/api/views.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'


# 2 - Bad URL
print '\nHere\'s what happens with a bad URL.'
url = 'http://XYZ.cityofchicago.org/api/views.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'


# 3 - Bad service name
print '\nHere\'s what happens with a good URL but bad API service name.'
url = 'http://data.cityofchicago.org/api/XYZ.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'


# 4 - Bad parameter
print '\nHere\'s what happens with a good URL and good API service name but bad parameter.'
url = 'http://data.cityofchicago.org/api/views.json?XYZ=2'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'

Explanations of each snippet
For each of the four snippets, here's their few lines of code, commentary, and output.

Good URL
# 1 - Good URL and service name
print '\nHere\'s what happens with a good URL.'
url = 'http://data.cityofchicago.org/api/views.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'
This is the normal case.  You've seen this URL several times before.  We're requesting all the views from the city of Chicago data portal.  We try to open the URL and since the output shows, OK, we skipped the exceptions - so, no errors.


Bad URL
# 2 - Bad URL
print '\nHere\'s what happens with a bad URL.'
url = 'http://XYZ.cityofchicago.org/api/views.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'
Here, the portal URL is wrong.  There is no web site at XYZ.cityofchicago.org.  Since this is a garbage URL, we end up with a URLError exception and we print out our own error message followed by the reason code from the Python urllib2 module.


Bad service name
# 3 - Bad service name
print '\nHere\'s what happens with a good URL but bad API service name.'
url = 'http://data.cityofchicago.org/api/XYZ.json'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK
The URL is correct but the API service name is wrong.  There is no service called /api/XYZ.  When this runs, the Chicago portal gets the request.  However, it balks because it doesn't know what to do for /api/XYZ.  We have an HTTPError.  The server doesn't know what to do with the HTTP we sent.  We print out our error message followed by the error code from the Python urllib2 module.


Bad parameter
# 4 - Bad parameter
print '\nHere\'s what happens with a good URL and good API service name but bad parameter.'
url = 'http://data.cityofchicago.org/api/views.json?XYZ=2'
try:
   u = urlopen(url)
except HTTPError, e:
   print 'The server could not handle the request.'
   print 'Error code: ', e.code
except URLError, e:
   print 'We failed to reach a server.'
   print 'Reason: ', e.reason
else:
   print 'OK'
Last thing to check - what happens when you send a bad parameter in an API request?  Here, we send /api/views a name/value parameter of XYZ =2. XYZ is bogus.  As the output shows, this does not cause any exception.  The URL is good.  The service name is good.  As far as urllib2.urlopen() is concerned everything worked properly.  The service would need to send back an error message to us complaining about an unknown name.  The API just ignores the bad parameter.  (Want to see?  Add one more line to the code, print u.read(), and you will see in the output there's no indication of an error.)

Summary
You've seen how to write some error checking code and how the different sorts of errors are reported.  For a production app, this error checking is just a start.  You'd need to be more thorough and helpful to your users.   In addition, note that nothing will tell you about a bad parameter being sent to /api/views.

Example #13 - Simple queries using GET

Going a bit deeper into the API.  Let's look at code a few simple queries.  Some of this will look familiar.  We'll combine some of the queries we did in Example #10 with the code from Example #12 to create code snippets that could be used by a simple app.

The SODA API is a RESTful API.  You can read more about that on Wikipedia or one of many simple tutorials.

The SODA reference page for the Views API shows that all uses of this service are done with a GET.  When we open a URL without any parameters, that is a GET.  This is what we've done in all examples so far and will do in this example.

This talk about REST and GET is just background.  You may want to dig deeper and if you do, you'll start to care about these terms.

So, on with the example.  We're going to progressively add parameters to do different queries and show how the returned information differs.  The output from each request shows the number of views and the name of the view.

Example Code
A few overall comments:
  • I'm using a different style of importing.  When you import from a module, you don't have to put in the name of the module when you use a function.  In past examples, I would import urllib2 and then call urllib2.urlopen().  This example uses from urllib2 import urlopen and then calls urlopen().  It makes the code look a little cleaner.
  • If you are wondering about the actions of the code, look back at Example 12 and your favorite Python book or website.

from urllib2 import urlopen
from json import load


# 1 - Ask for all views
url = 'http://data.cityofchicago.org/api/views.json'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for all views.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']


# 2 - Ask for views with 'police' in the name
url = 'http://data.cityofchicago.org/api/views.json?name=police'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police\" in name.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']


# 3 - Ask for views with 'police station' in the name
url = 'http://data.cityofchicago.org/api/views.json?name=police+station'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police station\" in name.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']


# 4 - Ask for views with 'police' in the name but set a limit of just returning two views
url = 'http://data.cityofchicago.org/api/views.json?name=police&limit=2'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police\" in name and limit to two views.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']

Explanations of each snippet
For each of the four snippets, here's the few lines of code, an explanation, and the portion of the output generated by that snippet.

API request with no parameters
# 1 - Ask for all views
url = 'http://data.cityofchicago.org/api/views.json'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for all views.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']
This is a straightforward request from the views service to return all the views (aka datasets) in the city of Chicago data portal.  The output shows we get back the default of 50 views and lists their names.


API request with one parameter
# 2 - Ask for views with 'police' in the name
url = 'http://data.cityofchicago.org/api/views.json?name=police'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police\" in name.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']
This shows how to add one parameter by adding on the ? along with a name/value pair.  This time we get back the eight views that have the word police in their name.


API request with a multi-word parameter
# 3 - Ask for views with 'police station' in the name
url = 'http://data.cityofchicago.org/api/views.json?name=police+station'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police station\" in name.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']
The + character is treated as a space.  This is a request to search for police station in the name.  We get back three datasets.


API request with multiple parameters
# 4 - Ask for views with 'police' in the name but set a limit of just returning two views
url = 'http://data.cityofchicago.org/api/views.json?name=police&limit=2'
u = urlopen(url)
views = load(u)
numViews = len(views)
print '\nAsk for \"police\" in name and limit to two views.  Number of views returned = %s' % numViews
for view in views:
   print '   ', view['name']
Using the & character between name/value pairs, you can string together multiple parameters for your service request.  Here we ask for the views with police in the name but with a limit of returning just two views.  Instead of getting the eight views that have police in the name, we get just the first two.


Generalizing
You can use this type of code to make all the requests detailed on the reference card in Example #10.  Just plug in the URL you wish to use.  You can use the ? to string together name/value parameters shown on the reference card.  If any of your parameters have multiple words in the value, use the +.

Friday, August 26, 2011

Example #12 - Pull data out of an API response

Let's take a look at how you might access specific values from the JSON-formatted response from an API request.

To get some data to play around with, we'll send one API request for metadata for the views in the Chicago data portal.  The response, in JSON, is a set of metadata structured as a list.  This is a SODA View[].  Each element of the list contains the metadata for one view.  That metadata (a SODA View), in turn, is represented as a number of key-value pairs.  Some of the values are themselves a set of nested key-value pairs.  You can look at the output from the snippet #4 to see the metadata for one view.

The example code does eight actions to illustrate some of the ways to access and use the data returned from an API request.  You can generalize these to accessing data from other API requests.

Example Code

#
# Example12.py - Shows code to access data in an API response
#
import json
import urllib2
import pprint
import time

# Get metadata for views in the Chicago data portal
fileHandle = urllib2.urlopen("http://data.cityofchicago.org/api/views.json")
views = json.load(fileHandle)

# 1 - Get the number of views that were returned
numViews = len(views)
print '\nNUMBER OF VIEWS RETURNED = %s\n' % numViews

# 2 - Access one of the metadata values.  Get the name of the first view.
viewName = views[0]['name']
print 'NAME OF FIRST VIEW = %s\n' % viewName

# 3 - Iterate through the views and print the name of each view
print 'NAMES OF ALL %s VIEWS DOWNLOADED:' % numViews
for view in views:
print view['name']

# 4 - Print all metadata for first view, both the keys and values, including all nested pairs 
print '\nFORMATTED DUMP OF THE FIRST VIEW SHOWING ALL KEYS AND VALUES:'
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(views[0])

# 5 - Iterate thru first view and print the keys.  Does not print nested keys
print '\nHIGH-LEVEL FIELDS IN A VIEW:'
for key in views[0].iterkeys():
print key

# 6 - Access a nested key-value pair.  Print the name of the view owner.
ownerName = views[0]['owner']['displayName']
print '\nNAME OF THE OWNER OF THE FIRST VIEW %s\n' % ownerName

# 7 - Get and print a time value.  It's in seconds since the epoch.  
secsOwnerProfileModified =  views[0]['owner']['profileLastModified']
print 'OWNER OF FIRST VIEW LAST MODIFIED HIS/HER USER PROFILE %s SECONDS SINCE JANUARY 1, 1970\n' % secsOwnerProfileModified

# 8 - Now print the time value in local timezone.
localtimeOwnerProfileModified = time.ctime(secsOwnerProfileModified)
print 'OWNER LAST MODIFIED HIS/HER USER PROFILE ON %s\n' %localtimeOwnerProfileModified

Explanations of each snippet
By the numbers, you'll see the few lines of code, an explanation, and the portion of the output generated by this section of the code.

Convert JSON and load an array
# Get metadata for views in the Chicago data portal
fileHandle = urllib2.urlopen("http://data.cityofchicago.org/api/views.json")
views = json.load(fileHandle)
To  start, get that array of metadata of views.  First, make the API request to the city of Chicago data portal and ask for the metadata for all views.  Second, use json.load() to take data coming back from the API, parse the JSON, and place the elements the views list. The remaining eight numbered snippets work with that list.

Get the number of views returned
# 1 - Get the number of views that were returned
numViews = len(views)
print '\nNUMBER OF VIEWS RETURNED = %s\n' % numViews
How many individual views were sent back?  We can get the length of the list to find out.  The result shows there are 50 elements.  Hey, first discovery - with no parameters, the API defaults to return 50 of the 603 currently available views



Get a value from one view
# 2 - Access one of the metadata values.  Get the name of the first view.
viewName = views[0]['name']
print 'NAME OF FIRST VIEW = %s\n' % viewName
This may be something you'll be doing for an app.  You can pull out the individual strings that are the values for different pieces of the metadata.  Here, we are asking for the value associated with the key called name for the zeroeth element of the views list.  Translation:  get the name of the first view.


Iterate through all of the views
# 3 - Iterate through the views and print the name of each view
print 'NAMES OF ALL %s VIEWS DOWNLOADED:' % numViews
for view in views:
    print view['name']
Here's how you can step through the list of views and get a value for the name key in each view.  The output is a nice summary of the views from the portal.


Print all metadata for a view
# 4 - Print all metadata for first view, both the keys and values, including all nested pairs
print '\nFORMATTED DUMP OF THE FIRST VIEW SHOWING ALL KEYS AND VALUES:'
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(views[0])
Want the details of the metadata for a view?  Here's everything we have for the first view.  pprint outputs each key-value pair on its own line.  The nested pairs are indented four spaces.  You can use this for other API responses to see everything that you receive in a layout that is somewhat easy to read.  The u'....' means ... is a unicode string.


Iterate through one view
# 5 - Iterate thru first view and print the keys.  Does not print nested keys
print '\nHIGH-LEVEL FIELDS IN A VIEW:'
for key in views[0].iterkeys():
    print key
We iterated through all views and printed one value from each view.  Now, let's go inside one view and iterate through it.  We'll step through all the key-value pairs and just print the key.  We aren't going deeper into the nested key-value pairs.  For example, the key owner has as it's value a whole set of key-value pairs.  We'll just print out owner and move on.  The result is a nice list of all the data elements in the View data type.  Note that it differs from the reference material on the SODA site.  That documentation does not match the API's operation.


Get a nested key-value pair
# 6 - Access a nested key-value pair.  Print the name of the view owner.
ownerName = views[0]['owner']['displayName']
print '\nNAME OF THE OWNER OF THE FIRST VIEW %s\n' % ownerName
Now, let's dig into one of those nested dictionaries.  The owner key has a value that consists of nine key-value pairs.  Here's how you can get one of those.  The code goes to the first view and then to the owner key and then the displayName key of within the owner key.  Translation:  what's the name of the owner of the first view?


Get a time value
# 7 - Get and print a time value.  It's in seconds since the epoch. 
secsOwnerProfileModified =  views[0]['owner']['profileLastModified']
print 'OWNER OF FIRST VIEW LAST MODIFIED HIS/HER USER PROFILE %s SECONDS SINCE JANUARY 1, 1970\n' % secsOwnerProfileModified
Some of the data elements are timestamps.  This example and the next one look at those.  We'll work with the element that represents the time when the owner of the first view last modified his or her user profile.  These values are represented as the number of seconds since January 1, 1970, which is also called the epoch.  This snippet prints out that value.


Convert time in seconds to local time
# 8 - Now print the time value in local timezone.
localtimeOwnerProfileModified = time.ctime(secsOwnerProfileModified)
print 'OWNER LAST MODIFIED HIS/HER USER PROFILE ON %s\n' %localtimeOwnerProfileModified
Seeing the total number of seconds isn't very useful to us (it is, however, very useful for comparing times or computing the difference between times).  Here we'll use a method to create a string that has day, month, time of day, and year in the local timezone.  Now, it's easier understand when the view's owner's profile was modified.


Final comments
Remember, if you run this code, your output may differ from what's shown.  When you run it, the portal may have a different sets of views with different values.

Wednesday, August 24, 2011

Example #11 - Create the simplest program

Example #10 showed many uses of the API.  Let's code up one.  This will be the Hello World! example of using the API.

The code examples will be written in Python.  I'm brand new to Python and using this work as a great excuse to learn it.  If you look at any of the code and know there's a better way, use it and ignore my novice approach.

Let's pick the API request from Example #10 with the smallest amount of response data.  It'll just keep things simpler.  The winner is the API request to count the number of views in the Chicago portal since it just returns a number.

Code
Here's the code for my program called, example11.py:
import urllib2
f = urllib2.urlopen("http://data.cityofchicago.org/api/views.json?count=true")
response = f.read()
print response

Explanation
What's going on in this code?  Here's a line by line description:
  1. Import the module that has the functions to open and get data from a URL.
  2. The URL is the API request to return the number of views in the Chicago portal.  (See Example #10 for more about how the URL is determined.)  f is the filehandle to read the response from the API request.  The API is invoked during this statement.
  3. Using file I/O method, read(), to read what the URL returned, ie., the API response, into the string response.
  4. Finally, print response.
If you are new to Python as well, you can try this yourself.  First, go to the Python site and download Python.  I'm using Python 2.7. Then, setup Python on your system.  You can cut-and-paste the code into your editor, save it as example11.py, and run it with the command, python example11.py.

Here's a screenshot when I ran the program and its output:

Tuesday, August 23, 2011

Example #10 - Explore the Views Service API

It's time to get a better understanding of the API you can use for an app that searches for and retrieves data from the portal.

The SODA API consists of four services.  These are the types of interactions you can request from the portal.  They are:
  1. authenticate - this invokes a cookie-based authentication method for subsequent requests.
  2. docs - retrieves the reference documentation for the API services
  3. users - retrieves details of user profiles as well as let you manage your own profile.
  4. views - access views to retrieve metadata plus query, create, read, update, and delete rows.
We're going to ignore the first three services.  They aren't needed for our simple apps.

We'll study a portion of the views service.  Our apps will just consume data.  We won't publish any data on the portal.  So, we'll look at retrieving metadata, query, and read operations.  We'll ignore the create, update, and delete operations.

If you want to study the other services, head over to the SODA Developers site.

If you want to study the details of the views service, check out the SODA Developers reference page.

Beware that this API is still a beta.  A number of the services are noted with the disclaimer that they are not finalized and subject to change. 

Terms
"When I use a word," HumptyDumpty said, in a rather scornful tone,
"it means just what I choose it to mean - neither more nor less."
 
"The question is," said Alice,
"whether you can make words mean so many different things."
-- Alice in Wonderland

In the land of Socrata, the terms view, table, and data set (or dataset) all mean the same thing.  Yeah, I know, if you are a database purist, that's unsettling.  Nevertheless, it seems as though the three terms are interchangeable.  Plus, the terms view and views might get confusing.  The views service does its work against a view in the portal.  At times, you will specify a particular view for the views API to go after.

When discussing the API, you make a request to a service.  The service sends back a response.  I tend to lapse into old school lingo of making a call to an API and getting back a return.  There are even a few mentions on the SODA site of methods which is from the object-oriented culture and lines-up with request or call.  For our purposes, different words - same concepts. 

Reference card for views
Here's a one page reference sheet to summarize the views service.  Think of it as the magic decoder ring to create an API request.  You can click on it to enlarge it.



The bold http... line at the top is the API request that you send to a Socrata server.  You have to fill in four blanks:
  1. The portal site you are going to use - either the city, county, or state.  Or, the URL of some other Socrata site.
  2. The views service you need.  Your choices are in the Service column.
  3. Tthe formatting of the response data.  You can have the API send you JSON, XML, etc.
  4. The parameters for the service from the Parameters column.  These further define your request.

Data types
The Response column lists the data type you will receive from the service.  There are six data types.  When the '[ ]' appear it means you will receive an array of them - you'll get back one or more in an array.  Here are the types with links that take you to the SODA reference document, if available, to show you the details:
  • View - has all the information about a view such as description of the view, the columns in the view, etc.
  • ViewColumn - has all the information about a column.  (The SODA reference does not have a section describing this data type.)
  • File - contents of a file.  You can request a file that is attached to a view.  For example, you can request a file from a dataset that has a GIS file in .KML format.  (The SODA reference does not have a section describing this data type.)
  • Row - contents of a row in a view.  This is the data you see when you are looking at a dataset in your web browser.  (The SODA reference does not have a section describing this data type.)
  • String - a string of characters.
  • UserTag - what the user entered when they added a tag to a dataset.  (The SODA reference does not have a section describing this data type.)

Annotated examples
We'll use the Chicago Ward Offices dataset.  To start, any API request we make for this dataset will use the Chicago portal address (If you are stickler for details, it's the hostname) and most requests will use the viewID of the dataset.

Go to the Chicago Data Portal, search for "ward office", look through the results list for the "Ward Offices" entry, click on it, and you'll be looking at the table of data.  (If you'd rather take a shortcut, here's the link to the ward office dataset.)  Now, take a look in your web browser URL bar.  You'll see the URL for the dataset:  http://data.cityofchicago.org/dataset/Ward-Offices/htai-wnw4.   The portal site is data.cityofchicago.org.  The viewID is htai-wnw4.

Now, let's build some API requests.  Try these out.  You can cut-and-paste these into your own command prompt window and put curl in front of them or edit them slightly and use the Chicago console.  See Example #9 for the how-to on both of those approaches.

You'll see I'm favoring the JSON format for the response.  Try replacing that string in one or two samples with XML to see the difference between a JSON-formatted response and an XML-formatted response.  It seems that JSON is the default.  If you omit the format, you'll get back JSON.  I like showing it so you'll know where to slip in another format keyword if you want something besides JSON.

Heads up!

If you are using curl, you might need to quote your http string.  I just found one example that needed quotes.  All other examples work without quotes.  My lazy way is to let you know you probably need quotes if you string parameters together.  I might dig into this in the future and revise all the examples as needed.

This doesn't work:
curl http://data.cityofchicago.org/api/views.json?name=ward&limit=2
I get the error message, 'limit' is not recognized as an operable program or batch file.

This works:
curl "http://data.cityofchicago.org/api/views.json?name=ward&limit=2"

Get metadata for all views
http://data.cityofchicago.org/api/views.json
This will return a huge amount of detail on all the datasets in the portal.  You probably won't use this for a simple app.  You might use this in an app that allows the user to navigate to any dataset in a portal.  For that, your first step might be to grab all this metadata and show the user the datasets in the portal to let the user pick what looks interesting.

Get number of views
http://data.cityofchicago.org/api/views.json?count=true
This parameter will cause the API to return the number of views in the portal. You could use this to tell the user how many datasets there are in the portal.  When I run this today, I get back a count of 603.  This matches what I see in my web browser when I look at the Chicago portal main page.   At the bottom of the page under the page numbers it has, "Showing 25 of 603". 

Get metadata for a view
http://data.cityofchicago.org/api/views/htai-wnw4.json
This returns all the metadata about the Ward Offices dataset.  You might want to use this.  You can pull out the description of the dataset, when it was last updated by the city, how many times it has been viewed and other tidbits that might help your user.  It also has the information on the columns of data in the table you can access.  This might be really useful.  Your app could parse through this to discover the names of columns and other helpful data about each column.

Get metadata for views with specific word in name
http://data.cityofchicago.org/api/views.json?name=ward
You can add a parameter to return just the views that have the term, 'ward', in their view name. When I run this today, I get back four views.  You could use this service to show the user all the views that have to do with wards.  Note that you could use ...name=ward or ...name='ward'.  They are equivalent. 

Combine parameters to limit the views returned
http://data.cityofchicago.org/api/views.json?name=ward&limit=2
Here we'll run the query above but only ask for a max of two views to be returned.  Because of the limit only two views are returned.  This is an example of combining two parameters with &.  For this and other services with multiple parameters, you can combine several parameters in this fashion.  See the Heads Up! found above.  You may need to quote this string with curl.

Get metadata for views with specific words in name
http://data.cityofchicago.org/api/views.json?name=ward+office
Like above but this time you get back the views that have both words, ward and office, in the name.  When I run this, I get back two views.  Note, the search just looks for both words.  It does not look for them being adjacent like ward office.  (I haven't figured out how to search for names with the phrase, ward office, yet.)

Get metadata for views with specific word in description
http://data.cityofchicago.org/api/views.json?description=ward
This parameter will search the view descriptions and return the views with ward in the description.  When I run this today, I get back seven views.

Get metadata for views with specific words in description
http://data.cityofchicago.org/api/views.json?description=ward+office
This parameter will search the view descriptions and return the views with both ward and office in the description.  When I run this today, I get back three views.

Get metadata for all columns
http://data.cityofchicago.org/api/views/htai-wnw4/columns.json
Returns the metadata about all columns in a view.  This is a subset of the previous API request.  In the one just above we get the metadata about the columns plus a whole lot more.  If you just want to know about the columns in the view, use this one.

Get one column
http://data.cityofchicago.org/api/views/htai-wnw4/columns/2607673.json
Returns the data and the metadata for a specific column.  I used the response from the previous API and found the columnID, 2607673, for the ALDERMAN column.  This sample will return the ALDERMAN column.

Get all rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json
Returns the data in the table.  Now we're getting something really useful for a simple app.  This pulls back all the data along with the row metadata for the table.  If you study the output, you'll see that the metadata for the view is returned and after that, at the end of flood of info is the "data" field.  There you'll find the metadata plus contents of each row.

Get the IDs of all rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?row_ids_only=true
You can request the set of all rowIDs.  This will have one ID for each row.  Note that you'll get view metadata first and at the end of the response, in the "data" field, you will get the rowIDs in an array.  I'm trying to think of a good use for this array.  There are other API services that will probably give you the information you really need.

Get one row (alternative #1)
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?ids=2
Using the list of rowIDs from the previous response, you can use them with the ids parameter to get a specific row.  You could use the previous sample to get all rowIDs and then use this service to pull them back one at a time.  This sample retrieves the row with rowID = 2.  The return from this call is very succinct.  You will get back just the info for that row and no other metadata.  (The SODA site says you can retrieve multiple rows with ...?ids=<rowID>&ids=<rowID> but I haven't been able to do that.)

Get one row (alternative #2)
http://data.cityofchicago.org/api/views/htai-wnw4/rows/2.json
Pulls back one row.  Here's another request you can use to pull back one row.  This approach has slightly simpler syntax.

Search and get rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?search="Burke"
This one is really important for a simple app.  It lets you search for rows for a text string.  This example searches for Alderman Burke's ward office by searching the rows for Burke.  You'll get back all the metadata, which is probably superfluous to you, and then at the end is the "data" for the one row about Alderman Burke's office.  Seems like you can use ...search=Burke or ...search="Burke".  If you use ...search='Burke', it doesn't find the row.

More parameters
The views/<viewID>/rows service has many parameters.  Take a look at these sometime.  You can limit the number of rows that are returned.  You can just get a subset of rows by giving a starting row plus a number of follow-on rows to return; you can use this to page through the dataset.  There are other parameters but they probably won't be of interest for a simple app.

Other views services
I'm going to skip the requests that work with sub_columns, files, tags, and user tags.  The reference card shows them and there's more info at the SODA site.  They don't seem too pertinent for a simple app.

The INLINE filtered search will be useful to us.  You can do very complex searches with one request.  Even better, you can do a geographic search to find rows that fall into a circle.  You can specify a latitude/longitude and a radius.  The service will return the rows with locations in that circle.  It warrants it's own entry.  I'll get to that one in a future example.

Saturday, August 20, 2011

Results #8 - Publish a map of foreclosures and income

Below is a map of foreclosures in the first half of 2011 by zip code in Cook County along with the per capita income from 2009 in those zip codes.


Example #8 - Publish a map of foreclosures and per capita income by zip code

We can use Tableau Public to create more complex visualizations.  Let's look at foreclosures along with per capita income in Cook County by zip code. As expected, you'll see that the lower income zip codes have a high number of foreclosures.

There's more work involved in this visualization.  Tableau Public (TP) runs on a Windows computer so you'll have to download a free copy and install it.  Getting the data ready takes some effort.  Start with the Cook County Data Portal foreclosure data set, download it,and edit it to clean it up.  I used the text editor, gVim (a vi variant), to do the clean up.  Four changes were needed:
  1. Change the Location 1 field into a five-digit zip code.  In the portal the field has zip+4 appended to a string of lat/lon.
  2. Remove the rows that represent amended foreclosures since they duplicate the original foreclosures.
  3. Remove rows that had null or zeroes for zip code.
  4. Remove the Considerations column.
We're not quite done.  TP wants an XLS file.  So, load the cleaned up CSV file into the spreadsheet or your choice (I used Lotus Symphony) and then save it as .XLS.  Crank up TP and connect it to that XLS file. Finally, we're set to make visualizations.

Here's a nice sample of what TP can create.  This map shows a blue circle for the foreclosures in each zip code in the county - the larger the circle, the more foreclosures.  TP provides built-in layers for the map.  Select the Per Capita Income layer.  Tinker with the colors and circles scale a bit to get the map below.


You can quickly generate other visualizations.  Here's a chart showing foreclosures by month.


Note:  The drop in July is not due to economic conditions.  The data set ends on 7/15/11.  We only have half of the month's data.

You can see an interact with both of these charts in the Results #8 blog entry.

Ready to watch?  Here's the link to the YouTube video.  If you have trouble reading the small fonts used, try changing the YouTube resolution to 720p and expand the video to full screen.  You'll find the controls to make those changes near the bottom of the YouTube video window when you hover your mouse over the video.

Results #7 - Publish a ManyEyes bubble chart

Here's a bubble chart showing the number of employees in each job title.  Only job titles with over 100 people in them are displayed.

Example #7 - Publish a ManyEyes bubble chart

Let's take a look at another web site that offers visualizations.  We'll use IBM's ManyEyes to create a bubble chart showing the number of jobs in each job title in the city government.

To build this, we'll use a roll-up and a sort to get the count of each job title (ie., the number of police officers, number of firefighters, etc.), download the results to a spreadsheet, copy the ones with more than 100 people in them, paste them into ManyEyes, and have ManyEyes create the bubble chart.  Here's that visualization:


You can publish this visualization on your own web page, too.  See the Results #7 entry for an interactive example.

ManyEyes has more styles of visualizations.  Once your data set is loaded, it's easy to try others.

Ready to watch?  Here's the YouTube video