Tuesday, August 23, 2011

Example #10 - Explore the Views Service API

It's time to get a better understanding of the API you can use for an app that searches for and retrieves data from the portal.

The SODA API consists of four services.  These are the types of interactions you can request from the portal.  They are:
  1. authenticate - this invokes a cookie-based authentication method for subsequent requests.
  2. docs - retrieves the reference documentation for the API services
  3. users - retrieves details of user profiles as well as let you manage your own profile.
  4. views - access views to retrieve metadata plus query, create, read, update, and delete rows.
We're going to ignore the first three services.  They aren't needed for our simple apps.

We'll study a portion of the views service.  Our apps will just consume data.  We won't publish any data on the portal.  So, we'll look at retrieving metadata, query, and read operations.  We'll ignore the create, update, and delete operations.

If you want to study the other services, head over to the SODA Developers site.

If you want to study the details of the views service, check out the SODA Developers reference page.

Beware that this API is still a beta.  A number of the services are noted with the disclaimer that they are not finalized and subject to change. 

Terms
"When I use a word," HumptyDumpty said, in a rather scornful tone,
"it means just what I choose it to mean - neither more nor less."
 
"The question is," said Alice,
"whether you can make words mean so many different things."
-- Alice in Wonderland

In the land of Socrata, the terms view, table, and data set (or dataset) all mean the same thing.  Yeah, I know, if you are a database purist, that's unsettling.  Nevertheless, it seems as though the three terms are interchangeable.  Plus, the terms view and views might get confusing.  The views service does its work against a view in the portal.  At times, you will specify a particular view for the views API to go after.

When discussing the API, you make a request to a service.  The service sends back a response.  I tend to lapse into old school lingo of making a call to an API and getting back a return.  There are even a few mentions on the SODA site of methods which is from the object-oriented culture and lines-up with request or call.  For our purposes, different words - same concepts. 

Reference card for views
Here's a one page reference sheet to summarize the views service.  Think of it as the magic decoder ring to create an API request.  You can click on it to enlarge it.



The bold http... line at the top is the API request that you send to a Socrata server.  You have to fill in four blanks:
  1. The portal site you are going to use - either the city, county, or state.  Or, the URL of some other Socrata site.
  2. The views service you need.  Your choices are in the Service column.
  3. Tthe formatting of the response data.  You can have the API send you JSON, XML, etc.
  4. The parameters for the service from the Parameters column.  These further define your request.

Data types
The Response column lists the data type you will receive from the service.  There are six data types.  When the '[ ]' appear it means you will receive an array of them - you'll get back one or more in an array.  Here are the types with links that take you to the SODA reference document, if available, to show you the details:
  • View - has all the information about a view such as description of the view, the columns in the view, etc.
  • ViewColumn - has all the information about a column.  (The SODA reference does not have a section describing this data type.)
  • File - contents of a file.  You can request a file that is attached to a view.  For example, you can request a file from a dataset that has a GIS file in .KML format.  (The SODA reference does not have a section describing this data type.)
  • Row - contents of a row in a view.  This is the data you see when you are looking at a dataset in your web browser.  (The SODA reference does not have a section describing this data type.)
  • String - a string of characters.
  • UserTag - what the user entered when they added a tag to a dataset.  (The SODA reference does not have a section describing this data type.)

Annotated examples
We'll use the Chicago Ward Offices dataset.  To start, any API request we make for this dataset will use the Chicago portal address (If you are stickler for details, it's the hostname) and most requests will use the viewID of the dataset.

Go to the Chicago Data Portal, search for "ward office", look through the results list for the "Ward Offices" entry, click on it, and you'll be looking at the table of data.  (If you'd rather take a shortcut, here's the link to the ward office dataset.)  Now, take a look in your web browser URL bar.  You'll see the URL for the dataset:  http://data.cityofchicago.org/dataset/Ward-Offices/htai-wnw4.   The portal site is data.cityofchicago.org.  The viewID is htai-wnw4.

Now, let's build some API requests.  Try these out.  You can cut-and-paste these into your own command prompt window and put curl in front of them or edit them slightly and use the Chicago console.  See Example #9 for the how-to on both of those approaches.

You'll see I'm favoring the JSON format for the response.  Try replacing that string in one or two samples with XML to see the difference between a JSON-formatted response and an XML-formatted response.  It seems that JSON is the default.  If you omit the format, you'll get back JSON.  I like showing it so you'll know where to slip in another format keyword if you want something besides JSON.

Heads up!

If you are using curl, you might need to quote your http string.  I just found one example that needed quotes.  All other examples work without quotes.  My lazy way is to let you know you probably need quotes if you string parameters together.  I might dig into this in the future and revise all the examples as needed.

This doesn't work:
curl http://data.cityofchicago.org/api/views.json?name=ward&limit=2
I get the error message, 'limit' is not recognized as an operable program or batch file.

This works:
curl "http://data.cityofchicago.org/api/views.json?name=ward&limit=2"

Get metadata for all views
http://data.cityofchicago.org/api/views.json
This will return a huge amount of detail on all the datasets in the portal.  You probably won't use this for a simple app.  You might use this in an app that allows the user to navigate to any dataset in a portal.  For that, your first step might be to grab all this metadata and show the user the datasets in the portal to let the user pick what looks interesting.

Get number of views
http://data.cityofchicago.org/api/views.json?count=true
This parameter will cause the API to return the number of views in the portal. You could use this to tell the user how many datasets there are in the portal.  When I run this today, I get back a count of 603.  This matches what I see in my web browser when I look at the Chicago portal main page.   At the bottom of the page under the page numbers it has, "Showing 25 of 603". 

Get metadata for a view
http://data.cityofchicago.org/api/views/htai-wnw4.json
This returns all the metadata about the Ward Offices dataset.  You might want to use this.  You can pull out the description of the dataset, when it was last updated by the city, how many times it has been viewed and other tidbits that might help your user.  It also has the information on the columns of data in the table you can access.  This might be really useful.  Your app could parse through this to discover the names of columns and other helpful data about each column.

Get metadata for views with specific word in name
http://data.cityofchicago.org/api/views.json?name=ward
You can add a parameter to return just the views that have the term, 'ward', in their view name. When I run this today, I get back four views.  You could use this service to show the user all the views that have to do with wards.  Note that you could use ...name=ward or ...name='ward'.  They are equivalent. 

Combine parameters to limit the views returned
http://data.cityofchicago.org/api/views.json?name=ward&limit=2
Here we'll run the query above but only ask for a max of two views to be returned.  Because of the limit only two views are returned.  This is an example of combining two parameters with &.  For this and other services with multiple parameters, you can combine several parameters in this fashion.  See the Heads Up! found above.  You may need to quote this string with curl.

Get metadata for views with specific words in name
http://data.cityofchicago.org/api/views.json?name=ward+office
Like above but this time you get back the views that have both words, ward and office, in the name.  When I run this, I get back two views.  Note, the search just looks for both words.  It does not look for them being adjacent like ward office.  (I haven't figured out how to search for names with the phrase, ward office, yet.)

Get metadata for views with specific word in description
http://data.cityofchicago.org/api/views.json?description=ward
This parameter will search the view descriptions and return the views with ward in the description.  When I run this today, I get back seven views.

Get metadata for views with specific words in description
http://data.cityofchicago.org/api/views.json?description=ward+office
This parameter will search the view descriptions and return the views with both ward and office in the description.  When I run this today, I get back three views.

Get metadata for all columns
http://data.cityofchicago.org/api/views/htai-wnw4/columns.json
Returns the metadata about all columns in a view.  This is a subset of the previous API request.  In the one just above we get the metadata about the columns plus a whole lot more.  If you just want to know about the columns in the view, use this one.

Get one column
http://data.cityofchicago.org/api/views/htai-wnw4/columns/2607673.json
Returns the data and the metadata for a specific column.  I used the response from the previous API and found the columnID, 2607673, for the ALDERMAN column.  This sample will return the ALDERMAN column.

Get all rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json
Returns the data in the table.  Now we're getting something really useful for a simple app.  This pulls back all the data along with the row metadata for the table.  If you study the output, you'll see that the metadata for the view is returned and after that, at the end of flood of info is the "data" field.  There you'll find the metadata plus contents of each row.

Get the IDs of all rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?row_ids_only=true
You can request the set of all rowIDs.  This will have one ID for each row.  Note that you'll get view metadata first and at the end of the response, in the "data" field, you will get the rowIDs in an array.  I'm trying to think of a good use for this array.  There are other API services that will probably give you the information you really need.

Get one row (alternative #1)
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?ids=2
Using the list of rowIDs from the previous response, you can use them with the ids parameter to get a specific row.  You could use the previous sample to get all rowIDs and then use this service to pull them back one at a time.  This sample retrieves the row with rowID = 2.  The return from this call is very succinct.  You will get back just the info for that row and no other metadata.  (The SODA site says you can retrieve multiple rows with ...?ids=<rowID>&ids=<rowID> but I haven't been able to do that.)

Get one row (alternative #2)
http://data.cityofchicago.org/api/views/htai-wnw4/rows/2.json
Pulls back one row.  Here's another request you can use to pull back one row.  This approach has slightly simpler syntax.

Search and get rows
http://data.cityofchicago.org/api/views/htai-wnw4/rows.json?search="Burke"
This one is really important for a simple app.  It lets you search for rows for a text string.  This example searches for Alderman Burke's ward office by searching the rows for Burke.  You'll get back all the metadata, which is probably superfluous to you, and then at the end is the "data" for the one row about Alderman Burke's office.  Seems like you can use ...search=Burke or ...search="Burke".  If you use ...search='Burke', it doesn't find the row.

More parameters
The views/<viewID>/rows service has many parameters.  Take a look at these sometime.  You can limit the number of rows that are returned.  You can just get a subset of rows by giving a starting row plus a number of follow-on rows to return; you can use this to page through the dataset.  There are other parameters but they probably won't be of interest for a simple app.

Other views services
I'm going to skip the requests that work with sub_columns, files, tags, and user tags.  The reference card shows them and there's more info at the SODA site.  They don't seem too pertinent for a simple app.

The INLINE filtered search will be useful to us.  You can do very complex searches with one request.  Even better, you can do a geographic search to find rows that fall into a circle.  You can specify a latitude/longitude and a radius.  The service will return the rows with locations in that circle.  It warrants it's own entry.  I'll get to that one in a future example.

No comments:

Post a Comment