Socrata has a page dedicated to this topic. It's the basis for this example and the next example.
There's a new term involved - filtered. In the Socrata web browser interface where you are using a data portal you can create a filter by specifying which rows to include based on their values, summing values in rows, and sorting rows. It's the way to do the operations to build or find the rows that meet your needs.
There's a simple way to do a complex query. Socrata calls it piggybacking on an existing filtered view. Use your web browser, go to the portal, build your query, save it as a view, and in your code you specify the viewID for that query. The result: your code calls the stored filter. That filter is just another view. The advantage: it is quick and you already know how to write the code to request a view. There are a couple of disadvantages: 1) It's fixed. Your app can't modify the query. 2) Your code is in two places. Your query is stored at the portal and your app calls that query. You might be maintaining things in two places instead of
one.
Create your filter
We'll use the Chkregist DEC2010 data set on the Cook County Data Portal. Here's a screenshot of it. You can find the viewID in the URL in the URL bar at the top of the web browser. It is e9qr-rmq4.
Let's find the checks for construction services between $1m and $10m. This screenshot shows the filter parameters on the right. I'm asking for all rows that have a product description of 912 (this means construction services) and an amount between $1m and $10m. After creating it, I saved the filtered view as Construction over 1m. You can see the viewID Socrata assigned to my filtered view in the URL in the URL bar at the top of the web browser. It is 2wek-2jap.
Now that the filtered view is in place you can access it from a program. Here's the code to get all the rows from the filtered view, load the JSON into an array, and print the array. This should look familiar.
from urllib2 import urlopen
from json import load
import pprint
url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json'
u = urlopen(url)
response = load(u)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)
You get back the data in the rows along with the metadata for columns. I ran it by typing python example15a.py > out.txt and then looked at out.txt in a text editor.
The value for 'data' is a list with two elements, one for each row in our filtered view. Each of those elements, in turn, has a list for the values of the columns in the row. As you'll see next, the first eight are metadata values. There are eight hidden columns each having metadata for a row.
After the 'data' list, is 'meta' which contains metadata for the view. 'view' is a dictionary and one of its keys is 'columns' with a value of a list with elements for each column. Each of those list elements has the metadata for the column. We care about a couple of those metadata elements: 1) 'id' - if it is -1 then this is a metadata column; and 2) 'name' - the name of the column. Translation: we now know the names of each column in the view. We also know which of these columns are hidden from the web browser UI. Turns out the first eight columns are hidden.
Program to get filtered view column names and data
We can use our understanding of the API response to create a small program, example15b.py, to load up the JSON data for the filtered view and print out the column name along with its value for each row. Here's the code:
from urllib2 import urlopen
from json import load
url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json'
u = urlopen(url)
response = load(u)
rowNum = 1
for rowData in response['data']:
print "\n===== Row %d ====" % (rowNum)
colNum = 0
for columnMeta in response['meta']['view']['columns']:
if columnMeta['id'] > 0:
print columnMeta['name'], ' = ', rowData[colNum]
colNum += 1
rowNum += 1
The outer loop iterates through the two rows returned in 'data'. For each of those rows, the inner loop iterates through each column's metadata. If a column has a negative value for 'id', then it's metadata and we won't print it. (Note, if you want to see the hidden metadata and its value for each column, just comment out that if statement.) Otherwise, print the name of the column along with it's corresponding value from the 'data' array. Here's the output.
This program lays the groundwork for you to create a program that could explore the column names and, on the fly, give the user column names and values. You have both the column metadata and the row data.
Program to get just the filtered data
If the above is too much bother and you just want to get your hands on the data values. You can ask the API to send back just the data. Your code would then do things like hard code that the 11th element of the 'data'list is the Company Name. By specifying meta=false parameter on your API request, the views service will return just the data. Here's the code:
from urllib2 import urlopen
from json import load
import pprint
url = 'http://datacatalog.cookcountyil.gov/api/views/2wek-2jap/rows.json?meta=false'
u = urlopen(url)
response = load(u)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(response)
Here's the output of just the data.
Summary
This is one approach to running a complex query. You can use the web browser UI to create your filtered view and save it. You can write a simple program to get back the data. You can write a more complicated program to plow through the metadata and data to respond to the user's needs.
With a filtered view, you can't change the specifics of the query. If I wanted to find construction services over $10,000 or if I wanted to find consulting services over $1m, I'd have to create a new filter, get it's viewID, and put it in the code.
The next example shows how to do a query without a filtered view. It gives you the ability to create a query or change the parameters of a query in your code.
No comments:
Post a Comment