Sunday, September 4, 2011

Example #20 - Three utilities for building inline filters

Along the way, I put together three short scripts that help in finding the details to build the inline filters.

getcolumns.py - prints the columnNames and columnIDs from a view
getquery.py - prints the query value for a filtered view
getview.py - prints the metadata for a view

They've been helpful for quickly getting the behind-the-scenes information from one of the portals.

They share the same command syntax:
python <program> [-cook | -chicago | - il | <hostname>] <viewID>

Where:
   -cook opens the Cook County Data Portal
   -chicago opens the City of Chicago Data Portal
   -ill opens the State of Illinois Data Portal
   <hostname> - allows the user to specify another Socrata site
   <viewID> - is the 9 character ID of the view to be used.  The format is cccc-cccc

Examples
Here is one example of each utility.

getcolumns
python getcolumns.py -cook e9qr-rmq4
Prints the IDs and names for all columns in the Cook County view for the DEC2010 Check Register.  This helps in getting the columnID to put in your own filter.


getquery

python getquery.py -cook 2wek-2jap
Prints the filter query Socrata generated for Example #15 where I used the web UI to search the DEC2010 Check Register for payments between $1m and $10m for construction services (which is a product code of 912).  You can create a filter using your browser, use this to see it, and use it to guide you in writing one in your code.


getview
python getview.py opendata.socrata.com n5m4-mism
Prints the metadata for the White House Nominations and Appointments dataset.  Direct the output to a file and use a text editor to browse through it to peruse a view.  The screenshot shows the head of the output.  There's much more that is printed.



Code
getcolumns
#
#  getcolumns - Prints the column IDs along with the column names for a view.
#
#  args:      <hostname> <viewID>
#             If <hostname> is "-cook", use the Cook County Data Portal host name
#             If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#             If <hostname> is "-ill", use the State of Illinois Data Portal host name
#             Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#             <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:    Lists the view's columnID and column name for all columns; one per line.
#             The column ID is the value for the 'id' key. 
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) !=3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s/columns.json" % (hostName, viewID)

try:
    u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)

    i = 0
    while i < len(response):
        print response[i]['id'], ' : ', response[i]['name']
        i += 1



getquery
#
#  getquery - Prints the query resource in a filtered view
#             This is used to quickly create the query document used in an INLINE filter API request.
#
#  args:      <hostname> <viewID>
#             If <hostname> is "-cook", use the Cook County Data Portal host name
#             If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#             If <hostname> is "-ill", use the State of Illinois Data Portal host name
#             Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#             <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:    The 'query' portion of the view metadata.  Unicode indicators are stripped off the strings.
#             The output is intended to be ready to be used in a program that is going to call the Socrata
#             views service API with an INLINE filter request.
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) !=3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s/rows.json" % (hostName, viewID)

try:
   u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)

    pp = pprint.PrettyPrinter(indent=3)
    pp.pprint(response['meta']['view']['query'])


getview
#
#  getview - Prints the metadata for a view
#
#  args:     <hostname> <viewID>
#            If <hostname> is "-cook", use the Cook County Data Portal host name
#            If <hostname> is "-chicago", use the City of Chicago Data Portal host name
#            If <hostname> is "-ill", use the State of Illinois Data Portal host name
#            Other <hosthame> is the host name for the Socrata site.  Do not include "http://"
#            <viewID> is the 9 character Socrata view (aka dataset) ID.  Format is:  "cccc-cccc"
#
#  output:   An indented print the response from /api/views/<viewID>.
#
import sys
from urllib2 import urlopen, URLError, HTTPError
from json import load
import pprint


hostNameChicago = "data.cityofchicago.org"
hostNameCook = "datacatalog.cookcountyil.gov"
hostNameIll = "data.illinois.gov"

if len(sys.argv) != 3:
    sys.stderr.write("Usage: python %s [-chicago | -cook | -ill | <hostname>] <viewID>\n" % sys.argv[0])
    raise SystemExit(1)

if sys.argv[1] == "-chicago":
    hostName = hostNameChicago
elif sys.argv[1] == "-cook":
    hostName = hostNameCook
elif sys.argv[1] == "-ill":
    hostName = hostNameIll
else:
    hostName = sys.argv[1]

viewID = sys.argv[2]
url = "http://%s/api/views/%s.json" % (hostName, viewID)

try:
   u = urlopen(url)
except HTTPError, e:
   print "The server at %s could not handle the request." % url
   print "Error code: ", e.code
except URLError, e:
   print "We failed to reach the server at %s." % url
   print "Reason: ", e.reason
else:
    response = load(u)
   
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(response)

No comments:

Post a Comment