Wednesday, September 7, 2011

Example #22 - Receiving data via paging

For a couple of the API requests that return large amounts of data, you can receive the data in pages.  Instead of waiting for and then receiving one very long stream of output, you can ask for and receive consecutive chunks - pages.

Here are two examples:  receiving the metadata for all views at a site and receiving all rows in a view.

Paging the return of all views
This uses the /api/views API service with two parameters.  page is the number of the page to retrieve.   limit is the number of views to return; the size of the page. The code prints out the names of all the views currently available on the Cook County site.  Note the first page number is 1, not 0.

Code
from urllib2 import urlopen
from json import load

hostname = "datacatalog.cookcountyil.gov"

pageSize = '20'
pageNum = 1
while True:
    url = "http://%s/api/views.json?page=%s&limit=%s" % (hostname, pageNum, pageSize)
    response = urlopen(url)
    views = load(response)
    numViewsRcvd = len(views)
    if numViewsRcvd == 0:
        break
    print '\n======  PAGE %s - %s VIEWS DOWNLOADED:  ======' % (pageNum, numViewsRcvd)
    for view in views:
        print view['name']
    pageNum += 1

Output

Paging the return of all rows in a table
This example uses the Chicago Ward Office table.  The code prints the contents for the Ward Number column for each of the rows.  Note that the first row is 0.

Code
from urllib2 import urlopen
from json import load

hostname = "data.cityofchicago.org"
viewID = "htai-wnw4"
startRow = 0
numRows = 20

while True:
    url = "http://%s/api/views/%s/rows.json?method=getRows&start=%s&length=%s" % (hostname, viewID, str(startRow), str(numRows))
    response = urlopen(url)
    rows = load(response)
    numRowsRcvd = len(rows)
    if numRowsRcvd == 0:
        break
    print '\n====== %s ROWS DOWNLOADED:  ======' % (numRowsRcvd)
    for row in rows:
        print row['2609304']      # rowID for ward number
    startRow += numRow

Output

Summary
Receiving your data in pages might make it easier for your app.  You may want to show, say, 20 results at a time.  Or, you may want to show the first 20 while you are getting the rest of the results.

For getting all the views in a site, you need to use paging if you there are more than 200.  The default is to receive 50 views.    You can use a parameter, XXX, to get 200 in one call.  Beyond that, you have to page to get all the views.

3 comments:

  1. Hi,

    Thanks for the example code.

    https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

    As shown in the above link, how can i filter out only a particular type of crime and the x/y co-ordinates?
    Thanks.

    ReplyDelete
  2. There are two approaches.

    1) The easy way.

    You can build a filtered view that has just three columns: Primary Type, X Coordinate, Y Coordinate for the type of crime you are interested in. You save that view and a viewID is created. You put that viewID in the above code and you can page back your results.

    Example #15 explains how to build a filtered view. You start with the Crimes Database view and click on the Filter icon. Socrata offers a link to a tutorial video at this point if you'd like to watch for more info. (Example #6 has a YouTube link for my video which shows using a filter as well.) You'll use the "Show & Hide Columns" option to show just your 3 columns. You'll use the "Filter" option to set the Primary Type you want.

    NOTE: There's sad news. I just tried it on the Crimes Database and the "Show & Hide" option is no longer given. I sent an email asking about it. Will update when I hear back. If you click on the second screen shot in Example #6, it's there. Not sure where it went.

    2) The hard way.

    It's a combo of Example #16 plus this example plus a little more code.

    Example #16 shows how to write a query in your code. You'd want to have a query that does an EQUALS of the columnId for Primary Type and the literal of the crime you care about. This will return all rows with your crime.

    You'll have to add some code to pull out the x and y coordinates. Using #16 as a guide, you'd do that in that last "for" loop to get the values for the three columns as you go through each row.

    To have this code page the results back, modify the request used in #16 and add the "... method=getRows&start=%s&length=%s ..." that is used above. Plus put it in a loop to pull each page back.

    ReplyDelete
  3. Good news! I found out that the "Show & Hide" option was moved in the last update of Socrata. (Thanks, Danielle!) It works. It's in a different place.

    You can use approach 1. Click on the "Manage" icon. It will bring up the "Show & Hide Columns" option. Click on it. You can now select the three columns you want in your view.

    Proceed to click on the "Filter" icon and you can add a filter condition of "Primary Type" is .

    Play around with this until you get the view you want. "Save As" to save it. Use the viewID in the code above.

    ReplyDelete