In addition to providing a simple to use directory of public data sets we plan to offer a broad set of tools and APIs for developers. Our goal is to make finding public data and integrating it into your application or website as simple and pain free as possible. Data Remix is currently in an early revision and our tooling and api efforts are just getting started. If you want to keep up to date with what we're up to follow us on twitter or join our developer group.
Want to help? Opportunities to get involved writing the integration for your preferred development platform are coming soon. Drop us a line and we'll get back to you when we're ready get rolling or join our developer group where we will discuss new toolkit and api features as well as provide help to toolkit and api users.
Currently Data Remix has a basic read-only API to search for and view our catalog of data sources. Detail API documentation is coming soon as are additional API capabilities, but in the mean time here are the basics.
Our API lives at /api/ and is intended to parallel the url structure of the website. So viewing the details for a dataset at /datasources/1 on the web can translate to an API call by simply modifying the url to /api/datasources/1. If you clicked the link you can see that you've received a JSON formatted response. JSON is the default, but you can request different formats using a simple format= query string.
Currently Available Formats:
- JSON - Example: /api/datasources/1?format=json
- YAML - Example: /api/datasources/1?format=yaml
- XML - Example: /api/datasources/1?format=xml
- Python Pickle - Example: /api/datasources/1?format=pickle
Example JSON Response for a data source
Take a look at the example response below. Most of the properties are fairly easy to understand an map directly to items that you can see on the Data Remix website. A single data source object can contain any number of files. Each file can contain a set of segment values which help us understand what files contain what slices of data. Segments are things like year, fiscal year, state, city, zip code, etc. Any slice of data that can be used to seperate a single data set into multiple filesSearching
To access search simply add /api to the front of your search url like so api/datasources/search?query=environment. All parameters are optional including query though a minimum of one parameter must be specified. All parameters other than query support multiple options seperated by a comma. so &file_type=csv,xls will return data sources with file types of either csv or xls.- query - A general text query that will search across name, description, tags, provider name, etc
- provider_name - Specify a specific provider name
- segment - Specify a specific segment name and segment value like year:2008
- file_type - Specify a file type to filter for files of a particular download type. zip, gz, kmz, ...
- data_type - Specify a data file type to filter for data in a particular format. kml, csv, shp, ...
Search Results Example - JSON
Going forward we intend to provide a set of easily installable modules for popular development languages and application frameworks. Three main toolkit components are planned. Naturally all of this is subject to change at any time. At the end of the day it's you, the developers using Data Remix, that will drive the ultimate direction of the toolkit.
- Data Remix Command Line - Access all Data Remix API functions from a simple command line.
- API Client Modules - Encapuslates and simplifies access to all DataRemix APIs for various programming languages and platforms (python, ruby, java, .net, etc). First version will be available for Python. We warmly welcome your contribution if you want to write one for your language of choice.
- Database Model and ETL Generation - Our long term goal is to provide a set of simple to use modules for various platforms that will enable a developer to select one or more datasets from the DataRemix catalog and automatically construct and populate a local data store with a joined set of public data. We beleive this would drastically lower the boundary of entry to writing websites and applications that use public data.
DataRemix.py - Our Command Line Tool
Our first beta tool, a python command line script to search for and download public data sets, is now available! Get it here.System Requirements
- Python 2.3 or greater
- urlgrabber
Search Examples
Search for datasets by the keyword 'wetlands' provided by the Environmental
Protection Agency
dataremix.py -s wetlands -p Environmental Protection Agency
Search for datsets provided by Homeland Security for California in 2008
dataremix.py -s -p Homeland Security -m year:2008,state:CA
Download Data Examples
Download all XML format data from dataset 182 for year: 2008 and
state: california
dataremix.py -g 182 -t XML -s year:2008, state:CA
Return a list of public data sets that relate to the environment
and store their data in KML or KMZ format
dataremix.py -f environment -t KML,KMZ
Return a list of public data sets that relate to the environment
and store their data in KML or KMZ format
dataremix.py -f environment -t KML,KMZ
Did we mention that Data Remix itself is open source? You can access the code here. The source is licenced under the New BSD Licence.

