Developers

In addition to providing a simple to use directory of public data sets we plan to offer a broad set of tools and APIs for developers. Our goal is to make finding public data and integrating it into your application or website as simple and pain free as possible. Data Remix is currently in an early revision and our tooling and api efforts are just getting started. If you want to keep up to date with what we're up to follow us on twitter or join our developer group.

Want to help? Opportunities to get involved writing the integration for your preferred development platform are coming soon. Drop us a line and we'll get back to you when we're ready get rolling or join our developer group where we will discuss new toolkit and api features as well as provide help to toolkit and api users.

We at Data Remix consider any contribution you wish to make whether it's code, bug reports, suggestions, or contributions to our data set documentation to be a valuable gift. We strongly encourage participation regardless of race, gender, sexuality, gender identiy, experience level, political affiliation, strong bias for or against serif fonts on the web, whatever! If you care about our core goal, lowering the barrier of entry to public data, we want your help and value your contribution. Period. If you want to help out let us know and we will find something valuable for you to do. That's a promise.

Currently Data Remix has a basic read-only API to search for and view our catalog of data sources. Detail API documentation is coming soon as are additional API capabilities, but in the mean time here are the basics.

Our API lives at /api/ and is intended to parallel the url structure of the website. So viewing the details for a dataset at /datasources/1 on the web can translate to an API call by simply modifying the url to /api/datasources/1. If you clicked the link you can see that you've received a JSON formatted response. JSON is the default, but you can request different formats using a simple format= query string.

Currently Available Formats:

Example JSON Response for a data source

Take a look at the example response below. Most of the properties are fairly easy to understand an map directly to items that you can see on the Data Remix website. A single data source object can contain any number of files. Each file can contain a set of segment values which help us understand what files contain what slices of data. Segments are things like year, fiscal year, state, city, zip code, etc. Any slice of data that can be used to seperate a single data set into multiple files
{ "name": "Geospatial display of current weather radar images (RIDGE Weather Radar)", "tags": "radar integrated display with geospatial elements, ridge radar, doppler radar, geographic overlay, weather radar, warning, enhanced radar images", "date_updated": null, "provided_by": { "information_url": null, "code": null, "name": "National Weather Service" }, "catalog_dataset_info_uri": "http://www.data.gov/details/61", "data_files": [ { "file_size_kb": null, "segments": [], "uri": "http://radar.weather.gov/ridge/kmzgenerator.php", "download_file_type": { "id": "unk", "name": "unk" }, "catalog_dataset_info_uri": null, "data_file_type": { "id": "kml", "name": "kml" } } ], "date_released": null, "id": 1, "description": "Provides GIS overlays for current weather radar results" }

Searching

To access search simply add /api to the front of your search url like so api/datasources/search?query=environment. All parameters are optional including query though a minimum of one parameter must be specified. All parameters other than query support multiple options seperated by a comma. so &file_type=csv,xls will return data sources with file types of either csv or xls.
  • query - A general text query that will search across name, description, tags, provider name, etc
  • provider_name - Specify a specific provider name
  • segment - Specify a specific segment name and segment value like year:2008
  • file_type - Specify a file type to filter for files of a particular download type. zip, gz, kmz, ...
  • data_type - Specify a data file type to filter for data in a particular format. kml, csv, shp, ...

Search Results Example - JSON

[ { "name": "2005 Toxics Release Inventory data for the state of Iowa", "date_updated": "2008-11-05", "provided_by": { "code": null, "name": "Environmental Protection Agency" }, "catalog_dataset_info_uri": "http://www.data.gov/details/149", "date_released": "2007-03-22", "id": 173, "description": "The Toxics Release Inventory (TRI) is a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities." }, { "name": "2005 Toxics Release Inventory data for the state of Kansas", "date_updated": "2008-11-05", "provided_by": { "code": null, "name": "Environmental Protection Agency" }, "catalog_dataset_info_uri": "http://www.data.gov/details/153", "date_released": "2007-03-22", "id": 177, "description": "The Toxics Release Inventory (TRI) is a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities." }, { "name": "2005 Toxics Release Inventory data for the state of Kentucky", "date_updated": "2008-11-05", "provided_by": { "code": null, "name": "Environmental Protection Agency" }, "catalog_dataset_info_uri": "http://www.data.gov/details/154", "date_released": "2007-03-22", "id": 178, "description": "The Toxics Release Inventory (TRI) is a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities." }, ]

Going forward we intend to provide a set of easily installable modules for popular development languages and application frameworks. Three main toolkit components are planned. Naturally all of this is subject to change at any time. At the end of the day it's you, the developers using Data Remix, that will drive the ultimate direction of the toolkit.

  • Data Remix Command Line - Access all Data Remix API functions from a simple command line.
  • API Client Modules - Encapuslates and simplifies access to all DataRemix APIs for various programming languages and platforms (python, ruby, java, .net, etc). First version will be available for Python. We warmly welcome your contribution if you want to write one for your language of choice.
  • Database Model and ETL Generation - Our long term goal is to provide a set of simple to use modules for various platforms that will enable a developer to select one or more datasets from the DataRemix catalog and automatically construct and populate a local data store with a joined set of public data. We beleive this would drastically lower the boundary of entry to writing websites and applications that use public data.

DataRemix.py - Our Command Line Tool

Our first beta tool, a python command line script to search for and download public data sets, is now available! Get it here.

System Requirements

Search Examples

Search for datasets by the keyword 'wetlands' provided by the Environmental Protection Agency
dataremix.py -s wetlands -p Environmental Protection Agency

Search for datsets provided by Homeland Security for California in 2008
dataremix.py -s -p Homeland Security -m year:2008,state:CA

Download Data Examples

Download all XML format data from dataset 182 for year: 2008 and state: california
dataremix.py -g 182 -t XML -s year:2008, state:CA

Return a list of public data sets that relate to the environment and store their data in KML or KMZ format
dataremix.py -f environment -t KML,KMZ

Return a list of public data sets that relate to the environment and store their data in KML or KMZ format
dataremix.py -f environment -t KML,KMZ

Did we mention that Data Remix itself is open source? You can access the code here. The source is licenced under the New BSD Licence.