• Fact Checks
  • Documentation
  • Browser
  • Home
  • GNI
  • Welcome to dataCommons

    Publicly available data from open sources (i.e. census.gov, NOAA, data.gov etc) are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets is often tedious and cumbersome. Organizations follow distinctive practices for codifying datasets. Combining data from different sources requires mapping common entities (city, county, etc) and resolving different types of keys/identifiers. This process is time consuming and can increase the likelihood for methodological errors.

    dataCommons attempts to synthesize a single Graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the dataCommons graph is open - any user can contribute datasets or build applications powered by the graph. In the long term, we hope the data contained within the dataCommons graph will be useful to students and researchers across different disciplines. Though we’ve already “jump-started” the graph with data from publicly available sources (Wikipedia, US Census, FBI, State election boards, etc), we encourage you to join and contribute.

    dataCommons is currently available to the academic community via Python Notebooks. You can use the dataCommons Graph Browser to browse through the graph. The data can be programmatically accessed via APIs. Also, check out the tutorial/examples.

    dataCommons is a project started by Google, and is intended to be a community effort. Get involved!