Data Commons aggregates data from a wide range of sources into a unified database to make it more accessible and useful. More on why we are building Data Commons.

Explore the data

Explore the data using one of our visualization tools (Scatter, Map, Timeline), the Place Explorer, or use the APIs via Python notebooks or build your own REST based application. Explore the thousands of variables in the Statistical Variable Explorer. Contribute to the project on Github .

New update

We have launched a new Data Download Tool that allows you to easily download statistical data for a large number of places with just a few button clicks.

Read our blog post for more details.

By the numbers

We have continuously added data to the Data Commons graph over the last 4 years. As of April 2022, the graph contains:

  • 1.4 trillion triples
  • 3 billion time series
  • 2.9 million places
  • 100,000 variables

Separately, the Biomedical Data Commons includes 200,000 variables and 850 billion triples.

Building connections

Data Commons covers many topics, from Demographics and Economics to Emissions and the Climate. The benefit of aggregated data from across multiple data sets is that it now becomes much easier to build connections across these data sets. Here are some of our favorite data excursions … some alarming, some sad, but always illuminating.

Climate change

Max projected summer temperatures for US counties (RCP 4.5) (source: NASA) Climate change is not just about reducing carbon emissions. It is also about adaptation to the change that is already happening. The change in temperature is not as simple as 1.5°C vs 2°C vs 2.5°C. These are global averages aggregated over a period of time. At every one of those levels, there will be places that become much hotter and places that become colder. The timing of peak temperatures also changes.

Explore what temperatures might be according to the CCSM4 model :

Heart condition vs. max projected summer temperature for US counties (RCP 4.5) (source: CDC, NASA) Climate change is not just about reducing carbon emissions. It is also about adaptation to the change that is already happening. The heart condition and temperature scatter is a scatter plot of the expected peak temperatures (in 30 years) in a county with the fraction of people suffering from coronary ailments. Note the outliers in the upper right quadrant: counties like Todd County, SD and Oglala Lakota County, SD with high incidence of coronary disease can also expect some of the highest temperature rises2, something we need to prepare for.

Emissions

Annual non-biogenic CO2 emissions from large facilities across US states (source: EPA) Emissions are responsible for the climate crisis. If we look at large industrial emissions, we see that Texas is the highest emitter, at almost four times more than the next. In fact Texas emits as much as both Turkey and the UK. In Texas, we see that Harris County has one of the highest concentrations of petrochemical refineries.

Emissions are responsible not just for climate change, but are also correlated3 with a range of health conditions -- from cardiac conditions and hypertension to mental unhealthiness. If we look at lifetime cancer risks across different counties, we find St John Parish, LA to be one of the most dangerous counties from a cancer perspective.

Annual biogenic CO2 emissions from large facilities across US states (source: EPA) Biogenic emissions, i.e., emissions from biological sources, are also a big contributor to climate change. Interestingly, Florida and Georgia are the biggest emitters of biogenic greenhouse gases, thanks to the sugar and paper mills. Florida is also the biggest producer of industrial Nitrous Oxide (NO2), thanks to the production of industrial chemicals in Escambia County. This is followed closely by Ascension Parish in Louisiana. These two counties produce more NO2 than most other states!

Water

Water withdrawal trends in California (source: USGS) Scarcity of water, for crops, animals and humans, could well be one of the things most at risk from climate change. California and the Southwest are some of the biggest consumers of water. However, we can see that utilization is improving. California in particular, has seen irrigation water consumption go down, while increasing agriculture yields. Household water consumption has stayed flat over the last 30+ years, while population has gone up substantially.

However, digging deeper we find that in Imperial County (the third highest county in terms of water consumption), the use of groundwater has risen sharply even though overall water consumption, including surface water use, has gone down.

Water withdrawal for irrigation vs. projected temperature rise across US counties (source: USGS, NASA) Which are the places that might be most impacted by temperature changes? While a temperature rise does not immediately imply it, this bivariate map shows places that use the most water for irrigation, correlated3 with the places that might have the highest temperature increases.

Covid-19

Fraction of positive Covid-19 cases vs. Fraction of uninsured across US counties (source: US Census, New York Times) As many insightful articles from the New York Times and others pointed out, Covid-19 affected African American communities much more. Unfortunately, Covid-19 prevalence is correlated3 with many other indicators. For example, we see that Covid-19 infection rates are highly correlated with the fraction of the population that is uninsured, with the fraction of the population in poverty, the fraction of the population on food stamps, etc.

Of course, these are just correlations. This Colab notebook digs deeper, performing a causal analysis to discover the most variables most causally predictive of Covid-19 occurrence and morbidity.

Income and other inequalities

Prevalence of obesity vs. Fraction of population in poverty (source: US Census, CDC) Studies from CDC and others have shown a correlation3 between obesity and poverty. In this Colab notebook , we explore the relation between poverty, unemployment and obesity. Unfortunately, many other medical conditions are inversely correlated with economic well being.

Explore the relation between these variables for counties across the US:


1. RCP 2.6 (optimistic), represents a stringent mitigation scenario, while RCP 8.5 (pessimistic) represents a scenario with very high Greenhouse Gas emissions. Source: IPCC

2. Ponjoan, Anna et al. “Effects of extreme temperatures on cardiovascular emergency hospitalizations in a Mediterranean region: a self-controlled case series study.” Environmental health : a global access science source vol. 16,1 32. 4 Apr. 2017, doi:10.1186/s12940-017-0238-0

3. Correlation does not imply causation. See this guide for more on correlation.