Frequently Asked Questions

Q: What is Data Commons?
Data Commons, an open source initiative from Google, organizes the world’s publicly available information and makes it more accessible and useful. Learn more on About Data Commons.
Q: Who can use Data Commons?
Data Commons is available for anyone to use. Our goal is to make the world’s publicly available data more helpful to people and organizations working on the big societal challenges like climate change, food security, or economic inequity.
Q: Is Data Commons free or is there a cost to use it?
There is no cost for the publicly available data, which is hosted on Google Cloud by Data Commons. For individuals or organizations who exceed the free usage limits, pricing will be in line with the BigQuery public dataset program.
Q: What is the difference between Data Commons and other other public dataset projects?
Many public dataset projects provide a great service by aggregating topical open data sets. However, using those data sets to answer specific questions often involves 'foraging' — finding the data, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, etc. This error-prone and tedious process is repeated, once (or more) by each organization working on an issue. This is a challenge in almost every area of study involving data, from the social sciences and physical sciences to public policy. Data Commons does this work once, on a large scale, and provides cloud-accessible APIs to the cleaned, normalized and joined data. While there are millions of datasets in every domain, some collections of data get used more frequently than others.
Q: What is the difference between Data Commons and Wikidata?
The focus in Data Commons is on aggregating external, already available data (with an emphasis on statistical data) from government agencies and other authoritative sources.
Q: What is the relation between DataCommons.org and Schema.org?
DataCommons.org builds upon the vocabularies defined by Schema.org, with additional terms defined to cover concepts (e.g. "citizenship") that are important to the data in Data Commons but which have not been a priority for Schema.org-based Web markup. The Data Commons schemas constitute an "external extension" to Schema.org, similar to that provided by GS1. Some schemas could migrate into Schema.org if the community finds value in them.
Q: What is the new Explore interface to Data Commons?
Data Commons has a new Explore interface that uses large language models (LLMs) to map your natural-language question to the public data sets to extract the right visualizations to your question. We do not use LLMs to generate any data or visualizations; all responses are based on real data with sourced provenance from Data Commons.
Q: How do you choose which dataset to show in the Explore interface?
The LLMs powering Data Commons’ Explore interface use generative AI to identify the most likely response to your query. As we continue to improve the interface, we will look to provide more options that allow users to select sources themselves.
Q: Is my data suitable for adding to Data Commons?
Data Commons is intended for public, statistical, macro data that benefits from being joined with other macrodata to derive new insights. Your data is a good fit for Data Commons if it meets the following criteria:
  • It can be licensed under the Creative Commons BY (CC BY) agreement.
  • It is pre-aggregated up to a minimal level that is common to other datasets; for example, an administrative area (place).
You can run your own Data Commons instance for data that is private and not appropriate for CC BY licensing. However, micro data, i.e. individual-level data, cannot currently be aggregated by Data Commons. In general, if there is no way to join your data with existing Data Commons datasets, on a common entity such as an administrative area or institution, there isn't much benefit to using Data Commons. To determine whether your data is best served by the base Data Commons (Google-run datacommons.org) or by a custom instance that you run yourself, see the Custom Data Commons FAQ.
Q: How can I suggest adding my data to Data Commons?
Data Commons is meant to be for the community, by the community and we welcome new submissions or suggestions. If you are interested in importing your data to Data Commons, please file a data request in our issue tracker.
Q: Where can I download all the data?
Given the size and evolving nature of the Data Commons knowledge graph, you cannot download all the data. You can download all the data pertaining to a specific subset of metrics (variables). You can use the Data Download Tool to download CSV files, or you can use the APIs to programmatically retrieve data in JSON format.
Q: How do we know if the data is accurate?
Data Commons provides an access mechanism to data, but cannot ensure accuracy. To provide as much context as possible, answers to queries will include the provenance (source of the data). The choice of which data to use is up to individuals. If you find something you think is in error, please file a bug in our issue tracker.
Q: How often is the data refreshed?
Different data sources refresh at different frequencies. We try to keep the data updated as the sources publish new versions of their data. If you see something out of date, please file a bug in our issue tracker.
Q: What are the SLAs / performance levels we can expect?
The service is provided on an as-is basis with no SLA or commitments on availability or uptime.
Q: How do I cite datacommons.org?
To cite charts and tools on this site, please use the following format.
Data Commons 2024, Data Commons, viewed 22 Dec 2024, <https://datacommons.org>.

If citing data from a particular dataset, e.g. CDC Places, then use:

Data Commons 2024, CDC Places, electronic dataset, Data Commons, viewed 22 Dec 2024, <https://datacommons.org>.

In both cases, please use the date you viewed the site (in the examples above, we used 22 Dec 2024).

Q: What are the usage rights of the data in Data Commons?
The Data Commons knowledge graph, and the compilation of the datasets is licensed under CC BY. The Data Commons REST API and the R, Python Libraries are released under Apache License 2.0. The data included in Data Commons come from different sources. The data provenance is provided for all the data, including a link to the source. While we make every effort to obtain data from sources offering unrestricted usage of underlying data, terms of use of data may be subject to different licenses and terms of use, specified in the linked source of the data.
Q: Can my educational institution use Data Commons while complying with the Family Educational Rights Privacy Act (FERPA) and/or similar state privacy requirements?
Data Commons collects no personal information (PII), records, or private information from users and can be used in compliance with FERPA. For specific questions about FERPA compliance, please contact your organization’s legal counsel for advice.
Q: What data do you collect about me?
Data Commons uses Google Analytics to collect non-identifiable usage data to improve the product. We log all queries asked in the Search tool, but do not associate IP address or any other identifiers with the queries. We do use in-session cookies to be able to manage state.