Frequently Asked Questions

Q: What is Data Commons?
Data Commons is a project started at Google, and is intended to be a community effort. Data Commons is a collaborative initiative with a mission to maintain and provide access to useful structured data on the Internet in an easy to use manner. Our goal is to create a knowledge graph that contains a significant fraction of the world's structured data. The Data Commons graph will be open, like the web, where anyone should be able to build applications on top of this graph and anyone should be able to add to it.

Q: What is the difference between Data Commons and public dataset projects like Dataverse, Kaggle datasets, Google Big Query Public Datasets, etc.?
Data Commons provides Data-as-a-Service directly rather than "downloadable datasets" in public dataset projects like Kaggle or Google Big Query Public Datasets. The underlying data indeed comes from different datasets and sources, but entities in different datasets are resolved such that users can directly query for the data, without specifying the source. In other words, users of Data Commons do not need to search and find datasets, or reconcile structure / schema across different datasets, or create common identifiers before joining across various sources of data.

Q: What is the relation between and builds upon on the vocabularies defined by, with additional terms defined to cover concepts (e.g. "citizenship") that are important to the data in Data Commons but which have not been a priority for Web markup. The Data Commons schemas constitute an "external extension" to, similar to that provided by GS1. Some schemas could migrate into if the community find value in them.

Q: What are the kinds of entities you are resolving now? Which ones are you not resolving?
At this time we are resolving entities as it relates to a geographical location or a place. For example "population", "weather", and "crime statistics" in a particular "city". We will continue to expand the set of entities that will be resolved like organizations, people, events, products, etc. The long term goal is to be able to resolve any arbitrary entity using Reference by Description.

Q: What are the usage rights of the data in Data Commons?
Data Commons knowledge graph, and the compilation of the datasets is licensed under CC BY. The Data Commons REST API and the R, Python Libraries are released under Apache License 2.0. The data included in Data Commons Graph come from different sources. The source of the data (provenance) is provided for all the data. Provenance includes the URL of the source of the data. While effort is made to obtain data from sources which offer unrestricted usage of underlying data, terms of use of this data may be subject to different licenses and terms of use as specified in the URL of the provenance.

Q: How can we access data in Data Commons?
The data in knowledge graph can be accessed through the Data Commons Graph Browser and API's for Python, R, REST and Google Sheets.

Q: How can we add our own data to knowledge graph?
Data Commons is intended to be a community project and seeks your involvement. To know more about publishing data that can be included into Data Commons, check out Get Involved page. You can also contact if you have an interesting dataset that you think should be included in Data Commons and would like to help. In the future we plan to allow users to ingest data into the Data Commons Graph using an upload tool. We will update the community when this functionality is released.

Q: How long will you store the data for?
Data Commons is not an archival service. We collect the data, build the knowledge graph and provide access to the Graph. As with any website, long term storage and safekeeping of the data is the responsibility of the primary publisher.

Q: Where can I download all the data?
Given the size and evolving nature of the Data Commons Graph, we prefer you access it via the APIs. If your project needs local access to a large fraction of the Data Commons Graph, please contact .

Q: How much does this service cost to use?
The public data in the Data Commons Graph is hosted on Google Cloud platform by Data Commons and is made available for users. There is no cost for data itself, when it is publicly available for free. The usage limits for the service beyond free tier quota will be in line with pricing of Big Query Public dataset program. In the future when more data is added to the knowledge graph by users - just like the Web, we expect some data to be free, some data to be private, and some data may have an associated cost to access.

Q: What are the SLAs / Performance levels we can expect?
The service is provided on an as-is basis with no SLA or commitments on availability or uptime.

Q: How do we know if the data is accurate?
Data Commons provides an access mechanism to data and makes no commitment on accuracy data. Answers to queries will include the provenance (source of the data). Choice of which data to use, based on source, is in developer's control.

Q: How often is the data refreshed?
We intend to refresh the dataset to reflect the evolving nature of underlying data and as more data sources are included. However, we don't have a set schedule for periodic update of data.

Q: I have a question / feedback. Whom do I contact?
You can post your question on the GitHub forum or contact