• Factchecks
  • Blog
  • Documentation
  • Browser
  • Home
  • Frequently Asked Questions

    Q: What is dataCommons?
    dataCommons is a project started by Google, and is intended to be a community effort. dataCommons is a collaborative initiative with a mission to maintain and provide access to useful structured data on the Internet in an easy to use manner. Our goal is to create a knowledge graph that contains a significant fraction of the world’s structured data. The dataCommons graph will be open, like the web, where anyone should be able to build applications on top of this graph and anyone should be able to add to it.

    Q: What is the difference between dataCommons and public dataset projects like Dataverse, Kaggle datasets, Google Big Query Public Datasets, etc?
    dataCommons provides Data-as-Service directly rather than ‘downloadable datasets’ in public dataset projects like Kaggle or Google Big Query Public Datasets. The underlying data indeed comes from different datasets and sources, but entities in different datasets are resolved such that users can directly query for the data, without specifying the source. In other words, users of dataCommons do not need to search and find datasets, or reconcile structure / schema across different datasets, or create common identifiers before joining across various sources of data.

    Q: What is the relation between dataCommons.org and Schema.org?
    dataCommons.org builds upon on the vocabularies defined by Schema.org, with additional terms defined to cover concepts (e.g. 'citizenship') that are important to the data in dataCommons but which have not been a priority for Schema.org-based Web markup. The dataCommons schemas constitute an 'external extension' to Schema.org, similar to that provided by GS1. Some schemas could migrate into Schema.org if the community find value in them.

    Q: What are the kinds of entities you are resolving now? Which ones are you not resolving?
    At this time we are resolving entities as it relates to a geographical location or a place. For example ‘population’, ‘weather’ and ‘crime statistics’ in a particular ‘city’. We will continue to expand the set of entities that will be resolved like organizations, people, events, products, etc. The long term goal is to be able to resolve any arbitrary entity using Reference by Description.

    Q: What are the usage rights of the data in dataCommons?
    dataCommons knowledge graph, and the compilation of the datasets is licensed under CC BY. The dataCommons API and the Python Libraries are released under Apache License 2.0. The data included in dataCommons knowledge graph come from different sources. The source of the data (provenance) is provided for all the data. Provenance includes the URL of the source of the data. While effort is made to obtain data from sources which offer unrestricted usage of underlying data, terms of use of this data may be subject to different licenses and terms of use as specified in the URL of the provenance.

    Q: How can we access data in dataCommons?
    The data in knowledge graph can be accessed through dataCommons browser and Python Query API.

    Q: How can we add our own data to knowledge graph?
    dataCommons is intended to be a community project and seeks your involvement. To know more about publishing data that can be included into dataCommons, check out Get involved section. You can also contact support@dataCommons.org if you have an interesting dataset that you think should be included in dataCommons and would like to help. In the future we plan to allow users to ingest data into the dataCommons knowledge graph using an upload tool. We will update the community when this functionality is released.

    Q: How long will you store the data for?
    dataCommons is not an archival service. We collect the data, build the knowledge graph and provide access to the Knowledge Graph. As with any website, long term storage and safekeeping of the data is the responsibility of the primary publisher.

    Q: Where can I download all the data?
    Given the size and evolving nature of the dataCommons Knowledge Graph, we prefer you access it via the APIs. If your project needs local access to a large fraction of the Know Graph, please contact support@dataCommons.org .

    Q: How much does this service cost to use?
    The public data in the dataCommons knowledge graph is hosted on Google Cloud platform by dataCommons.org and is made available for users. There is no cost for data itself, when it is publicly available for free. The usage limits for the service beyond free tier quota will be in line with pricing of Big Query Public dataset program. In the future when more data is added to the knowledge graph by users - just like the Web, we expect some data to be free, some data to be private, and some data may have an associated cost to access.

    Q: What are the SLAs / Performance levels we can expect?
    The service is provided on an as-is basis with no SLA or commitments on availability or uptime.

    Q: How do we know if the data is accurate?
    dataCommons provides an access mechanism to data and makes no commitment on accuracy data. Answers to queries will include the provenance (source of the data). Choice of which data to use, based on source, is in developer's control.

    Q: How often is the data refreshed?
    We intend to refresh the dataset to reflect the evolving nature of underlying data and as more data sources are included. However, we don't have a set schedule for periodic update of data.

    Q: I have a question / feedback. Whom do I contact?
    You can post your question on the GitHub forum or contact support@dataCommons.org