Scaling Synthesis

Search IconIcon to open search

A Short Introduction to the Underlay

Last updated March 17, 2023


The Opportunity

Human knowledge is an ever-expanding resource. Unfortunately, more knowledge does not necessarily lead to more understanding. Our methods for sharing knowledge are collapsing under the weight of distrust of science and journalism, false beliefs, misinformation campaigns, and deliberate fraud —not to mention the sheer quantity of information. Think of how overwhelming it would be to read everything that has been said on a given topic, much less decide what of it is true. Fortunately, the information technologies that helped create these problems may also help us solve them. Eventually, intelligent software will enable us to filter, judge, and connect all public information. Here, we describe how we are making this possible by sharing public assertions of knowledge in a form that can be more readily processed by such software.

The Underlay Project

==The creation of richer collections of machine-readable knowledge is inevitable. What is not inevitable is that such knowledge will be connected in a meaningful way and be freely available as a public resource.== We are at a fork in the road. The transition to machine-mediated access  could consolidate our dependence on a few large commercial intermediaries, or alternatively, it could be built as open infrastructure, as a public good. 

Just as it took decades of work to weave the World Wide Web, building the Underlay will be a massive worldwide effort. Like the web, it will be built primarily by those who connect their information to the shared system for their own purposes. Our initial effort is focused on building what is required to enable that. 

We are now just beginning to build the software tools that implements these ideas. We plan to test and refine these tools by applying them to use cases in a few specific areas of knowledge, such as scientific information about COVID-19, that are of interest to our early sponsors. Fields of coverage will expand over time, eventually linking many topics and formats of published knowledge. Our goal is to build a sound framework that can grow to a distributed, industrial-scale effort. 

How the Underlay works

Any assertion can be updated or contradicted, perhaps even by the same source, but the updates will have different timestamps. Some registries may choose to stop storing an assertion while others retain it. ==Some will make an assertion about a statement’s veracity or value. Others will take these judgments into account in deciding what to store. This leads to many independent editorial judgments of what to believe and what to store. Consensus is not required, and not even expected.== 

How the Underlay is different from other public knowledge bases

One way to understand the Underlay is to compare it to other open collections of knowledge. 

Wikipedia is a set of public knowledge about notable entities presented in an illustrated natural language format. Distinct versions of Wikipedia exist in different languages. It may be the most widely used reference publication ever created. Much of its knowledge is not yet easily interpretable by machines. Articles contain structured elements, including categories and often an “information box” with roughly standardized fields. As in the Underlay, additions can be asserted by anyone. A prefilter for “notable” topics limits what is included in most language editions. Knowledge is attributed, usually to pseudonymous editors, and cited to a source roughly once a paragraph. ==What makes Wikipedia different than the Underlay is that its assertions are not machine-readable, and it covers only a few million topics, making it tiny by comparison with commercial knowledge graphs.==

==Public databases such as the U.S. patent database, SEC filings, citation indexes, catalogs, the human genome, star atlases, linguistic lexicons, and zoological taxonomies are usually machine- readable, but not stored in any consistent format. Some, like the Allen Brain Atlas, are actively maintained, expanded, and made available for the public good.== The Underlay would both take advantage of these open efforts and expand their usefulness by making the knowledge within them available in a common format and connecting them with other types of knowledge. 

==Blockchains are distributed public ledgers of assertions. Like the Underlay, they are distributed, with no storage node having special status. Their primary purpose is to ensure agreement among users about what transactions took place between them, which is not a requirement of the Underlay. Blockchains may have a role in implementing the Underlay, but because they have no standard representation of knowledge and are difficult to scale, they are not in themselves a solution to the problem of sharing public knowledge. ==

Several projects have tried to represent vetted general knowledge in an open machine-readable format, such as DBpedia and Opencyc. These databases differ from the Underlay in attempting a consistent version of the truth, rather than all (possibly contradicting) assertions and their provenance. ==These are an excellent source of assertions for the Underlay, and the Underlay may be a source of provenance and reliability data for these vetted collections, as it stores attributed assertions about the validity of other assertions. Such curated databases could be represented as sources in the Underlay. ==

==The public databases most similar to the Underlay are Freebase and Wikidata, open entity-relationship databases with limited provenance. Like the Underlay, each preserves the history of assertions; sourced assertions can come from anyone; they can contradict other assertions; and relationships are language-independent.== (The recent launch of Abstract Wikipedia may improve further on the latter.) Both have assertions added by curators around the world. Like the Underlay, they include assertions automatically extracted from other published sources. They have some representation of authorship, although not as rich a representation of provenance as that envisioned for the Underlay. 

==Freebase and its toolchain was acquired by Google and used to build what is now the Google Knowledge Graph. At the time Freebase was acquired, it had about 100 million assertions, many of which have now become part of Wikidata. The Google Knowledge Graph has grown steadily since then, adding other public and private data, now fully intermixed so that none of these updates are available to the public.== Today it has hundreds of billions of assertions, and is used widely for search, advertisement placement, generating dynamic presentations, and much more. It is the knowledge base that is most similar to the envisioned Underlay, except for the important difference that it is managed as a proprietary resource, rather than as a public good. The Underlay also supports richer presentations of provenance and improved methods of connecting independent collections of knowledge. 


The Underlay is a long-term global project. It has the potential to become part of humanity’s basic infrastructure, enabling wider and more useful access to public knowledge. Eventually, if we are successful, it may change how we discover and understand what is known. 

For the original whitepaper, see: The Future of Knowledge