Scaling Synthesis

Search IconIcon to open search

How do you scale and distribute synthesis?

Last updated June 7, 2023

# Scaling synthesis by focusing on individual synthesis and allowing distributed synthesis out of band

A common assumption from builders is that the answer is to focus on individual synthesis with the expectation that distributed synthesis will happen out of band. How do people come to agree on queryable schemas is an infamously hard problem, especially when balanced against the fact that many experts prefer to structure later or not at all, and that the limited amount of structure people put into their own notes will likely be specialized for their own use case. We were surprised to find in our research that only a couple of our participants claimed that they synthesized with others at all, for reasons discussed in Synthesis is hard to do with people who don’t share context with you.

# Scaling synthesis through software that empowers users with advanced data modeling and exploration capabilities

We claim that a discourse graph structure, covering at the minimum typed nodes with degrees of certainty could be enough to bridge this tension when combined with affordances for structure in hindsight, structuring by individuals who were not the original author (see It will be important to capture the potential energy of information consumption), automatic logical structuring similar to Jump, an interface designed for Search Behavior such that it is simple to bring new people up to speed in a domain, and a A DSL for a discourse graph with information entry, visualization, and retrieval to facilitate composability and explicit articulation of relationships.

Hypertext enables communication with high information density, and as such is a core piece of the puzzle. People naturally try to enact typed distinctions in their notes, and starting with nodes such as claims, questions, evidence, people, and keywords are a solid enough starting point for modeling the discourse. Simply being able to see the relationships between nodes of different types, such as “what questions are attracting the most attention,” “what evidence supports this claim,” and “which questions lead into other questions,” will cut through a large amount of noise. Most scholarly communication infrastructure operates on the document as the base unit, those documents have non-descriptive titles, and will often make so many points that it is difficult to understand what people are citing in particular when they cite a paper. Graph views of nodes composed of simple keywords are often difficult to interpret, whereas full sentences of claims and questions present the appropriate level of abstraction to understand the relationships between points.

We should note as well that, if we expect people apart from the author to add metadata and context to the original author’s work, they need to be able to do it nondestructively. This is a core thesis for Codex OS and its use of Standoff Annotation as the basis for its graph database rather than inline markup. This friction with editing prior work by yourself or others is discussed somewhat in Thought processors struggle with multiplicity. Also undo history.

# Scaling synthesis by routing people to people with knowledge

Another perspective on the titular question is that networks of people are a more effective information transmission mechanism than networks of knowledge, see What are the most efficient routes to useful cross-boundary knowledge from Joel Chan. The argument goes that knowledge is fundamentally contextual, it is difficult to transmit that context, A key requirement to participating in a discourse graph for a specific domain is knowing the vocabulary used in that graph, and people are a better access point for that vocabulary than anything else. Learning through discussion with people in your personal networks is the natural path that many people take, especially considering that substantial portions of knowledge critical for innovation are not written down. This is an approach to finding or creating shared context that de-emphasizes exploratory information search. Perhaps we need a synthesis of the two: An ideal decentralized knowledge graph would map a social graph and a knowledge graph.

# Scaling synthesis with incentives to encourage desirable social sensemaking strategies

Others claim that incentives are a core part of the puzzle to facilitate a semantic web and are a large reason why previous efforts have failed. As discussed in Where the semantic publishing rubber meets the scholarly practice road, academic publishing is an entrenched system and it’s difficult for academics to work outside of those incentives. They need to publish papers as pdfs to keep their job. While we agree that incentives are a core part of the puzzle for problems such as encouraging people to share and curate knowledge for other people, it is outside of the scope of our own research, which focuses more on the technical details of valuable tooling.