Guide 7 — What Changed
Architecture After Round 19
Principles instead of summaries, algorithms on top, one contradiction engine
This documentation describes Theseus Codex's infrastructure and methodology. It does not expose private firm materials, uploaded source documents, or unreleased internal records.
- For
- Readers who want to know what the recent rebuild changed and why.
- Summary
- The system was rebuilt around a different idea about what it should store. Earlier, the extractor produced first-person summaries of what an author seemed to be saying. The firm cannot act on that. The pipeline was reshaped so the stored object is a third-person, generalizable principle — and a new layer of algorithms was added on top to turn principles into structured reasoning when their conditions match an observed input.
The core change
The extractor no longer emits first-person summaries. If a source span is purely autobiographical and no underlying principle can be extracted from it, the extractor logs that fact and emits nothing. If a principle is present, it is stored with structure: principleKind, domainOfApplicability, quantifiableProxies, decisionExamples, and a verbatim source anchor.
This is the load-bearing change. Everything else in the rebuild either follows from it or cleans up something that was getting in the way of it.
Algorithms as a new layer
A principle says what is true. An algorithm says when, given what can be observed, the principle predicts something specific. The algorithm is the bridge between abstract principle and concrete prediction.
Each algorithm names the observables it watches, the condition that has to be true for it to fire, the principles it is reasoning from, and the structured output it produces. The reasoning chain inside an algorithm cites the principles it depends on, so a prediction is always traceable.
One contradiction engine
An earlier family of six contradiction heuristics — different rules that disagreed with each other and were hard to calibrate — was replaced with a single detector that returns a calibrated score, a confidence band, and a human-readable explanation. The legacy heuristics are deprecated and no longer write new rows.
A related change: contradictions no longer have a manual resolve button. Operator clicks were being treated as authoritative, which was wrong — the record should reflect what the sources jointly imply, not which side an operator preferred on a given afternoon. Contradictions now resolve when new source material weights one side decisively over the other.
Provenance at upload time
Every piece of source material now carries one of four labels, chosen when it is uploaded: proprietary (the firm wrote it), endorsed external (someone else wrote it, but the firm explicitly endorses it as representative), studied external (reference material, read but not endorsed), and opposing external (material the firm disagrees with, kept for the value of testing positions against it).
Why this matters: the system used to flag contradictions between the firm's principles and opposing material it was reading — which is noise, because the firm expects to disagree with that material. Provenance demarcation lets the contradiction engine skip those cross-provenance pairs and surface only the contradictions that are actually news.
Memos as the canonical output
When the system answers a question, the answer takes a fixed memo shape: a TL;DR, the question, the governing principles, the observed inputs, the reasoning chain, the implied position, what would change the firm's mind, and any caveats or abstentions. The memo is the canonical output.
The system explicitly prefers to abstain over making a chain of reasoning it cannot ground. If there are no governing principles, or the principles directly contradict each other, or the confidence band is too wide, or the question itself is unformed — the memo says so, by name.
"Bet" was generalized
A bet used to mean a financial position on a prediction market or an equity. It now means any falsifiable commitment of firm resources: a financial position, a public statement of position with no money behind it, an internal allocation of operator time or hiring direction, or a scientific prediction that resolves against external data. All four are tracked in one place. The firm's edge is the principle layer, not any specific way of expressing a view.
A knowledge graph view
The corpus is no longer a flat list of claims. There is a cross-source graph view where principles, sources, algorithms, memos, and concepts are nodes, and the relationships between them are edges (derived from, contradicts, supports, applies to, cites). The accompanying reasoner produces a grounded explanation of why an edge exists, and refuses to fabricate connections the data does not support.
This is the reading view for "what does the corpus jointly imply about X?" — a question that was not directly answerable before the underlying structure existed.