Guide 5 — Forecasts and Portfolio
Forecasts and Portfolio
Prediction-market opinions, the decision trace, and the eight gates
This documentation describes Theseus Codex's infrastructure and methodology. It does not expose private firm materials, uploaded source documents, or unreleased internal records.
- For
- Readers who want to understand how a forecast is generated, scored, and (when authorized) translated into a real-money bet.
- Summary
- The Forecasts surface tracks markets on Polymarket and Kalshi, retrieves relevant firm conclusions, and builds a deterministic decision trace for each market. Live trading requires eight successive human gates; paper mode is the default. After resolution, every prediction is Brier-scored, log-loss-scored, and added to the public calibration manifest.
How a forecast is generated
For each market the firm decides to score, the workshop runs a conservative sequence.
- Retrieve the firm's most relevant conclusions and claims by embedding search. The bundle must contain at least three distinct conclusions; otherwise the workshop abstains.
- Check the near-duplicate window. If a forecast on a similar market was published in the last 24 hours, the workshop abstains.
- Check the market-close buffer. If the market closes within an hour, the workshop refuses to predict. Stale-but-open markets are the firm's most expensive losses.
- Call a language model with a strict JSON contract: probability, confidence band, headline, reasoning body, uncertainty notes, citations.
- Validate every quoted span against the cited source — verbatim, character-by-character. Fabrications cause the prediction to abstain.
- Build the Market Decision Trace.
- Persist the prediction with status PUBLISHED, with the trace and one citation row per quoted span.
The Market Decision Trace
After the language model has produced a probability and a written rationale, the workshop builds a separate object — fully deterministic — that computes a small set of metrics from the inputs, runs a fixed rule graph over them, and produces an action (HOLD, WATCH, PAPER, or LIVE) and a stake recommendation. No randomness, no further model calls.
The deterministic trace is the artifact the firm is willing to defend. The model's prose explains, it never overrides. If the prose says "buy" but the trace says WATCH, the trace wins.
- Edge estimate — firm probability versus market price.
- Confidence and locality — how on-domain the retrieval was.
- Liquidity, contradiction load, decay status.
- Each rule's firing — which threshold was crossed, which veto triggered, which combination escalated to the next tier.
- The final action and stake.
- A version string, so a later refactor that changes the trace format is detectable rather than silently mixed with old data.
The eight-gate safety architecture
Each gate is a separate human action. No gate auto-promotes the next; if any gate fails, every later gate refuses.
- 1. Exchange credentials configured.
- 2. Scheduler ingesting and monitoring. Absence of fresh rows blocks every downstream step.
- 3. Paper mode validated. Paper bets have been written, scored, and reconciled for long enough that the firm trusts the decision pipeline.
- 4. Risk caps configured. Maximum per-bet stake and maximum daily loss are set to numbers the firm is willing to lose. The submitter checks both on every bet.
- 5. Master live flag on. Without this, no other authorization counts.
- 6. Per-prediction live authorization. The operator flags a specific forecast row as live-eligible.
- 7. Per-bet live confirmation. For each individual bet, the operator clicks confirm. The system refuses to submit without it.
- 8. Kill switch clear at submit time. The submitter re-evaluates the kill switch immediately before placing the order. If the kill switch flipped between confirmation and submission, the bet is dropped.
Resolution and the public calibration manifest
When a market closes, the workshop polls the venue and writes a resolution carrying the outcome (YES, NO, CANCELLED, or AMBIGUOUS), the Brier score, the log-loss, the calibration bucket, and the raw settlement payload.
The firm publishes its own track record. A public calibration manifest shows, per probability bucket, the number of predictions and the realized YES rate. The horizon view shows the same broken down by time-to-resolution. The manifest is regenerated on a schedule.
This is one of the most uncomfortable surfaces the firm offers: it is the firm publicly grading itself on every prediction it has staked publicly. If the firm has been consistently overconfident at a given bucket, the public page shows it.
Resolution overrides
If a founder believes the venue resolved incorrectly, the chain handles it: a resolution override records an alternative resolution with citation and reason, a mismatch row logs any later venue disagreement, and an append-only revision history preserves prior settlement for audit.