{"ok":true,"data":{"v":1,"schema":"theseus.methodology.manifest","generatedAt":"2026-06-10T10:53:18.212Z","methods":[{"name":"classify_claim_type","version":"1.0.0","description":"Classifies a claim into discourse categories (METHODOLOGICAL, SUBSTANTIVE, etc.).","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"contradiction_geometry","version":"1.0.0","description":"Detects contradiction via Hoyer sparsity of embedding difference vectors.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":2,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"contradiction_probe","version":"1.0.0","description":"Predicts the embedding-space neighborhood where a new proposition's logical contradiction should lie, then surfaces nearby existing propositions as unconfirmed candidates.","status":"active","depth":1,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"decompose_voice","version":"1.0.0","description":"Decomposes a founder's voice into an intellectual profile with orientation scores.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"external_claim_match","version":"1.0.0","description":"Ingests external literature and matches claims against internal positions.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"extract_claims","version":"1.0.0","description":"Extracts atomic truth-apt claims from a text chunk using an LLM.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"extract_methodology","version":"1.0.0","description":"Extracts portable methodology profiles from a source text.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":3,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"extract_prediction","version":"1.0.0","description":"Extracts falsifiable world predictions from a single claim using an LLM.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"method_candidate_extractor","version":"1.0.0","description":"Scans ingested artifacts for passages describing a methodology and extracts structured method candidates via regex + LLM.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":null},{"name":"nli_scorer","version":"1.0.0","description":"NLI cross-encoder scorer for claim pair coherence using DeBERTa.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"six_layer_coherence","version":"1.0.0","description":"Six-layer coherence aggregation with 4/6 majority voting.","status":"active","depth":1,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":2,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"suggest_research","version":"1.0.0","description":"Generates research topics, empirical anchors, and reading lists after a discussion.","status":"active","depth":0,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":0,"lastReviewDate":"2018-10-20T01:46:40.000Z"},{"name":"synthesize_conclusion","version":"1.0.0","description":"Registers a substantive conclusion and returns method calibration feedback.","status":"active","depth":2,"domain":null,"conclusionsProduced":0,"calibration":null,"drift":{"state":"ok","lastActiveAt":null},"publicFailureModeCount":2,"lastReviewDate":"2018-10-20T01:46:40.000Z"}],"edges":[{"dst":"contradiction_geometry","src":"contradiction_probe"},{"dst":"nli_scorer","src":"six_layer_coherence"},{"dst":"extract_claims","src":"synthesize_conclusion"},{"dst":"nli_scorer","src":"synthesize_conclusion"},{"dst":"six_layer_coherence","src":"synthesize_conclusion"}],"publicFailureModes":[{"method":"contradiction_geometry","name":"short_text_collapses_sparsity","severity":"high","description":"Hoyer sparsity is unstable for very short texts (under roughly\nfive words). The difference vector inherits too little geometric\nstructure, so the sparsity score swings around the threshold and\ncontradiction calls become essentially random.\n","trigger":"Either input span is fewer than ~5 tokens, or the texts are\nslogans, headlines, or single-clause assertions. Watch for\nverdicts that flip when whitespace or punctuation is normalised.\n","mitigation":"Require both texts to clear a minimum token length (default ≥ 8\ntokens) before trusting the sparsity verdict; otherwise route\nto an explicit NLI model.\n"},{"method":"contradiction_geometry","name":"implicit_pragmatic_contradiction_missed","severity":"medium","description":"Contradictions that hinge on world knowledge or implicit\nreasoning (\"All swans are white\" vs \"I saw a black bird in the\nswan pond\") often fail to produce a sparse difference vector.\nThe model encodes the surface concepts but not the reasoning\nstep that makes them contradict.\n","trigger":"The conclusion relies on enthymematic reasoning, common-sense\ninference, or domain knowledge not present in the embedding\ntraining data.\n","mitigation":"Pair the geometry call with an LLM-judge layer and treat the\ngeometry-only verdict as advisory whenever the conclusion text\nuses domain terms of art.\n"},{"method":"extract_methodology","name":"profile_inflates_method_from_thin_text","severity":"high","description":"The deterministic profile pipeline is structured to always\noutput reasoning_moves, assumptions, and transfer_targets. On\nthin source text it confabulates plausible-looking entries\nrather than abstaining. The profile then survives downstream\nreview because its shape looks complete.\n","trigger":"The source upload is short (under ~500 words), is largely\nnarrative, or lacks an explicit reasoning step. The output\nprofile has full coverage of every field despite the input\noffering little to abstract.\n","mitigation":"Gate the profile on a per-field source-anchor count; require at\nleast one anchor per non-empty field, and emit an explicit\n\"no-profile\" sentinel when the source cannot support one.\n"},{"method":"extract_methodology","name":"transfer_targets_smuggle_source_topic","severity":"medium","description":"Transfer targets are supposed to describe portable arenas for\nthe method, not the original conclusion's topic. When the\nsource is heavily on-topic, the extractor frequently lists the\noriginal domain itself as a transfer target.\n","trigger":"The source upload's topic_hint is one of the listed transfer\ntargets, or the transfer targets are minor variations on the\nsource title.\n","mitigation":"Reject transfer targets that share more than 60% token overlap\nwith the source's topic_hint or title; require at least one\ntarget from a distinct domain category.\n"},{"method":"extract_methodology","name":"failure_modes_field_left_empty_treated_as_safe","severity":"high","description":"The methodology profile contract has a failure_modes field, but\nthe deterministic extractor often returns it empty. Downstream\nconsumers read \"no listed failure modes\" as \"this method is\nsafe to apply broadly\", inverting the intended meaning of the\nblank field.\n","trigger":"The profile's failure_modes list is empty AND the profile is\nbeing used to justify cross-domain transfer.\n","mitigation":"Treat blank failure_modes as \"not assessed\" rather than\n\"assessed and clean\"; refuse transfer-target use of a profile\nuntil at least one failure mode is recorded or a deliberate\nopt-out is filed.\n"},{"method":"six_layer_coherence","name":"judge_layer_skipped_biases_unresolved","severity":"high","description":"When the LLM-judge layer (S6) is disabled via skip_llm_judge=True,\nthe sixth vote defaults to UNRESOLVED. The 4/6 supermajority\nbecomes harder to reach, so the aggregate verdict drifts toward\nUNRESOLVED even on pairs where the mechanical layers agree.\n","trigger":"The conclusion was produced in offline batch mode, on a tight\ncost budget, or with skip_llm_judge=True. Watch for clusters of\nUNRESOLVED verdicts on pairs the firm previously considered\ndecidable.\n","mitigation":"Re-run the contested pair with the judge enabled, or treat\nUNRESOLVED outputs from skip_llm_judge runs as \"needs human\"\nrather than \"indeterminate\".\n"},{"method":"six_layer_coherence","name":"argumentation_layer_starves_without_neighbors","severity":"medium","description":"The argumentation layer (S2) needs neighbour claims and\nprecomputed pairwise contradiction scores to construct an\nacceptable extension. When those are absent — early in a project,\nafter a corpus reset, or for an isolated claim — S2 emits a weak\nor null signal and effectively abstains.\n","trigger":"The conclusion is the first or one of very few claims in its\ncluster, the contradiction score table has not been backfilled,\nor the pair sits far from any neighbour in embedding space.\n","mitigation":"Check the argumentation layer's neighbour count before trusting\nits abstention; backfill contradiction scores for the cluster\nbefore treating S2 outputs as evidence.\n"},{"method":"synthesize_conclusion","name":"single_method_attribution_loses_interaction","severity":"medium","description":"Each conclusion is attributed to a single reasoning method, but\nmost conclusions arise from a combination (empirical observation\nfiltered through first-principles reasoning, etc.). The\nsimplification keeps calibration tractable but the per-method\ntrack record then misses interaction effects entirely.\n","trigger":"The conclusion's reasoning text mentions more than one method by\nname, or its source profiles span multiple pattern types.\n","mitigation":"Allow multi-method attribution with weights; until then, surface\nthe secondary method in the audit trail and exclude such\nconclusions from per-method calibration aggregates.\n"},{"method":"synthesize_conclusion","name":"cold_start_calibration_silence_misleads","severity":"medium","description":"Calibration feedback is suppressed below 3 resolved conclusions\nper method. The dashboard shows no signal during early use,\nwhich the firm tends to read as \"the method is working\" rather\nthan \"we have not measured it yet\".\n","trigger":"The method has fewer than 3 resolved conclusions in its track\nrecord AND a founder is using its absence-of-warnings to\njustify a new application.\n","mitigation":"Render the cold-start state as an explicit \"untested\" badge in\nreviewer-facing UI; require an alternative justification when a\nmethod with fewer than the suppression threshold is invoked.\n"}],"publicTrackRecords":[]},"meta":{"schemaVersion":1,"generatedAt":"2026-06-10T10:53:18.212Z"}}