Platform Overview · Biomedical Evidence Review
MedC82 Platform Overview
An open biomedical research council for evidence review, hypothesis stress-testing, and traceable scientific triage.
MedC82 combines public literature, structured biomedical databases, disease-specific evidence packs, and a six-model adversarial AI council. The platform is designed to help researchers separate validated evidence from speculative leads, identify missing evidence layers, compare cross-disease transfer plausibility, and generate falsifiable next-step experiments.
01Platform Summary
MedC82 is a disease-agnostic, study-scoped biomedical research platform. It builds structured evidence packs from literature, genetics, omics, single-cell, spatial, drug/pathway, clinical, and study-specific sources, then routes that pack through a six-model adversarial AI council that reviews the same evidence across four rounds. The final report is a canonical research triage surface — not raw model output.
The current strength is evidence review, hypothesis stress-testing, cross-disease caution, and falsifiable next-step framing. The platform may surface:
- validated causal anchors
- validate-first candidates
- exploratory hypotheses
- useful negative / triage results
- STOP decisions
- non-promoted follow-up ideas
| What MedC82 can support | Evidence needed | What MedC82 cannot claim yet |
|---|---|---|
| Exploratory disease-state hypothesis | Human disease-state data, outcome or cohort context where available, clear caveats, and a falsifiable next experiment | Causality, treatment response, or clinical action |
| Validated causal anchor | Mechanism-appropriate causal lane — for example, coding-variant MR, validated pQTL colocalization, or validated tissue eQTL colocalization | General causal claims outside the validated mechanism lane |
| Patient-stratification or prognostic lead | Disease-matched cohort, spatial, molecular, or outcome-linked evidence with confounder-aware validation needs | Therapeutic mechanism or treatment recommendation by itself |
| Useful triage / no-primary result | Adversarial review showing candidates are weak, off-contract, or under-validated | That the biology is false |
| Follow-up idea | Specific in-run grounding plus a concrete next experiment | That the idea is canonical or validated |
| Treatment recommendation | Not supported by current MedC82 export surfaces | MedC82 reports should not recommend treatment combinations |
02Why This Is Not Just Search
The platform is built for the gap between search and expert review: it does not simply retrieve papers, and it does not claim to prove discoveries. It organizes heterogeneous biomedical evidence into a structured pack, forces multiple AI models to argue over it, and then applies report-safety rules so the final output separates validation, caution, failed transfer, and exploratory ideas.
| Layer | What it does | What it does not do |
|---|---|---|
| Search | Retrieves documents matching keywords / semantic similarity | Does not separate validated evidence from speculative leads; does not test cross-disease transfer; does not produce a verdict |
| MedC82 evidence pack | Bounded, study-scoped object with source provenance, evidence type, disease context, causal-lane tagging, and explicit missing-layer caveats | Does not eliminate the need for expert review; does not turn association into causality |
| MedC82 AI Council | Six adversarial roles debate the same bounded evidence across four rounds and produce a structured synthesis | Does not prove truth by consensus; does not override deterministic safety gates |
| MedC82 report layer | Separates validated anchors, validate-first candidates, deprioritized / demolished candidates, and exploratory hypotheses with traceability back to source | Does not replace expert review; raw transcript proposals are not the final answer |
| Expert review | Confirms interpretation, source attribution, causal-lane framing, and translational plausibility | MedC82 is designed to support this layer, not replace it |
The value is not "AI answer generation." The value is structured evidence triage and adversarial stress-testing of research leads, with explicit no-primary and validate-first surfaces when evidence is not yet sufficient.
03Evidence and Data Layers
Evidence packs are study-scoped and connector-scoped. A connector can contribute causal evidence, association evidence, context evidence, reference evidence, safety evidence, or missing-layer caveats. The table below names the common sources and how each should be interpreted. "Where configured" indicates a connector that is integrated for some studies but not all.
| Layer | Source / connector | Contributes | Evidence class | Cannot prove | Example platform use |
|---|---|---|---|---|---|
| Literature / paper corpus | PubMed, Europe PMC, PMC Open Access, OpenAlex, Semantic Scholar, DOI metadata, retrieved corpus sources | Published evidence, biological plausibility, prior findings, outcome context, mechanistic background, caveats, and source-linked support for council claims | Literature / context | Causality by itself unless the study design and validation lane support it | Traceable support, prior art, caveats, and outcome context |
| Clinical trials | ClinicalTrials.gov, WHO ICTRP, trial-outcome connectors | Trial history, phase, outcomes, completion status | Clinical context | Mechanism or efficacy by trial existence alone | Translational caveats, failed-translation evidence |
| GEO dataset discovery | geo_dataset_discovery | Public dataset availability, accession, assay type, matrix availability, sample metadata, linked paper metadata where exact | Discovery metadata | Expression effect, causality, or evidence quality by dataset existence alone | Identifies candidate datasets for future ingestion or validation |
| Single-cell / spatial datasets | CELLxGENE, study-specific single-cell connectors, curated GEO/index records (where configured) | Disease-state, tissue, cell-state, or compartment context | Context | Causality or treatment response by itself | Cell-state, spatial, or tissue-compartment hypothesis support |
| Disease-specific outcome literature | Outcome paper connectors or literature corpus | Article-level outcome-linked context | Outcome context | Raw matrix causal proof or treatment recommendation | Cohort/outcome plausibility and validation needs |
| Cancer genomics / clinical context | TCGA, GDC, cBioPortal where configured, study-specific connectors | Molecular subtype, mutation, copy-number, expression, clinical metadata | Molecular ctx | Causality without a matching validation lane | Cohort stratification or disease-state context |
| Proteomics / phosphoproteomics | CPTAC where configured, study-specific proteomics connectors | Protein and phosphoprotein context | Molecular ctx | Causality without matching causal layer | Pathway / protein-state context |
| GWAS Catalog | GWAS connector | Disease associations | Association | Causal mechanism by itself | Association priors and candidate context |
| Open Targets / L2G | Open Targets | Genetic association and likely-gene priors | Association | Final causality alone | Candidate ranking / context |
| MR / pQTL MR / colocalization | OpenGWAS / MR-style evidence, MR-coloc connectors | Protein-abundance genetic association and coloc validation attempt | Causal lane if coloc | Positive support when colocalization fails or is not computable | Mechanism-specific causal validation or caveat |
| MR-KG literature | MR literature connector | Published MR study references | Association | Replacement for in-pack coloc | Published MR context |
| eQTL Catalogue | eQTL connector | Expression / regulatory context | Mol. assoc. | Causality without coloc | Input to tissue eQTL coloc |
| Tissue / brain eQTL coloc | Tissue eQTL coloc connector | Shared-variant expression mechanism test | Causal lane if PP4 | Coding-mechanism validation | Tissue or cell-type expression mechanism |
| Coding-variant MR | Curated coding-variant MR lane | Protein-altering variant causal lane | Causal | Protein-abundance or tissue-expression mechanism | Coding-variant mechanism validation |
| HPA | Human Protein Atlas | Tissue / atlas reference | Reference | Disease causality | Tissue expression / context |
| ChEMBL / DGIdb / OpenFDA | Drug, target, and adverse-event databases | Binding, tractability, drug-gene interaction, and adverse-event context | Druggability | Disease causal support | Target tractability and safety signal context |
| Reactome / SIGNOR / STRING | Pathway and interaction connectors | Pathway membership and interaction network context | Reference | Causality | Network / pathway context |
| Perturbation / dependency | LINCS, PRISM, DepMap, scPerturb where available | Cell-line drug / target / dependency context | Low-tier context | Human disease validation | Experimental feasibility and biological perturbation context |
| Safety / adverse events | FAERS | Post-marketing adverse-event signals | Safety ctx | Causal mechanism alone | Translational safety caveats |
| Imaging / neuroimaging | OpenNeuro, XNAT, NITRC where available | Disease-relevant imaging metadata and context | Context | Disease causality | Neuroimaging-specific stratification context |
| Missing or skipped layers | Connector absence or study-inapplicability | Missing-layer caveats | Missing / irrelevant | Should not be inferred as evidence | Explicit not-available caveats |
04How the Evidence Pack Is Built
AStudy setup
Every study is scoped by study_id. Cross-study runs can include a main study plus explicitly requested adjacent studies. Cross-study evidence is combined at retrieval / council time with source_study_id provenance, not merged as if all evidence came from one disease.
BIngestion and normalization
MedC82 ingests literature, structured databases, single-cell and spatial datasets, omics connectors, genetics connectors, and drug / pathway connectors. Entities and source metadata are normalized and carried through the evidence pack and export system. Evidence is tagged at the row level as direct, contextual, caveat, missing-layer, causal-lane, or support.
CDatapack construction
The datapack is the compact research object shown to the council. It includes:
evidence_text: human-readable structured evidence and retrieved literaturefull_context_text: broader context supplied to the council- source citations and source maps
- structured evidence rows with tags such as
[EV:...]and[DB:...] - per-gene evidence profiles
- explicit missing-layer caveats — for example, missing pQTL instrument or failed colocalization
The pack is assembled before council review. The raw council transcript is preserved as audit material, but it is never the canonical final answer.
DCanonical export
The canonical export surface is built by backend/app/services/validated_export.py::build_validated_export. It creates validated_export, top_answer, finding_semantics, canonical_report, export_gate, and insight_assessment. The HTML export is built from frontend rendering code that checks export_gate.hard_block, validates the canonical_report, and uses backend canonical display fields for the report top answer.
05The AI Council
The AI Council is the adversarial reasoning layer that sits between the structured evidence pack and the canonical report. It is not a vote-counting system and it does not prove truth by consensus. It asks multiple differently biased research roles to read the same bounded evidence, challenge one another, and produce a final synthesis that can still be demoted by deterministic export checks.
AModel roles
The council roles are deliberately different. The names below describe the prompt roles in the live code; they should be read as research functions, not as guarantees that any one model is correct.
| Council role | Main job | Useful for | Cannot do alone |
|---|---|---|---|
| Maverick Theorist | Propose non-obvious, testable hypotheses | Surface new framing from the evidence pack | Validate causality or treatment response |
| Evidence Auditor | Challenge evidence quality, replication, sample size, and citation support | Keep claims source-grounded and appropriately caveated | Discover every possible mechanism |
| Cross-Domain Connector | Look for transferable biology across diseases and fields | Generate cross-study hypotheses | Turn analogy into proof |
| Translational Specialist | Ask what would be needed for a real translational path | Identify safety, biomarker, and validation gaps | Recommend clinical treatment from exploratory evidence |
| Assumption Destroyer | Attack hidden assumptions and propose falsifiers | Prevent attractive but weak claims from surviving unchallenged | Prove a claim false without adequate evidence |
| Literature Specialist | Mine the provided corpus for buried signals and convergence | Find overlooked corpus patterns | Use unsupported memory or invented citations as evidence |
BFour-round workflow
| Round | What happens | What the round is designed to catch |
|---|---|---|
| Round 1 — blind independent analysis | Each role reads the same research context and user query without seeing the other roles' answers. | Independent hypotheses, missed signals, and initial evidence interpretations |
| Round 2 — audit / discussion | Each role reads compressed Round 1 outputs and states agreement, disagreement, missed findings, and revisions. | Early convergence, contradictions, weak evidence, and claims that need refinement |
| Round 3 — adversarial attack | The Assumption Destroyer attacks converged claims; other roles must defend, concede, or revise. | Overclaiming, hidden assumptions, methodological flaws, unverified claims, and missing falsifiers |
| Round 4 — neutral synthesis | A neutral synthesizer reads the context and compressed Rounds 1-3. It is instructed not to add new claims, not to introduce uncited evidence, not to resurrect demolished claims, and to preserve dissent. | Final adjudication, demotion of weak ideas, unresolved splits, and safer framing |
CGeneric claim lifecycle
| Stage | Generic lifecycle |
|---|---|
| Raw proposal | A model raises a hypothesis from the evidence pack. |
| Evidence support | Other roles identify which sources or structured rows support it. |
| Adversarial caveat | Round 3 tests whether the claim is causal, overbroad, off-contract, or missing decisive evidence. |
| Canonical classification | Round 4 plus export logic classifies it as validated, exploratory, no-primary / triage, STOP, unresolved, or follow-up only. |
| HTML display | validated_export and canonical_report supply the top report surface, while the raw transcript remains in the appendix. |
06MR, eQTL, Colocalization & Causal Lanes
Plain-language summary
Mendelian randomization (MR) asks whether genetic variation associated with an exposure is also associated with disease. pQTL MR uses genetic instruments for protein abundance. Colocalization asks whether the exposure association and the disease association likely share the same causal variant.
MR without colocalization can be misleading because linkage disequilibrium or horizontal pleiotropy can make two nearby signals look connected. In MedC82, failed or unavailable pQTL MR colocalization is a caveat only. It is never positive support.
AWhat colocalization actually measures
Colocalization estimates the posterior probability that both traits share one causal variant. In the ABF model, PP4 is the posterior probability for a shared causal variant.
| PP4 range | MedC82 interpretation | Promotion consequence |
|---|---|---|
| PP4 > 0.8 Validated | Validated colocalization support | Can support a causal lane if mechanism and disease context match |
| 0.5 < PP4 ≤ 0.8 Suggestive | Suggestive / unresolved | Usually validate-first or exploratory, not definitive |
| PP4 < 0.5, when ABF colocalization actually ran Computed negative | Computed negative colocalization for that pQTL-MR lane | Caveat only; do not use as positive support. Not the same as not-computable cases. |
BHow to interpret zero pQTL-coloc validations
Historical summaries may collapse multiple outcomes into failed or PP4=0.000, including true computed-negative ABF colocalization, no suitable pQTL top hit, regional summary-stat fetch failure, too few overlapping SNPs, missing or insufficient disease GWAS / locus data, or QC / input failure.
| Status | Meaning | How MedC82 should interpret it |
|---|---|---|
validated_coloc | ABF colocalization ran and PP4 passed threshold | Possible causal support if mechanism, disease context, and source traceability match |
computed_negative_coloc | ABF colocalization ran and PP4 stayed below threshold | This pQTL-MR causal lane did not support the mechanism |
not_computable_no_pqtl_tophit | No suitable pQTL instrument / top hit | Absence of validation, not biological refutation |
not_computable_low_snp_overlap | Insufficient overlapping SNPs | Absence of validation, not biological refutation |
not_computable_regional_fetch_failed | Required regional summary statistics unavailable or fetch failed | Absence of validation |
not_computable_missing_gwas | Disease / locus GWAS unavailable or insufficient | Absence of validation |
qc_error | Input or QC problem | Cannot interpret as negative evidence |
CMechanism-specific causal lanes
| Mechanism type | Correct causal test | Example | What failure means |
|---|---|---|---|
| Protein abundance mechanism | pQTL MR plus colocalization | Plasma / protein abundance associated with disease | No pQTL-coloc validation means the protein-abundance lane did not validate; not-computable cases are absence of validation, not biological refutation |
| Expression / regulatory mechanism | Tissue / cell eQTL colocalization | Disease-relevant tissue expression coloc | Failed eQTL coloc does not refute coding mechanism |
| Coding / protein-altering mechanism | Coding-variant MR | Protein-altering variant instrument | Failed pQTL / eQTL should not penalize the coding lane if the coding variant itself is the instrument |
| Disease-state cell context | Single-cell / spatial evidence | Disease-relevant cell state or compartment | Context, not causal validation |
| Treatment-response context | Outcome cohort / prospective validation | Outcome-linked cohort evidence | Outcome-linked plausibility, not causality unless controlled |
Positive colocalization or coding-variant MR can support a validated causal anchor when mechanism, disease context, and source traceability match. Failed pQTL MR becomes a caveat. MR-only without colocalization does not promote a causal claim.
07Signal Triage and Report Outputs
After the council completes its four rounds, the report layer classifies each candidate using a labeled action state. These labels are deterministic and gate what the final report surface can say.
AReport state and action labels
| Label | Meaning | Required to apply |
|---|---|---|
VALIDATED_TARGET Positive report state | Top-level report state when at least one finding qualifies as a validated anchor for the target disease. | Canonical biology, target-disease support, at least one positive validation lane, no disqualifying action_label. |
VALIDATED_ANCHOR Positive finding state | Per-finding label for a known validated anchor where mechanism-appropriate evidence supports it. | Same predicate as VALIDATED_TARGET but applied at the finding level; never overrides an explicit blocked action_label. |
VALIDATE_FIRST Caution | Potentially interesting, but decisive validation is missing in the current pack. | Evidence is suggestive; council does not endorse promotion until specific validation lanes are met. |
DEPRIORITIZED Caution | Useful enough to record, but not strong enough to lead. | Adversarial review reduces the candidate's standing without rejecting it outright. |
DEMOLISHED Negative | Adversarial review rejected it in the current evidence context. | Round 3 attack survived; export layer must not present it positively. |
BLOCKED Negative | Compatibility validator blocked the candidate from primary promotion. | Used for wrong-disease evidence, contextual-only support, or off-contract candidates. |
NO_PRIMARY_PROMOTED Triage | Broad derived state — every primary candidate is effectively blocked. | Phase 5d derived gate; valid result, not a failure. |
NO_PRIMARY_SURVIVED Triage | Strict compatibility-validator escalation; no primary candidate survived gate enforcement. | Equivalent to a deliberate "we cannot promote anything from this pack" verdict. |
CONTEXTUAL_ONLY Context | Evidence is present but cannot be treated as direct target-disease validation. | Common for cross-disease anchors lifted into a different target disease. |
EXPLORATORY Hypothesis | Hypothesis-grade research lead. | Useful for research design; not validation; not actionable. |
CALIBRATION_ANCHOR Reference | Pre-curated canonical anchor used for calibration. | Not promoted as a novel discovery; serves as a known reference. |
HYPOTHESIS Hypothesis | Generic hypothesis label for non-anchor proposals. | Requires next-step framing; not validation. |
BFinal report surfaces
| Surface | How to read it |
|---|---|
| Top answer / final adjudication | The canonical researcher-facing interpretation. It should match validated_export and canonical_report. |
| Evidence ledger / decision surface | Candidate-level buckets, caveats, evidence chains, and validation status. A mixed section is not automatically a validated section. |
| Evidence chains | Source-linked reasoning steps. Links should resolve by exact source identity or fall back to dataset / source-label-only. |
| Direct vs contextual evidence | Direct evidence supports the target-disease claim; contextual evidence supports plausibility or stratification but cannot validate causality alone. |
| Numeric evidence | When the council names MR / coloc / eQTL / GWAS / L2G / pQTL / causal evidence, numeric support (p, PP4, OR, β, CI, L2G, n, scientific notation) is expected. The R13 warning fires when such language appears without numeric backing. |
| Council transcript | Non-canonical audit trail. It can contain rejected, overstrong, or early-round proposals. |
| Dissent | Preserved for auditability so reviewers can see where models disagreed. |
| Source links | Source-linked citations for each claim; reviewers should verify exact source identity before treating as proof. |
CRender guards
The export layer applies these deterministic guards before HTML display:
- Failed coloc cannot render as validated support.
- Failed pQTL MR cannot render as validation.
- MR without coloc cannot render as definitive causal proof.
- Single-cell / spatial descriptive evidence cannot render as causal validation.
- Literature-only outcome context cannot render as causal validation.
- Adjacent-study evidence cannot render as primary-study proof without explicit provenance and framing.
- Druggability / pathway membership cannot render as validation.
- Model consensus cannot render as evidence.
08Safety, Guardrails, and Traceability
MedC82 enforces a small set of report contracts that protect against the most common ways AI-assisted research outputs can mislead. These are deterministic safety gates applied after the council finishes its rounds but before HTML export.
ARecent trust-fix history
The platform has gone through several rounds of targeted trust fixes. The current state reflects the cumulative effect of those fixes; the suite was last run at 257 / 257 passing.
| Fix | What it adds | Why it matters |
|---|---|---|
| 7 regression gaps | 15 backend regression tests added for historical failure modes (Round 1 ideas shown as final findings; anchor count inconsistency; stale hotsheet vs canonical_report; calibration anchors promoted as primary; front_summary promoting a demolished gene; wrong-disease evidence used as target-disease validation; Round 1 → Round 4 demotion not reflected in final output). |
Existing safety gates already caught these cases; we now have automated proof and a tripwire for future regressions. |
| Phase 5f — evidence citation telemetry | New evidence_citation_telemetry module counts [EV:] tags, numeric mentions, and structured source-type mentions per round and per model role. |
Turns "the council under-quotes numerics" from a suspicion into a measurable per-session signal. |
| R13 — numeric-evidence warning | R13_NUMERIC_TIER1_EVIDENCE_REQUIRED emits a warning when MR / coloc / eQTL / GWAS / L2G / pQTL / causal-evidence language is used without numeric support in the same field or linked evidence detail. |
Surfaces narrative-only causal claims for researcher review without hard-blocking legitimate exports. Warning-only severity. |
| Phase 5g — positive validated-target state | Added VALIDATED_TARGET / VALIDATED_ANCHOR labels gated by a conservative predicate (canonical biology + target-disease support + at least one positive validation lane + no disqualifying action_label). |
Lets the platform say YES cleanly when evidence genuinely supports it, without weakening no-primary, validate-first, or demolished protections. |
| Phase 5g — R4 JSON robustness | When the R4 structured-verdict JSON is missing but canonical_report carries findings + lead_summary + verdict, degrade hard_block → review_required / warning. |
Stops legitimate audit-mode sessions from being killed by an R4 JSON parse hiccup when the verdict is otherwise present. |
| Phase 5f-mini — bare-space numeric formats | R13 numeric detector recognizes bare-space evidence formats (L2G 0.92, PP4 0.989, p 5e-8, OR 1.25, CI 1.1–1.4, n 1200) without weakening the rule. |
Eliminates noisy false positives that were undermining R13's signal-to-noise ratio. |
BWhy this matters for trust
- Source traceability — citation paths must resolve to exact source identity or fall back to dataset-only.
- No-primary protections — the broad derived gate prevents primary promotion when every candidate is effectively blocked.
- Wrong-disease evidence protections — cross-disease anchors require explicit transfer qualifiers and cannot render as target-disease validation.
- Calibration-anchor protections — pre-curated canonical anchors do not silently surface as novel discoveries.
- Numeric-evidence warnings — R13 surfaces narrative-only causal claims so reviewers can check whether numeric support is present.
- Report state consistency — top-level
report_stateand per-findingaction_labelmust agree across hotsheet, front_summary, and canonical_report panels. - Positive validated-target state — added without weakening no-primary, validate-first, or demolished protections; the predicate is conservative by design and never overrides an explicit blocked action_label.
CCode paths inspected (appendix)
For technical reviewers who want to verify the safety gates against the codebase. The main flow of this overview is understandable without reading these files.
| Area | File / function | What it owns |
|---|---|---|
| pQTL MR coloc | backend/app/connectors/mr_coloc_validator.py | OpenGWAS-style source, pQTL / GWAS regional stats, ABF coloc, PP4 thresholds, mr_coloc storage |
| Coloc lifecycle | backend/app/services/coloc_lifecycle.py | Coloc write / version / staleness contract |
| Tissue eQTL coloc | backend/app/connectors/tissue_eqtl_coloc_connector.py | Tissue / cell eQTL coloc lane and PP4 thresholds |
| Coding-variant MR | backend/app/services/coding_variant_mr.py | Curated coding-variant lane |
| Evidence tiers | backend/app/services/evidence_tiers.py | Source-tier metadata and failed-coloc not-positive-signal logic |
| Council engine | backend/app/services/council_engine.py | Four-round council flow and post-synthesis surface handling |
| Council prompts | backend/app/services/council_prompts.py | Adversarial and synthesis instructions |
| Validated export | backend/app/services/validated_export.py::build_validated_export | Canonical export generation and strict gate |
| Export gate | backend/app/services/export_gate.py | Hard-block gate semantics |
| Report contract | backend/app/services/report_contract_validator.py | R1–R13 contract rules including the numeric-evidence warning |
| Telemetry | backend/app/services/evidence_citation_telemetry.py | Per-round, per-model evidence-citation counts |
| HTML export | frontend/src/components/council/CouncilResponse.tsx::buildFinalHtmlPure | Canonical report and transcript appendix rendering |
09Internal Benchmark Checks
Three internal benchmark sessions were run to test whether the platform produces the correct verdict shape on known-answer cases. These are internal behavior checks, not external validation.
| Benchmark | Expected behavior | Result after current fixes | Status | Caveat |
|---|---|---|---|---|
| IL6R / rheumatoid arthritis YES case |
Recognize as clinically-validated target with approved drug-class context. | After Phase 5g rebuild: report_state = VALIDATED_TARGET, IL6R action_label = VALIDATED_ANCHOR, export unblocked, lead summary correctly references tocilizumab / sarilumab clinical context. Caveat about unresolved genetic causality preserved. |
PASS | Internal benchmark only. First run exposed a packaging issue (no positive verdict label); later fixed in Phase 5g. |
| IL12B / rheumatoid arthritis CAUTION case substituted for CETP / CAD |
Refuse to promote despite strong germline genetic signal, because RA Phase 3 trials of the IL-12 / 23 pathway failed in RA even though they succeeded in adjacent autoimmune diseases. | IL12B → action_label = DEPRIORITIZED. Lead summary cites PP4 = 0.000 for failed MR colocalization and the failed RA Phase 3 history. R13 warnings caught narrative-only causal mentions on nested top-answer titles — the rule working as intended. |
PASS | Structural substitute for the CETP / coronary artery disease canonical failed-translation case; no cardiovascular study is on the platform. The substitution is annotated, not hidden. |
| PTPN22 RA → DLBCL NO case |
Refuse to promote a strong RA / autoimmune-disease causal anchor into a different target disease (DLBCL). Recognize as contextual evidence only. | final_adjudication.report_state = NO_PRIMARY_PROMOTED, no_primary_promoted = True, PTPN22 action_label = DEMOLISHED. Lead summary correctly states RA evidence does not transfer to DLBCL. Calibration anchors (CTLA4, HLA-DRB1, IL6R, CD40) correctly deprioritized. |
PASS | Strong internal evidence of cross-disease guardrail behavior. Not external validation. |
These benchmarks demonstrate that the report layer produces the correct verdict shape (YES / CAUTION / NO) on cases where the answer is known in advance. They do not constitute independent validation, and a larger known-answer set is still needed.
10Current Readiness and Limitations
AReadiness matrix
| Use case | Current status | Allowed framing | Not allowed |
|---|---|---|---|
| Internal testing | Ready | Solo internal use; running benchmarks; iterating on the contract layer. | Sharing raw outputs externally without manual review. |
| Curated expert feedback · friendly demo | Ready with caveats | Selected, rebuilt, manually-reviewed artifacts shown to a known expert with a written caveat sheet. | Sending arbitrary fresh outputs blind. |
| University / data-partner feedback outreach | Ready with caveats | Framed as a request for methodology feedback and validation, not as a discovery claim. | Pitching as a finished discovery engine, or implying external validation that does not exist. |
| External researcher beta | Not ready | — | Any unsupervised external researcher use, including self-serve account creation. |
| Public unsupervised use | Not ready | — | Public access, marketing as a discovery platform, anything that bypasses expert review. |
| Clinical · patient-facing use | No | — | Any clinical decision support, patient-facing recommendation, or treatment claim. |
BStanding limitations
- MedC82 is not a validated discovery engine — internal benchmarks are necessary but not sufficient.
- MedC82 is not for clinical use.
- R4 synthesis still tends to under-quote numeric evidence; telemetry measures this but does not fully solve it.
- R13 is warning-only, not hard-blocking. A behavioral baseline (≥30 sessions) is needed before elevation to blocker is considered.
- The benchmark set is small (3 cases). A larger known-answer set, ideally 10–20 cases drawn from independent sources, is still needed.
- External expert review is still needed, especially for MR / coloc / eQTL interpretation and source attribution.
- Dataset and source-link traceability should still be checked on any artifact before sending externally.
- Older HTML reports may pre-date current fixes and should be rebuilt before sharing.
- Stronger novelty pressure is still needed — better candidate generation, explicit negative-evidence weighting, deeper consumption of disease-state omics and perturbation atlases, and an explicit failed-translation scoring lane.
- No CAD / cardiovascular study exists on the platform yet; the canonical CETP / CAD failed-translation benchmark was substituted with IL12B / RA.
- Partner-facing artifacts must be selected and manually reviewed before external use.
CNot-claimed list
The following are explicitly outside what MedC82 currently claims:
- discovery-engine claim
- clinical-readiness claim
- public-beta claim
- no-review-needed claim
- external-validation claim
- treatment recommendation
11What Expert Review Should Focus On
This overview is intended to support a methodology-feedback conversation. The most valuable things a researcher, lab, or data / API team can help evaluate are:
- Whether MedC82 interprets their evidence layer responsibly (especially the layer they know best).
- Whether source attribution is correct on any artifact provided.
- Whether direct evidence and contextual evidence are clearly separated in the report.
- Whether MR, pQTL MR, eQTL, and colocalization claims are caveated correctly when the lane fails or is not computable.
- Whether cross-disease transfer is handled conservatively (no silent promotion of source-disease evidence into a target-disease verdict).
- Whether report traceability is sufficient — every claim should resolve to a source.
- Whether the internal benchmark design is fair, and what additional known-answer cases should be added.
- What evidence — connector, dataset, or method — would make the platform genuinely useful to researchers in their field.
AReproducibility checklist
The following can accompany any artifact shared for expert review:
- session ID
- main study ID and adjacent study IDs, if any
- datapack path
- raw transcript preserved
- source map present
validated_exportpresentcanonical_reportpresentexport_blocked = falseexport_gate.hard_block = false- raw checksum unchanged on rebuild
- evidence-chain source links verified
- source appendix verified
- primary result matches
validated_export - action label matches
validated_export - validation and causal flags match
validated_export - transcript appendix present
- failed MR / coloc not used as support
- single-cell / spatial evidence not labeled causal unless independently validated
- no treatment recommendation
- causal lane clearly stated
- PP4 / MR result clearly interpreted when present