Platform Overview · Biomedical Evidence Review

MedC82 Platform Overview

An open biomedical research council for evidence review, hypothesis stress-testing, and traceable scientific triage.

MedC82 combines public literature, structured biomedical databases, disease-specific evidence packs, and a six-model adversarial AI council. The platform is designed to help researchers separate validated evidence from speculative leads, identify missing evidence layers, compare cross-disease transfer plausibility, and generate falsifiable next-step experiments.

SubjectMedC82 platform · evidence · council · report

AudienceResearchers, labs, data/API partners

Document typePlatform overview · expert-facing

StatusOpen · non-commercial

This overview is intended for expert feedback and methodology review. It is not medical advice or a treatment recommendation.

01Platform Summary

MedC82 is a disease-agnostic, study-scoped biomedical research platform. It builds structured evidence packs from literature, genetics, omics, single-cell, spatial, drug/pathway, clinical, and study-specific sources, then routes that pack through a six-model adversarial AI council that reviews the same evidence across four rounds. The final report is a canonical research triage surface — not raw model output.

The current strength is evidence review, hypothesis stress-testing, cross-disease caution, and falsifiable next-step framing. The platform may surface:

validated causal anchors
validate-first candidates
exploratory hypotheses
useful negative / triage results
STOP decisions
non-promoted follow-up ideas

The council does not discover truth. It tests research leads against available evidence, missing layers, adversarial objections, and validation gates. MedC82 outputs should be read as research evidence synthesis and triage, not clinical or treatment recommendations.

What MedC82 can support	Evidence needed	What MedC82 cannot claim yet
Exploratory disease-state hypothesis	Human disease-state data, outcome or cohort context where available, clear caveats, and a falsifiable next experiment	Causality, treatment response, or clinical action
Validated causal anchor	Mechanism-appropriate causal lane — for example, coding-variant MR, validated pQTL colocalization, or validated tissue eQTL colocalization	General causal claims outside the validated mechanism lane
Patient-stratification or prognostic lead	Disease-matched cohort, spatial, molecular, or outcome-linked evidence with confounder-aware validation needs	Therapeutic mechanism or treatment recommendation by itself
Useful triage / no-primary result	Adversarial review showing candidates are weak, off-contract, or under-validated	That the biology is false
Follow-up idea	Specific in-run grounding plus a concrete next experiment	That the idea is canonical or validated
Treatment recommendation	Not supported by current MedC82 export surfaces	MedC82 reports should not recommend treatment combinations

02Why This Is Not Just Search

The platform is built for the gap between search and expert review: it does not simply retrieve papers, and it does not claim to prove discoveries. It organizes heterogeneous biomedical evidence into a structured pack, forces multiple AI models to argue over it, and then applies report-safety rules so the final output separates validation, caution, failed transfer, and exploratory ideas.

Layer	What it does	What it does not do
Search	Retrieves documents matching keywords / semantic similarity	Does not separate validated evidence from speculative leads; does not test cross-disease transfer; does not produce a verdict
MedC82 evidence pack	Bounded, study-scoped object with source provenance, evidence type, disease context, causal-lane tagging, and explicit missing-layer caveats	Does not eliminate the need for expert review; does not turn association into causality
MedC82 AI Council	Six adversarial roles debate the same bounded evidence across four rounds and produce a structured synthesis	Does not prove truth by consensus; does not override deterministic safety gates
MedC82 report layer	Separates validated anchors, validate-first candidates, deprioritized / demolished candidates, and exploratory hypotheses with traceability back to source	Does not replace expert review; raw transcript proposals are not the final answer
Expert review	Confirms interpretation, source attribution, causal-lane framing, and translational plausibility	MedC82 is designed to support this layer, not replace it

The value is not "AI answer generation." The value is structured evidence triage and adversarial stress-testing of research leads, with explicit no-primary and validate-first surfaces when evidence is not yet sufficient.

03Evidence and Data Layers

Evidence packs are study-scoped and connector-scoped. A connector can contribute causal evidence, association evidence, context evidence, reference evidence, safety evidence, or missing-layer caveats. The table below names the common sources and how each should be interpreted. "Where configured" indicates a connector that is integrated for some studies but not all.

Layer	Source / connector	Contributes	Evidence class	Cannot prove	Example platform use
Literature / paper corpus	PubMed, Europe PMC, PMC Open Access, OpenAlex, Semantic Scholar, DOI metadata, retrieved corpus sources	Published evidence, biological plausibility, prior findings, outcome context, mechanistic background, caveats, and source-linked support for council claims	Literature / context	Causality by itself unless the study design and validation lane support it	Traceable support, prior art, caveats, and outcome context
Clinical trials	ClinicalTrials.gov, WHO ICTRP, trial-outcome connectors	Trial history, phase, outcomes, completion status	Clinical context	Mechanism or efficacy by trial existence alone	Translational caveats, failed-translation evidence
GEO dataset discovery	`geo_dataset_discovery`	Public dataset availability, accession, assay type, matrix availability, sample metadata, linked paper metadata where exact	Discovery metadata	Expression effect, causality, or evidence quality by dataset existence alone	Identifies candidate datasets for future ingestion or validation
Single-cell / spatial datasets	CELLxGENE, study-specific single-cell connectors, curated GEO/index records (where configured)	Disease-state, tissue, cell-state, or compartment context	Context	Causality or treatment response by itself	Cell-state, spatial, or tissue-compartment hypothesis support
Disease-specific outcome literature	Outcome paper connectors or literature corpus	Article-level outcome-linked context	Outcome context	Raw matrix causal proof or treatment recommendation	Cohort/outcome plausibility and validation needs
Cancer genomics / clinical context	TCGA, GDC, cBioPortal where configured, study-specific connectors	Molecular subtype, mutation, copy-number, expression, clinical metadata	Molecular ctx	Causality without a matching validation lane	Cohort stratification or disease-state context
Proteomics / phosphoproteomics	CPTAC where configured, study-specific proteomics connectors	Protein and phosphoprotein context	Molecular ctx	Causality without matching causal layer	Pathway / protein-state context
GWAS Catalog	GWAS connector	Disease associations	Association	Causal mechanism by itself	Association priors and candidate context
Open Targets / L2G	Open Targets	Genetic association and likely-gene priors	Association	Final causality alone	Candidate ranking / context
MR / pQTL MR / colocalization	OpenGWAS / MR-style evidence, MR-coloc connectors	Protein-abundance genetic association and coloc validation attempt	Causal lane if coloc	Positive support when colocalization fails or is not computable	Mechanism-specific causal validation or caveat
MR-KG literature	MR literature connector	Published MR study references	Association	Replacement for in-pack coloc	Published MR context
eQTL Catalogue	eQTL connector	Expression / regulatory context	Mol. assoc.	Causality without coloc	Input to tissue eQTL coloc
Tissue / brain eQTL coloc	Tissue eQTL coloc connector	Shared-variant expression mechanism test	Causal lane if PP4	Coding-mechanism validation	Tissue or cell-type expression mechanism
Coding-variant MR	Curated coding-variant MR lane	Protein-altering variant causal lane	Causal	Protein-abundance or tissue-expression mechanism	Coding-variant mechanism validation
HPA	Human Protein Atlas	Tissue / atlas reference	Reference	Disease causality	Tissue expression / context
ChEMBL / DGIdb / OpenFDA	Drug, target, and adverse-event databases	Binding, tractability, drug-gene interaction, and adverse-event context	Druggability	Disease causal support	Target tractability and safety signal context
Reactome / SIGNOR / STRING	Pathway and interaction connectors	Pathway membership and interaction network context	Reference	Causality	Network / pathway context
Perturbation / dependency	LINCS, PRISM, DepMap, scPerturb where available	Cell-line drug / target / dependency context	Low-tier context	Human disease validation	Experimental feasibility and biological perturbation context
Safety / adverse events	FAERS	Post-marketing adverse-event signals	Safety ctx	Causal mechanism alone	Translational safety caveats
Imaging / neuroimaging	OpenNeuro, XNAT, NITRC where available	Disease-relevant imaging metadata and context	Context	Disease causality	Neuroimaging-specific stratification context
Missing or skipped layers	Connector absence or study-inapplicability	Missing-layer caveats	Missing / irrelevant	Should not be inferred as evidence	Explicit not-available caveats

04How the Evidence Pack Is Built

AStudy setup

Every study is scoped by study_id. Cross-study runs can include a main study plus explicitly requested adjacent studies. Cross-study evidence is combined at retrieval / council time with source_study_id provenance, not merged as if all evidence came from one disease.

BIngestion and normalization

MedC82 ingests literature, structured databases, single-cell and spatial datasets, omics connectors, genetics connectors, and drug / pathway connectors. Entities and source metadata are normalized and carried through the evidence pack and export system. Evidence is tagged at the row level as direct, contextual, caveat, missing-layer, causal-lane, or support.

CDatapack construction

The datapack is the compact research object shown to the council. It includes:

evidence_text: human-readable structured evidence and retrieved literature
full_context_text: broader context supplied to the council
source citations and source maps
structured evidence rows with tags such as [EV:...] and [DB:...]
per-gene evidence profiles
explicit missing-layer caveats — for example, missing pQTL instrument or failed colocalization

The pack is assembled before council review. The raw council transcript is preserved as audit material, but it is never the canonical final answer.

DCanonical export

The canonical export surface is built by backend/app/services/validated_export.py::build_validated_export. It creates validated_export, top_answer, finding_semantics, canonical_report, export_gate, and insight_assessment. The HTML export is built from frontend rendering code that checks export_gate.hard_block, validates the canonical_report, and uses backend canonical display fields for the report top answer.

Pipeline · ingestion to report 4-round adversarial council

Data connectors

Literature, genetics, omics, single-cell, drug/pathway

→

Evidence scores · graph enrichment

Structured rows, tags, per-gene profiles

→

Evidence pack · datapack

Compact research object, missing-layer caveats

→

Retrieval context

Source map, citations, full context text

Council · 4 rounds

Blind analysis

Independent reads, no cross-talk

→

Audit · discussion

Cross-examination of claims

→

Adversarial attack

Demolish weak / overstrong claims

→

Synthesis

No resurrection of demolished claims

Outputs

▸ Canonical

validated_export · canonical_report

Strict gate · HTML report

▸ Audit

Raw transcript appendix

Non-canonical · auditability only

05The AI Council

The AI Council is the adversarial reasoning layer that sits between the structured evidence pack and the canonical report. It is not a vote-counting system and it does not prove truth by consensus. It asks multiple differently biased research roles to read the same bounded evidence, challenge one another, and produce a final synthesis that can still be demoted by deterministic export checks.

AModel roles

The council roles are deliberately different. The names below describe the prompt roles in the live code; they should be read as research functions, not as guarantees that any one model is correct.

Council role	Main job	Useful for	Cannot do alone
Maverick Theorist	Propose non-obvious, testable hypotheses	Surface new framing from the evidence pack	Validate causality or treatment response
Evidence Auditor	Challenge evidence quality, replication, sample size, and citation support	Keep claims source-grounded and appropriately caveated	Discover every possible mechanism
Cross-Domain Connector	Look for transferable biology across diseases and fields	Generate cross-study hypotheses	Turn analogy into proof
Translational Specialist	Ask what would be needed for a real translational path	Identify safety, biomarker, and validation gaps	Recommend clinical treatment from exploratory evidence
Assumption Destroyer	Attack hidden assumptions and propose falsifiers	Prevent attractive but weak claims from surviving unchallenged	Prove a claim false without adequate evidence
Literature Specialist	Mine the provided corpus for buried signals and convergence	Find overlooked corpus patterns	Use unsupported memory or invented citations as evidence

BFour-round workflow

Round	What happens	What the round is designed to catch
Round 1 — blind independent analysis	Each role reads the same research context and user query without seeing the other roles' answers.	Independent hypotheses, missed signals, and initial evidence interpretations
Round 2 — audit / discussion	Each role reads compressed Round 1 outputs and states agreement, disagreement, missed findings, and revisions.	Early convergence, contradictions, weak evidence, and claims that need refinement
Round 3 — adversarial attack	The Assumption Destroyer attacks converged claims; other roles must defend, concede, or revise.	Overclaiming, hidden assumptions, methodological flaws, unverified claims, and missing falsifiers
Round 4 — neutral synthesis	A neutral synthesizer reads the context and compressed Rounds 1-3. It is instructed not to add new claims, not to introduce uncited evidence, not to resurrect demolished claims, and to preserve dissent.	Final adjudication, demotion of weak ideas, unresolved splits, and safer framing

CGeneric claim lifecycle

Stage	Generic lifecycle
Raw proposal	A model raises a hypothesis from the evidence pack.
Evidence support	Other roles identify which sources or structured rows support it.
Adversarial caveat	Round 3 tests whether the claim is causal, overbroad, off-contract, or missing decisive evidence.
Canonical classification	Round 4 plus export logic classifies it as validated, exploratory, no-primary / triage, STOP, unresolved, or follow-up only.
HTML display	`validated_export` and `canonical_report` supply the top report surface, while the raw transcript remains in the appendix.

Why the transcript can look stronger than the final answer. Raw model outputs are preserved so researchers can audit the debate. That means the appendix may contain early or overstrong claims that were later challenged. Round 3 and Round 4 can force a claim to be narrowed, demoted, or rejected, and the canonical export can further block unsafe wording before HTML display. The practical rule is: the top report is the canonical answer; the appendix is the non-canonical audit trail.

06MR, eQTL, Colocalization & Causal Lanes

Plain-language summary

Mendelian randomization (MR) asks whether genetic variation associated with an exposure is also associated with disease. pQTL MR uses genetic instruments for protein abundance. Colocalization asks whether the exposure association and the disease association likely share the same causal variant.

MR without colocalization can be misleading because linkage disequilibrium or horizontal pleiotropy can make two nearby signals look connected. In MedC82, failed or unavailable pQTL MR colocalization is a caveat only. It is never positive support.

AWhat colocalization actually measures

Colocalization estimates the posterior probability that both traits share one causal variant. In the ABF model, PP4 is the posterior probability for a shared causal variant.

PP4 range	MedC82 interpretation	Promotion consequence
PP4 > 0.8 Validated	Validated colocalization support	Can support a causal lane if mechanism and disease context match
0.5 < PP4 ≤ 0.8 Suggestive	Suggestive / unresolved	Usually validate-first or exploratory, not definitive
PP4 < 0.5, when ABF colocalization actually ran Computed negative	Computed negative colocalization for that pQTL-MR lane	Caveat only; do not use as positive support. Not the same as not-computable cases.

BHow to interpret zero pQTL-coloc validations

A MedC82 run may find MR or pQTL association context but zero pQTL-colocalization validations. This means the pQTL-MR causal lane did not validate a protein-abundance mechanism in that run. It is not proof that every biological hypothesis is false.

Historical summaries may collapse multiple outcomes into failed or PP4=0.000, including true computed-negative ABF colocalization, no suitable pQTL top hit, regional summary-stat fetch failure, too few overlapping SNPs, missing or insufficient disease GWAS / locus data, or QC / input failure.

Status	Meaning	How MedC82 should interpret it
`validated_coloc`	ABF colocalization ran and PP4 passed threshold	Possible causal support if mechanism, disease context, and source traceability match
`computed_negative_coloc`	ABF colocalization ran and PP4 stayed below threshold	This pQTL-MR causal lane did not support the mechanism
`not_computable_no_pqtl_tophit`	No suitable pQTL instrument / top hit	Absence of validation, not biological refutation
`not_computable_low_snp_overlap`	Insufficient overlapping SNPs	Absence of validation, not biological refutation
`not_computable_regional_fetch_failed`	Required regional summary statistics unavailable or fetch failed	Absence of validation
`not_computable_missing_gwas`	Disease / locus GWAS unavailable or insufficient	Absence of validation
`qc_error`	Input or QC problem	Cannot interpret as negative evidence

CMechanism-specific causal lanes

Mechanism type	Correct causal test	Example	What failure means
Protein abundance mechanism	pQTL MR plus colocalization	Plasma / protein abundance associated with disease	No pQTL-coloc validation means the protein-abundance lane did not validate; not-computable cases are absence of validation, not biological refutation
Expression / regulatory mechanism	Tissue / cell eQTL colocalization	Disease-relevant tissue expression coloc	Failed eQTL coloc does not refute coding mechanism
Coding / protein-altering mechanism	Coding-variant MR	Protein-altering variant instrument	Failed pQTL / eQTL should not penalize the coding lane if the coding variant itself is the instrument
Disease-state cell context	Single-cell / spatial evidence	Disease-relevant cell state or compartment	Context, not causal validation
Treatment-response context	Outcome cohort / prospective validation	Outcome-linked cohort evidence	Outcome-linked plausibility, not causality unless controlled

Positive colocalization or coding-variant MR can support a validated causal anchor when mechanism, disease context, and source traceability match. Failed pQTL MR becomes a caveat. MR-only without colocalization does not promote a causal claim.

07Signal Triage and Report Outputs

After the council completes its four rounds, the report layer classifies each candidate using a labeled action state. These labels are deterministic and gate what the final report surface can say.

AReport state and action labels

Label	Meaning	Required to apply
`VALIDATED_TARGET` Positive report state	Top-level report state when at least one finding qualifies as a validated anchor for the target disease.	Canonical biology, target-disease support, at least one positive validation lane, no disqualifying action_label.
`VALIDATED_ANCHOR` Positive finding state	Per-finding label for a known validated anchor where mechanism-appropriate evidence supports it.	Same predicate as `VALIDATED_TARGET` but applied at the finding level; never overrides an explicit blocked action_label.
`VALIDATE_FIRST` Caution	Potentially interesting, but decisive validation is missing in the current pack.	Evidence is suggestive; council does not endorse promotion until specific validation lanes are met.
`DEPRIORITIZED` Caution	Useful enough to record, but not strong enough to lead.	Adversarial review reduces the candidate's standing without rejecting it outright.
`DEMOLISHED` Negative	Adversarial review rejected it in the current evidence context.	Round 3 attack survived; export layer must not present it positively.
`BLOCKED` Negative	Compatibility validator blocked the candidate from primary promotion.	Used for wrong-disease evidence, contextual-only support, or off-contract candidates.
`NO_PRIMARY_PROMOTED` Triage	Broad derived state — every primary candidate is effectively blocked.	Phase 5d derived gate; valid result, not a failure.
`NO_PRIMARY_SURVIVED` Triage	Strict compatibility-validator escalation; no primary candidate survived gate enforcement.	Equivalent to a deliberate "we cannot promote anything from this pack" verdict.
`CONTEXTUAL_ONLY` Context	Evidence is present but cannot be treated as direct target-disease validation.	Common for cross-disease anchors lifted into a different target disease.
`EXPLORATORY` Hypothesis	Hypothesis-grade research lead.	Useful for research design; not validation; not actionable.
`CALIBRATION_ANCHOR` Reference	Pre-curated canonical anchor used for calibration.	Not promoted as a novel discovery; serves as a known reference.
`HYPOTHESIS` Hypothesis	Generic hypothesis label for non-anchor proposals.	Requires next-step framing; not validation.

BFinal report surfaces

Surface	How to read it
Top answer / final adjudication	The canonical researcher-facing interpretation. It should match `validated_export` and `canonical_report`.
Evidence ledger / decision surface	Candidate-level buckets, caveats, evidence chains, and validation status. A mixed section is not automatically a validated section.
Evidence chains	Source-linked reasoning steps. Links should resolve by exact source identity or fall back to dataset / source-label-only.
Direct vs contextual evidence	Direct evidence supports the target-disease claim; contextual evidence supports plausibility or stratification but cannot validate causality alone.
Numeric evidence	When the council names MR / coloc / eQTL / GWAS / L2G / pQTL / causal evidence, numeric support (p, PP4, OR, β, CI, L2G, n, scientific notation) is expected. The R13 warning fires when such language appears without numeric backing.
Council transcript	Non-canonical audit trail. It can contain rejected, overstrong, or early-round proposals.
Dissent	Preserved for auditability so reviewers can see where models disagreed.
Source links	Source-linked citations for each claim; reviewers should verify exact source identity before treating as proof.

CRender guards

The export layer applies these deterministic guards before HTML display:

Failed coloc cannot render as validated support.
Failed pQTL MR cannot render as validation.
MR without coloc cannot render as definitive causal proof.
Single-cell / spatial descriptive evidence cannot render as causal validation.
Literature-only outcome context cannot render as causal validation.
Adjacent-study evidence cannot render as primary-study proof without explicit provenance and framing.
Druggability / pathway membership cannot render as validation.
Model consensus cannot render as evidence.

08Safety, Guardrails, and Traceability

MedC82 enforces a small set of report contracts that protect against the most common ways AI-assisted research outputs can mislead. These are deterministic safety gates applied after the council finishes its rounds but before HTML export.

ARecent trust-fix history

The platform has gone through several rounds of targeted trust fixes. The current state reflects the cumulative effect of those fixes; the suite was last run at 257 / 257 passing.

Fix	What it adds	Why it matters
7 regression gaps	15 backend regression tests added for historical failure modes (Round 1 ideas shown as final findings; anchor count inconsistency; stale hotsheet vs canonical_report; calibration anchors promoted as primary; `front_summary` promoting a demolished gene; wrong-disease evidence used as target-disease validation; Round 1 → Round 4 demotion not reflected in final output).	Existing safety gates already caught these cases; we now have automated proof and a tripwire for future regressions.
Phase 5f — evidence citation telemetry	New `evidence_citation_telemetry` module counts `[EV:]` tags, numeric mentions, and structured source-type mentions per round and per model role.	Turns "the council under-quotes numerics" from a suspicion into a measurable per-session signal.
R13 — numeric-evidence warning	`R13_NUMERIC_TIER1_EVIDENCE_REQUIRED` emits a warning when MR / coloc / eQTL / GWAS / L2G / pQTL / causal-evidence language is used without numeric support in the same field or linked evidence detail.	Surfaces narrative-only causal claims for researcher review without hard-blocking legitimate exports. Warning-only severity.
Phase 5g — positive validated-target state	Added `VALIDATED_TARGET` / `VALIDATED_ANCHOR` labels gated by a conservative predicate (canonical biology + target-disease support + at least one positive validation lane + no disqualifying action_label).	Lets the platform say YES cleanly when evidence genuinely supports it, without weakening no-primary, validate-first, or demolished protections.
Phase 5g — R4 JSON robustness	When the R4 structured-verdict JSON is missing but `canonical_report` carries findings + lead_summary + verdict, degrade `hard_block` → `review_required / warning`.	Stops legitimate audit-mode sessions from being killed by an R4 JSON parse hiccup when the verdict is otherwise present.
Phase 5f-mini — bare-space numeric formats	R13 numeric detector recognizes bare-space evidence formats (`L2G 0.92`, `PP4 0.989`, `p 5e-8`, `OR 1.25`, `CI 1.1–1.4`, `n 1200`) without weakening the rule.	Eliminates noisy false positives that were undermining R13's signal-to-noise ratio.

BWhy this matters for trust

Source traceability — citation paths must resolve to exact source identity or fall back to dataset-only.
No-primary protections — the broad derived gate prevents primary promotion when every candidate is effectively blocked.
Wrong-disease evidence protections — cross-disease anchors require explicit transfer qualifiers and cannot render as target-disease validation.
Calibration-anchor protections — pre-curated canonical anchors do not silently surface as novel discoveries.
Numeric-evidence warnings — R13 surfaces narrative-only causal claims so reviewers can check whether numeric support is present.
Report state consistency — top-level report_state and per-finding action_label must agree across hotsheet, front_summary, and canonical_report panels.
Positive validated-target state — added without weakening no-primary, validate-first, or demolished protections; the predicate is conservative by design and never overrides an explicit blocked action_label.

CCode paths inspected (appendix)

For technical reviewers who want to verify the safety gates against the codebase. The main flow of this overview is understandable without reading these files.

Area	File / function	What it owns
pQTL MR coloc	`backend/app/connectors/mr_coloc_validator.py`	OpenGWAS-style source, pQTL / GWAS regional stats, ABF coloc, PP4 thresholds, `mr_coloc` storage
Coloc lifecycle	`backend/app/services/coloc_lifecycle.py`	Coloc write / version / staleness contract
Tissue eQTL coloc	`backend/app/connectors/tissue_eqtl_coloc_connector.py`	Tissue / cell eQTL coloc lane and PP4 thresholds
Coding-variant MR	`backend/app/services/coding_variant_mr.py`	Curated coding-variant lane
Evidence tiers	`backend/app/services/evidence_tiers.py`	Source-tier metadata and failed-coloc not-positive-signal logic
Council engine	`backend/app/services/council_engine.py`	Four-round council flow and post-synthesis surface handling
Council prompts	`backend/app/services/council_prompts.py`	Adversarial and synthesis instructions
Validated export	`backend/app/services/validated_export.py::build_validated_export`	Canonical export generation and strict gate
Export gate	`backend/app/services/export_gate.py`	Hard-block gate semantics
Report contract	`backend/app/services/report_contract_validator.py`	R1–R13 contract rules including the numeric-evidence warning
Telemetry	`backend/app/services/evidence_citation_telemetry.py`	Per-round, per-model evidence-citation counts
HTML export	`frontend/src/components/council/CouncilResponse.tsx::buildFinalHtmlPure`	Canonical report and transcript appendix rendering

09Internal Benchmark Checks

Three internal benchmark sessions were run to test whether the platform produces the correct verdict shape on known-answer cases. These are internal behavior checks, not external validation.

Benchmark	Expected behavior	Result after current fixes	Status	Caveat
IL6R / rheumatoid arthritis YES case	Recognize as clinically-validated target with approved drug-class context.	After Phase 5g rebuild: `report_state = VALIDATED_TARGET`, IL6R `action_label = VALIDATED_ANCHOR`, export unblocked, lead summary correctly references tocilizumab / sarilumab clinical context. Caveat about unresolved genetic causality preserved.	PASS	Internal benchmark only. First run exposed a packaging issue (no positive verdict label); later fixed in Phase 5g.
IL12B / rheumatoid arthritis CAUTION case substituted for CETP / CAD	Refuse to promote despite strong germline genetic signal, because RA Phase 3 trials of the IL-12 / 23 pathway failed in RA even though they succeeded in adjacent autoimmune diseases.	IL12B → `action_label = DEPRIORITIZED`. Lead summary cites `PP4 = 0.000` for failed MR colocalization and the failed RA Phase 3 history. R13 warnings caught narrative-only causal mentions on nested top-answer titles — the rule working as intended.	PASS	Structural substitute for the CETP / coronary artery disease canonical failed-translation case; no cardiovascular study is on the platform. The substitution is annotated, not hidden.
PTPN22 RA → DLBCL NO case	Refuse to promote a strong RA / autoimmune-disease causal anchor into a different target disease (DLBCL). Recognize as contextual evidence only.	`final_adjudication.report_state = NO_PRIMARY_PROMOTED`, `no_primary_promoted = True`, PTPN22 `action_label = DEMOLISHED`. Lead summary correctly states RA evidence does not transfer to DLBCL. Calibration anchors (CTLA4, HLA-DRB1, IL6R, CD40) correctly deprioritized.	PASS	Strong internal evidence of cross-disease guardrail behavior. Not external validation.

These benchmarks demonstrate that the report layer produces the correct verdict shape (YES / CAUTION / NO) on cases where the answer is known in advance. They do not constitute independent validation, and a larger known-answer set is still needed.

10Current Readiness and Limitations

AReadiness matrix

Use case	Current status	Allowed framing	Not allowed
Internal testing	Ready	Solo internal use; running benchmarks; iterating on the contract layer.	Sharing raw outputs externally without manual review.
Curated expert feedback · friendly demo	Ready with caveats	Selected, rebuilt, manually-reviewed artifacts shown to a known expert with a written caveat sheet.	Sending arbitrary fresh outputs blind.
University / data-partner feedback outreach	Ready with caveats	Framed as a request for methodology feedback and validation, not as a discovery claim.	Pitching as a finished discovery engine, or implying external validation that does not exist.
External researcher beta	Not ready	—	Any unsupervised external researcher use, including self-serve account creation.
Public unsupervised use	Not ready	—	Public access, marketing as a discovery platform, anything that bypasses expert review.
Clinical · patient-facing use	No	—	Any clinical decision support, patient-facing recommendation, or treatment claim.

BStanding limitations

MedC82 is not a validated discovery engine — internal benchmarks are necessary but not sufficient.
MedC82 is not for clinical use.
R4 synthesis still tends to under-quote numeric evidence; telemetry measures this but does not fully solve it.
R13 is warning-only, not hard-blocking. A behavioral baseline (≥30 sessions) is needed before elevation to blocker is considered.
The benchmark set is small (3 cases). A larger known-answer set, ideally 10–20 cases drawn from independent sources, is still needed.
External expert review is still needed, especially for MR / coloc / eQTL interpretation and source attribution.
Dataset and source-link traceability should still be checked on any artifact before sending externally.
Older HTML reports may pre-date current fixes and should be rebuilt before sharing.
Stronger novelty pressure is still needed — better candidate generation, explicit negative-evidence weighting, deeper consumption of disease-state omics and perturbation atlases, and an explicit failed-translation scoring lane.
No CAD / cardiovascular study exists on the platform yet; the canonical CETP / CAD failed-translation benchmark was substituted with IL12B / RA.
Partner-facing artifacts must be selected and manually reviewed before external use.

CNot-claimed list

The following are explicitly outside what MedC82 currently claims:

discovery-engine claim
clinical-readiness claim
public-beta claim
no-review-needed claim
external-validation claim
treatment recommendation

11What Expert Review Should Focus On

This overview is intended to support a methodology-feedback conversation. The most valuable things a researcher, lab, or data / API team can help evaluate are:

Whether MedC82 interprets their evidence layer responsibly (especially the layer they know best).
Whether source attribution is correct on any artifact provided.
Whether direct evidence and contextual evidence are clearly separated in the report.
Whether MR, pQTL MR, eQTL, and colocalization claims are caveated correctly when the lane fails or is not computable.
Whether cross-disease transfer is handled conservatively (no silent promotion of source-disease evidence into a target-disease verdict).
Whether report traceability is sufficient — every claim should resolve to a source.
Whether the internal benchmark design is fair, and what additional known-answer cases should be added.
What evidence — connector, dataset, or method — would make the platform genuinely useful to researchers in their field.

AReproducibility checklist

The following can accompany any artifact shared for expert review:

session ID
main study ID and adjacent study IDs, if any
datapack path
raw transcript preserved
source map present
validated_export present
canonical_report present
export_blocked = false
export_gate.hard_block = false
raw checksum unchanged on rebuild
evidence-chain source links verified
source appendix verified
primary result matches validated_export
action label matches validated_export
validation and causal flags match validated_export
transcript appendix present
failed MR / coloc not used as support
single-cell / spatial evidence not labeled causal unless independently validated
no treatment recommendation
causal lane clearly stated
PP4 / MR result clearly interpreted when present

BSharing verdict

Suitable for expert-feedback conversations when paired with selected, rebuilt, manually-reviewed example reports. Not suitable as a claim of independent validation, broad public availability, or clinical use.