Platform Overview · Biomedical Evidence Review

MedC82 Platform Overview

An open biomedical research council for evidence review, hypothesis stress-testing, and traceable scientific triage.

MedC82 combines public literature, structured biomedical databases, disease-specific evidence packs, and a six-model adversarial AI council. The platform is designed to help researchers separate validated evidence from speculative leads, identify missing evidence layers, compare cross-disease transfer plausibility, and generate falsifiable next-step experiments.

SubjectMedC82 platform · evidence · council · report
AudienceResearchers, labs, data/API partners
Document typePlatform overview · expert-facing
StatusOpen · non-commercial
!
This overview is intended for expert feedback and methodology review. It is not medical advice or a treatment recommendation.

01Platform Summary

MedC82 is a disease-agnostic, study-scoped biomedical research platform. It builds structured evidence packs from literature, genetics, omics, single-cell, spatial, drug/pathway, clinical, and study-specific sources, then routes that pack through a six-model adversarial AI council that reviews the same evidence across four rounds. The final report is a canonical research triage surface — not raw model output.

The current strength is evidence review, hypothesis stress-testing, cross-disease caution, and falsifiable next-step framing. The platform may surface:

  • validated causal anchors
  • validate-first candidates
  • exploratory hypotheses
  • useful negative / triage results
  • STOP decisions
  • non-promoted follow-up ideas
The council does not discover truth. It tests research leads against available evidence, missing layers, adversarial objections, and validation gates. MedC82 outputs should be read as research evidence synthesis and triage, not clinical or treatment recommendations.
What MedC82 can supportEvidence neededWhat MedC82 cannot claim yet
Exploratory disease-state hypothesisHuman disease-state data, outcome or cohort context where available, clear caveats, and a falsifiable next experimentCausality, treatment response, or clinical action
Validated causal anchorMechanism-appropriate causal lane — for example, coding-variant MR, validated pQTL colocalization, or validated tissue eQTL colocalizationGeneral causal claims outside the validated mechanism lane
Patient-stratification or prognostic leadDisease-matched cohort, spatial, molecular, or outcome-linked evidence with confounder-aware validation needsTherapeutic mechanism or treatment recommendation by itself
Useful triage / no-primary resultAdversarial review showing candidates are weak, off-contract, or under-validatedThat the biology is false
Follow-up ideaSpecific in-run grounding plus a concrete next experimentThat the idea is canonical or validated
Treatment recommendationNot supported by current MedC82 export surfacesMedC82 reports should not recommend treatment combinations

02Why This Is Not Just Search

The platform is built for the gap between search and expert review: it does not simply retrieve papers, and it does not claim to prove discoveries. It organizes heterogeneous biomedical evidence into a structured pack, forces multiple AI models to argue over it, and then applies report-safety rules so the final output separates validation, caution, failed transfer, and exploratory ideas.

LayerWhat it doesWhat it does not do
SearchRetrieves documents matching keywords / semantic similarityDoes not separate validated evidence from speculative leads; does not test cross-disease transfer; does not produce a verdict
MedC82 evidence packBounded, study-scoped object with source provenance, evidence type, disease context, causal-lane tagging, and explicit missing-layer caveatsDoes not eliminate the need for expert review; does not turn association into causality
MedC82 AI CouncilSix adversarial roles debate the same bounded evidence across four rounds and produce a structured synthesisDoes not prove truth by consensus; does not override deterministic safety gates
MedC82 report layerSeparates validated anchors, validate-first candidates, deprioritized / demolished candidates, and exploratory hypotheses with traceability back to sourceDoes not replace expert review; raw transcript proposals are not the final answer
Expert reviewConfirms interpretation, source attribution, causal-lane framing, and translational plausibilityMedC82 is designed to support this layer, not replace it

The value is not "AI answer generation." The value is structured evidence triage and adversarial stress-testing of research leads, with explicit no-primary and validate-first surfaces when evidence is not yet sufficient.

03Evidence and Data Layers

Evidence packs are study-scoped and connector-scoped. A connector can contribute causal evidence, association evidence, context evidence, reference evidence, safety evidence, or missing-layer caveats. The table below names the common sources and how each should be interpreted. "Where configured" indicates a connector that is integrated for some studies but not all.

LayerSource / connectorContributesEvidence classCannot proveExample platform use
Literature / paper corpusPubMed, Europe PMC, PMC Open Access, OpenAlex, Semantic Scholar, DOI metadata, retrieved corpus sourcesPublished evidence, biological plausibility, prior findings, outcome context, mechanistic background, caveats, and source-linked support for council claimsLiterature / contextCausality by itself unless the study design and validation lane support itTraceable support, prior art, caveats, and outcome context
Clinical trialsClinicalTrials.gov, WHO ICTRP, trial-outcome connectorsTrial history, phase, outcomes, completion statusClinical contextMechanism or efficacy by trial existence aloneTranslational caveats, failed-translation evidence
GEO dataset discoverygeo_dataset_discoveryPublic dataset availability, accession, assay type, matrix availability, sample metadata, linked paper metadata where exactDiscovery metadataExpression effect, causality, or evidence quality by dataset existence aloneIdentifies candidate datasets for future ingestion or validation
Single-cell / spatial datasetsCELLxGENE, study-specific single-cell connectors, curated GEO/index records (where configured)Disease-state, tissue, cell-state, or compartment contextContextCausality or treatment response by itselfCell-state, spatial, or tissue-compartment hypothesis support
Disease-specific outcome literatureOutcome paper connectors or literature corpusArticle-level outcome-linked contextOutcome contextRaw matrix causal proof or treatment recommendationCohort/outcome plausibility and validation needs
Cancer genomics / clinical contextTCGA, GDC, cBioPortal where configured, study-specific connectorsMolecular subtype, mutation, copy-number, expression, clinical metadataMolecular ctxCausality without a matching validation laneCohort stratification or disease-state context
Proteomics / phosphoproteomicsCPTAC where configured, study-specific proteomics connectorsProtein and phosphoprotein contextMolecular ctxCausality without matching causal layerPathway / protein-state context
GWAS CatalogGWAS connectorDisease associationsAssociationCausal mechanism by itselfAssociation priors and candidate context
Open Targets / L2GOpen TargetsGenetic association and likely-gene priorsAssociationFinal causality aloneCandidate ranking / context
MR / pQTL MR / colocalizationOpenGWAS / MR-style evidence, MR-coloc connectorsProtein-abundance genetic association and coloc validation attemptCausal lane if colocPositive support when colocalization fails or is not computableMechanism-specific causal validation or caveat
MR-KG literatureMR literature connectorPublished MR study referencesAssociationReplacement for in-pack colocPublished MR context
eQTL CatalogueeQTL connectorExpression / regulatory contextMol. assoc.Causality without colocInput to tissue eQTL coloc
Tissue / brain eQTL colocTissue eQTL coloc connectorShared-variant expression mechanism testCausal lane if PP4Coding-mechanism validationTissue or cell-type expression mechanism
Coding-variant MRCurated coding-variant MR laneProtein-altering variant causal laneCausalProtein-abundance or tissue-expression mechanismCoding-variant mechanism validation
HPAHuman Protein AtlasTissue / atlas referenceReferenceDisease causalityTissue expression / context
ChEMBL / DGIdb / OpenFDADrug, target, and adverse-event databasesBinding, tractability, drug-gene interaction, and adverse-event contextDruggabilityDisease causal supportTarget tractability and safety signal context
Reactome / SIGNOR / STRINGPathway and interaction connectorsPathway membership and interaction network contextReferenceCausalityNetwork / pathway context
Perturbation / dependencyLINCS, PRISM, DepMap, scPerturb where availableCell-line drug / target / dependency contextLow-tier contextHuman disease validationExperimental feasibility and biological perturbation context
Safety / adverse eventsFAERSPost-marketing adverse-event signalsSafety ctxCausal mechanism aloneTranslational safety caveats
Imaging / neuroimagingOpenNeuro, XNAT, NITRC where availableDisease-relevant imaging metadata and contextContextDisease causalityNeuroimaging-specific stratification context
Missing or skipped layersConnector absence or study-inapplicabilityMissing-layer caveatsMissing / irrelevantShould not be inferred as evidenceExplicit not-available caveats

04How the Evidence Pack Is Built

AStudy setup

Every study is scoped by study_id. Cross-study runs can include a main study plus explicitly requested adjacent studies. Cross-study evidence is combined at retrieval / council time with source_study_id provenance, not merged as if all evidence came from one disease.

BIngestion and normalization

MedC82 ingests literature, structured databases, single-cell and spatial datasets, omics connectors, genetics connectors, and drug / pathway connectors. Entities and source metadata are normalized and carried through the evidence pack and export system. Evidence is tagged at the row level as direct, contextual, caveat, missing-layer, causal-lane, or support.

CDatapack construction

The datapack is the compact research object shown to the council. It includes:

  • evidence_text: human-readable structured evidence and retrieved literature
  • full_context_text: broader context supplied to the council
  • source citations and source maps
  • structured evidence rows with tags such as [EV:...] and [DB:...]
  • per-gene evidence profiles
  • explicit missing-layer caveats — for example, missing pQTL instrument or failed colocalization

The pack is assembled before council review. The raw council transcript is preserved as audit material, but it is never the canonical final answer.

DCanonical export

The canonical export surface is built by backend/app/services/validated_export.py::build_validated_export. It creates validated_export, top_answer, finding_semantics, canonical_report, export_gate, and insight_assessment. The HTML export is built from frontend rendering code that checks export_gate.hard_block, validates the canonical_report, and uses backend canonical display fields for the report top answer.

Pipeline · ingestion to report 4-round adversarial council
A
Data connectors
Literature, genetics, omics, single-cell, drug/pathway
B
Evidence scores · graph enrichment
Structured rows, tags, per-gene profiles
C
Evidence pack · datapack
Compact research object, missing-layer caveats
D
Retrieval context
Source map, citations, full context text
Council · 4 rounds
R1
Blind analysis
Independent reads, no cross-talk
R2
Audit · discussion
Cross-examination of claims
R3
Adversarial attack
Demolish weak / overstrong claims
R4
Synthesis
No resurrection of demolished claims
Outputs
▸ Canonical
validated_export · canonical_report
Strict gate · HTML report
▸ Audit
Raw transcript appendix
Non-canonical · auditability only

05The AI Council

The AI Council is the adversarial reasoning layer that sits between the structured evidence pack and the canonical report. It is not a vote-counting system and it does not prove truth by consensus. It asks multiple differently biased research roles to read the same bounded evidence, challenge one another, and produce a final synthesis that can still be demoted by deterministic export checks.

AModel roles

The council roles are deliberately different. The names below describe the prompt roles in the live code; they should be read as research functions, not as guarantees that any one model is correct.

Council roleMain jobUseful forCannot do alone
Maverick TheoristPropose non-obvious, testable hypothesesSurface new framing from the evidence packValidate causality or treatment response
Evidence AuditorChallenge evidence quality, replication, sample size, and citation supportKeep claims source-grounded and appropriately caveatedDiscover every possible mechanism
Cross-Domain ConnectorLook for transferable biology across diseases and fieldsGenerate cross-study hypothesesTurn analogy into proof
Translational SpecialistAsk what would be needed for a real translational pathIdentify safety, biomarker, and validation gapsRecommend clinical treatment from exploratory evidence
Assumption DestroyerAttack hidden assumptions and propose falsifiersPrevent attractive but weak claims from surviving unchallengedProve a claim false without adequate evidence
Literature SpecialistMine the provided corpus for buried signals and convergenceFind overlooked corpus patternsUse unsupported memory or invented citations as evidence

BFour-round workflow

RoundWhat happensWhat the round is designed to catch
Round 1 — blind independent analysisEach role reads the same research context and user query without seeing the other roles' answers.Independent hypotheses, missed signals, and initial evidence interpretations
Round 2 — audit / discussionEach role reads compressed Round 1 outputs and states agreement, disagreement, missed findings, and revisions.Early convergence, contradictions, weak evidence, and claims that need refinement
Round 3 — adversarial attackThe Assumption Destroyer attacks converged claims; other roles must defend, concede, or revise.Overclaiming, hidden assumptions, methodological flaws, unverified claims, and missing falsifiers
Round 4 — neutral synthesisA neutral synthesizer reads the context and compressed Rounds 1-3. It is instructed not to add new claims, not to introduce uncited evidence, not to resurrect demolished claims, and to preserve dissent.Final adjudication, demotion of weak ideas, unresolved splits, and safer framing

CGeneric claim lifecycle

StageGeneric lifecycle
Raw proposalA model raises a hypothesis from the evidence pack.
Evidence supportOther roles identify which sources or structured rows support it.
Adversarial caveatRound 3 tests whether the claim is causal, overbroad, off-contract, or missing decisive evidence.
Canonical classificationRound 4 plus export logic classifies it as validated, exploratory, no-primary / triage, STOP, unresolved, or follow-up only.
HTML displayvalidated_export and canonical_report supply the top report surface, while the raw transcript remains in the appendix.
Why the transcript can look stronger than the final answer. Raw model outputs are preserved so researchers can audit the debate. That means the appendix may contain early or overstrong claims that were later challenged. Round 3 and Round 4 can force a claim to be narrowed, demoted, or rejected, and the canonical export can further block unsafe wording before HTML display. The practical rule is: the top report is the canonical answer; the appendix is the non-canonical audit trail.

06MR, eQTL, Colocalization & Causal Lanes

Plain-language summary

Mendelian randomization (MR) asks whether genetic variation associated with an exposure is also associated with disease. pQTL MR uses genetic instruments for protein abundance. Colocalization asks whether the exposure association and the disease association likely share the same causal variant.

MR without colocalization can be misleading because linkage disequilibrium or horizontal pleiotropy can make two nearby signals look connected. In MedC82, failed or unavailable pQTL MR colocalization is a caveat only. It is never positive support.

AWhat colocalization actually measures

Colocalization estimates the posterior probability that both traits share one causal variant. In the ABF model, PP4 is the posterior probability for a shared causal variant.

PP4 rangeMedC82 interpretationPromotion consequence
PP4 > 0.8   ValidatedValidated colocalization supportCan support a causal lane if mechanism and disease context match
0.5 < PP4 ≤ 0.8   SuggestiveSuggestive / unresolvedUsually validate-first or exploratory, not definitive
PP4 < 0.5, when ABF colocalization actually ran   Computed negativeComputed negative colocalization for that pQTL-MR laneCaveat only; do not use as positive support. Not the same as not-computable cases.

BHow to interpret zero pQTL-coloc validations

A MedC82 run may find MR or pQTL association context but zero pQTL-colocalization validations. This means the pQTL-MR causal lane did not validate a protein-abundance mechanism in that run. It is not proof that every biological hypothesis is false.

Historical summaries may collapse multiple outcomes into failed or PP4=0.000, including true computed-negative ABF colocalization, no suitable pQTL top hit, regional summary-stat fetch failure, too few overlapping SNPs, missing or insufficient disease GWAS / locus data, or QC / input failure.

StatusMeaningHow MedC82 should interpret it
validated_colocABF colocalization ran and PP4 passed thresholdPossible causal support if mechanism, disease context, and source traceability match
computed_negative_colocABF colocalization ran and PP4 stayed below thresholdThis pQTL-MR causal lane did not support the mechanism
not_computable_no_pqtl_tophitNo suitable pQTL instrument / top hitAbsence of validation, not biological refutation
not_computable_low_snp_overlapInsufficient overlapping SNPsAbsence of validation, not biological refutation
not_computable_regional_fetch_failedRequired regional summary statistics unavailable or fetch failedAbsence of validation
not_computable_missing_gwasDisease / locus GWAS unavailable or insufficientAbsence of validation
qc_errorInput or QC problemCannot interpret as negative evidence

CMechanism-specific causal lanes

Mechanism typeCorrect causal testExampleWhat failure means
Protein abundance mechanismpQTL MR plus colocalizationPlasma / protein abundance associated with diseaseNo pQTL-coloc validation means the protein-abundance lane did not validate; not-computable cases are absence of validation, not biological refutation
Expression / regulatory mechanismTissue / cell eQTL colocalizationDisease-relevant tissue expression colocFailed eQTL coloc does not refute coding mechanism
Coding / protein-altering mechanismCoding-variant MRProtein-altering variant instrumentFailed pQTL / eQTL should not penalize the coding lane if the coding variant itself is the instrument
Disease-state cell contextSingle-cell / spatial evidenceDisease-relevant cell state or compartmentContext, not causal validation
Treatment-response contextOutcome cohort / prospective validationOutcome-linked cohort evidenceOutcome-linked plausibility, not causality unless controlled

Positive colocalization or coding-variant MR can support a validated causal anchor when mechanism, disease context, and source traceability match. Failed pQTL MR becomes a caveat. MR-only without colocalization does not promote a causal claim.

07Signal Triage and Report Outputs

After the council completes its four rounds, the report layer classifies each candidate using a labeled action state. These labels are deterministic and gate what the final report surface can say.

AReport state and action labels

LabelMeaningRequired to apply
VALIDATED_TARGET   Positive report stateTop-level report state when at least one finding qualifies as a validated anchor for the target disease.Canonical biology, target-disease support, at least one positive validation lane, no disqualifying action_label.
VALIDATED_ANCHOR   Positive finding statePer-finding label for a known validated anchor where mechanism-appropriate evidence supports it.Same predicate as VALIDATED_TARGET but applied at the finding level; never overrides an explicit blocked action_label.
VALIDATE_FIRST   CautionPotentially interesting, but decisive validation is missing in the current pack.Evidence is suggestive; council does not endorse promotion until specific validation lanes are met.
DEPRIORITIZED   CautionUseful enough to record, but not strong enough to lead.Adversarial review reduces the candidate's standing without rejecting it outright.
DEMOLISHED   NegativeAdversarial review rejected it in the current evidence context.Round 3 attack survived; export layer must not present it positively.
BLOCKED   NegativeCompatibility validator blocked the candidate from primary promotion.Used for wrong-disease evidence, contextual-only support, or off-contract candidates.
NO_PRIMARY_PROMOTED   TriageBroad derived state — every primary candidate is effectively blocked.Phase 5d derived gate; valid result, not a failure.
NO_PRIMARY_SURVIVED   TriageStrict compatibility-validator escalation; no primary candidate survived gate enforcement.Equivalent to a deliberate "we cannot promote anything from this pack" verdict.
CONTEXTUAL_ONLY   ContextEvidence is present but cannot be treated as direct target-disease validation.Common for cross-disease anchors lifted into a different target disease.
EXPLORATORY   HypothesisHypothesis-grade research lead.Useful for research design; not validation; not actionable.
CALIBRATION_ANCHOR   ReferencePre-curated canonical anchor used for calibration.Not promoted as a novel discovery; serves as a known reference.
HYPOTHESIS   HypothesisGeneric hypothesis label for non-anchor proposals.Requires next-step framing; not validation.

BFinal report surfaces

SurfaceHow to read it
Top answer / final adjudicationThe canonical researcher-facing interpretation. It should match validated_export and canonical_report.
Evidence ledger / decision surfaceCandidate-level buckets, caveats, evidence chains, and validation status. A mixed section is not automatically a validated section.
Evidence chainsSource-linked reasoning steps. Links should resolve by exact source identity or fall back to dataset / source-label-only.
Direct vs contextual evidenceDirect evidence supports the target-disease claim; contextual evidence supports plausibility or stratification but cannot validate causality alone.
Numeric evidenceWhen the council names MR / coloc / eQTL / GWAS / L2G / pQTL / causal evidence, numeric support (p, PP4, OR, β, CI, L2G, n, scientific notation) is expected. The R13 warning fires when such language appears without numeric backing.
Council transcriptNon-canonical audit trail. It can contain rejected, overstrong, or early-round proposals.
DissentPreserved for auditability so reviewers can see where models disagreed.
Source linksSource-linked citations for each claim; reviewers should verify exact source identity before treating as proof.

CRender guards

The export layer applies these deterministic guards before HTML display:

  • Failed coloc cannot render as validated support.
  • Failed pQTL MR cannot render as validation.
  • MR without coloc cannot render as definitive causal proof.
  • Single-cell / spatial descriptive evidence cannot render as causal validation.
  • Literature-only outcome context cannot render as causal validation.
  • Adjacent-study evidence cannot render as primary-study proof without explicit provenance and framing.
  • Druggability / pathway membership cannot render as validation.
  • Model consensus cannot render as evidence.

08Safety, Guardrails, and Traceability

MedC82 enforces a small set of report contracts that protect against the most common ways AI-assisted research outputs can mislead. These are deterministic safety gates applied after the council finishes its rounds but before HTML export.

ARecent trust-fix history

The platform has gone through several rounds of targeted trust fixes. The current state reflects the cumulative effect of those fixes; the suite was last run at 257 / 257 passing.

FixWhat it addsWhy it matters
7 regression gaps 15 backend regression tests added for historical failure modes (Round 1 ideas shown as final findings; anchor count inconsistency; stale hotsheet vs canonical_report; calibration anchors promoted as primary; front_summary promoting a demolished gene; wrong-disease evidence used as target-disease validation; Round 1 → Round 4 demotion not reflected in final output). Existing safety gates already caught these cases; we now have automated proof and a tripwire for future regressions.
Phase 5f — evidence citation telemetry New evidence_citation_telemetry module counts [EV:] tags, numeric mentions, and structured source-type mentions per round and per model role. Turns "the council under-quotes numerics" from a suspicion into a measurable per-session signal.
R13 — numeric-evidence warning R13_NUMERIC_TIER1_EVIDENCE_REQUIRED emits a warning when MR / coloc / eQTL / GWAS / L2G / pQTL / causal-evidence language is used without numeric support in the same field or linked evidence detail. Surfaces narrative-only causal claims for researcher review without hard-blocking legitimate exports. Warning-only severity.
Phase 5g — positive validated-target state Added VALIDATED_TARGET / VALIDATED_ANCHOR labels gated by a conservative predicate (canonical biology + target-disease support + at least one positive validation lane + no disqualifying action_label). Lets the platform say YES cleanly when evidence genuinely supports it, without weakening no-primary, validate-first, or demolished protections.
Phase 5g — R4 JSON robustness When the R4 structured-verdict JSON is missing but canonical_report carries findings + lead_summary + verdict, degrade hard_blockreview_required / warning. Stops legitimate audit-mode sessions from being killed by an R4 JSON parse hiccup when the verdict is otherwise present.
Phase 5f-mini — bare-space numeric formats R13 numeric detector recognizes bare-space evidence formats (L2G 0.92, PP4 0.989, p 5e-8, OR 1.25, CI 1.1–1.4, n 1200) without weakening the rule. Eliminates noisy false positives that were undermining R13's signal-to-noise ratio.

BWhy this matters for trust

  • Source traceability — citation paths must resolve to exact source identity or fall back to dataset-only.
  • No-primary protections — the broad derived gate prevents primary promotion when every candidate is effectively blocked.
  • Wrong-disease evidence protections — cross-disease anchors require explicit transfer qualifiers and cannot render as target-disease validation.
  • Calibration-anchor protections — pre-curated canonical anchors do not silently surface as novel discoveries.
  • Numeric-evidence warnings — R13 surfaces narrative-only causal claims so reviewers can check whether numeric support is present.
  • Report state consistency — top-level report_state and per-finding action_label must agree across hotsheet, front_summary, and canonical_report panels.
  • Positive validated-target state — added without weakening no-primary, validate-first, or demolished protections; the predicate is conservative by design and never overrides an explicit blocked action_label.

CCode paths inspected (appendix)

For technical reviewers who want to verify the safety gates against the codebase. The main flow of this overview is understandable without reading these files.

AreaFile / functionWhat it owns
pQTL MR colocbackend/app/connectors/mr_coloc_validator.pyOpenGWAS-style source, pQTL / GWAS regional stats, ABF coloc, PP4 thresholds, mr_coloc storage
Coloc lifecyclebackend/app/services/coloc_lifecycle.pyColoc write / version / staleness contract
Tissue eQTL colocbackend/app/connectors/tissue_eqtl_coloc_connector.pyTissue / cell eQTL coloc lane and PP4 thresholds
Coding-variant MRbackend/app/services/coding_variant_mr.pyCurated coding-variant lane
Evidence tiersbackend/app/services/evidence_tiers.pySource-tier metadata and failed-coloc not-positive-signal logic
Council enginebackend/app/services/council_engine.pyFour-round council flow and post-synthesis surface handling
Council promptsbackend/app/services/council_prompts.pyAdversarial and synthesis instructions
Validated exportbackend/app/services/validated_export.py::build_validated_exportCanonical export generation and strict gate
Export gatebackend/app/services/export_gate.pyHard-block gate semantics
Report contractbackend/app/services/report_contract_validator.pyR1–R13 contract rules including the numeric-evidence warning
Telemetrybackend/app/services/evidence_citation_telemetry.pyPer-round, per-model evidence-citation counts
HTML exportfrontend/src/components/council/CouncilResponse.tsx::buildFinalHtmlPureCanonical report and transcript appendix rendering

09Internal Benchmark Checks

Three internal benchmark sessions were run to test whether the platform produces the correct verdict shape on known-answer cases. These are internal behavior checks, not external validation.

BenchmarkExpected behaviorResult after current fixesStatusCaveat
IL6R / rheumatoid arthritis
YES case
Recognize as clinically-validated target with approved drug-class context. After Phase 5g rebuild: report_state = VALIDATED_TARGET, IL6R action_label = VALIDATED_ANCHOR, export unblocked, lead summary correctly references tocilizumab / sarilumab clinical context. Caveat about unresolved genetic causality preserved. PASS Internal benchmark only. First run exposed a packaging issue (no positive verdict label); later fixed in Phase 5g.
IL12B / rheumatoid arthritis
CAUTION case
substituted for CETP / CAD
Refuse to promote despite strong germline genetic signal, because RA Phase 3 trials of the IL-12 / 23 pathway failed in RA even though they succeeded in adjacent autoimmune diseases. IL12B → action_label = DEPRIORITIZED. Lead summary cites PP4 = 0.000 for failed MR colocalization and the failed RA Phase 3 history. R13 warnings caught narrative-only causal mentions on nested top-answer titles — the rule working as intended. PASS Structural substitute for the CETP / coronary artery disease canonical failed-translation case; no cardiovascular study is on the platform. The substitution is annotated, not hidden.
PTPN22 RA → DLBCL
NO case
Refuse to promote a strong RA / autoimmune-disease causal anchor into a different target disease (DLBCL). Recognize as contextual evidence only. final_adjudication.report_state = NO_PRIMARY_PROMOTED, no_primary_promoted = True, PTPN22 action_label = DEMOLISHED. Lead summary correctly states RA evidence does not transfer to DLBCL. Calibration anchors (CTLA4, HLA-DRB1, IL6R, CD40) correctly deprioritized. PASS Strong internal evidence of cross-disease guardrail behavior. Not external validation.

These benchmarks demonstrate that the report layer produces the correct verdict shape (YES / CAUTION / NO) on cases where the answer is known in advance. They do not constitute independent validation, and a larger known-answer set is still needed.

10Current Readiness and Limitations

AReadiness matrix

Use caseCurrent statusAllowed framingNot allowed
Internal testingReadySolo internal use; running benchmarks; iterating on the contract layer.Sharing raw outputs externally without manual review.
Curated expert feedback · friendly demoReady with caveatsSelected, rebuilt, manually-reviewed artifacts shown to a known expert with a written caveat sheet.Sending arbitrary fresh outputs blind.
University / data-partner feedback outreachReady with caveatsFramed as a request for methodology feedback and validation, not as a discovery claim.Pitching as a finished discovery engine, or implying external validation that does not exist.
External researcher betaNot readyAny unsupervised external researcher use, including self-serve account creation.
Public unsupervised useNot readyPublic access, marketing as a discovery platform, anything that bypasses expert review.
Clinical · patient-facing useNoAny clinical decision support, patient-facing recommendation, or treatment claim.

BStanding limitations

  • MedC82 is not a validated discovery engine — internal benchmarks are necessary but not sufficient.
  • MedC82 is not for clinical use.
  • R4 synthesis still tends to under-quote numeric evidence; telemetry measures this but does not fully solve it.
  • R13 is warning-only, not hard-blocking. A behavioral baseline (≥30 sessions) is needed before elevation to blocker is considered.
  • The benchmark set is small (3 cases). A larger known-answer set, ideally 10–20 cases drawn from independent sources, is still needed.
  • External expert review is still needed, especially for MR / coloc / eQTL interpretation and source attribution.
  • Dataset and source-link traceability should still be checked on any artifact before sending externally.
  • Older HTML reports may pre-date current fixes and should be rebuilt before sharing.
  • Stronger novelty pressure is still needed — better candidate generation, explicit negative-evidence weighting, deeper consumption of disease-state omics and perturbation atlases, and an explicit failed-translation scoring lane.
  • No CAD / cardiovascular study exists on the platform yet; the canonical CETP / CAD failed-translation benchmark was substituted with IL12B / RA.
  • Partner-facing artifacts must be selected and manually reviewed before external use.

CNot-claimed list

The following are explicitly outside what MedC82 currently claims:

  • discovery-engine claim
  • clinical-readiness claim
  • public-beta claim
  • no-review-needed claim
  • external-validation claim
  • treatment recommendation

11What Expert Review Should Focus On

This overview is intended to support a methodology-feedback conversation. The most valuable things a researcher, lab, or data / API team can help evaluate are:

  • Whether MedC82 interprets their evidence layer responsibly (especially the layer they know best).
  • Whether source attribution is correct on any artifact provided.
  • Whether direct evidence and contextual evidence are clearly separated in the report.
  • Whether MR, pQTL MR, eQTL, and colocalization claims are caveated correctly when the lane fails or is not computable.
  • Whether cross-disease transfer is handled conservatively (no silent promotion of source-disease evidence into a target-disease verdict).
  • Whether report traceability is sufficient — every claim should resolve to a source.
  • Whether the internal benchmark design is fair, and what additional known-answer cases should be added.
  • What evidence — connector, dataset, or method — would make the platform genuinely useful to researchers in their field.

AReproducibility checklist

The following can accompany any artifact shared for expert review:

  • session ID
  • main study ID and adjacent study IDs, if any
  • datapack path
  • raw transcript preserved
  • source map present
  • validated_export present
  • canonical_report present
  • export_blocked = false
  • export_gate.hard_block = false
  • raw checksum unchanged on rebuild
  • evidence-chain source links verified
  • source appendix verified
  • primary result matches validated_export
  • action label matches validated_export
  • validation and causal flags match validated_export
  • transcript appendix present
  • failed MR / coloc not used as support
  • single-cell / spatial evidence not labeled causal unless independently validated
  • no treatment recommendation
  • causal lane clearly stated
  • PP4 / MR result clearly interpreted when present

BSharing verdict

Suitable for expert-feedback conversations when paired with selected, rebuilt, manually-reviewed example reports. Not suitable as a claim of independent validation, broad public availability, or clinical use.