Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

All notable changes to this project are documented here. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.4.0] — 2026-04-24

### Changed
- **`reactome_cypher_schema` and `reactome://graph/schema` now return rich APOC-level data.** The previous implementation used the sparse built-in `db.schema.*` (labels / rel types / property names only). This release pulls `apoc.meta.schema()`, `apoc.meta.stats()`, `apoc.meta.{node,rel}TypeProperties()`, `db.indexes()`, `db.constraints()`, and `dbms.components()` — so clients see **per-label node counts**, **relationship cardinalities**, **property types with mandatory flags**, indexes, and constraints. The markdown digest jumps from ~40 KB (sparse) to ~80 KB (rich).
- Fetch is lazy + cached in-memory for the session. Concurrent first-callers share one round-trip via promise deduplication.

### Added
- **Startup schema prefetch.** `main()` fires `fetchGraphSchema()` in the background once the MCP is listening, so the first `reactome_cypher_schema` call doesn't wait 15–30 s on `apoc.meta.schema()` (that procedure samples 3M nodes on Reactome). Failures are logged; the cache stays empty and the next tool call retries on demand.
- 7 new tests: markdown format coverage (4) + cache behavior (caching, concurrent dedup, optional-call fallback).

### Removed
- The sparse `db.schema.*`-based schema path. No fallback — APOC is required for the Cypher schema tool. This is fine for the `reactome_neo4j_env` Docker image (APOC is always present); other deployments must load APOC for schema tooling to work.

### Notes
- **No vendored schema artifact.** The MCP fetches live on connect. No coordination with `reactome_neo4j_env` release cadence is required.

## [1.3.1] — 2026-04-21

### Added
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ Only registered when `NEO4J_URI` is set. Designed for curators running the [`rea
| Tool | Description |
|------|-------------|
| `reactome_cypher_query` | Run a Cypher query with optional parameters; row count, per-row size, and total response size are all capped; a server-side timeout terminates runaway queries |
| `reactome_cypher_schema` | Introspect labels, relationship types, and per-label property keys |
| `reactome_cypher_schema` | Live APOC introspection: labels with node counts, relationship cardinalities, per-label and per-rel property types (with mandatory flags), indexes, constraints. Cached for the session after first call; pre-warmed at MCP startup. |
| `reactome_cypher_sample` | Return a small sample of nodes for a given label |

**Read-only posture — what it is and isn't.** Sessions run in Neo4j READ mode, which rejects native write clauses (`CREATE`, `MERGE`, `DELETE`, `SET`, `REMOVE`). On top of that, `reactome_cypher_query` rejects APOC procedures that can write or reach outside the graph through back-channels (`apoc.cypher.runWrite` / `apoc.cypher.doIt`, `apoc.periodic.*`, `apoc.create/merge/refactor.*`, `apoc.load/import/export.*`, `apoc.trigger.*`, `apoc.nodes.delete`). Treat this as a guardrail against accidental mutation, not a security boundary — a real trust boundary should live at the Neo4j RBAC / plugin configuration layer, or by pointing at a read-only replica.
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "reactome-mcp",
"version": "1.3.1",
"version": "1.4.0",
"description": "MCP server for Reactome pathway database - analysis, search, and exploration tools",
"type": "module",
"main": "dist/index.js",
Expand Down
32 changes: 2 additions & 30 deletions src/clients/neo4j.ts
Original file line number Diff line number Diff line change
Expand Up @@ -107,33 +107,5 @@ export async function runRead<T = Record<string, unknown>>(
}
}

export interface GraphSchema {
labels: string[];
relationshipTypes: string[];
propertiesByLabel: Record<string, { name: string; types: string[] }[]>;
}

export async function fetchGraphSchema(): Promise<GraphSchema> {
interface LabelRow { label: string }
interface RelRow { relationshipType: string }
interface PropRow { nodeType: string; propertyName: string; propertyTypes: string[] | null }

const [labelRows, relRows, propRows] = await Promise.all([
runRead<LabelRow>("CALL db.labels() YIELD label RETURN label ORDER BY label"),
runRead<RelRow>("CALL db.relationshipTypes() YIELD relationshipType RETURN relationshipType ORDER BY relationshipType"),
runRead<PropRow>("CALL db.schema.nodeTypeProperties() YIELD nodeType, propertyName, propertyTypes RETURN nodeType, propertyName, propertyTypes"),
]);

const propertiesByLabel: Record<string, { name: string; types: string[] }[]> = {};
for (const p of propRows) {
const entry = propertiesByLabel[p.nodeType] ?? [];
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [] });
propertiesByLabel[p.nodeType] = entry;
}

return {
labels: labelRows.map((l) => l.label),
relationshipTypes: relRows.map((r) => r.relationshipType),
propertiesByLabel,
};
}
// Graph-schema access lives in src/graph/schema.ts — split out so tests
// can mock runRead across the module boundary.
109 changes: 109 additions & 0 deletions src/graph/format-schema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import type { GraphSchema } from "./schema.js";

/**
* Render a GraphSchema (as produced by fetchGraphSchema) as a compact
* markdown summary suitable for direct LLM consumption. The raw APOC
* payload is ~500 KB — much too large to return whole. This digest keeps
* the signal (labels with counts, relationship cardinalities, property
* types with mandatory flags, indexes, constraints) and drops the
* verbose apoc.meta.schema() object. Clients that need the full
* structure can read the `reactome://graph/schema` resource.
*/
export function formatGraphSchemaMarkdown(schema: GraphSchema): string {
const { stats, nodeTypeProperties, relTypeProperties, indexes, constraints } = schema;

const labelEntries = Object.entries(stats.labels ?? {}).sort(([, a], [, b]) => b - a);
const relEntries = Object.entries(stats.relTypesCount ?? {}).sort(([, a], [, b]) => b - a);

const propsByLabel = new Map<string, Array<{ name: string; types: string[]; mandatory: boolean }>>();
for (const p of nodeTypeProperties) {
const key = (p.nodeLabels?.join(":") || p.nodeType) ?? p.nodeType;
const entry = propsByLabel.get(key) ?? [];
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [], mandatory: p.mandatory });
propsByLabel.set(key, entry);
}

const propsByRel = new Map<string, Array<{ name: string; types: string[]; mandatory: boolean }>>();
for (const p of relTypeProperties) {
const entry = propsByRel.get(p.relType) ?? [];
entry.push({ name: p.propertyName, types: p.propertyTypes ?? [], mandatory: p.mandatory });
propsByRel.set(p.relType, entry);
}

const lines: string[] = [];
lines.push(`## Reactome Graph Schema`);
const dbComp = schema.dbComponents[0];
lines.push(
`**Neo4j:** ${dbComp?.versions?.[0] ?? "?"} ${dbComp?.edition ?? ""} · **Fetched:** ${schema.fetchedAt}`
);
lines.push(
`**Totals:** ${stats.nodeCount.toLocaleString()} nodes · ${stats.relCount.toLocaleString()} relationships · ${labelEntries.length} labels · ${Object.keys(stats.relTypes ?? {}).length} relationship types`
);
lines.push("");

lines.push(`### Labels (${labelEntries.length}, by node count)`);
for (const [label, count] of labelEntries) {
lines.push(`- \`${label}\` — ${count.toLocaleString()}`);
}
lines.push("");

lines.push(`### Relationship types (${relEntries.length}, by relationship count)`);
for (const [relType, count] of relEntries) {
lines.push(`- \`${relType}\` — ${count.toLocaleString()}`);
}
lines.push("");

lines.push(`### Node properties (by label)`);
const sortedLabels = Array.from(propsByLabel.keys()).sort();
for (const label of sortedLabels) {
lines.push(`- **${label}**`);
for (const p of propsByLabel.get(label)!) {
const t = p.types.length ? ` _(${p.types.join("|")})_` : "";
const m = p.mandatory ? " **required**" : "";
lines.push(` - \`${p.name}\`${t}${m}`);
}
}
lines.push("");

if (propsByRel.size > 0) {
lines.push(`### Relationship properties (by type)`);
const sortedRels = Array.from(propsByRel.keys()).sort();
for (const rel of sortedRels) {
const props = propsByRel.get(rel)!;
if (props.length === 0) continue;
lines.push(`- **${rel}**`);
for (const p of props) {
const t = p.types.length ? ` _(${p.types.join("|")})_` : "";
const m = p.mandatory ? " **required**" : "";
lines.push(` - \`${p.name}\`${t}${m}`);
}
}
lines.push("");
}

if (indexes.length > 0) {
lines.push(`### Indexes (${indexes.length})`);
for (const ix of indexes) {
const row = ix as { name?: string; labelsOrTypes?: string[]; properties?: string[]; type?: string; state?: string };
const labels = row.labelsOrTypes?.join(",") ?? "?";
const props = row.properties?.join(",") ?? "?";
lines.push(`- \`${row.name ?? "?"}\` — ${labels}(${props}) [${row.type ?? "?"}, ${row.state ?? "?"}]`);
}
lines.push("");
}

if (constraints.length > 0) {
lines.push(`### Constraints (${constraints.length})`);
for (const c of constraints) {
const row = c as { name?: string; description?: string };
lines.push(`- \`${row.name ?? "?"}\` — ${row.description ?? ""}`);
}
lines.push("");
}

lines.push(
"_For programmatic access to the full schema (including the raw `apoc.meta.schema()` output with per-relationship cardinalities and full property type inventories), read the `reactome://graph/schema` resource._"
);

return lines.join("\n");
}
137 changes: 137 additions & 0 deletions src/graph/schema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
import { runRead } from "../clients/neo4j.js";
import { logger } from "../logger.js";

export interface GraphSchema {
fetchedAt: string;
dbComponents: Array<{ name: string; versions: string[]; edition: string }>;
stats: {
nodeCount: number;
relCount: number;
labels: Record<string, number>;
relTypes: Record<string, number>;
relTypesCount: Record<string, number>;
};
schema: Record<string, unknown>;
nodeTypeProperties: Array<{
nodeType: string;
nodeLabels: string[];
propertyName: string;
propertyTypes: string[];
mandatory: boolean;
}>;
relTypeProperties: Array<{
relType: string;
sourceNodeLabels: string[];
targetNodeLabels: string[];
propertyName: string;
propertyTypes: string[];
mandatory: boolean;
}>;
indexes: unknown[];
constraints: unknown[];
}

// apoc.meta.schema() can scan many nodes; give the schema queries a longer
// budget than the default Cypher-query timeout.
const SCHEMA_FETCH_TIMEOUT_MS = 60_000;

let schemaCache: GraphSchema | null = null;
let schemaPending: Promise<GraphSchema> | null = null;

/**
* Fetch the live graph schema via APOC (+ fallbacks for indexes and
* constraints). Cached in-memory after the first successful call so
* subsequent tool invocations are free. Concurrent first-callers share
* one round-trip via the `schemaPending` promise.
*/
export async function fetchGraphSchema(): Promise<GraphSchema> {
if (schemaCache) return schemaCache;
if (schemaPending) return schemaPending;

const opts = { timeoutMs: SCHEMA_FETCH_TIMEOUT_MS };
const start = Date.now();

schemaPending = (async () => {
try {
type Comp = { name: string; versions: string[]; edition: string };
type Stats = GraphSchema["stats"];
type NodeProp = GraphSchema["nodeTypeProperties"][number];
type RelProp = GraphSchema["relTypeProperties"][number];

const [components, stats, schemaRow, nodeProps, relProps, indexes, constraints] = await Promise.all([
runRead<Comp>(
"CALL dbms.components() YIELD name, versions, edition RETURN name, versions, edition",
{},
opts
),
runRead<Stats>(
"CALL apoc.meta.stats() YIELD labels, relTypes, relTypesCount, nodeCount, relCount RETURN labels, relTypes, relTypesCount, nodeCount, relCount",
{},
opts
),
runRead<{ value: Record<string, unknown> }>(
"CALL apoc.meta.schema() YIELD value RETURN value",
{},
opts
),
runRead<NodeProp>(
"CALL apoc.meta.nodeTypeProperties() YIELD nodeType, nodeLabels, propertyName, propertyTypes, mandatory RETURN nodeType, nodeLabels, propertyName, propertyTypes, mandatory",
{},
opts
),
runRead<RelProp>(
"CALL apoc.meta.relTypeProperties() YIELD relType, sourceNodeLabels, targetNodeLabels, propertyName, propertyTypes, mandatory RETURN relType, sourceNodeLabels, targetNodeLabels, propertyName, propertyTypes, mandatory",
{},
opts
).catch(() => [] as RelProp[]),
runRead<unknown>(
"CALL db.indexes() YIELD name, state, type, entityType, labelsOrTypes, properties RETURN name, state, type, entityType, labelsOrTypes, properties",
{},
opts
).catch(() => [] as unknown[]),
runRead<unknown>(
"CALL db.constraints() YIELD name, description RETURN name, description",
{},
opts
).catch(() => [] as unknown[]),
]);

const result: GraphSchema = {
fetchedAt: new Date().toISOString(),
dbComponents: components,
stats: stats[0] ?? {
nodeCount: 0,
relCount: 0,
labels: {},
relTypes: {},
relTypesCount: {},
},
schema: schemaRow[0]?.value ?? {},
nodeTypeProperties: nodeProps,
relTypeProperties: relProps,
indexes,
constraints,
};

logger.info("graph schema fetched", {
durationMs: Date.now() - start,
nodeCount: result.stats.nodeCount,
relCount: result.stats.relCount,
labels: Object.keys(result.stats.labels ?? {}).length,
});

schemaCache = result;
return result;
} finally {
schemaPending = null;
}
})();

return schemaPending;
}

/** For tests — clears both the cached value and any in-flight fetch. */
export function _resetGraphSchemaCache(): void {
schemaCache = null;
schemaPending = null;
}
15 changes: 14 additions & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ import { registerAllResources } from "./resources/index.js";
import { logger } from "./logger.js";
import { CONTENT_SERVICE_URL, ANALYSIS_SERVICE_URL, NEO4J_URI } from "./config.js";
import { buildServerInstructions } from "./instructions.js";
import { fetchGraphSchema } from "./graph/schema.js";

const server = new McpServer(
{ name: "reactome", version: "1.3.1" },
{ name: "reactome", version: "1.4.0" },
{ instructions: buildServerInstructions() }
);

Expand All @@ -24,6 +25,18 @@ async function main() {
analysisService: ANALYSIS_SERVICE_URL,
neo4jEnabled: Boolean(NEO4J_URI),
});

// Warm the schema cache in the background so the first
// reactome_cypher_schema call (or reactome://graph/schema read) doesn't
// wait 15–30s on apoc.meta.schema(). Failures are logged; the cache
// stays empty and the tool call will retry on demand.
if (NEO4J_URI) {
fetchGraphSchema().catch((err) => {
logger.warn("graph schema prefetch failed; will retry on first use", {
error: err instanceof Error ? err.message : String(err),
});
});
}
}

main().catch((error) => {
Expand Down
2 changes: 1 addition & 1 deletion src/instructions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ A local Neo4j Reactome graph is available. Use it when the user wants a query th

**Workflow for Cypher:**

1. Call \`reactome_cypher_schema\` (or read the \`reactome://graph/schema\` resource) **before writing any query** to learn the live labels, relationship types, and properties. Never guess the schema.
1. Call \`reactome_cypher_schema\` (or read the \`reactome://graph/schema\` resource) **before writing any query**. The schema tool returns labels with node counts, relationship cardinalities, per-label and per-rel property types (with mandatory flags), indexes, and constraints. Pulled live via APOC on first use and cached in-memory for the session (warm after the MCP's startup prefetch). Never guess the schema.
2. Use \`reactome_cypher_sample\` on a label to see a representative node's shape.
3. Write a Cypher query with \`reactome_cypher_query\`. Rules:
- Sessions run in READ mode; write clauses will be rejected.
Expand Down
3 changes: 2 additions & 1 deletion src/resources/static.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { contentClient } from "../clients/content.js";
import type { Species, Disease } from "../types/index.js";
import { isNeo4jConfigured, fetchGraphSchema } from "../clients/neo4j.js";
import { isNeo4jConfigured } from "../clients/neo4j.js";
import { fetchGraphSchema } from "../graph/schema.js";

export function registerStaticResources(server: McpServer) {
// All species
Expand Down
Loading
Loading