Skip to content

feat: BMDB integration, system prompt split, LLM speed-ups, dual-DB UI#66

Draft
jcschaff wants to merge 60 commits intomainfrom
development
Draft

feat: BMDB integration, system prompt split, LLM speed-ups, dual-DB UI#66
jcschaff wants to merge 60 commits intomainfrom
development

Conversation

@jcschaff
Copy link
Copy Markdown
Member

@jcschaff jcschaff commented May 6, 2026

Summary

Brings 76 commits from development into main. Major themes:

  • BMDB (BioModels.org) integration — new bmdb_router/bmdb_controller/bmdb_schema, new service functions (fetch_bmdb_models, get_xml_file, get_bmdb_model_info), new BMDB tools wired into the LLM, and a parallel BMDB search path on the frontend.
  • System prompt split — monolithic system_prompt.py carved into a base SYSTEM_PROMPT + per-DB BMDB_SYSTEM_PROMPT / VCDB_SYSTEM_PROMPT, composed at runtime based on the selected database.
  • LLM response speed-upsshould_use_tools() skips tool round-trips for chitchat; select_tools_for_prompt() filters tools into DB_TOOLS / KB_TOOLS / PUB_TOOLS subsets via regex on the user prompt; asyncio.gather runs tool calls concurrently; summarize_tool_result() truncates large tool outputs; default_rows lowered 1000 → 25. Per-stage timing surfaced to the UI as tool_summary.
  • Dual-DB UIChatBox gains useVCDB / useBMDB checkboxes, a Stop button (AbortController), conditional quick-action button groups, and BMDB-formatted result rendering.
  • Conversation historylocalStorage-backed conversations with deep-linking via ?conversation=<uuid>, real entries in the sidebar.
  • Service renameapp.services.vcelldb_serviceapp.services.databases_service (all in-tree imports updated).
  • Misc — Suspense wrappers for Next 15 build, BMDB AI Analysis tab on /search/[bmid], settings page link updates, Pydantic v2 SettingsConfigDict(extra="ignore").

⚠️ Known issues to address before merging

Opened as draft — the following surfaced during review:

Blocking

  • Debug print() / console.log left inllms_service.py (incl. one that dumps the full messages payload), databases_service.py (top-level print on import + several CHECK/DEBUG/RAW JSON prints), tools_utils.py, vcelldb_controller.py, llms_router.py, and ChatBox.tsx (PPPPPP, RRRRRR, AAAAAA, bmkeys x2).
  • bmkeys = [] reset inside the per-tool-call loop in llms_service.py — only the last tool call's keys survive. Move the initialization above the loop.
  • No-tools fast path returns the message object, not the contentdirect_text = response.choices[0].message or "" should be .message.content or "" in llms_service.py.
  • Wrong-host connectivity probedatabases_service.get_xml_file calls check_vcell_connectivity() (DNS-checks vcell.cam.uchc.edu) before hitting biomodels.org.
  • Inconsistent BMDB base URL — backend hardcodes https://biomodels.org/; frontend / docker-compose.yml use https://www.biomodels.org/. Pick one and centralize.
  • Stray importsfrom multiprocessing import process in llms_router.py:1 (likely IDE auto-import); import Suspense from \"react\" in analyze/[id]/page.tsx:4 is a default-import typo (should be named) and unused.
  • Stray ß character in a comment in llms_service.py ("simple, conversational promptsß").

Should fix

  • Silent behavior change — LLM tool calls capped at default_rows=25 inside execute_tool regardless of what the model requests; the tool schema still advertises maximum: 50. Either raise the cap or update the schema.
  • Dead code — unused CategoryEnum / OrderByEnum in bmdb_schema.py; empty sanitize_xml_content stub in databases_service.py; commented payload/userMessage blocks duplicated across handleSendMessage and handleSendMessageBMDB in ChatBox.tsx.
  • Refactor near-duplicate send functionshandleSendMessage and handleSendMessageBMDB in ChatBox.tsx are ~140-line copies; parameterize over the database key.
  • Test coveragetest_vcelldb_service.py only got its import path updated for the rename. No tests cover the new fetch_bmdb_models / get_xml_file / get_bmdb_model_info, nor the load-bearing should_use_tools / select_tools_for_prompt regex routing on which the speed claims rest.
  • BMDB_SYSTEM_PROMPT is missing the publications guidance the old monolithic prompt had — BMDB-mode publication questions will degrade.

Migration / deployer notes

  • app.services.vcelldb_service is renamed to app.services.databases_service. Any out-of-tree importer (e.g. populate_db.ipynb, CI scripts) must be updated.
  • New env var NEXT_PUBLIC_API_URL_BMDB is consumed in frontend/app/search/page.tsx and frontend/app/search/[bmid]/page.tsx. Confirm it lands in frontend/.env.example.
  • Settings switched to Pydantic v2 SettingsConfigDict(extra=\"ignore\") — masks future env var typos silently.
  • get_llm_response now returns a 3-tuple (result, bmkeys, tool_summary); affected endpoints' JSON gains a tool_summary field.

Test plan

  • Backend tests pass: `cd backend/app && poetry run pytest tests/`
  • Frontend builds: `cd frontend && npm run build`
  • Frontend lints: `cd frontend && npm run lint`
  • Manual: `/chat` page — VCDB-only, BMDB-only, and both-DBs queries return formatted results
  • Manual: `/search/[bmid]` — both a VCDB id and a BMDB id (e.g. `BIOMD…` / `MODEL…`) render correctly, AI Analysis tab works for each
  • Manual: conversation history — start a chat, refresh, deep-link via `?conversation=`, verify restoration
  • Manual: Stop button aborts an in-flight request

🤖 Generated with Claude Code

…input and add another button 'provide history of model changes'
se, format the output results from the query
…ntroduce and test handlesendmessage2 for sending queries to BMDB
…ic questions, add the tool to llm processing
reeshapatel12 and others added 30 commits March 24, 2026 15:03
…ting tools into subsets and choosing which tools to send to llm based on user prompt; decreasing max result return
…anging format, this way the llm will stop returning false results
…d queried even on the biomodel-id-specific page instead of only one
…n the biomodel specific page (AI Analysis) on the chat screen + loads the history in the sidebar
…functions; start adding the endpoints to the router file
…out BMDB model to “ask about this specific model”
…iles and for getting information about a specific model
…to identifiers.org should be underlined with a link available, and no other elements should have links
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants