Skip to content

Blog: Granite Switch in Mellea#59

Open
planetf1 wants to merge 22 commits into
mainfrom
blog/granite-switch
Open

Blog: Granite Switch in Mellea#59
planetf1 wants to merge 22 commits into
mainfrom
blog/granite-switch

Conversation

@planetf1

@planetf1 planetf1 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Closes #58

Summary

Granite Switch bakes a curated set of Granite intrinsics into a single
vLLM-served checkpoint. This post leads with the end-user value
(adding validation, requirement checks, hallucination detection to a
Mellea program is a single function call), shows two intrinsics
running end-to-end against a live vLLM server, and explains the
architectural shift that makes it work — the routing between
behaviours is part of the model itself, not orchestration around it.

Content checklist

  • Hook — drop-in validation as the value proposition
  • How it works — routing baked into the model (LoRA-hot-swap and
    LLM-as-judge framed as the alternatives the reader knows)
  • Setup — granite-switch plugin install + vllm serve + mellea install
  • Demo — answerability with real output ("answerable"/"unanswerable")
  • Demo — hallucination detection with sentence-level output
  • When this fits — light closer; links to intrinsics overview docs
  • Try it — links to repo, examples, docs, model card
  • Header image (hub-and-spoke SVG)

Technical validation

  • markdownlint: 0 errors
  • Live smoke test confirmed against
    ibm-granite/granite-switch-4.1-3b-preview: both code snippets run
    clean; output values ("answerable"/"unanswerable",
    "faithful"/"unfaithful") match the post exactly.

Verification status

Platform Runtime Status
Linux (IBM LSF, via bvllm) vLLM Verified — both snippets executed end-to-end, output matches blog
macOS ? Open — Switch doesn't support Ollama; investigating what the macOS path looks like.

To run vLLM for validation (internal tooling — bvllm):

bv run ibm-granite/granite-switch-4.1-3b-preview

Launches vLLM on an IBM LSF cluster and returns an OpenAI-compatible
endpoint.

Open questions for the reviewer

Each of these has a callout at the relevant point in the post; this
list is the consolidated todo. The post is honest about being preview
software, but several setup details are stricter than upstream docs
and need a re-test before publish.

  1. granite-switch[vllm20] vs granite-switch[vllm] — upstream
    granite-switch README leads with [vllm] (broad CUDA 12.x compat,
    vLLM 0.19.x); we've only validated [vllm20] (CUDA 13+, vLLM 0.20+)
    on LSF. Decide which to lead with, and consider showing only one to
    keep install simple.

  2. --enable-auto-tool-choice --tool-call-parser granite4 — these
    flags were added after bvllm testing showed intrinsics didn't
    dispatch without them. The upstream granite-switch README, the HF
    model card, and docs/docs/integrations/openai.md all omit them.
    Re-test against a vanilla vllm serve <model> invocation; if
    dispatch works without them, drop to match upstream.

  3. macOS path — Switch doesn't run under Ollama. A macOS option is
    under investigation; nothing confirmed. Expand the setup section if
    a macOS path lands before merge.

Claims avoided

  • Switch is not framed as a "better aLoRA" (IBM frames it as
    coarse-grained expert switching).
  • No claim of multi-backend support — OpenAIBackend only on main
    today.
  • Adapter selection is described as a chat-template control token, not
    a runtime API call.
  • No production recommendation — Switch model IDs are -preview.

Notes for reviewer

All code snippets match the logic in
docs/examples/granite-switch/ in the main Mellea repository — they're
tested by the e2e suite and confirmed against a live vLLM instance.

The three reviewer callouts in the post (one per open question above)
are tagged **Reviewer note —** so they're easy to grep before publish.

FYI @ajbozarth — this is a candidate to publish soon; flagging early in
case you want to weigh in on framing or timing.

planetf1 added 4 commits June 1, 2026 11:45
Introduces Granite Switch as a delivery mechanism for Mellea intrinsics
— single vLLM-served checkpoint, no adapter-weight lifecycle management.
Shows answerability checking and hallucination detection end-to-end.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Hub-and-spoke SVG showing one Granite Switch checkpoint serving five
intrinsic capabilities (answerability, hallucination, citations,
req. check, guardian), colour-coded by library family (RAG/Core/Safety).

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Flags vLLM/Linux as the validated path and notes omlx/vmlx as
unvalidated macOS alternatives pending end-to-end smoke-test.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Add granite-switch plugin package install (registers
  GraniteSwitchForCausalLM architecture with vLLM; without it vLLM
  crashes immediately on model load)
- Add required vLLM flags: --enable-auto-tool-choice and
  --tool-call-parser granite4 (without the latter, adapter dispatch
  silently fails)
- Replace "What this post does" callout with prerequisites callout
- Fix faithfulness output values: faithful/unfaithful (lowercase) —
  confirmed against live ibm-granite/granite-switch-4.1-3b-preview
- Fix pip install quoting: pip install 'mellea[switch]'
- Remove "(OpenAI-compatible)" qualifier from table Runtime cell
- Strengthen hook: matrix framing; "zero adapter files to manage"
- Strengthen trade-off section: eliminate lifecycle management entirely
- Strengthen operational cost section: N×M matrix argument; adding a
  new intrinsic is a code change, not an infrastructure change
- Strengthen vision section: "just a request parameter", clear path to
  broad runtime support, same code path in prod and on a laptop

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

@planetf1 planetf1 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and content review — all runnable snippets validated against a live ibm-granite/granite-switch-4.1-3b-preview vLLM instance.

What's landing well: the intrinsics-vs-Switch distinction in the second section is sharp and exactly right. The demo flows cleanly from setup → answerability → hallucination detection on the same backend object — that's the core "same call, different intrinsic" point in action.

planetf1 added 7 commits June 1, 2026 13:38
- Callout now flags the server-side plugin as a prerequisite
- Setup section distinguishes server environment (granite-switch plugin)
  from application environment (mellea[switch])
- Inline comment explains api_key="EMPTY" for readers unfamiliar with
  vLLM's API key handling

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- "twelve-plus intrinsics" → "a dozen" (exact count from model_ids.py
  is 12; future-proofed phrasing as the curated set evolves)
- Adapter config download description now mentions JSON + YAML, matching
  the later paragraph that names adapter_index.json and per-adapter
  io.yaml
- Linked granitelib-{rag,core,guardian} HF repos at first mention
- Linked the granite-switch plugin package on PyPI at first mention
- Try-it section now points at the top-level Mellea repo and the
  intrinsics overview docs in addition to the OpenAI-backend page

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The earlier draft singled out two specific Apple Silicon MLX servers
(omlx, vmlx) as candidates. The investigation is broader than that —
the open question is "what's the macOS story for Switch given Ollama
doesn't support it", not a comparison of named MLX runtimes. Reword
the reviewer note to match that framing.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The earlier draft led with "managing adapter weights is painful" — but
that hook only resonates with readers already running intrinsics on
PEFT. Most readers come in cold and want to know what they get.

Reframed the post around the value users actually see: dropping
validation and requirement checks into a Mellea program is one
function call. The hook now shows the trio of calls
(check_answerability, flag_hallucinated_content, requirement_check)
and the rest of the post follows: what Switch is, how to run it, the
demo, and where it fits.

Cuts:
- "What it costs to ship" — its core point ("a code change, not an
  infrastructure change") moved into the hook; the matrix-of-binaries
  detail was mellea-internal noise
- "Where this is going" — issue/epic references and roadmap detail
  read as project status and made the post feel WIP-heavy
- "When to reach for Switch vs PEFT" comparison table — replaced with
  a short "When this fits" closer that links to the intrinsics
  overview docs for readers who want the full picture
- The intrinsics-vs-Switch terminology section is now a single tight
  "How it works" paragraph

The reader still gets enough to run it end-to-end; everything beyond
that lives behind the docs links in "Try it".

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Previous version said the validators are "baked into the model weights"
but stopped there. The genuinely distinctive thing about Switch — the
reason it can be drop-in where alternatives can't — is that the
*routing* between validators is part of the model itself, not an
orchestration layer wrapped around it. Spelled that out by contrasting
with the two mechanisms a reader is most likely to know (LoRA hot-swap
and LLM-as-judge), in one short paragraph each.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The previous single reviewer note covered macOS only. Cross-checking
against the granite-switch repo README, the HF model card, and the
mellea OpenAI backend doc surfaced two more environment questions
that need verification before merge:

1. Whether to lead with `granite-switch[vllm20]` (CUDA 13+) or
   `granite-switch[vllm]` (broader CUDA 12.x compat). Upstream prefers
   `[vllm]`; we've only validated `[vllm20]` on LSF.
2. Whether `--enable-auto-tool-choice --tool-call-parser granite4` is
   actually required. Upstream sources all omit it, but internal
   testing on bvllm needed it to make intrinsics dispatch.

Each callout now sits at the install/serve line it relates to, so a
reviewer reading the post can resolve them in place. The macOS note
keeps its existing position.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Retitle to "one checkpoint, every intrinsic" — drops "adapter
wrangling" framing the rewritten hook no longer leans on, and matches
the SVG tagline. Removes the trailing "the dispatch happens inside
OpenAIBackend" aside which is mellea-internal detail readers don't
need.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, only one note outside the items you've already called out, and making sure to update the date to the actual expected date when you're ready to schedule.

Comment thread content/blogs/granite-switch.md Outdated
@planetf1

planetf1 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Terminology / timing note

The post uses intrinsic and the current class/import names (Intrinsic, from mellea.stdlib.components.intrinsic import ...). generative-computing/mellea#929 (Epic: Fix Intrinsic Adapter Lifecycle & Consistency) will change these class names and import paths, but it's a multi-PR epic with no firm date.

Plan: build the example against the current code rather than waiting on the rename. For this pass we'll just get the positioning terminology right — lead with "adapter function" (the term already shipped in the public Granite Switch material), and define "intrinsic" as the earlier name. The example's class names stay as-is for now; we'll update the blog to the new symbols once the epic lands (the deprecation plan keeps the current code runnable in the meantime).

Separately, one dependency on the epic side: the epic's own naming work still needs to realign with the shipped Granite Switch vocabulary. Its current target (AdapterBasedComponent, treated as a placeholder pending an IBM decision) predates "adapter function" being public.

@planetf1

planetf1 commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Blog updated: terminology + background

Reader-facing prose now leads with "adapter function" to match the public Granite Switch vocabulary; "intrinsic" is defined once as the earlier name. Code symbols are kept as the current code (mellea.stdlib.components.intrinsic, the load_embedded_adapters flag, the function names) — see the timing note above; we'll update those once the rename lands.

Also added:

  • A definition bridge for adapter function (task-specific capability with a defined I/O contract) and the intrinsic→adapter-function naming note.
  • Background on activated LoRA (aLoRA) — shared KV cache, control-token activation — and the Granite Libraries (Core / RAG / Guardian) mapping to the core/rag/guardian modules.
  • Links to the granite-switch source repo (plugin source + Try it section).

Open items parked for the next pass (previously inline reviewer notes, removed from the draft so they don't ship):

  1. [vllm20] vs [vllm] install extra. Upstream granite-switch README recommends [vllm] as the broad-compat default and [vllm20] for newer-CUDA performance. [vllm20] is validated end-to-end in our internal environment; the [vllm] path hasn't been re-confirmed. Decide which to lead with before merge — and consider showing only one to keep the install simple.
  2. --enable-auto-tool-choice --tool-call-parser granite4 flags. Added after internal testing where the adapter functions didn't dispatch without them. The upstream granite-switch README, the HF model card, and the Mellea OpenAI integration docs all omit the flags. Re-test against a vanilla vllm serve <model> before publish; if dispatch works without them, drop them to match upstream and simplify the snippet.
  3. macOS path. vLLM on Linux is the validated path. Switch doesn't run under Ollama, so a macOS option is still being investigated — nothing confirmed yet. Expand the setup section if a macOS path lands before merge.

@planetf1

planetf1 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

FYI @ajbozarth — publish date updated to 2026-06-11 (this Wednesday) as you flagged.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing as we move this out of draft with the latest updates. Three reviewer notes from the description are resolved cleanly, and the macOS path is in good shape end-to-end. Manual walkthrough surfaced two structural issues with the Linux-first ordering and a couple of cosmetic notes.

Comment thread content/blogs/granite-switch.md Outdated
Comment thread content/blogs/granite-switch.md Outdated
Comment thread content/blogs/granite-switch.md Outdated
Comment thread content/blogs/granite-switch.md Outdated
Comment thread content/blogs/granite-switch.md Outdated
@ajbozarth

Copy link
Copy Markdown
Contributor

@planetf1 in addition to the review above don't forget to move this out of draft, also:

publish date updated to 2026-06-11 (this Wednesday)

the 11th is fine, but that's Thursday not Wednesday

@planetf1 planetf1 marked this pull request as ready for review June 9, 2026 08:53
@planetf1 planetf1 requested a review from a team as a code owner June 9, 2026 08:53
@planetf1 planetf1 requested review from ajbozarth and serjikibm June 9, 2026 08:53
@ajbozarth

Copy link
Copy Markdown
Contributor

FYI the recent commits from yesterday and today do not follow our contributing rules: they are not signed off and Claude is listed as a co-author instead of assisted-by.

This is causing DCO failures.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you fix the DCO issue I described above then this LGTM, if you'd like to update the date and publish early today (2026-06-09) or tomorrow (2026-06-10) feel free, otherwise just remember to merge this Thursday

planetf1 and others added 4 commits June 9, 2026 18:33
…search blog

- Add LocalHFBackend macOS path (pip install mellea[hf], no server
  needed; MPS auto-selected on Apple Silicon)
- Resolve reviewer note 1: show both [vllm] and [vllm20] with their
  version ranges stated explicitly
- Resolve reviewer note 2: drop --enable-auto-tool-choice and
  --tool-call-parser granite4 — not needed, not in upstream docs,
  confirmed against live deployment
- Add aLoRA KV cache advantage paragraph (aligns with IBM Research blog)
- Add 51% → 84% IFEval accuracy number for requirement-check
- Bridge adapter functions / intrinsics terminology for readers coming
  from the IBM Research blog
- Add experimental qualifier in When this fits section
- Update Try it and opening callout for both paths
- Verified: macOS (LocalHFBackend + MPS) and vLLM (upstream 0.20.2
  on BlueVela) both produce correct intrinsic outputs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Remove --enable-auto-tool-choice --tool-call-parser granite4 from
  the Try it install line — contradicted the setup section which
  correctly omits them; caught in Opus review
- Add "16 GB unified memory recommended" to macOS setup note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…tes and link

Per ajbozarth review:
- Reorder setup section: macOS first, vLLM as production/graduate-to path
- Make both backend code blocks self-contained with full imports — macOS
  readers no longer hit NameError from imports only present in the vLLM block
- Fix quote style: 'mellea[switch]' → "mellea[switch]" throughout
- Link 51%→84% IFEval accuracy number to IBM Research blog source
- Update opening callout to match new macOS-first order

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 force-pushed the blog/granite-switch branch from ae831dc to ff44c82 Compare June 9, 2026 17:45
@planetf1 planetf1 enabled auto-merge June 9, 2026 17:47
@planetf1 planetf1 disabled auto-merge June 9, 2026 17:48

@psschwei psschwei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to resolve the switch-on-macos question below publishing

Comment thread content/blogs/granite-switch.md Outdated
Comment thread content/blogs/granite-switch.md Outdated
@planetf1 planetf1 requested a review from ajbozarth June 9, 2026 18:28
@planetf1 planetf1 marked this pull request as draft June 9, 2026 18:28
Comment thread content/blogs/granite-switch.md
Replace all prose uses of 'intrinsic'/'intrinsics' with 'adapter function'/
'adapter functions' in the Granite Switch blog post. Python import paths
(mellea.stdlib.components.intrinsic) and external URL slugs are unchanged.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@ajbozarth ajbozarth mentioned this pull request Jun 10, 2026
4 tasks
Granite Switch uses a custom GraniteSwitchForCausalLM architecture
only vLLM supports. The LocalHFBackend path was using the base model,
not the Switch checkpoint, so it did not belong in this blog.

- Remove macOS/HF setup section and macOS backend code block
- Collapse to a single vLLM setup path with a note that the client
  runs on macOS or Linux
- Fix pip install quote style to double quotes throughout

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 marked this pull request as ready for review June 11, 2026 07:16
@planetf1

Copy link
Copy Markdown
Collaborator Author

@psschwei — macOS section removed, all threads addressed. Ready for a re-review when you get a chance.

planetf1 added 2 commits June 11, 2026 08:42
- Restore 'Want to try it first?' Colab callout lost during branch rebase
- Fix 'intrinsics overview' link text → 'adapter functions overview'
- Add one-line intro to 'Running answerability' section

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
- Clarify snippet repo context in callout ("the mellea repo at...")
- Replace {rag,core,guardian} brace shorthand with plain prose
- Replace "I/O configuration files" with "adapter metadata files"
- Rewrite "When this fits" to focus on when to reach for Switch,
  not hardware requirements (already covered in Setup)

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Comment thread public/images/granite-switch/main.svg
@planetf1 planetf1 marked this pull request as draft June 11, 2026 17:20
@planetf1

Copy link
Copy Markdown
Collaborator Author

Recommend holding this blog post. The tutorials we link to are not working correctly. Will review & come back when more is known.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment other wise the blog LGTM.

Blocking for now until generative-computing/granite-switch#92 is addressed, otherwise we could just remove the collab callout and publish, but I'd recommend against that

Comment thread content/blogs/granite-switch.md Outdated
@psschwei

Copy link
Copy Markdown
Member

I don't know if the tutorials not working on colab needs to be a blocker here. The Research blog went out last week to a much larger audience, and presumably they were broken then too. Worst case scenario could just link to the tutorials on github rather than through colab.

@ajbozarth

Copy link
Copy Markdown
Contributor

I don't know if the tutorials not working on colab needs to be a blocker here. The Research blog went out last week to a much larger audience, and presumably they were broken then too. Worst case scenario could just link to the tutorials on github rather than through colab.

In this case I'd say ignore my point two above and just fix point one and we can publish

Remove the Colab callout (tutorials broken per granite-switch#92 — no
longer blocking per reviewer consensus). Removes the MD028 lint error
as a side effect. Update graphic to drop model size variants from the
centre bubble. Set publish date to 2026-06-12.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1

Copy link
Copy Markdown
Collaborator Author

After all the changes I’m going to have a final walkthrough & test the code is still working. Once done (later) I’ll move out of draft and we will hopefully be good to go.

planetf1 added 2 commits June 12, 2026 13:35
…ples

Rename ctx → context in the answerability snippet to match the
hallucination detection block. Replace inline # → comments with a
proper Output: text block, consistent with the rest of the post.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
…phic

Move both nodes up and outward so they no longer overlap the central
hub bubble, making the connecting lines clearly visible.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands now the blog LGTM to publish, but I have not tested the code blocks myself.

@planetf1 planetf1 marked this pull request as ready for review June 12, 2026 15:31
@planetf1 planetf1 requested a review from psschwei June 12, 2026 15:31
@planetf1

Copy link
Copy Markdown
Collaborator Author

@psschwei requesting re-review

@psschwei psschwei left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, though neither are blocking.
It might also be worth running dellmify on this prior to publishing.

Comment on lines +73 to +76
Granite Switch requires vLLM for inference — it uses a custom
`GraniteSwitchForCausalLM` architecture that only vLLM supports, via the
[`granite-switch`](https://pypi.org/project/granite-switch/) plugin. Without it,
vLLM refuses to load the model with

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there are plans to add to other places, so maybe soften a bit, something like

Suggested change
Granite Switch requires vLLM for inference — it uses a custom
`GraniteSwitchForCausalLM` architecture that only vLLM supports, via the
[`granite-switch`](https://pypi.org/project/granite-switch/) plugin. Without it,
vLLM refuses to load the model with
As of today, Granite Switch requires vLLM for inference — it uses a custom
`GraniteSwitchForCausalLM` architecture that currently only vLLM supports, via the
[`granite-switch`](https://pypi.org/project/granite-switch/) plugin (though support for additional runtimes is coming). Without it,
vLLM refuses to load the model with

Comment on lines +80 to +86
The plugin currently supports vLLM 0.19.x and 0.20.x. Install it in your
**vLLM server environment** using the extra that matches your vLLM version:

```bash
pip install "granite-switch[vllm20]" # vLLM 0.20.x
pip install "granite-switch[vllm]" # vLLM 0.19.x
```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we just go happy path and only cover the v0.19 install?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blog post: Granite Switch in Mellea

3 participants