Skip to content

Allow model_max_tokens to be set to whatever the LLM maximum is #1233

@iuliaturc

Description

@iuliaturc

Description

I'm trying to use GenerateREADME and maximize the underlying LLM's context window. But unfortunately I can't figure out easily what that magical value is, because model_max_tokens isn't the length of the final input sent to the LLM.

For instance, I'm trying to consume the entire 128k context window. And I'm doing a bunch of trials:

  1. patchwork GenerateREADME ... model_max_tokens=128_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 255511 tokens
  2. patchwork GenerateREADME ... model_max_tokens=64_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 191511 tokens
  3. patchwork GenerateREADME ... model_max_tokens=30_000 ===> Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 157511 tokens

So I need to keep guessing.

Proposed solution

Have an option to e.g. set model_max_tokens=-1, which would mean the maximum window allowed by the underlying LLM, once all the other tokens you're sending under the hood are accounted for.

Alternatives considered

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions