Skip to content

streaming cost updates#77

Open
dixitaniket wants to merge 1 commit into
mainfrom
ani/streaming-cost
Open

streaming cost updates#77
dixitaniket wants to merge 1 commit into
mainfrom
ani/streaming-cost

Conversation

@dixitaniket
Copy link
Copy Markdown
Collaborator

  • cost for streaming response

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a streaming-only billing side-channel to /v1/ohttp chunked responses so the relay can learn settled cost after the stream completes.

Changes:

  • Introduces a private billing-frame marker (OHTTP_BILLING_FRAME_MAGIC) and _build_billing_frame() to serialize SessionCost into a plaintext frame.
  • Updates the streaming response generator to parse the final SSE JSON, set inner cost context, and (optionally) inject the billing frame before emitting the final encrypted chunk.
Comments suppressed due to low confidence (1)

tee_gateway/controllers/ohttp_controller.py:297

  • response_json is parsed from b"".join(plaintext_chunks) here, and then _set_inner_stream_cost_context() in the finally block joins and parses the same chunks again. For large streams this duplicates O(n) copying/parsing and retains two full concatenations in memory. Consider refactoring so the SSE parse happens once (e.g., compute response_json once and pass it into the final context setter, or skip the _set_inner_stream_cost_context call when response_json has already been derived).
            response_json = _parse_final_sse_json(b"".join(plaintext_chunks))
            _set_inner_cost_context(
                cost_context,
                response_json=response_json,
                status_code=status,
            )
            billing_frame = _build_billing_frame(response_json)
            if billing_frame:
                yield billing_frame

            # Always emit exactly one final chunk so the AAD=b"final"
            # marker is present — that's what protects clients from
            # undetected truncation.
            yield encrypter.encrypt_chunk(pending or b"", is_final=True)
        finally:
            _set_inner_stream_cost_context(
                cost_context, plaintext_chunks, status_code=status
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
billing_frame = _build_billing_frame(response_json)
if billing_frame:
yield billing_frame
Comment on lines +286 to +288
billing_frame = _build_billing_frame(response_json)
if billing_frame:
yield billing_frame
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants