Summary
ADK has a first-class ArtifactService (InMemoryArtifactService, GcsArtifactService) that stores files keyed by filename, scoped to a session, with optional GCS persistence. This is conceptually equivalent to OpenAI's Files store.
However, neither AgentEngineCodeExecutor nor BuiltInCodeExecutor actually consume artifacts as inputs. To use an artifact's bytes inside the executor, the developer has to write a callback that calls tool_context.load_artifact(), extracts Part.inline_data, and re-injects the bytes into the executor's input on every fresh sandbox. This is the same network cost as not using the artifact service at all.
The result is that ADK has the right storage abstraction but it doesn't plumb through to where code actually runs. The existing bug adk-docs#368 describes exactly this gap: an uploaded CSV saved as an artifact is invisible to BuiltInCodeExecutor.
Comparison: OpenAI Responses API
OpenAI gets this right with a clean split between a persistent file store and ephemeral containers:
file = client.files.create(file=open("sales_data.csv","rb"), purpose="assistants")
response = client.responses.create(
model="gpt-4.1",
tools=[{
"type": "code_interpreter",
"container": {"type": "auto", "file_ids": [file.id]},
}],
input="Summarize sales by region",
)
The file lives in OpenAI's Files store independently of any container's lifetime and is auto-injected into the container's filesystem on every invocation. The client never re-uploads bytes. Across many container restarts, the upload cost is paid exactly once.
Proposed API
Let ADK code executors accept artifact filenames as a first-class input, and have the executor handle the load + inject step internally:
agent = LlmAgent(
name="data_analyst",
model="gemini-2.0-flash",
code_executor=AgentEngineCodeExecutor(
agent_engine_resource_name=AGENT_ENGINE,
# NEW: artifacts to mount into the sandbox on every execution
artifact_refs=["sales_data.csv", "user:reference_lookup.parquet"],
),
instruction="Use sales_data.csv to answer questions...",
)
Behavior:
- On the first
execute_code call for a given sandbox, the executor calls artifact_service.load_artifact() for each listed artifact, takes its bytes, and passes them in input_data["files"] to the sandbox.
- Because the sandbox is persistent for its TTL, subsequent calls in the same session find the files already present on disk — no re-load.
- When a new sandbox is created (TTL expiry, eviction, new session), the executor transparently re-injects from the artifact service. From the developer's perspective, the file "just exists" at a known path across the entire agent lifecycle.
This is internal-API-only — no new storage primitive, no new resource type. Just bridging two existing ADK components that should already be talking to each other.
Extensions worth considering
- Allow
artifact_refs to accept user: and app: prefixed names so cross-session reference data works the same way.
- Allow
gs:// URIs for cases where the data is in GCS but hasn't been registered as an artifact yet.
- Auto-detect: parse the agent's
instruction and any callback-injected context for artifact filename mentions and auto-include them, similar to how the model already cites artifacts by name in output.
- Mirror the OpenAI ergonomic of also accepting these in the raw
execute_code call, so non-ADK users of agent_engines.sandboxes.execute_code get the same benefit. This part would live in googleapis/python-genai.
Why this matters
- Closes the gap flagged in adk-docs#368 — uploaded data files becoming unusable by the executor.
- Reuses what ADK already has (
ArtifactService + AgentEngineCodeExecutor) instead of asking developers to glue them with callbacks.
- Achieves the OpenAI
file_ids ergonomic — upload once, reference by name forever — without introducing a new storage primitive.
- Removes per-sandbox re-upload cost for the common case of agents that operate on large user-uploaded files across many turns.
References
Summary
ADK has a first-class
ArtifactService(InMemoryArtifactService,GcsArtifactService) that stores files keyed by filename, scoped to a session, with optional GCS persistence. This is conceptually equivalent to OpenAI's Files store.However, neither
AgentEngineCodeExecutornorBuiltInCodeExecutoractually consume artifacts as inputs. To use an artifact's bytes inside the executor, the developer has to write a callback that callstool_context.load_artifact(), extractsPart.inline_data, and re-injects the bytes into the executor's input on every fresh sandbox. This is the same network cost as not using the artifact service at all.The result is that ADK has the right storage abstraction but it doesn't plumb through to where code actually runs. The existing bug adk-docs#368 describes exactly this gap: an uploaded CSV saved as an artifact is invisible to
BuiltInCodeExecutor.Comparison: OpenAI Responses API
OpenAI gets this right with a clean split between a persistent file store and ephemeral containers:
The file lives in OpenAI's Files store independently of any container's lifetime and is auto-injected into the container's filesystem on every invocation. The client never re-uploads bytes. Across many container restarts, the upload cost is paid exactly once.
Proposed API
Let ADK code executors accept artifact filenames as a first-class input, and have the executor handle the load + inject step internally:
Behavior:
execute_codecall for a given sandbox, the executor callsartifact_service.load_artifact()for each listed artifact, takes its bytes, and passes them ininput_data["files"]to the sandbox.This is internal-API-only — no new storage primitive, no new resource type. Just bridging two existing ADK components that should already be talking to each other.
Extensions worth considering
artifact_refsto acceptuser:andapp:prefixed names so cross-session reference data works the same way.gs://URIs for cases where the data is in GCS but hasn't been registered as an artifact yet.instructionand any callback-injected context for artifact filename mentions and auto-include them, similar to how the model already cites artifacts by name in output.execute_codecall, so non-ADK users ofagent_engines.sandboxes.execute_codeget the same benefit. This part would live ingoogleapis/python-genai.Why this matters
ArtifactService+AgentEngineCodeExecutor) instead of asking developers to glue them with callbacks.file_idsergonomic — upload once, reference by name forever — without introducing a new storage primitive.References