S3Mem Harness

Portable S3Mem evidence-harness plugin for long-horizon agent trajectories.

This repository packages the reusable core of S3Mem as a trajectory-to-evidence harness rather than a benchmark-specific answerer. The goal is to make S3Mem usable from external agent systems such as OpenClaw or other custom loops that already have their own planner, policy, or LLM answerer.

What This Plugin Does

S3Mem Harness converts a trajectory into structured scene-event memories and returns a compact evidence bundle for a downstream system.

Core stages:

Structured write
- Convert each trajectory step into a structured episodic memory unit.
- Parse visible objects, relations, events, actions, location, and inventory.
Anchor-sensitive retrieval
- Retrieve candidate memories with lexical and dense-style signals.
- Promote query-aligned anchor steps when structured question metadata is available.
Budget-aware evidence packing
- Preserve decisive anchor steps and local neighborhoods.
- Return a compact evidence bundle under a fixed token budget.
Downstream consumption
- The harness returns evidence.
- Your planner / answerer / agent can decide how to use it.

This design keeps the plugin portable across agent systems.

What This Plugin Is Not

This repository is not the full research evaluation codebase. It intentionally excludes benchmark-specific heuristic answerers and training scripts that were tied to Crafter, Jericho, ATM-Bench, and other paper experiments.

Instead, this plugin exposes the reusable memory interface:

trajectory ingestion
structured memory serialization
retrieval
reranking
compact evidence packing

Repository Layout

s3mem-harness/
├── .codex-plugin/plugin.json
├── INTEGRATION.md
├── examples/
│   ├── openclaw_trajectory.json
│   ├── openclaw_real_trace_excerpt.json
│   ├── openclaw_real_question.json
│   └── openclaw_integration_demo.py
│   └── question.json
├── src/s3mem_harness/
│   ├── __init__.py
│   ├── adapters.py
│   ├── cli.py
│   ├── harness.py
│   ├── retrieval.py
│   └── types.py
└── tests/
    └── test_harness.py

Supported Integration Modes

1. Generic trajectory adapter

Use this when your system can already emit normalized step dictionaries.

Expected step shape:

{
  "step_id": 3,
  "observation": {
    "text": "Moved to the hallway.",
    "location": "hallway",
    "inventory": ["brass_key"],
    "visible_objects": [{"category": "door"}],
    "relations": [{"src": "agent", "dst": "door", "relation": "near"}],
    "action": "MOVE",
    "event": {"event_type": "move", "arguments": {"target": "hallway"}}
  }
}

2. OpenClaw-style adapter

Use this when your logs look more like a typical agent runtime trace:

{
  "step": 7,
  "observation": "You are in the lab.",
  "action": "LOOK",
  "info": {
    "location": "lab",
    "inventory": ["badge"],
    "objects": ["desk"]
  }
}

The adapter normalizes common OpenClaw-like fields:

step / step_id
observation / obs / message
action
state / info
inventory
objects
relations
location

Installation

cd s3mem-harness
python -m pip install -e .[dev]

Python API

One-shot usage

from s3mem_harness import S3MemHarness

harness = S3MemHarness()
harness.ingest_trajectory(
    steps=my_trajectory,
    episode_id="episode_001",
    adapter="openclaw",   # or "generic"
)

result = harness.query(
    {
        "question": "What happened one step after obtaining the brass key?",
        "metadata": {
            "answer_type": "action_after_gain_item",
            "item": "brass_key",
            "occurrence": "first",
            "delta": 1
        }
    },
    mode="s3mem",
    top_k=24,
    token_budget=768,
)

print(result.bundle.compressed_text)
print(result.bundle.selected_steps)

Persist and reload memory

from s3mem_harness import S3MemHarness

harness = S3MemHarness()
harness.ingest_trajectory(steps, episode_id="episode_001", adapter="generic")
harness.save_jsonl("memory.jsonl")

other = S3MemHarness()
other.load_jsonl("memory.jsonl")
result = other.query("Where did the agent go after taking the key?")

CLI

One-shot

s3mem-harness one-shot \
  --trajectory examples/openclaw_trajectory.json \
  --question examples/question.json \
  --adapter openclaw \
  --mode s3mem

One-shot on a real trace excerpt

The repository also includes a real OpenClaw-compatible excerpt derived from an actual ALFWorld handcoded-expert rollout.

Files:

examples/openclaw_real_trace_excerpt.json
examples/openclaw_real_question.json

Command:

s3mem-harness one-shot \
  --trajectory examples/openclaw_real_trace_excerpt.json \
  --question examples/openclaw_real_question.json \
  --adapter openclaw \
  --mode s3mem

The real excerpt is derived from:

benchmark: ALFWorld
rollout source: real_text_expert_rollout
policy: handcoded_expert
task family: pick_two_obj_and_place

This keeps the public sample realistic without depending on the full original evaluation repository.

Full integration demo

To see a complete trajectory-ingest -> evidence-bundle -> downstream-prompt flow:

python examples/openclaw_integration_demo.py

This script demonstrates:

loading a real OpenClaw-compatible trace excerpt
ingesting it with OpenClawTrajectoryAdapter
querying through S3MemHarness
constructing a downstream prompt payload for an external LLM / planner

Build an index

s3mem-harness index \
  --trajectory examples/openclaw_trajectory.json \
  --adapter openclaw \
  --output build/memory.jsonl

Query an existing memory index

s3mem-harness query \
  --memory-jsonl build/memory.jsonl \
  --question examples/question.json \
  --mode s3mem

Retrieval Modes

The harness supports three modes:

s3mem
- structured retrieval + reranking + budget-aware evidence packing
graph_no_reader
- structured memory text without the full S3Mem evidence harness behavior
vanilla_rag
- plain summary-based retrieval

These modes are useful for integration tests and apples-to-apples harness comparisons.

Output Contract

The harness returns a QueryResult with:

bundle.compressed_text
bundle.selected_steps
bundle.retrieved_steps
bundle.support_objects
bundle.support_relations
bundle.evidence_chain

This makes it easy to wire S3Mem into downstream LLM prompts, planners, tools, or custom answerers.

For a fuller integration walkthrough, see:

INTEGRATION.md

Why This Works As a Plugin

S3Mem in the paper contains both:

a reusable memory core
benchmark-specific answer-time logic

For external systems, the reusable part is the memory core. That is what this plugin extracts and packages. The plugin is therefore a harness:

it does not force a particular planner
it does not require a specific benchmark
it does not require the original paper’s heuristic answer layer

This is the correct compatibility boundary for systems such as OpenClaw.

Current Test Status

Local tests included in this repository:

anchor-aligned retrieval on a toy trajectory
OpenClaw adapter normalization
CLI one-shot smoke test

Run:

pytest -q

The real excerpt can also be used as a public integration sample for OpenClaw-style logs.

Notes On Compatibility

Best fit:

OpenClaw-like action/observation logs
custom game agents
embodied or text agents with step-structured traces
evaluation harnesses that want compact evidence instead of raw long context

Less ideal fit:

archive-style document QA with no trajectory semantics
applications that only need generic long-context summarization

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.codex-plugin		.codex-plugin
examples		examples
src/s3mem_harness		src/s3mem_harness
tests		tests
.gitignore		.gitignore
INTEGRATION.md		INTEGRATION.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S3Mem Harness

What This Plugin Does

What This Plugin Is Not

Repository Layout

Supported Integration Modes

1. Generic trajectory adapter

2. OpenClaw-style adapter

Installation

Python API

One-shot usage

Persist and reload memory

CLI

One-shot

One-shot on a real trace excerpt

Full integration demo

Build an index

Query an existing memory index

Retrieval Modes

Output Contract

Why This Works As a Plugin

Current Test Status

Notes On Compatibility

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S3Mem Harness

What This Plugin Does

What This Plugin Is Not

Repository Layout

Supported Integration Modes

1. Generic trajectory adapter

2. OpenClaw-style adapter

Installation

Python API

One-shot usage

Persist and reload memory

CLI

One-shot

One-shot on a real trace excerpt

Full integration demo

Build an index

Query an existing memory index

Retrieval Modes

Output Contract

Why This Works As a Plugin

Current Test Status

Notes On Compatibility

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages