Skip to content

SpectrAI-Initiative/S3Mem

Repository files navigation

S3Mem Harness

Portable S3Mem evidence-harness plugin for long-horizon agent trajectories.

This repository packages the reusable core of S3Mem as a trajectory-to-evidence harness rather than a benchmark-specific answerer. The goal is to make S3Mem usable from external agent systems such as OpenClaw or other custom loops that already have their own planner, policy, or LLM answerer.

What This Plugin Does

S3Mem Harness converts a trajectory into structured scene-event memories and returns a compact evidence bundle for a downstream system.

Core stages:

  1. Structured write

    • Convert each trajectory step into a structured episodic memory unit.
    • Parse visible objects, relations, events, actions, location, and inventory.
  2. Anchor-sensitive retrieval

    • Retrieve candidate memories with lexical and dense-style signals.
    • Promote query-aligned anchor steps when structured question metadata is available.
  3. Budget-aware evidence packing

    • Preserve decisive anchor steps and local neighborhoods.
    • Return a compact evidence bundle under a fixed token budget.
  4. Downstream consumption

    • The harness returns evidence.
    • Your planner / answerer / agent can decide how to use it.

This design keeps the plugin portable across agent systems.

What This Plugin Is Not

This repository is not the full research evaluation codebase. It intentionally excludes benchmark-specific heuristic answerers and training scripts that were tied to Crafter, Jericho, ATM-Bench, and other paper experiments.

Instead, this plugin exposes the reusable memory interface:

  • trajectory ingestion
  • structured memory serialization
  • retrieval
  • reranking
  • compact evidence packing

Repository Layout

s3mem-harness/
├── .codex-plugin/plugin.json
├── INTEGRATION.md
├── examples/
│   ├── openclaw_trajectory.json
│   ├── openclaw_real_trace_excerpt.json
│   ├── openclaw_real_question.json
│   └── openclaw_integration_demo.py
│   └── question.json
├── src/s3mem_harness/
│   ├── __init__.py
│   ├── adapters.py
│   ├── cli.py
│   ├── harness.py
│   ├── retrieval.py
│   └── types.py
└── tests/
    └── test_harness.py

Supported Integration Modes

1. Generic trajectory adapter

Use this when your system can already emit normalized step dictionaries.

Expected step shape:

{
  "step_id": 3,
  "observation": {
    "text": "Moved to the hallway.",
    "location": "hallway",
    "inventory": ["brass_key"],
    "visible_objects": [{"category": "door"}],
    "relations": [{"src": "agent", "dst": "door", "relation": "near"}],
    "action": "MOVE",
    "event": {"event_type": "move", "arguments": {"target": "hallway"}}
  }
}

2. OpenClaw-style adapter

Use this when your logs look more like a typical agent runtime trace:

{
  "step": 7,
  "observation": "You are in the lab.",
  "action": "LOOK",
  "info": {
    "location": "lab",
    "inventory": ["badge"],
    "objects": ["desk"]
  }
}

The adapter normalizes common OpenClaw-like fields:

  • step / step_id
  • observation / obs / message
  • action
  • state / info
  • inventory
  • objects
  • relations
  • location

Installation

cd s3mem-harness
python -m pip install -e .[dev]

Python API

One-shot usage

from s3mem_harness import S3MemHarness

harness = S3MemHarness()
harness.ingest_trajectory(
    steps=my_trajectory,
    episode_id="episode_001",
    adapter="openclaw",   # or "generic"
)

result = harness.query(
    {
        "question": "What happened one step after obtaining the brass key?",
        "metadata": {
            "answer_type": "action_after_gain_item",
            "item": "brass_key",
            "occurrence": "first",
            "delta": 1
        }
    },
    mode="s3mem",
    top_k=24,
    token_budget=768,
)

print(result.bundle.compressed_text)
print(result.bundle.selected_steps)

Persist and reload memory

from s3mem_harness import S3MemHarness

harness = S3MemHarness()
harness.ingest_trajectory(steps, episode_id="episode_001", adapter="generic")
harness.save_jsonl("memory.jsonl")

other = S3MemHarness()
other.load_jsonl("memory.jsonl")
result = other.query("Where did the agent go after taking the key?")

CLI

One-shot

s3mem-harness one-shot \
  --trajectory examples/openclaw_trajectory.json \
  --question examples/question.json \
  --adapter openclaw \
  --mode s3mem

One-shot on a real trace excerpt

The repository also includes a real OpenClaw-compatible excerpt derived from an actual ALFWorld handcoded-expert rollout.

Files:

  • examples/openclaw_real_trace_excerpt.json
  • examples/openclaw_real_question.json

Command:

s3mem-harness one-shot \
  --trajectory examples/openclaw_real_trace_excerpt.json \
  --question examples/openclaw_real_question.json \
  --adapter openclaw \
  --mode s3mem

The real excerpt is derived from:

  • benchmark: ALFWorld
  • rollout source: real_text_expert_rollout
  • policy: handcoded_expert
  • task family: pick_two_obj_and_place

This keeps the public sample realistic without depending on the full original evaluation repository.

Full integration demo

To see a complete trajectory-ingest -> evidence-bundle -> downstream-prompt flow:

python examples/openclaw_integration_demo.py

This script demonstrates:

  • loading a real OpenClaw-compatible trace excerpt
  • ingesting it with OpenClawTrajectoryAdapter
  • querying through S3MemHarness
  • constructing a downstream prompt payload for an external LLM / planner

Build an index

s3mem-harness index \
  --trajectory examples/openclaw_trajectory.json \
  --adapter openclaw \
  --output build/memory.jsonl

Query an existing memory index

s3mem-harness query \
  --memory-jsonl build/memory.jsonl \
  --question examples/question.json \
  --mode s3mem

Retrieval Modes

The harness supports three modes:

  • s3mem
    • structured retrieval + reranking + budget-aware evidence packing
  • graph_no_reader
    • structured memory text without the full S3Mem evidence harness behavior
  • vanilla_rag
    • plain summary-based retrieval

These modes are useful for integration tests and apples-to-apples harness comparisons.

Output Contract

The harness returns a QueryResult with:

  • bundle.compressed_text
  • bundle.selected_steps
  • bundle.retrieved_steps
  • bundle.support_objects
  • bundle.support_relations
  • bundle.evidence_chain

This makes it easy to wire S3Mem into downstream LLM prompts, planners, tools, or custom answerers.

For a fuller integration walkthrough, see:

  • INTEGRATION.md

Why This Works As a Plugin

S3Mem in the paper contains both:

  • a reusable memory core
  • benchmark-specific answer-time logic

For external systems, the reusable part is the memory core. That is what this plugin extracts and packages. The plugin is therefore a harness:

  • it does not force a particular planner
  • it does not require a specific benchmark
  • it does not require the original paper’s heuristic answer layer

This is the correct compatibility boundary for systems such as OpenClaw.

Current Test Status

Local tests included in this repository:

  • anchor-aligned retrieval on a toy trajectory
  • OpenClaw adapter normalization
  • CLI one-shot smoke test

Run:

pytest -q

The real excerpt can also be used as a public integration sample for OpenClaw-style logs.

Notes On Compatibility

Best fit:

  • OpenClaw-like action/observation logs
  • custom game agents
  • embodied or text agents with step-structured traces
  • evaluation harnesses that want compact evidence instead of raw long context

Less ideal fit:

  • archive-style document QA with no trajectory semantics
  • applications that only need generic long-context summarization

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages