Skip to content

triggerdotdev/staff-engineering-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

staff-engineering-skills

A collection of hard-won systems engineering knowledge -- the traps you walk into when you don't think about cardinality, idempotency, cache invalidation, and the other concepts that only matter at scale. Designed for Claude Code, Cursor, and any agent that writes infrastructure code on your behalf.

Install

npx skills add triggerdotdev/staff-engineering-skills

What This Is

AI coding agents write code that works on day one and breaks on day ninety. They generate solutions that pass tests with small datasets, handle the happy path gracefully, and completely fall apart under real production conditions.

These skills inject systems engineering knowledge directly into the agent's context at the moment it matters -- when it's making an architectural decision, writing a database query, designing a queue consumer, or choosing how to cache something.

Each skill activates contextually based on what the agent is doing. No slash commands, no manual invocation. The agent writes code that touches a cache, and the cache invalidation skill is there. The agent adds retry logic, and the retry storms skill is there.

Each skill covers:

  • Detection Heuristics -- patterns to watch for in code ("if you see X, stop and think")
  • Correct Patterns -- what to do instead, with code examples
  • Anti-Patterns -- what not to do, with explanations of why
  • Related Traps -- links to other skills that interact with this one

Traps

# Trap One-liner
01 Cardinality You stored one thing per user. Now you have a million users.
02 Denormalization You optimized for reads. Now your writes are a nightmare.
03 Streams vs Batch You processed everything in a loop. Now you need it in real-time.
04 Object Store as Database S3 is a database now. But only if you use conditional writes.
05 Race Conditions It works every time. Except when two things happen at once.
06 Idempotency The request was sent twice. Now the customer was charged twice.
07 Sharding One database was fine. Now you need twelve and a routing layer.
08 Consistency Models You read your own write. Except you didn't.
09 Distributed System Fallacies The network is reliable. Except it isn't.
10 Cache Invalidation You cached it for performance. Now it's stale and no one knows.
11 Memory Leaks It runs fine for a day. On day seven the OOM killer visits.
12 Backpressure You can produce faster than you can consume. Now what?
13 Thundering Herd The cache expired. Ten thousand requests hit the database at once.
14 Hot Partitions You sharded perfectly. One shard is on fire.
15 Retry Storms The service is slow. Your retries made it slower.
16 Clock Skew You used timestamps for ordering. Time disagreed.

How It Works

Each skill is a SKILL.md file with YAML frontmatter containing a name and description. The description tells the agent runtime when to activate the skill -- no manual invocation needed.

For example, the race conditions skill activates when the agent writes code that "reads then writes shared state, checks a condition then acts on it, creates records that should be unique, updates counters or balances, transitions status fields, or handles webhook/queue retries."

Skills reference each other. Sharding links to hot partitions. Cache invalidation links to thundering herd. Retry storms link to idempotency. The skills form a graph, not a list.

Suggested Additions

Candidates that didn't make the initial cut:

  • Connection Pool Exhaustion -- running out of database connections under load
  • Schema Migrations at Scale -- ALTER TABLE on a billion-row table
  • Observability Cardinality -- your metrics have more unique label combinations than data points
  • Poison Pill Messages -- one bad message in a queue blocks everything behind it
  • Lease/Lock Expiry -- your distributed lock expired while you were still holding it

Structure

staff-engineering-skills/
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── .github/                                       # Issue and PR templates
└── skills/                                        # Skill files (what gets installed)
    ├── staff-engineering-skills-cardinality/
    │   └── SKILL.md
    ├── staff-engineering-skills-denormalization/
    │   └── SKILL.md
    ├── staff-engineering-skills-streams-vs-batch/
    │   └── SKILL.md
    ├── staff-engineering-skills-object-store-as-database/
    │   └── SKILL.md
    ├── staff-engineering-skills-race-conditions/
    │   └── SKILL.md
    ├── staff-engineering-skills-idempotency/
    │   └── SKILL.md
    ├── staff-engineering-skills-sharding/
    │   └── SKILL.md
    ├── staff-engineering-skills-consistency-models/
    │   └── SKILL.md
    ├── staff-engineering-skills-distributed-system-fallacies/
    │   └── SKILL.md
    ├── staff-engineering-skills-cache-invalidation/
    │   └── SKILL.md
    ├── staff-engineering-skills-memory-leaks/
    │   └── SKILL.md
    ├── staff-engineering-skills-backpressure/
    │   └── SKILL.md
    ├── staff-engineering-skills-thundering-herd/
    │   └── SKILL.md
    ├── staff-engineering-skills-hot-partitions/
    │   └── SKILL.md
    ├── staff-engineering-skills-retry-storms/
    │   └── SKILL.md
    └── staff-engineering-skills-clock-skew/
        └── SKILL.md

Design Principles

  1. Detection over prevention. Skills teach agents to recognize when they're about to walk into a trap. "If you see X, stop and think about Y."
  2. Concrete over abstract. Real code patterns, real failure modes, real numbers. No hand-waving about "consider scalability."
  3. Composable. Traps reference each other. The skills form a dependency graph that mirrors how these problems interact in production.
  4. Agent-native. Written for LLM context windows. Concise, pattern-matchable, with clear "if you see X, do Y" heuristics.
  5. Tradeoffs, not rules. Every pattern has a tradeoff callout. There are no silver bullets, only informed decisions.

About

Skills that give AI coding agents staff-engineer instincts: recognizing and avoiding production failure modes like cardinality, idempotency, and race conditions.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors