Tool poisoning
Hidden "ignore previous instructions," covert directives, and smuggled tags that hijack the agent from inside a skill's text.
Connect your open-source agent to one endpoint, and it gains a curated, cryptographically-signed, sandboxed set of skills — without poisoning it.
py -m warden serve · pure standard library · zero dependencies ·
nothing leaves your box
We don't own directory size. We own trust + curation. Directory figures (mcp.so ~20k, Glama 6k+) are cited from the project plan's sources, not independently verified.
Real CLI output, ~15 seconds — a curated skill passes, a poisoned one is rejected, a tampered one won't run. No mock-ups.
The one finding the whole project turns on: verification of identity is not verification of behavior. A "verified author" badge can still turn malicious on its next update.
Hidden "ignore previous instructions," covert directives, and smuggled tags that hijack the agent from inside a skill's text.
Ship a benign skill, earn trust, then quietly swap in malice on a later version. Identity stays "verified" the whole time.
Read your environment or credentials and ship them out — often in the same breath, behind an innocent-looking task.
A manifest that declares "no network" wrapped around a skill that actually phones home. Warden reconciles the claim against the content.
Built to the OWASP Agentic Skills Top 10. Not a silver bullet — defense in depth, so one failure is contained and visible rather than silent.
You connect to a hash, not a name. Ed25519-signed (real RFC 8032). Change one byte and verification fails — no rug-pull.
Tool-poisoning, unsafe-exec, SSRF, secret-exfil, obfuscation, and capability drift — caught at the door.
Each skill declares exactly what it may touch. "No network" means it cannot phone home. Anything undeclared is denied.
Skills run inside a declared profile, never the agent's process. A poisoned skill is contained.
Per-version and time-aware — re-publishing re-evaluates. A signed skill can lose trust. Not a static badge.
Append-only, hash-linked, Merkle-rooted. Every publish and yank is permanent and auditable. Nothing changes silently.
Python 3.8+ (on Windows use the py launcher). No pip install — pure standard library, zero dependencies.
git clone https://github.com/chadcorp/warden && cd warden
py -m warden keygen # your curator key (root of trust)
py -m warden sign-all # scan + sign + log every skill
py -m warden verify-all # cold-verify: hash, sig, scan, score, log
// claude_desktop_config.json — the one config line
{
"mcpServers": {
"warden": {
"command": "py",
"args": ["-m", "warden", "serve"],
"cwd": "/path/to/warden"
}
}
}
…or drive it yourself with py examples/mcp_client_smoke.py, and sanity-check the whole stack with py -m warden selftest → 75/75. Full quickstart on GitHub →
Real output from the reference scanner and verifier — not mock-ups. Toggle the scanner between a curated and a poisoned skill, then tamper the bundle and watch the rug-pull get caught.
research-brain/idea-scoutsha256:208b0208cd3c58a9…sha256:208b0208cd3c58a9…Five curated skills, three packs. No vanity scores — the badge tells the truth about
each one. secret-sentinel is a C on purpose.
research-brain/idea-scoutFind and score a net-new idea before building — evidence gates and a mandatory pre-mortem.
no network · no filesystem · no shell · no secretsresearch-brain/fact-gateVerify every factual, legal, or financial claim against a dated primary source before it ships.
no network · no filesystem · no shell · no secretsbuild-brain/build-productTurn a validated idea into a complete, verified, shippable product. Still accruing clean observation.
no network · no filesystem · no shell · no secretsbuild-brain/ship-gateAn independent GO/NO-GO release gate that trusts no build claim and blocks on unwaivable conditions.
no network · no filesystem · no shell · no secretscompliance-brain/secret-sentinelA security-review skill that must name attack indicators — so it ships a signed, logged scanner waiver, and the score reflects the honest cost.
no network · no filesystem · no shell · no secretsThe durable moat is curation + the behavioral trust score + local-first — not signing alone, which becomes table stakes if it lands in the official registry. Eyes open.
Trust is a signal, not a guarantee. We never claim "100% safe."
The core is free, forever — no install wall, no account. Join the list and you'll get the launch, the build log, and one short write-up per attack class — tool-poisoning, capability drift, rug-pulls — as the supply-chain hits keep landing. A handful of emails, no spam, unsubscribe in one click.
Prefer to watch the code? Star it on GitHub — stars are the other half of our launch gate. Want the deep dive today? Read the Field Guide ↗