Vision / Roadmap

🦊 YouSure

From pattern-based scanner to Credential Intelligence Platform

Date 2026-01-09
Origin Truth Manifesto + OOTW
Current Score 82.9 / 100
01
Detects secrets using embeddingsβ€”no regex maintenance
02
Verifies credentials actually workβ€”zero false positives
03
Correlates against breach databasesβ€”real-world context
04
Indexes all public codeβ€”instant breach cross-reference
05
Protects AI assistantsβ€”safety API for Copilot, Claude, etc.

The Problem We Solve 🦊

Secrets leak. We find them before attackers do.

πŸ’€
Developer Commits Secret
const API_KEY = "sk_live_abc123..."
β†’
🌐
Pushed to GitHub
git push origin main
β†’
πŸ‘οΈ
Scraped by Bots
15 seconds to detection
β†’
πŸ”₯
Account Compromised
$47K AWS bill overnight
🦊 With YouSure
πŸ”
Detect
Embeddings catch secrets regex misses
β†’
βœ…
Verify
Actually test if credentials work
β†’
πŸ—ƒοΈ
Correlate
Check against 847M breach records
β†’
πŸ›‘οΈ
Protect
Alert in seconds, not hours
847M+
Breach Records
<100ms
Detection Time
0%
False Positives
∞
Pattern Coverage

The Five Layers 🦊

A comprehensive stack evolving from pattern detection to AI-powered credential intelligence

L5 AI Code Safety API

API for AI assistants to check code before suggesting. "Don't let Copilot suggest leaked secrets"

L4 Global Index (Shadow GitHub)

Pre-index all public repos with hashes/embeddings. When breach happens β†’ instant cross-reference

L3 Breach Database Integration

Check against HIBP, breach collections. "This password appeared in 847 breaches"

L2 Active Verification

Actually try credentials in sandbox. Zero false positivesβ€”every alert is verified

L1 Embedding-Based Detection (OOTW)

Replace 58 regex patterns with learned similarity. Detects NEW formats without pattern updates

L0 Current: Pattern + Entropy + Intent

140 tests passing, 82.9/100 truth-manifesto. Production-ready foundation

L1 Embedding-Based Detection

Before (regex)
// Must add new pattern for sk_prod_
/sk_live_[a-zA-Z0-9]{24}/
/sk_test_[a-zA-Z0-9]{24}/
// Manual maintenance forever
After (embeddings)
train(["sk_live_abc123...", "sk_test_xyz..."]);
// 5 examples

detect("sk_prod_newformat...");
// Works! Never seen this format.
βœ“ No regex maintenance
βœ“ Detects novel formats
βœ“ Learns from examples
βœ“ Generalizes across services

L2 Active Verification

Metric Without Verification With Verification
Alerts 50 alerts 3 verified alerts
False Positives 96% 0%
Team Response Ignores alerts Every alert is actionable

When a secret is verified working, an AI agent explores blast radius: What permissions? What data is accessible? What's the worst case exploit?

L3 Breach Database Integration

Cross-reference against HaveIBeenPwned, breach collections, and historical leak databases.

Password: Welcome123!
β”œβ”€β”€ Found in: 847 breaches
β”œβ”€β”€ First seen: LinkedIn 2012
β”œβ”€β”€ Rank: #1,247 most common password
└── Risk: CRITICAL - rotate immediately

L4 Global Code Index

Continuously clone all public GitHub. Store hashes and embeddings (not actual secrets). When breach happens:

-- Instant query, pre-computed
SELECT repo, file, secret_hash
FROM global_index
WHERE secret_hash IN (SELECT hash FROM breach_data)
   OR embedding <-> breach_embedding < 0.1;

-- Results in milliseconds, not hours
πŸ”’ SHA-256 hashes (one-way)
πŸ”’ Embeddings (not reversible)
πŸ”’ Bloom filters (probabilistic)
πŸ”’ NO actual secrets stored

L5 AI Code Safety API

AI coding assistants trained on GitHub. GitHub contains leaked secrets. AI might suggest code WITH real leaked secrets. The solution:

POST /v1/check
{
  "code": "const AWS_KEY = 'AKIAIOSFODNN7EXAMPLE';",
  "context": "code_generation"
}

RESPONSE:
{
  "safe": false,
  "reason": "Exact match to AWS key leaked in 2023 breach",
  "suggested_fix": "const AWS_KEY = process.env.AWS_ACCESS_KEY_ID;"
}

Integration targets: GitHub Copilot β€’ Claude (via MCP) β€’ Cursor β€’ CodeWhisperer β€’ Replit β€’ CodeSandbox β€’ StackBlitz

Technical Architecture

🦊 YOUSURE
INGESTION
GitHub Firehose
INDEX
PostgreSQL + pgvector
API
REST / MCP
↓
DETECTION
Embeddings β€’ Patterns β€’ Entropy
VERIFICATION
Sandbox (E2B)
BREACH INTEL
HIBP API β€’ Collections

What Makes Us Different 🦊

🧠
Embeddings, Not Regex
Others maintain hundreds of patterns. We learn from examples and detect secrets we've never seen before.
βœ…
We Actually Verify
Others flag maybes. We spin up sandboxes and test if credentials actually work. Zero false positives.
πŸ€–
AI Safety API
No one else does this. We stop AI assistants from suggesting leaked secrets in real-time via MCP.
Feature GitGuardian TruffleHog YouSure 🦊
Pattern detection βœ“ βœ“ βœ“
Entropy detection βœ“ βœ“ βœ“
Embedding-based βœ— βœ— βœ“
Active verification Partial --verify flag Full sandbox
Breach correlation Enterprise only βœ— βœ“
Global index βœ“ βœ— βœ“
AI safety API / MCP βœ— βœ— βœ“

Business Model 🦊

🏒
Enterprise Contracts
~$100K/year
Companies pay for continuous scanning, verified alerts, and breach correlation across their entire codebase. White-glove security.
πŸ€–
AI Safety API / MCP
Per-call pricing
AI coding assistants (Copilot, Claude, Cursor) call our API before suggesting code. We check for leaked secrets, breach patterns, vulnerable code. MCP integration for Claude.
πŸ”¬
Sandbox Testing
Freemium β†’ Paid
We scan public GitHub repos in sandboxes, verify if secrets actually work, and notify repo owners. They sign up to see details and get future alerts.
πŸ—„οΈ
Continuous Monitoring
Subscription
We store findings and continuously monitor. When new breaches drop, we cross-reference and notify affected repos instantly. Protection that gets smarter over time.
The Flywheel
Scan GitHub β†’ Find secrets β†’ Notify devs β†’ They sign up β†’ Store & monitor β†’ Future alerts β†Ί

Implementation Roadmap

Phase 1

Foundation (Current)

Pattern + Entropy + Intent 140 tests passing 82.9/100 truth-manifesto Correlation boost
Phase 2

Embedding Detection

Feature extraction Train on secret corpus Benchmark vs regex Integrate as layer
Phase 3

Active Verification

E2B sandbox Service verifiers Blast radius agent Rate limiting
Phase 4

Breach Integration

HIBP API Hash database Cross-reference Real-time alerts
Phase 5

Global Index

GitHub ingestion Hash/embedding storage Query infrastructure Incremental updates
Phase 6

AI Safety API

REST API design MCP tool for Claude Copilot/Cursor partnerships Rate limiting & pricing

Success Metrics

False Positive Rate
~10% β†’ <1%
Detection Coverage
58 patterns β†’ Unlimited
Time to Breach Alert
Hours β†’ Seconds
API Response Time
N/A β†’ <100ms
Enterprise Customers
0 β†’ 10