AI guardrails powered by Resource Aware Attention
State-of-the-art accuracy. Runs on any CPU. Performance that surpasses SOTA models running on $15,000 GPU servers.
On the commodity hardware most organizations actually have, today's GenAI systems struggle to perform. This often forces teams to run guardrail systems on GPUs, turning what should be lightweight safeguards into an unexpectedly expensive part of the stack.
GPUs are fast but expensive. CPUs are affordable but unusably slow.
No amount of tuning compensates for an architecture designed for GPUs.
To truly democratise GenAI, you have to commoditise it. That requires rethinking the model architecture itself, not optimising for GPUs, but building for the hardware most organisations actually have.
Resource Aware Attention is designed from the ground up for CPUs, maximising their strengths while maintaining model-level accuracy. The result is a fundamentally more efficient way to run GenAI — without the cost and dependency of specialised infrastructure.
Because guardrails are non-negotiable in any serious GenAI deployment, they sit on the critical path of every request.
Five categories. Thirty-three variants. One deployment.
Guardrail systems are only as good as what they guard against. Sentinel ships 23 specialised models across 33 variants — covering the full spectrum of threats your GenAI deployment will face in production. Every model runs on commodity CPUs at enterprise scale. Zero GPU infrastructure required at any point.
| Category | Models | What It Catches |
|---|---|---|
| Security | 3 | Jailbreak detection, prompt injection defence, secrets & credential exposure |
| Safety | 4 | Content moderation, harmful content, suicide & self-harm detection, drug enablement |
| Toxicity | 8 | Hate speech, abuse, profanity, obscenity, insults, threats, identity attacks, impoliteness, toxic conversation patterns |
| Compliance | 6 | PII detection across 11 regions (AU, US, ES, IN, FI, IT, KR, SG, PL, UK, General), illegal activity, political content, regulated advice, bias |
| Quality | 2 | Spam detection, domain-specific QA validation |
All 23 models are available through a single unified API via the Bud Guardrail Gateway.
Low attack success rate and low false refusal rate. Every other model trades one for the other.
Every guardrail system in the market accepts one of two failure modes. High-sensitivity models like ArchGuard and Prompt Guard block 94–95% of attacks — but also reject 81–89% of legitimate users. Unusable in production. Lower-sensitivity models like ProtectAI V2 and Prompt Guard 2 keep false refusals low — but let 34–36% of attacks through. A security liability.
The industry treated this as an unavoidable trade-off. Sentinel refuses to accept it.
| Model | Attack Success Rate ↓ | False Refusal Rate ↓ | Balanced Accuracy | Rank |
|---|---|---|---|---|
| Bud Sentinel | 15.97% | 14.92% | 84.56% | #1 |
| PIGuard | 25.01% | 25.86% | 74.57% | #2 |
| Prompt Guard 2 (86M) | 34.68% | 15.30% | 75.01% | #3 |
| ProtectAI V2 | 36.35% | 24.39% | 69.63% | #4 |
| ArchGuard | 5.40% | 81.65% | 56.48% | #5 |
| Prompt Guard (86M) | 5.83% | 89.35% | 52.41% | #6 |
ASR (Attack Success Rate): percentage of attacks that bypass the guardrail — lower is better.
FRR (False Refusal Rate): percentage of legitimate requests incorrectly blocked — lower is better.
Benchmarks conducted across JailBreakBench, PIGuard, WildJailbreak, and Qualifire PI.
Every competing guardrail was tested on an NVIDIA A100. Sentinel was tested on a laptop. Sentinel won.
One binary. All 23 models. Deploy anywhere.
The Guardrail Gateway is the production serving infrastructure that delivers Sentinel's performance claims at scale. A single binary that serves all 23 models with a unified gRPC API — handling request routing, batching, model loading, health checks, and horizontal scaling automatically. No orchestration overhead. No GPU dependency. One process, enterprise-ready.
All 23 models served from one process with a unified API. No per-model infrastructure. No GPU cluster management.
Processes up to 65,536 tokens in a single pass — no chunking, no windowing, no parallel workers. Competing models cap at 512 tokens and require 16 parallel GPU workers to handle real-world prompt lengths.
Latency improves under load. At 10,000 concurrent connections, p50 is 0.70ms — better than at low concurrency. Built for enterprise traffic patterns, not benchmarking conditions.
Validated on Intel Xeon, AMD EPYC, and consumer-grade Intel Core processors. Deploy on existing cloud CPU instances, on-premise servers, or edge nodes without hardware changes.
gRPC protocol, health checks, observability hooks, and native integration with Bud SENTRY for governance and audit. Fits into any existing infrastructure stack.
Deploy on public cloud, private cloud, on-premise, air-gapped environments, or edge — with no infrastructure changes.
Accurate guardrails require data that reflects the real threat landscape. No existing public dataset covered the breadth of attacks, toxicity patterns, and adversarial perturbations that production deployments face. So we built one.
The Bud Guardrails Dataset contains over 4 million labelled rows spanning toxicity, jailbreak attacks, prompt injections, and adversarial perturbations — believed to be the largest open guardrails dataset in existence. Every Sentinel model is trained on this foundation.
Guardrails stop being a cost center. They become infrastructure.
Phones, IoT gateways, embedded systems, ARM processors. Sentinel runs at ~25ms on edge CPUs — competing models take ~2,400ms on the same hardware. For the first time, guardrails can run at the point of generation, not just in the cloud.
Guardrail every agent action, tool call, and model output — without adding a GPU to your agent infrastructure. One CPU server. 124 million classifications per day.
Government, defence, and regulated industries can now run a full guardrail stack on their existing CPU infrastructure. No GPU procurement. No external cloud dependency. No data ever leaves the perimeter.
CPU cloud instances run at approximately $0.50/hour. NVIDIA A100 instances cost $2–3/hour — and still lose to Sentinel on latency. The performance advantage compounds with the cost difference at every scale.
We're rebuilding the entire GenAI stack with Resource Aware Attention.
Not by waiting for cheaper hardware - but by building an architecture that works with what already exists.