Introducing

Bud Sentinel

AI guardrails powered by Resource Aware Attention

State-of-the-art accuracy. Runs on any CPU. Performance that surpasses SOTA models running on $15,000 GPU servers.

CPU-Native SOTA Accuracy Zero GPU Cost
Why Sentinel?

Today's GenAI is
built for GPUs.

On the commodity hardware most organizations actually have, today's GenAI systems struggle to perform. This often forces teams to run guardrail systems on GPUs, turning what should be lightweight safeguards into an unexpectedly expensive part of the stack.

Costly
Hard to Scale
Inaccessible
GPU

GPUs are fast but expensive. CPUs are affordable but unusably slow.

CPU · 16 Workers
~5s
Latency at 8,000 input tokens
Affordable hardware - unusable speed
GPU · A100 ($15K)
~300ms
Fast enough - but the guardrail costs
as much as the language model itself
The Optimization Wall

You can optimize the runtime.
But returns diminish.

No amount of tuning compensates for an architecture designed for GPUs.

Optimization Effort → Performance Diminishing Returns
The Solution

Resource Aware Attention

To truly democratise GenAI, you have to commoditise it. That requires rethinking the model architecture itself, not optimising for GPUs, but building for the hardware most organisations actually have.

Resource Aware Attention is designed from the ground up for CPUs, maximising their strengths while maintaining model-level accuracy. The result is a fundamentally more efficient way to run GenAI — without the cost and dependency of specialised infrastructure.

First Application

We built Sentinel with
Resource Aware Attention

Because guardrails are non-negotiable in any serious GenAI deployment, they sit on the critical path of every request.

User Request
Authentication
⛨ Input Guardrail
Language Model
⛨ Output Guardrail
Response
Coverage

23 Models. Every Threat.

Five categories. Thirty-three variants. One deployment.

Guardrail systems are only as good as what they guard against. Sentinel ships 23 specialised models across 33 variants — covering the full spectrum of threats your GenAI deployment will face in production. Every model runs on commodity CPUs at enterprise scale. Zero GPU infrastructure required at any point.

Category Models What It Catches
Security 3 Jailbreak detection, prompt injection defence, secrets & credential exposure
Safety 4 Content moderation, harmful content, suicide & self-harm detection, drug enablement
Toxicity 8 Hate speech, abuse, profanity, obscenity, insults, threats, identity attacks, impoliteness, toxic conversation patterns
Compliance 6 PII detection across 11 regions (AU, US, ES, IN, FI, IT, KR, SG, PL, UK, General), illegal activity, political content, regulated advice, bias
Quality 2 Spam detection, domain-specific QA validation

All 23 models are available through a single unified API via the Bud Guardrail Gateway.

Accuracy

More Accurate Than Leading
Guardrail Systems

Low attack success rate and low false refusal rate. Every other model trades one for the other.

Every guardrail system in the market accepts one of two failure modes. High-sensitivity models like ArchGuard and Prompt Guard block 94–95% of attacks — but also reject 81–89% of legitimate users. Unusable in production. Lower-sensitivity models like ProtectAI V2 and Prompt Guard 2 keep false refusals low — but let 34–36% of attacks through. A security liability.

The industry treated this as an unavoidable trade-off. Sentinel refuses to accept it.

Benchmark Results: Overall Performance
Model Attack Success Rate False Refusal Rate Balanced Accuracy Rank
Bud Sentinel 15.97% 14.92% 84.56% #1
PIGuard 25.01% 25.86% 74.57% #2
Prompt Guard 2 (86M) 34.68% 15.30% 75.01% #3
ProtectAI V2 36.35% 24.39% 69.63% #4
ArchGuard 5.40% 81.65% 56.48% #5
Prompt Guard (86M) 5.83% 89.35% 52.41% #6
Lower is better for ASR and FRR. Higher is better for Balanced Accuracy.

ASR (Attack Success Rate): percentage of attacks that bypass the guardrail — lower is better.

FRR (False Refusal Rate): percentage of legitimate requests incorrectly blocked — lower is better.

Benchmarks conducted across JailBreakBench, PIGuard, WildJailbreak, and Qualifire PI.

Performance

Faster on any CPU
than competitors on a $15K GPU.

Every competing guardrail was tested on an NVIDIA A100. Sentinel was tested on a laptop. Sentinel won.

Sentinel on a laptop vs. everyone else on an A100
Per-classification latency · 512 tokens · Sentinel includes gRPC network overhead
Bud Sentinel
i7 Laptop · CPU
8.39ms
Prompt Guard 2
A100 · $15K GPU
18.52ms
ArchGuard
A100 · $15K GPU
19.07ms
PIGuard
A100 · $15K GPU
19.00ms
↑ 2.3x faster on a laptop CPU than competitors on a $15,000 GPU
Same CPU hardware. Completely different architecture.
Per-classification latency on Intel Xeon 8272CL · 512 tokens
Bud Sentinel
Xeon 8272CL
5.99ms
Prompt Guard 2
Xeon 8272CL
334ms
56x slower
ArchGuard
Xeon 8272CL
380ms
63x slower
Prompt Guard
Xeon 8272CL
402ms
67x slower
↑ 56-67x faster on the exact same hardware
Production-grade throughput
Sentinel handles enterprise traffic on commodity CPUs alone - no GPUs in the loop.
Xeon 8592+
256 vCPU · 8K Tokens
1,432 req/s
p50: 0.70ms
16K tokens761 req/s
65K tokens89 req/s
Xeon 8272CL
16 vCPU · 512 Tokens
2,749 req/s
p50: 25.17ms
8K tokens508 req/s
65K tokens101 req/s
EPYC 9V74
16 vCPU · 8K Tokens
57 req/s
p50: 17.60ms
16K tokens29 req/s
65K tokens6 req/s
The bottom line
Sentinel redefines what hardware you need for guardrails.
Bud Sentinel
Any laptop CPU
CPU
8.39 ms
512 tokens · incl. network overhead · no GPU
Leading Guardrails
NVIDIA A100 · $15,000
GPU
~18.9 ms
512 tokens only · max seq 512 · needs 16 parallel for 8K
Bud Sentinel
Xeon 8592+ · Server
Server
0.70 ms
8K tokens · 5,000 concurrent · 1,412 req/s
Leading Guardrails
Same CPU · No GPU
CPU
~845 ms
512 tokens only · 100x slower on identical hardware
Requires a $15K GPU to reach 18ms
Infrastructure

The Bud Guardrail Gateway

One binary. All 23 models. Deploy anywhere.

The Guardrail Gateway is the production serving infrastructure that delivers Sentinel's performance claims at scale. A single binary that serves all 23 models with a unified gRPC API — handling request routing, batching, model loading, health checks, and horizontal scaling automatically. No orchestration overhead. No GPU dependency. One process, enterprise-ready.

Single binary deployment

All 23 models served from one process with a unified API. No per-model infrastructure. No GPU cluster management.

Native long-context support

Processes up to 65,536 tokens in a single pass — no chunking, no windowing, no parallel workers. Competing models cap at 512 tokens and require 16 parallel GPU workers to handle real-world prompt lengths.

Concurrency-optimised

Latency improves under load. At 10,000 concurrent connections, p50 is 0.70ms — better than at low concurrency. Built for enterprise traffic patterns, not benchmarking conditions.

Hardware agnostic

Validated on Intel Xeon, AMD EPYC, and consumer-grade Intel Core processors. Deploy on existing cloud CPU instances, on-premise servers, or edge nodes without hardware changes.

Enterprise-ready

gRPC protocol, health checks, observability hooks, and native integration with Bud SENTRY for governance and audit. Fits into any existing infrastructure stack.

Deploy on public cloud, private cloud, on-premise, air-gapped environments, or edge — with no infrastructure changes.

The Foundation

Built on the World's Largest
Open Guardrails Dataset

Accurate guardrails require data that reflects the real threat landscape. No existing public dataset covered the breadth of attacks, toxicity patterns, and adversarial perturbations that production deployments face. So we built one.

4M+ labelled rows

The Bud Guardrails Dataset contains over 4 million labelled rows spanning toxicity, jailbreak attacks, prompt injections, and adversarial perturbations — believed to be the largest open guardrails dataset in existence. Every Sentinel model is trained on this foundation.

Ubiquitous Safety

Deploy guardrails
everywhere.

Guardrails stop being a cost center. They become infrastructure.

Edge Deployment

Phones, IoT gateways, embedded systems, ARM processors. Sentinel runs at ~25ms on edge CPUs — competing models take ~2,400ms on the same hardware. For the first time, guardrails can run at the point of generation, not just in the cloud.

96× faster than competitors on edge

Always-On Agent Monitoring

Guardrail every agent action, tool call, and model output — without adding a GPU to your agent infrastructure. One CPU server. 124 million classifications per day.

$0.10 per million classifications

Sovereign & Air-Gapped Deployments

Government, defence, and regulated industries can now run a full guardrail stack on their existing CPU infrastructure. No GPU procurement. No external cloud dependency. No data ever leaves the perimeter.

No GPU Required No Cloud Dependency Data Never Leaves

15–18× Cost-Performance Advantage

CPU cloud instances run at approximately $0.50/hour. NVIDIA A100 instances cost $2–3/hour — and still lose to Sentinel on latency. The performance advantage compounds with the cost difference at every scale.

CPU Instance $0.50/hr
vs
A100 GPU $2-3/hr
What's Next

Sentinel is just
the beginning.

We're rebuilding the entire GenAI stack with Resource Aware Attention.

Guardrails - Sentinel Live
Embeddings Next
Rerankers Next
Routing Planned
Compression Planned
Caching Planned

Redesign the foundation.
Redefine what's possible.

Not by waiting for cheaper hardware - but by building an architecture that works with what already exists.