FOR IMMEDIATE RELEASE

AT A GLANCE

First product, Bud Sentinel, ships 23 guardrail models that run 2.3X faster on a laptop CPU than competing systems on a $15,000 NVIDIA A100 GPU

8.39 ms: guardrail classification on a laptop CPU (incl. network overhead)
2.3× faster: than competitors running on a $15,000 NVIDIA A100 GPU
0.70 ms: p50 latency at 8K tokens with 10,000 concurrent requests (Xeon 8592+)
23 models, 33 variants: security, safety, toxicity, compliance, and quality guardrails
$0 GPU cost: production-grade guardrails on commodity hardware enterprises already own

Bud Ecosystem today revealed what the AI Safety industry has been getting wrong: you don’t need a $15,000 GPU to run production AI Safety. The company launched Resource Aware Attention, a new model architecture built from scratch for CPUs, alongside Bud Sentinel (23 guardrail models) and the Bud Guardrail Gateway, a production-grade gRPC serving infrastructure for deploying guardrails at scale — all running on commodity hardware.

The result: state-of-the-art AI safety at sub-millisecond latency and thousands of requests per second — on the CPUs organisations already own. No GPUs required.

The Problem: AI Safety Has a $15,000 Hardware Tax

Here’s what nobody talks about: every serious GenAI deployment needs guardrails — classifiers that catch toxicity, jailbreaks, and prompt injections on every request. The leading models (Meta’s Prompt Guard, ArchGuard, PIGuard) all assume you have a GPU. On an NVIDIA A100, they manage 18–19ms per classification. On the CPUs that 99% of organisations actually run? Latency explodes to 300–800ms. Unusable.

It gets worse. These models cap at 512 tokens. Real-world prompts routinely exceed 8,000. That means 16 parallel GPU workers per classification — turning a single guardrail check into a distributed computing problem that costs as much as the language model itself.

The Fix: Stop Porting GPU Architectures. Design for CPUs.

The conventional approach is to take GPU-native architectures and squeeze them onto CPUs. Diminishing returns every time. Bud Ecosystem’s research team took the opposite path: they built a new attention mechanism from scratch, designed around what CPUs actually excel at — cache locality, NUMA topology, and SIMD vector instructions.

Resource Aware Attention is none of the usual tricks — not compression, not quantisation, not a runtime hack. It computes attention natively within CPU memory hierarchies. GPU-class performance on commodity hardware, no accuracy loss. And it processes up to 65,536 tokens in a single pass — no chunking, no windowing, no parallel workers.

KEY INSIGHT : When you design attention for the hardware the world actually has — instead of porting architectures designed for hardware it doesn’t — you don’t just close the GPU gap. You eliminate it.

Bud Sentinel: 23 Models, Zero GPUs

Bud Sentinel is the first product built on Resource Aware Attention: 23 specialised models across 33 variants, covering five categories of content safety and compliance:

Category	Models	Capabilities
Security	3	Jailbreak detection, prompt injection defence, secrets/credential detection
Safety	4	Content moderation, content safety, suicide and self-harm detection, drug enablement detection
Toxicity	8	Social media toxicity, toxic conversation detection, hate speech, abuse, profanity, obscenity, insults, threats, identity attacks, impoliteness
Compliance	6	PII detection across 11 regions (AU, US, ES, IN, FI, IT, KR, SG, PL, UK, General), illegal activity, political content, regulated advice, bias detection
Quality	2	Spam detection, domain-specific QA validation

Every model runs on commodity CPUs at production scale. Zero infrastructure is required at any point in the pipeline — from inference to serving to scaling.

The Trade-Off Everyone Accepted Was a False Choice

The guardrail industry accepted a false trade-off: catch attacks and block real users, or let users through and miss attacks. Pick one. Bud Sentinel refuses to choose. It delivers the highest balanced accuracy of any guardrail system tested across four independent benchmarks: JailBreakBench, PIGuard, WildJailbreak, and Qualifire PI.

Overall Accuracy (Weighted Across All Benchmarks)

Model	ASR ↓	FRR ↓	Balanced	Rank
Bud Sentinel	15.97%	14.92%	84.56%	#1
PIGuard	25.01%	25.86%	74.57%	#2
ProtectAI V2	36.35%	24.39%	69.63%	#3
Prompt Guard 2 (86M)	34.68%	15.30%	75.01%	#4
ArchGuard	5.40%	81.65%	56.48%	#5
Prompt Guard (86M)	5.83%	89.35%	52.41%	#6

ASR (Attack Success Rate): percentage of attacks that bypass the guardrail — lower is better. FRR (False Refusal Rate): percentage of legitimate requests incorrectly blocked — lower is better.

Prompt Guard and ArchGuard block attacks well (5–6% ASR) but reject 81–89% of legitimate users. Unusable in production. ProtectAI and Prompt Guard 2 keep false refusals low but let 34–36% of attacks sail through. Sentinel hits 15.97% ASR and 14.92% FRR simultaneously — the only model that performs on both metrics.

Performance: Your Laptop Beats a $15,000 GPU

Laptop CPU vs. $15,000 GPU

Per-classification latency at 512 tokens. Sentinel includes full gRPC round-trip overhead. Competitors benchmarked with raw inference only (no network overhead) — and they still lose.

Model	Hardware	Latency	vs. Sentinel
Bud Sentinel	i7 Laptop CPU	8.39 ms	—
Prompt Guard 2	A100 GPU ($15K)	18.52 ms	2.2× slower
ArchGuard	A100 GPU ($15K)	19.07 ms	2.3× slower
PIGuard	A100 GPU ($15K)	19.00 ms	2.3× slower
ProtectAI V2	A100 GPU ($15K)	19.13 ms	2.3× slower

Same CPU, Different Architecture

Same chip. Same conditions. On an Intel Xeon 8272CL, competing models degrade from 18–19ms (GPU) to 334–402ms — a 56–67× performance collapse. Sentinel hits 5.99ms on the same hardware.

Model	Xeon 8272CL	A100 GPU	CPU/GPU Ratio
Bud Sentinel	5.99 ms	N/A (no GPU)	—
Prompt Guard 2	334 ms	18.52 ms	18× degradation
ArchGuard	380 ms	19.07 ms	20× degradation
Prompt Guard (86M)	402 ms	18.92 ms	21× degradation

Production Throughput

On server-grade CPUs, Sentinel matches throughput that normally demands dedicated GPU clusters. Benchmarks on Intel Xeon Platinum 8592+ (256 vCPU) with the Bud Guardrail Gateway:

Seq Length	Concurrency	p50 Latency	Throughput	Per 1K Tokens
8,192	10,000	0.70 ms	1,432 req/s	0.09 ms
16,384	10,000	1.31 ms	761 req/s	0.08 ms
65,536	10,000	11.29 ms	89 req/s	0.17 ms

At 8,192 tokens and 10,000 concurrent connections: 1,432 classifications per second, 0.70ms p50 latency. Entirely on CPU. One server. 124 million classifications per day. Estimated cost: $0.10 per million.

The Bud Guardrail Gateway

Bud Ecosystem also ships the Bud Guardrail Gateway — a single binary that serves all 23 Sentinel models with unified gRPC API, model loading, request routing, batching, health checks, and horizontal scaling. Deploy anywhere: public cloud, private cloud, on-premise, air-gapped, or edge.

Single binary deployment — all 23 models served from one process with unified API
Native long-context support — processes up to 65,536 tokens in a single pass without chunking or parallel workers
Concurrency-optimised — p50 latency improves under load (0.70ms at 10K concurrent vs. 1.63ms at 100)
Hardware agnostic — validated on Intel Xeon, AMD EPYC, and consumer-grade Intel Core processors
Enterprise-ready — gRPC protocol, health checks, observability hooks, and integration with Bud SENTRY governance framework

The Bud Guardrails Dataset

No existing dataset covered the threat landscape these models needed. So Bud Ecosystem built one: over 4 million labelled rows spanning toxicity, jailbreak attacks, prompt injections, and adversarial perturbations — believed to be the world’s largest open guardrails dataset. It underpins all 23 Sentinel models.

Beyond Guardrails: Every GPU Primitive, Rebuilt for CPUs

Sentinel is first, not last. Bud Ecosystem plans to apply Resource Aware Attention across the GenAI stack:

Embeddings — CPU-native embedding models for retrieval and similarity search
Re-rankers — CPU-native cross-encoder re-ranking for RAG pipelines
Routing — intelligent request routing and intent classification
Compression — context compression and summarisation
Caching — semantic caching for repeated query optimisation

The bet: every GenAI primitive that currently demands GPU acceleration can run on commodity hardware. Your infrastructure becomes your AI advantage.

What This Unlocks

Sub-millisecond guardrails on commodity CPUs don’t just cut costs. They make deployment patterns possible that were impossible before:

Edge deployment: Guardrails on phones, IoT gateways, and edge nodes — for the first time. ~25ms on ARM/edge CPUs versus ~2,400ms for competing models on the same hardware.
Always-on agent monitoring: Guardrails on every agent action, tool call, and output. 124 million classifications per day. One CPU server. $0.10 per million.
Sovereign and air-gapped deployments: Government, defence, and regulated industries deploy full guardrail stacks on existing infrastructure. No GPU procurement. No external cloud dependency.
15–18× cost-performance improvement: CPU cloud instances at ~$0.50/hour versus A100 instances at ~$2–3/hour, with Sentinel on CPU running 2.3× faster than competitors on A100. The cost advantage compounds at scale.

About Bud Ecosystem

Bud Ecosystem builds sovereign AI infrastructure that runs on any hardware. The platform spans the full Enterprise AI lifecycle: model training (Model Foundry), deployment (AI Foundry), GPU virtualisation (FCSP), developer tools (Pod), AI Consumption (Studio), enterprise agents (Bud Agent), tool integration (MCP Foundry), and governance (SENTRY). Hardware-agnostic across Intel, AMD, Qualcomm, and NVIDIA silicon. Production-grade AI on without lock in.

MEDIA CONTACT

Bud Ecosystem Communications
marketing@budecosystem.com

All performance benchmarks cited in this release were conducted by Bud Ecosystem’s research team. Competitor model results reflect publicly available architectures evaluated under controlled conditions. Resource Aware Attention, Bud Sentinel, Bud Guardrail Gateway, and Bud Ecosystem are trademarks of Bud Ecosystem. All other trademarks are the property of their respective owners.