Blog

From the Bud blog

Deep dives, engineering notes, and ideas on building efficient, sovereign AI — straight from the team.

Showing results
Understanding Native Tools in Bud Agent Builder: Web Fetch
Understanding Native Tools in Bud Agent Builder: Web Fetch

When teams evaluate AI agent platforms, “the agent can fetch a URL” sits near the top of every feature checklist — and…

Read more
Understanding Native Tools in Bud Agent Builder: Web Search
Understanding Native Tools in Bud Agent Builder: Web Search

Every useful AI agent eventually hits the same wall: the model only knows what it knew at training time. Ask it about…

Read more
Understanding Native Tools in Bud Agent Builder: Code Interpreter
Understanding Native Tools in Bud Agent Builder: Code Interpreter

Every serious agent eventually needs to do something computational — parse a file, reconcile a ledger, transform a dataset, run a model,…

Read more
Bud Agent Runtime: The Execution Layer for Production-Grade Agentic Systems
Bud Agent Runtime: The Execution Layer for Production-Grade Agentic Systems

Building an AI agent has never been easier. With today’s models, a developer can wire up a prompt, attach a tool or…

Read more
How We Designed User Access Controls in Bud Ecosystem, and Why
How We Designed User Access Controls in Bud Ecosystem, and Why

For any enterprise platform, access control is foundational. Admins, employees, and partners all operate with permissioned access across systems like CRM and…

Read more
Why Enterprise AI Doesn’t Need Another Tool — It Needs a Platform That Owns the Stack From Silicon to Consumption
Why Enterprise AI Doesn’t Need Another Tool — It Needs a Platform That Owns the Stack From Silicon to Consumption

In 2025, enterprises invested $684 billion in AI. More than $547 billion of that — over 80% — failed to deliver the…

Read more
When Generic AI Safety Isn’t Enough: Building Custom Guardrails That Fit Your Enterprise
When Generic AI Safety Isn’t Enough: Building Custom Guardrails That Fit Your Enterprise

Every enterprise deploying generative AI eventually arrives at the same uncomfortable realisation: the world’s best pre-built guardrails are still written by someone…

Read more
AI-Enabled vs. AI-Native: What’s the Actual Difference?
AI-Enabled vs. AI-Native: What’s the Actual Difference?

Here is a number worth sitting with. According to McKinsey’s 2025 State of AI survey across nearly 2,000 executives and 105 countries, 88…

Read more
Introducing SIMD-Bench: An Open-Source Framework for Cross-Architecture Benchmarking, Profiling, and Improving SIMD Kernels
Introducing SIMD-Bench: An Open-Source Framework for Cross-Architecture Benchmarking, Profiling, and Improving SIMD Kernels

We open-sourced SIMD-Bench, an open-source framework that benchmarks and profiles SIMD kernels to evaluate and compare their performance across different instruction set…

Read more
Why Use FCSP If GPUs Already Support MIG?
Why Use FCSP If GPUs Already Support MIG?

If you’ve ever tried to share a GPU between multiple users or workloads in a Kubernetes cluster, you’ve probably heard of NVIDIA’s…

Read more
How to Build vLLM Plugins: A comprehensive Developer Guide with tips and best practices
How to Build vLLM Plugins: A comprehensive Developer Guide with tips and best practices

Building plugins for vLLM allows you to tailor the system to your specific requirements and integrate custom functionality into your LLM workflows.…

Read more
Fixed Capacity Spatial Partition, FCSP : GPU Resource Isolation Framework for Multi-Tenant ML Workloads
Fixed Capacity Spatial Partition, FCSP : GPU Resource Isolation Framework for Multi-Tenant ML Workloads

GPU sharing in multi-tenant cloud environments requires efficient resource isolation without sacrificing performance. We present FCSP (Fixed Capacity Spatial Partition), a user-space…

Read more
Virtualised Hardware is The Missing Layer for Scalable AI-in-a-Box Systems
Virtualised Hardware is The Missing Layer for Scalable AI-in-a-Box Systems

AI-in-a-Box appliances have become the preferred choice for enterprises that need GenAI to run on-premises, within air-gapped environments, or under strict physical…

Read more
Introducing GPU-Virt-Bench: An Open-Source Framework for Benchmarking GPU Virtualization
Introducing GPU-Virt-Bench: An Open-Source Framework for Benchmarking GPU Virtualization

We just open-sourced GPU-Virt-Bench, a comprehensive benchmarking framework for evaluating software-based GPU virtualization systems like HAMi-core, BUD-FCSP, and comparing against ideal MIG…

Read more
Heterogenous GPU Virtualisation in Bud AI foundry
Heterogenous GPU Virtualisation in Bud AI foundry

Most enterprises don’t have a GPU performance problem—they have a GPU wastage problem. Clusters packed with A100s and H100s routinely run GenAI…

Read more
Reinventing Guardrails – Part 1: Why Performance, Latency, and Safety Need a New Equation
Reinventing Guardrails – Part 1: Why Performance, Latency, and Safety Need a New Equation

As generative AI (GenAI) systems evolve from experimental tools to enterprise-grade applications, the balance between performance, cost, and safety has become a…

Read more
Beyond Hardware: How Bud AI Foundry Helps OEMs Move from Devices to AI-Native Systems
Beyond Hardware: How Bud AI Foundry Helps OEMs Move from Devices to AI-Native Systems

In the early days of computing, machines came without an operating system. Users had to install one themselves, often requiring technical know-how.…

Read more
Beyond Bare Metal: How Bud AI Foundry Helps Cloud Service Providers Move from Bare Metal to AI-First Services
Beyond Bare Metal: How Bud AI Foundry Helps Cloud Service Providers Move from Bare Metal to AI-First Services

The rapid rise of Generative AI (GenAI) is sparking a new wave of global change, a movement that can only be described…

Read more
NxtGen’s M for Coding, Powered by Bud—  India’s Alternative to Claude Code
NxtGen’s M for Coding, Powered by Bud— India’s Alternative to Claude Code

Together with NxtGen Cloud, we’re excited to introduce M for Coding — a coding assistant launched under NxtGen Cloud’s M GenAI platform…

Read more
A case against AI wrapper companies & proprietary API-based models for Enterprise AI
A case against AI wrapper companies & proprietary API-based models for Enterprise AI

Over the past couple of years, we’ve seen a wave of “wrapper” AI companies pop up. These are the startups that don’t…

Read more
We Just Released the World’s Largest Open Dataset for AI Guardrails
We Just Released the World’s Largest Open Dataset for AI Guardrails

Ensuring that language models behave safely, ethically, and within intended boundaries is one of the most pressing challenges in AI today. That’s…

Read more
From GenAI Pilot to Production: Best Practices and Evals That Matter
From GenAI Pilot to Production: Best Practices and Evals That Matter

Many GenAI initiatives shine in the pilot phase but struggle when scaled to production. A common reason is that teams often focus…

Read more
From GenAI Pilot to Production: Why 95% of Projects Fail—and How to Beat the Odds
From GenAI Pilot to Production: Why 95% of Projects Fail—and How to Beat the Odds

GenAI pilots are proliferating across industries, yet advancing these initiatives into full-scale production remains a major challenge. A recent MIT study revealed…

Read more
I Built BlazeText — It’s 10X Faster Than HuggingFace’s Tokenizer
I Built BlazeText — It’s 10X Faster Than HuggingFace’s Tokenizer

A few weeks ago, while working on implementing a guardrail engine, I found myself staring at a performance graph that didn’t make…

Read more
Open Source Update : Bud Symbolic AI
Open Source Update : Bud Symbolic AI

This week we published a new open-source project — Bud Symbolic AI, an open-source framework designed to bridge traditional pattern matching (like…

Read more
What’s New in LLM Inference Optimization: Recent Advances and Techniques
What’s New in LLM Inference Optimization: Recent Advances and Techniques

Large Language Models (LLMs) are resource-intensive. Open-source models like LLaMA 2, Mistral 7B, Falcon 40B, and others offer flexibility for deployment on…

Read more
A Survey of parallelism strategies that can deliver better efficiency for your GenAI deployments.
A Survey of parallelism strategies that can deliver better efficiency for your GenAI deployments.

Generative AI unlocks incredible capabilities, but it doesn’t come cheap. Training and deploying large models like LLMs or diffusion models demand massive…

Read more
Product Update: Bud’s LLM Evaluation Framework 2.0
Product Update: Bud’s LLM Evaluation Framework 2.0

We have a major upgrade to our LLM Evaluation Framework — making it even more powerful, transparent, and scalable for enterprise AI…

Read more
A Survey on LLM Guardrails: Part 2, Guardrail Testing, Validating, Tools and Frameworks
A Survey on LLM Guardrails: Part 2, Guardrail Testing, Validating, Tools and Frameworks

Part 1 : Methods, Best Practices and Optimisations Part 2: Guardrail Testing, Validating, Tools and Frameworks (This article) As large language models (LLMs)…

Read more
A Survey on LLM Guardrails: Part 1, Methods, Best Practices and Optimisations
A Survey on LLM Guardrails: Part 1, Methods, Best Practices and Optimisations

Part 1 : Methods, Best Practices and Optimisations (This article)Part 2: Guardrail Testing, Validating, Tools and Frameworks As organizations embrace large language…

Read more
Sovereign AI Framework for Developing Nations
Sovereign AI Framework for Developing Nations

The global AI landscape shows a significant gap in infrastructure between developed and developing countries. For instance, the United States has about…

Read more
Automating License Analysis: A Small Feature That Solves a Big Problem
Automating License Analysis: A Small Feature That Solves a Big Problem

In the fast-moving world of Generative AI, where innovation often outpaces regulation, licensing has emerged as an increasingly critical—yet overlooked—challenge. Every AI…

Read more
Why Over-Engineering LLM Inference Is Costing You Big Money: SLO-Driven Optimization Explained
Why Over-Engineering LLM Inference Is Costing You Big Money: SLO-Driven Optimization Explained

When deploying Generative AI models in production, achieving optimal performance isn’t just about raw speed—it’s about aligning compute with user experience while…

Read more
Introducing Bud Agent; An Agent to automate GenAI Systems Management
Introducing Bud Agent; An Agent to automate GenAI Systems Management

Beyond the high costs associated with adopting Generative AI (GenAI), one of the biggest challenges organizations face is the lack of know-how…

Read more
Why You Should Choose On-Prem Over Cloud for Your GenAI Deployments
Why You Should Choose On-Prem Over Cloud for Your GenAI Deployments

Generative AI adoption is skyrocketing across industries, but organizations face a critical choice in how to deploy these models. Many use third-party…

Read more
Introducing Hex-1: A Fully Open-Source LLM for Indic Languages
Introducing Hex-1: A Fully Open-Source LLM for Indic Languages

India, being one of the most linguistically diverse nations in the world, faces a major roadblock in harnessing the full potential of…

Read more
Introducing Bud SENTRY – Secure Evaluation and Runtime Trust for Your Models
Introducing Bud SENTRY – Secure Evaluation and Runtime Trust for Your Models

Open-source large language models (LLMs) have become foundational to modern enterprise AI strategies. Their accessibility, performance, and flexibility make them an attractive…

Read more
Optimising Cost Efficiency in LLM Serving Using Heterogeneous Hardware Inferencing
Optimising Cost Efficiency in LLM Serving Using Heterogeneous Hardware Inferencing

Summary: The current industry practice of deploying GenAI-based solutions relies solely on high-end GPU infrastructure. However, several analyses have uncovered that this…

Read more
Exploring Transformed Multi-Head Latent Attention for Cost-Effective Enterprise GenAI
Exploring Transformed Multi-Head Latent Attention for Cost-Effective Enterprise GenAI

Deepseek’s latest innovation, R1, marks a significant milestone in the GenAI market. The company has achieved performance comparable to OpenAI’s o1, yet…

Read more
SLMs fine-tuned like DeepSeek’s R1 + Bud Inference = Most Cost-effective Enterprise GenAI
SLMs fine-tuned like DeepSeek’s R1 + Bud Inference = Most Cost-effective Enterprise GenAI

The recent launch of DeepSeek’s R1 model has made waves in the AI industry—not just for its technological advancements but also for…

Read more
Introducing Maxwell TCS v0.2: A Lightweight SOTA Model for Prompt Complexity Scoring
Introducing Maxwell TCS v0.2: A Lightweight SOTA Model for Prompt Complexity Scoring

We are excited to announce the open-source release of Maxwell Task Complexity Scorer v0.2, a breakthrough in efficient instruction complexity scoring. Maxwell…

Read more
Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments
Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments

As organizations experiment with proof-of-concept and pilot projects for enterprise-grade Generative AI applications, the primary focus often remains on developing functionality rather…

Read more
The Cost Conundrum Essays, Part 1 : The Goose Chase for Cost Effective LLMs
The Cost Conundrum Essays, Part 1 : The Goose Chase for Cost Effective LLMs

In recent years, Generative Large Language Models have become a centerpiece in the domain of NLP, catching the attention of researchers and…

Read more
How Enterprises That Are Serious About Their ESG Goals Should Approach GenAI Adoption
How Enterprises That Are Serious About Their ESG Goals Should Approach GenAI Adoption

Environmental, Social, and Governance (ESG) goals have become a top priority for most large enterprises in recent years. Stakeholders, regulators, and consumers…

Read more
x86 is All you need for AI Democratisation
x86 is All you need for AI Democratisation

Market Landscape Technology Landscape Why x86/CPU/Non-Accelerators is preferred for Inferencing Bud Ecosystem (Technology and Models) Bud Ecosystem develops a universal runtime, inference…

Read more
Should You Replace Third-party LLM Services with Open Source SLMs? A Cost-Benefit Analysis
Should You Replace Third-party LLM Services with Open Source SLMs? A Cost-Benefit Analysis

As artificial intelligence (AI) becomes an integral part of business operations, companies are increasingly leveraging powerful language models to create innovative products.…

Read more
An Equitable Governance Framework For Balancing AI Innovation and Ethical Regulation
An Equitable Governance Framework For Balancing AI Innovation and Ethical Regulation

NOTE: This is an ongoing research and we invite fellow researchers to collaborate on this project. If you are currently working on…

Read more
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models

As LLMs continue to grow, boasting billions to trillions of parameters, they offer unprecedented capabilities in natural language understanding and generation. However,…

Read more
Fast yet Safe: Early-Exit Neural Networks with Risk Control for Optimal Performance
Fast yet Safe: Early-Exit Neural Networks with Risk Control for Optimal Performance

Large Language Models, with their increased parameter sizes, often achieve higher accuracy and better performance across a variety of tasks. However, this…

Read more
LiveMind: Low-latency Large Language Models with Simultaneous Inference
LiveMind: Low-latency Large Language Models with Simultaneous Inference

In the rapidly evolving world of artificial intelligence, large language models (LLMs) are making headlines for their remarkable ability to understand and…

Read more
Reducing LLM Ops Costs through Hybrid Inference with SLMs on Intel CPUs and Cloud LLMs
Reducing LLM Ops Costs through Hybrid Inference with SLMs on Intel CPUs and Cloud LLMs

Despite the transformative potential of generative AI, its adoption in enterprises is lagging significantly. One major reason for this slow uptake is…

Read more
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

In the research paper “Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting,” the authors introduce a new framework called Kangaroo designed to…

Read more