Why Use FCSP If GPUs Already Support MIG?

Dec 22, 2025 | By Bud Ecosystem

If you’ve ever tried to share a GPU between multiple users or workloads in a Kubernetes cluster, you’ve probably heard of NVIDIA’s Multi-Instance GPU (MIG) technology. It’s the official, hardware-backed solution for GPU partitioning. But what if I told you there’s a compelling software alternative that might actually be better for your use case?

Enter FCSP (Fixed Capacity Spatial Partition) – a sophisticated software-based GPU virtualization library that provides MIG-like multi-tenant isolation without the hardware constraints. In this article, we’ll explore when and why you might choose FCSP over native MIG, backed by real benchmarks and production considerations.


What is NVIDIA MIG?

Multi-Instance GPU (MIG) is a hardware feature introduced with NVIDIA Ampere architecture (A100) and continued in Hopper (H100) GPUs. It allows a single physical GPU to be partitioned into up to 7 isolated instances, each with dedicated:

  • Memory: Physically isolated memory regions
  • SMs (Streaming Multiprocessors): Dedicated compute units
  • Memory Controllers: Guaranteed memory bandwidth

MIG partitions are “hard” – once created, they’re fixed until explicitly reconfigured. Each MIG instance appears as a separate GPU to CUDA applications.

What is FCSP?

FCSP is a software-based GPU virtualization layer that achieves similar multi-tenant isolation through:

  • LD_PRELOAD interception: Transparent CUDA API interception
  • Token-bucket rate limiting: Fine-grained compute time allocation
  • Memory quota enforcement: Hard/soft memory limits per process
  • Work-conserving scheduling: Unused resources flow to active tenants
  • Intelligent prefetching: UVM-based memory oversubscription

The key difference: FCSP operates entirely in software, works on any CUDA-capable GPU, and provides dynamic, flexible resource allocation.


The Problem with MIG

MIG is excellent technology, but it comes with significant limitations such as strict hardware requirements, static partitioning, limited partition profiles, and the potential for wasted resources.

1. Hardware Requirements

MIG (Multi-Instance GPU) support is limited to a very small set of NVIDIA’s data center–class GPUs. Specifically, MIG is available only on the A100 (both 40GB and 80GB variants), A30, H100, and H200. These GPUs are designed for large-scale enterprise and cloud environments, where hardware-level isolation and fine-grained GPU partitioning are critical for multi-tenant workloads.

Outside of this narrow group, MIG is not supported. Consumer GPUs do not offer MIG capabilities, and neither do older or lower-tier data center accelerators such as the V100 or T4. Likewise, no RTX-series GPUs—regardless of their performance—can use MIG. In practical terms, this means that MIG is exclusively a feature of NVIDIA’s most expensive, modern data center GPUs, and it is not an option for systems built on consumer or legacy hardware.

2. Static Partitioning

MIG partitions are fixed at creation time. It requires;

  1. No running CUDA processes
  2. MIG mode enabled (requires GPU reset)
  3. Specific, predefined partition profiles

$ nvidia-smi mig -cgi 9,9,9 -C # Create 3 MIG instances
# GPU must be idle, process disruption required

Changing MIG partition sizes is not a simple or flexible operation. To make any adjustments, all running workloads must first be terminated. Existing MIG instances then need to be destroyed and recreated with the new configuration, after which workloads must be restarted. This process is inherently disruptive and introduces operational overhead, making it both time-consuming and costly in production environments.

3. Limited Partition Profiles

MIG doesn’t allow arbitrary resource splits. You’re limited to predefined profiles:

A100-80GB ProfileMemorySMsInstances
1g.10gb10GB14Up to 7
2g.20gb20GB28Up to 3
3g.40gb40GB42Up to 2
7g.80gb80GB981

MIG offers little flexibility in how resources are divided. You cannot create arbitrary splits such as 30%/70%, evenly divide the GPU into five equal instances, or dynamically rebalance resources as workload demands change. You are constrained to a small set of predefined profiles, which often fail to align with real-world usage patterns.

4. Wasted Resources

With MIG’s static allocation, idle instances can’t share resources. For example consider a scenario where 3 MIG instances are present but only one tenant is active. Here, Tenant A limited to 33% even though 67% of hardware resources are idle.


How FCSP Solves These Problems

1. Universal GPU Support

FCSP delivers universal support across virtually all CUDA-enabled GPUs. For example, even an RTX 3080 can now offer MIG-like isolation, while a V100 cluster gains true multi-tenancy—without requiring specialized or next-generation hardware. FCSP-Supported GPUs include;

  • All Data Center: A100, H100, V100, T4, A10, L4, etc.
  • All Professional: RTX A6000, RTX 4000/5000/6000, etc
  • All Consumer: RTX 3080/3090/4080/4090, etc
  • Legacy: GTX 1080 Ti, Titan V, etc
  • Minimum: CUDA Compute Capability 3.0+

2. Dynamic Resource Allocation

FCSP partitions can be adjusted dynamically at runtime without disrupting running workloads—no restarts, no service interruption, and instant rebalancing as demands change.

3. Arbitrary Resource Splits

You can define GPU partition sizes at any percentage you choose, with precise control over how resources are allocated. For example;

Tenant A: 25% compute, 4GB memory
Tenant B: 35% compute, 6GB memory
Tenant C: 40% compute, 10GB memory
# Or dynamic fair-share $ export

BUD_ISOLATION_MODE=adaptive # Resources automatically balance based on demand

4. Work-Conserving Scheduling

This is FCSP’s standout feature: idle GPU resources automatically flow to active tenants, maximizing utilization without any manual effort. For example, in a scenario with three tenants, if only Tenant A is active, it can use the full GPU. As other tenants become active, resources are dynamically reclaimed and redistributed. Our benchmarks demonstrate this can achieve up to 143.9% efficiency compared to static allocation—exceeding 100% because compute-heavy and memory-heavy workloads complement each other perfectly.


FCSP Architecture Deep Dive

The core components of FCSP form a layered architecture designed to deliver dynamic, fine-grained GPU partitioning and multi-tenant resource management without modifying existing applications. At the top level, CUDA applications run unmodified, with FCSP intercepting GPU calls via LD_PRELOAD to integrate seamlessly. The heart of the system is libvgpu.so, which orchestrates resource allocation through modules such as the Memory Manager, Compute Throttler, and Stream Classifier, while also handling NCCL hooks, UVM prefetching, and graph optimization to maximize GPU efficiency.

Resource usage is tracked in a shared memory region, maintaining per-process metrics, global utilization data, token bucket states, and burst pool management, enabling real-time visibility into GPU activity. The SM Observer Thread continuously monitors GPU status via NVML, calculates fair-share allocations, detects idle tenants, and tracks contention, ensuring resources are dynamically rebalanced. Together, these components allow FCSP to deliver high efficiency, flexibility, and multi-tenancy on any CUDA-enabled GPU.

Core Components

Isolation Modes

FCSP provides four isolation modes to match your requirements:

1. None (BUD_ISOLATION_MODE=none)

  • No isolation enforcement
  • Useful for single-tenant scenarios
  • Minimal overhead (~40ns per API call)

2. Balanced (BUD_ISOLATION_MODE=balanced)

  • Default mode
  • Fair sharing with burst capability
  • 20% floor + 40% shared pool + 40% burst pool
  • Best for mixed workloads

3. Strict (BUD_ISOLATION_MODE=strict)

  • Hard quotas, no bursting
  • MIG-like behavior
  • Maximum isolation, minimum efficiency

4. Adaptive (BUD_ISOLATION_MODE=adaptive)

  • Automatically switches based on contention
  • Low contention → relaxed limits (more throughput)
  • High contention → strict limits (better isolation)
  • Best of both worlds

Rate Limiting Implementation

FCSP uses a sophisticated token bucket algorithm:

The system also includes:

  • Batch token consumption: Thread-local caching reduces atomic operations by 8-16x
  • PID controller: Smooth rate limiting with proportional-integral-derivative control
  • Exponential backoff: Graceful throttling (50ns → 10µs)

Memory Management

FCSP provides flexible memory limits:

Plus advanced UVM (Unified Virtual Memory) features:

  • Intelligent prefetching: Predict and prefetch based on access patterns
  • Memory pressure monitoring: Early warning at 70%, 85%, 95% thresholds
  • Automatic eviction: LRU, access-aware, or FIFO policies
  • Oversubscription: Use more GPU memory than physically available

Benchmark Comparison

Let’s look at real numbers comparing FCSP to native execution and MIG.

Test Environment

  • GPU: NVIDIA RTX 3080 (10GB, 68 SMs)
  • FCSP: Adaptive isolation mode
  • Iterations: 100 per test

Overhead Metrics

MetricNativeFCSPMIG (A100)
Kernel Launch Latency~3 µs4.9 µs~3.5 µs
Memory Alloc Latency~100 µs704 µs~100 µs
API Interception0 ns40 ns0 ns
Rate LimiterN/A1.5 µsN/A

Analysis: FCSP adds ~2µs overhead per kernel launch – negligible for typical GPU workloads where kernels run for milliseconds. Memory allocation is slower due to tracking, but this is amortized over the allocation lifetime.

Isolation Metrics

MetricFCSP (Balanced)FCSP (Adaptive)MIG
Fairness Index0.9960.9961.0
QoS Consistency (CV)0.070.07<0.05
Noisy Neighbor Impact11.5%4.66%~3%
Cross-Tenant Isolation81.4%81.4%~95%

Analysis: FCSP achieves near-perfect fairness (0.996 out of 1.0). With adaptive isolation, noisy neighbor impact drops to 4.66% – approaching MIG’s ~3%. The tradeoff is flexibility: FCSP allows work-conservation, MIG doesn’t.

Efficiency Metrics

MetricFCSPMIG
Affinity Complementary Efficiency143.9%N/A
Work Conservation BenefitUp to 67%0%
Resource Utilization (3 idle, 1 active)~100%33%

Analysis: FCSP’s work-conservation is a game-changer. When tenants are idle, active tenants can use the full GPU. MIG wastes 67% of resources in this scenario.


Feature Comparison Matrix

FeatureFCSPMIG
Hardware Support
Consumer GPUs (RTX)
V100, T4, A10
A100, H100
Partitioning
Arbitrary splits
Dynamic resizing
No-disruption changes
Work-conservation
Isolation
Memory isolationSoftwareHardware
Compute isolationSoftwareHardware
Fault isolationPartialComplete
Error containmentPer-processPer-instance
Performance
Zero overhead option
Burst capability
QoS guaranteesSoftHard
Operations
Kubernetes integrationVia device pluginNative
MonitoringNVML + customNVML
ConfigurationEnv varsnvidia-smi


Use Cases: When to Choose FCSP vs MIG

This section highlights the use cases for FCSP compared to MIG, helping you understand when each approach is most suitable. It provides guidance on choosing the right GPU partitioning solution based on workload needs, resource flexibility, and operational requirements.

When to choose FCSP?

Choose FCSP when you need flexible, dynamic GPU partitioning with multi-tenant support across any CUDA-enabled GPU. For example, it can be the right choice in the following scenarios;

1. You Don’t Have MIG-Capable GPUs

The most straightforward scenario for choosing FCSP is when your environment includes consumer GPUs like the RTX series, older data center GPUs such as V100, T4, or P100, or mixed GPU clusters. In these cases, FCSP is the only solution that provides true multi-tenant isolation, enabling dynamic partitioning and efficient resource sharing across hardware that MIG does not support.

2. You Need Dynamic Resource Allocation

Choose FCSP when you require dynamic resource allocation that adapts to changing workload demands in real time.

3. Variable Workload Patterns

Choose FCSP when your workloads have variable or unpredictable patterns that require flexible GPU resource management. It’s ideal for development environments that burst during testing and remain idle while coding, batch processing jobs that start and finish at different times, and time-sharing scenarios where different users are active at different hours.

An example Workday Pattern (Work-Conservation):

  • 9:00 AM: All 4 tenants active → 25% each
  • 12:00 PM: Tenant B at lunch → A,C,D get 33% each
  • 2:00 PM: Tenant D in meeting → A,C get 50% each
  • 6:00 PM: Only Tenant A working → A gets 100%

In such a scenario, using MIG would waste 75% of resources at 6 PM. But FCSP uses everything.

4. Complementary Workload Scheduling

FCSP’s workload affinity feature can co-schedule compute-heavy and memory-heavy workloads for better-than-isolated performance:

Without Affinity

  • Compute-heavy job: 50% SM utilization, 10% memory BW
  • Memory-heavy job: 10% SM utilization, 50% memory BW
  • Total GPU utilization: ~55% (many resources idle)

With FCSP Affinity:

  • Total GPU utilization: 60% SM + 60% memory BW
  • Both jobs run together
  • Efficiency: 143.9% vs isolated execution

5. LLM Inference at Scale

LLM inference presents unique challenges, including bursty memory allocation for KV caches, mixed compute patterns such as attention versus feed-forward networks, and variable batch sizes. FCSP addresses these demands with LLM-optimized profiles, ensuring efficient resource allocation and high performance for large language model workloads.

When to choose MIG?

Choose MIG when you need fixed, hardware-level GPU partitioning on supported NVIDIA data center GPUs.

1. Maximum Isolation is Critical

If tenant A crashing must never impact tenant B—such as in financial trading systems, medical imaging, or other safety-critical applications—MIG is the preferred choice for its hardware-level fault isolation, whereas FCSP offers process-level isolation.

2. Regulatory Compliance Requires Hardware Isolation

Certain compliance frameworks explicitly mandate hardware-level isolation, including HIPAA for healthcare, PCI-DSS for payment processing, and SOC 2 Type II. In these cases, MIG’s hardware partitioning can meet auditor requirements where software-based isolation like FCSP may not.

3. Guaranteed, Predictable Performance

MIG provides hard QoS guarantees:

MIG Instance: 1g.10gb

  • Guaranteed: 14 SMs (always)
  • Guaranteed: 10GB memory (always)
  • No variability, no neighbors

FCSP (even with strict mode):

  • Target: 14 SMs
  • Actual: 13-15 SMs (software scheduling variance)
  • Approx 0.07 coefficient of variation

If your SLA requires “exactly 14 SMs, never less”, use MIG.

4. You Have A100/H100 and Static Workloads

If you have:

  • MIG-capable hardware (A100, H100)
  • Predictable, always-on workloads
  • No need for resource flexibility

MIG is simpler to operate and has zero runtime overhead.


Configuration Guide

The Configuration Guide provides step-by-step instructions to set up and optimize FCSP for your GPU environment.

Basic FCSP Setup

Multi-Tenant Kubernetes Deployment

Production Configuration Profiles

Profile 1: Development Cluster

Profile 2: Production ML Serving

Profile 3: High-Isolation Multi-Tenant

Performance Tuning

Performance Tuning offers strategies to maximize GPU efficiency and workload throughput with FCSP.

Reducing Overhead

Improving Isolation

Memory Oversubscription (UVM)


Limitations and Considerations

Both FCSP and MIG have trade-offs that should be carefully considered when choosing the right GPU partitioning solution.

FCSP Limitations

  1. Software Isolation Only: A malicious or buggy tenant could potentially bypass isolation (MIG has hardware enforcement)
  2. Overhead: ~2µs per kernel launch, ~600µs for memory allocation (negligible for most workloads, but measurable)
  3. No Hardware Error Isolation: GPU errors (ECC, timeouts) affect all tenants
  4. Requires LD_PRELOAD: Application must use dynamically-linked CUDA
  5. Approximated Metrics: SM utilization is polled (5ms interval), not instantaneous

MIG Limitations

  1. Limited Hardware: Only A100, A30, H100, H200
  2. Fixed Profiles: Cannot create arbitrary partition sizes
  3. Static Allocation: Requires workload termination to resize
  4. No Work Conservation: Idle resources are wasted
  5. Reduced Total Performance: Sum of MIG instances < full GPU performance
  6. Complexity: More infrastructure to manage (MIG instances, CUDA MIG handles)

Conclusion

The choice between FCSP and MIG isn’t about which is “better” – it’s about which fits your requirements:

RequirementRecommendation
No MIG hardwareFCSP (only option)
Dynamic workloadsFCSP (work-conservation)
Maximum isolationMIG (hardware guarantee)
Compliance requirementsMIG (auditable hardware)
Development clustersFCSP (flexibility)
Predictable productionMIG (simplicity)
Mixed GPU typesFCSP (universal support)
LLM inferenceFCSP (optimized profiles)

For most practical scenarios, FCSP provides 80-95% of MIG’s isolation benefits with significantly more flexibility and universal hardware support. The work-conservation feature alone can improve cluster utilization by 40-67% in realistic multi-tenant scenarios. Consider FCSP when you value flexibility, efficiency, and broad hardware support. Choose MIG when you need the absolute guarantee of hardware isolation and have compatible GPUs.

This article is based on FCSP v1.0 and NVIDIA MIG as of December 2025. Benchmarks were conducted on an NVIDIA RTX 3080 (10GB, 68 SMs) with CUDA 12.0.

Bud Ecosystem

Our vision is to simplify intelligence—starting with understanding and defining what intelligence is, and extending to simplifying complex models and their underlying infrastructure.

Related Blogs

How to Build vLLM Plugins: A comprehensive Developer Guide with tips and best practices
How to Build vLLM Plugins: A comprehensive Developer Guide with tips and best practices

Building plugins for vLLM allows you to tailor the system to your specific requirements and integrate custom functionality into your LLM workflows. Whether you’re looking to integrate custom functionality, optimize performance, or streamline deployment, understanding how vLLM’s plugin system works is essential. In this comprehensive developer guide, I’ll break down the core concepts, walk through […]

Fixed Capacity Spatial Partition, FCSP : GPU Resource Isolation Framework for Multi-Tenant ML Workloads
Fixed Capacity Spatial Partition, FCSP : GPU Resource Isolation Framework for Multi-Tenant ML Workloads

GPU sharing in multi-tenant cloud environments requires efficient resource isolation without sacrificing performance. We present FCSP (Fixed Capacity Spatial Partition), a user-space GPU virtualization framework that achieves sub-microsecond memory enforcement and deterministic compute throttling through lock-free data structures and hierarchical token bucket rate limiting. Unlike existing solutions that rely on semaphore-based synchronization, FCSP employs C11 […]

Virtualised Hardware is The Missing Layer for Scalable AI-in-a-Box Systems
Virtualised Hardware is The Missing Layer for Scalable AI-in-a-Box Systems

AI-in-a-Box appliances have become the preferred choice for enterprises that need GenAI to run on-premises, within air-gapped environments, or under strict physical control. But as organizations scale AI, they often hit the same roadblock where each use case ends up needing its own dedicated system, every model appears to require its own GPU, and every […]

Introducing GPU-Virt-Bench: An Open-Source Framework for Benchmarking GPU Virtualization
Introducing GPU-Virt-Bench: An Open-Source Framework for Benchmarking GPU Virtualization

We just open-sourced GPU-Virt-Bench, a comprehensive benchmarking framework for evaluating software-based GPU virtualization systems like HAMi-core, BUD-FCSP, and comparing against ideal MIG behavior. It evaluates 56 metrics across 10 categories. 👉 GitHub : GPU-Virt-Bench Why Benchmark GPU-Virtualization Systems? When several applications or tenants try to run on the same GPU, the system can become unstable […]