Is this MIG or vGPU under the hood?

No. MIG and vGPU carve a GPU into fixed hardware partitions. We don't partition the silicon at all — workloads are placed dynamically by the scheduler, which understands compute, VRAM, bandwidth, and interconnect as separate dimensions.

How is isolation enforced without hardware partitions?

Three layers: VM/container isolation at the OS level (KVM + cgroups), scheduler-enforced quotas across compute and memory dimensions, and resource reservations that prevent starvation when a workload's bottleneck shows up.

What's the performance hit from sharing?

Negligible in practice. We co-locate workloads that stress different bottlenecks. When real contention would arise, the scheduler moves or queues — it doesn't fight. Most customers see latency within 2-5% of dedicated.

Can I get a fully dedicated GPU if I really need it?

Yes. Dedicated tier guarantees exclusive single-tenant access. Use it for benchmarking runs, regulated workloads, or apps with extreme tail-latency needs.

How are spikes to 100% GPU usage handled?

The scheduler watches utilisation continuously. Sustained spikes get preferential allocation; co-located workloads are migrated or queued. Your job sees the full card during the spike.

Does this work for multi-node distributed training?

Yes — but for cluster-scale training we deploy on dedicated GPUs with InfiniBand. Dynamic placement is for single-GPU and small-fleet workloads where co-location actually improves economics.

Is there a noisy-neighbour risk?

Mathematically possible, operationally rare. The scheduler refuses placements that would conflict. We monitor for p95 / p99 deviation and rebalance when measured impact exceeds 5%.

How does pricing compare for the same B200?

Dedicated hyperscaler B200: $8-$11/hr typical. packet.ai dynamic-tier B200: $3.75/hr — 50-65% cheaper. Dedicated tier on packet.ai sits in between at ~$5.50/hr.

Technology — Intelligent GPU Scheduling Explained

The premise.

You either buy or rent dedicated GPUs and accept that large parts of the hardware will sit idle most of the time, or you oversubscribe aggressively and accept unpredictable performance, noisy neighbours, and brittle workloads.

Our GPU utilisation technology is built on a simple observation: modern AI workloads rarely consume all aspects of a GPU at the same time. VRAM, compute, memory bandwidth, and interconnect are stressed differently depending on whether you're doing inference, fine-tuning, evaluation, or burst training.

Traditional infrastructure ignores this reality and prices GPUs as if they are a single, indivisible resource. They're not.

The core idea.

Instead of treating a GPU as "one job, one card", packet.ai allocates and schedules GPU resources based on what workloads actually consume in real time.

We track and manage GPU usage across multiple dimensions, not just whether a GPU is occupied, but how it is being used. Multiple compatible workloads can share the same physical hardware safely and predictably, without slicing the GPU into hard partitions or degrading performance.

The result: significantly higher utilisation per GPU, while performance feels close to dedicated infrastructure from the user's perspective.

Standard GPU utilisation

Average fleet utilization58%

GAPS BILLED TO CUSTOMER

Predictable performance92%

WITHIN DEDICATED SLA

Effective $/GPU-hour spent100%

100% RATE · 58% USEFUL WORK

With packet.ai optimisation

Average fleet utilization92%

CHARGED FOR EXECUTION

Predictable performance94%

WITHIN DEDICATED SLA

Effective $/GPU-hour spent48%

~50% LOWER · SAME WORK

Why performance stays high.

Performance degradation happens when workloads compete for the same bottleneck at the same time. Our scheduler is built to avoid exactly that.

By understanding how different workloads stress the GPU, we co-locate jobs that complement each other rather than collide. A memory-heavy inference workload can run alongside a compute-heavy task without either seeing meaningful slowdown.

When contention would impact performance, workloads are moved or queued automatically. The system prioritises predictable execution over raw density.

±2-5%

Typical latency variance vs. dedicated GPU

< 100ms

Scheduler reaction time on contention

Customer reboots required for re-placement

Why pricing becomes dramatically better.

Once you can safely drive higher utilisation, the economics change completely.

Traditional GPU pricing assumes low average utilisation, so prices have to cover idle time. packet.ai removes much of that waste. The same physical GPU does more useful work per hour, so the cost of that GPU can be spread across more customers without degrading their experience.

You're not paying for idle silicon.
You're paying for the resources your workload actually uses.

How we deliver B200 performance at $3.75/hour.

The premise.

The core idea.

How this differs from slicing & oversubscription.

Hard slicing

Oversubscription

Dynamic placement

Why performance stays high.

Why pricing becomes dramatically better.

The win for everyone.

Access at sensible pricing.

Better returns on expensive hardware.

Best of all worlds.

Same silicon. Half the cost. Try it now.

Questions about the tech.