🚀 B200 bare metal now at $5.6/hr. The best price you'll find. DC in US West → (Access it from Bare metal button on top after login).

Get Your B200 →
Start Building
Technology

How we deliver B200 performance at $3.75/hour.

GPU infrastructure has been sold as if performance and price are opposing forces. packet.ai removes that false trade-off by treating each GPU as a multi-dimensional resource, not a single indivisible card.

92%
Average GPU utilisation across our fleet
2.5×
Useful work per GPU-hour vs. traditional pricing
$3.75/hr
On-demand B200, Blackwell, 180 GB HBM3e
±2-5%
Latency variance vs. dedicated, in real workloads

The premise.

You either buy or rent dedicated GPUs and accept that large parts of the hardware will sit idle most of the time, or you oversubscribe aggressively and accept unpredictable performance, noisy neighbours, and brittle workloads.

Our GPU utilisation technology is built on a simple observation: modern AI workloads rarely consume all aspects of a GPU at the same time. VRAM, compute, memory bandwidth, and interconnect are stressed differently depending on whether you're doing inference, fine-tuning, evaluation, or burst training.

Traditional infrastructure ignores this reality and prices GPUs as if they are a single, indivisible resource. They're not.

The core idea.

Instead of treating a GPU as "one job, one card", packet.ai allocates and schedules GPU resources based on what workloads actually consume in real time.

We track and manage GPU usage across multiple dimensions, not just whether a GPU is occupied, but how it is being used. Multiple compatible workloads can share the same physical hardware safely and predictably, without slicing the GPU into hard partitions or degrading performance.

The result: significantly higher utilisation per GPU, while performance feels close to dedicated infrastructure from the user's perspective.
Standard GPU utilisation
Average fleet utilization58%
GAPS BILLED TO CUSTOMER
Predictable performance92%
WITHIN DEDICATED SLA
Effective $/GPU-hour spent100%
100% RATE · 58% USEFUL WORK
With packet.ai optimisation
Average fleet utilization92%
CHARGED FOR EXECUTION
Predictable performance94%
WITHIN DEDICATED SLA
Effective $/GPU-hour spent48%
~50% LOWER · SAME WORK

How this differs from slicing & oversubscription.

A lot of platforms claim "sharing", but what they usually mean is one of two things:

The slicersMIG · vGPU

Hard slicing

MIG and vGPU carve a GPU into fixed partitions. Simple, but inflexible. If your workload needs more VRAM but less compute, you're stuck paying for a shape that doesn't fit.

  • Fixed shapes ignore workload variance
  • Pay for headroom you don't use
  • Re-partitioning requires reboot
The hopefulsBest-effort

Oversubscription

Oversubscription works in theory. Without the right software layer to manage it, jobs compete for resources, performance becomes unpredictable and workloads break in ways that are hard to debug.

  • Unbounded contention
  • p99 latency in free-fall
  • No fairness guarantees
packet.aiScheduler-level

Dynamic placement

Our software layer manages every workload in real time, ensuring each job gets exactly the resources it paid for. No throttling, no competition, no surprises. Just consistent, guaranteed performance every time.

  • Adapts to real consumption
  • p99 within ±5% of dedicated
  • Hot-migration, no reboots

Why performance stays high.

Performance degradation happens when workloads compete for the same bottleneck at the same time. Our scheduler is built to avoid exactly that.

By understanding how different workloads stress the GPU, we co-locate jobs that complement each other rather than collide. A memory-heavy inference workload can run alongside a compute-heavy task without either seeing meaningful slowdown.

When contention would impact performance, workloads are moved or queued automatically. The system prioritises predictable execution over raw density.

±2-5%
Typical latency variance vs. dedicated GPU
< 100ms
Scheduler reaction time on contention
0
Customer reboots required for re-placement

Why pricing becomes dramatically better.

Once you can safely drive higher utilisation, the economics change completely.

Traditional GPU pricing assumes low average utilisation, so prices have to cover idle time. packet.ai removes much of that waste. The same physical GPU does more useful work per hour, so the cost of that GPU can be spread across more customers without degrading their experience.

You're not paying for idle silicon.
You're paying for the resources your workload actually uses.

The win for everyone.

For customers

Access at sensible pricing.

Predictable performance, fast startup times, and the ability to scale without long commitments or inflated hourly rates.

Workloads that would be uneconomical on dedicated GPUs suddenly become viable to run continuously, such as production inference, always-on agents, batch evaluation, and long-tail fine-tunes.

50%+
Below hyperscaler on-demand
For infrastructure providers

Better returns on expensive hardware.

Higher utilisation means GPUs that would normally sit partially idle can be monetised efficiently, without turning the platform into a support nightmare.

Providers offer competitive pricing while protecting margins, because the underlying economics finally work.

87%
Average provider fleet utilisation
The outcome

Best of all worlds.

Four properties that historically required four different trade-offs. We don't make you pick.

Dedicated-feel performance
Full isolation when your job needs the card. Latency within a few percent of single-tenant, measured, not promised.
Real-usage pricing
Charges reflect what you actually consume, not worst-case assumptions baked into a flat hourly rate.
Stability and isolation
Scheduler-enforced fairness without rigid hardware slicing. No noisy-neighbour roulette.
High utilisation
Multiple compatible workloads coexisting on the same silicon, without contention chaos.

Customers get more compute for their money. Providers get healthier economics. GPUs finally spend their time doing what they were bought for, useful work, not sitting idle.

Same silicon. Half the cost. Try it now.

Launch a B200 in minutes and see B200-class performance at $3.75/hour.

Frequently asked

Questions about the tech.

Things engineers ask in walkthroughs once they hear "shared GPU".

Is this MIG or vGPU under the hood?
No. MIG and vGPU carve a GPU into fixed hardware partitions. We don't partition the silicon at all. Workloads are placed dynamically by the scheduler, which understands compute, VRAM, bandwidth, and interconnect as separate dimensions.
How is isolation enforced without hardware partitions?
Three layers: VM/container isolation at the OS level (KVM + cgroups), scheduler-enforced quotas across compute and memory dimensions, and resource reservations that prevent starvation when a workload's bottleneck shows up.
What's the performance hit from sharing?
Negligible in practice. We co-locate workloads that stress different bottlenecks. When real contention would arise, the scheduler moves or queues. Most customers see latency within 2-5% of dedicated.
Can I get a fully dedicated GPU if I really need it?
Yes. Dedicated tier guarantees exclusive single-tenant access. Use it for benchmarking runs, regulated workloads, or apps with extreme tail-latency needs.
How are spikes to 100% GPU usage handled?
The scheduler watches utilisation continuously. Sustained spikes get preferential allocation; co-located workloads are migrated or queued. Your job sees the full card during the spike.
Does this work for multi-node distributed training?
Yes, but for cluster-scale training we deploy on dedicated GPUs with InfiniBand. Dynamic placement is for single-GPU and small-fleet workloads where co-location actually improves economics.
Is there a noisy-neighbour risk?
Mathematically possible, operationally rare. The scheduler refuses placements that would conflict. We monitor for p95 / p99 deviation and rebalance when measured impact exceeds 5%.
How does pricing compare for the same B200?
Dedicated hyperscaler B200: $8-$11/hr typical. packet.ai dynamic-tier B200: $3.75/hr, 50-65% cheaper. Dedicated tier on packet.ai sits in between at $5.25/hr.