🚀 B200 bare metal now at $5.6/hr. The best price you'll find. DC in US West → (Access it from Bare metal button on top after login).

Get Your B200 →
Start Building
Cover image
Guide

GPU Pricing Models Compared 2026: On-Demand vs Reserved vs Spot

GPU pricing models compared: on-demand, reserved, and spot explained with real break-even math. Know which model fits your workload before committing to hardware.

Author photo
packet.ai Team
January 12, 2025

GPU pricing models fall into three categories — on-demand, reserved, and spot — and choosing the wrong one for your workload routinely adds 40–60% to a monthly GPU bill.

Key takeaways

  • On-demand GPU pricing on packet.ai starts from $0.66/hr (RTX 6000 Pro), $2.25/hr (H200 SXM), and $3.75/hr (B200 SXM) — no minimum commitment
  • Reserved contracts deliver 20–40% discounts but require 1–3 month commitments; break-even sits at ~65% utilisation for H200 and B200
  • Spot instances (50–70% below on-demand) suit batch training with checkpointing — not production inference
  • B200 SXM on packet.ai at $3.75/hr is the lowest confirmed on-demand rate tracked across 26 providers as of June 2026
  • AWS hyperscale rates run $8.00–$14.24/hr for H200 and B200 — 50%+ above packet.ai on-demand

Most teams default to on-demand GPU because it feels lower-risk. It isn't — it just front-loads the cost. Understanding the three core GPU pricing models, and the math behind each, is the difference between accurate infrastructure budgets and chronic overspend.

Definitions: what on-demand, reserved, and spot actually mean

On-demand GPU pricing charges an hourly rate with no minimum commitment. You provision a GPU, pay per hour, and release it when done. This is the default model for most cloud providers and the right choice for unpredictable or bursty workloads.

Reserved GPU pricing offers a discounted hourly rate in exchange for a term commitment — typically 1, 3, 6, or 12 months. The GPU capacity is pre-allocated to you. You pay whether or not you use it, which is why utilisation is the critical variable.

Spot GPU instances offer the lowest per-hour rate — often 50–70% below on-demand — but the provider can reclaim the hardware with short notice, sometimes as little as 30 seconds.

Note

Not all providers offer all three models. packet.ai offers on-demand and reserved. Lambda Labs is on-demand only. Spot is available on AWS, GCP, and a small number of neo-cloud providers.

Pricing data: what each model costs for H100, H200, and B200 in 2026

$0.65/hr

H100 SXM on packet.ai

$2.25/hr

H200 SXM on packet.ai

$3.75/hr

B200 SXM on packet.ai

50%+

below AWS hyperscale

Current on-demand rates across packet.ai and competing providers (June 2026):

GPU packet.ai Market range AWS / hyperscale Saving vs AWS
H100 SXM 80GBfrom $0.65/hr$0.81–$2.49/hr$4.59–$8.90/hr~85%
H200 SXM 141GBfrom $2.25/hr$3.50–$4.54/hr$8.00–$13.78/hr~72%
B200 SXM 192GBfrom $3.75/hr$4.99–$6.28/hr$10.00–$14.24/hr~60%
L40Sfrom $0.66/hr$0.66–$1.20/hr$3.50/hr+~81%
RTX 6000 Profrom $0.66/hr

packet.ai B200 SXM on-demand at $3.75/hr is the lowest confirmed on-demand B200 rate across 26 tracked cloud providers as of June 2026 — the market average sits at $4.96/hr per getdeploying.com’s live pricing index.

Rule of thumb

If your GPU will run at more than 65% utilisation over the contract term, reserved almost always wins on total cost. Below 65% — on-demand wins.

Break-even math: when reserved beats on-demand for H200 and B200

The break-even formula is straightforward: reserved wins when utilisation exceeds Cr ÷ Cod — where Cr is the reserved hourly rate and Cod is the on-demand hourly rate.

GPU On-demand Reserved (est.) Break-even utilisation
H100 SXM 80GB$0.65/hr~$0.42/hr~65%
H200 SXM 141GB$2.25/hr~$1.60/hr~71%
B200 SXM 192GB$3.75/hr~$2.50/hr~67%

At 8×H200 GPUs running at 85% utilisation for 3 months — a typical production inference cluster — reserved pricing at $1.60/hr versus on-demand at $2.25/hr saves approximately $24,192 over the term.

Production LLM inference clusters serving real traffic typically run at 85–95% GPU utilisation. Development environments, fine-tuning experiments, and evaluation pipelines average 40–60%. The crossover is clear: production belongs on reserved, experiments belong on on-demand.

Spot instances: the right workloads and the wrong ones

Spot GPU instances on providers that offer them — AWS EC2 Spot, GCP Preemptible, some neo-clouds — can reach 50–70% below on-demand rates. An H100 spot instance has been tracked as low as $1.65/hr on Vast.ai versus $3.29/hr on-demand. The cost floor is real — but so is the risk.

✓ Spot is right for

  • Batch training with checkpoint/resume
  • Offline inference pipelines
  • Hyperparameter sweeps
  • Data preprocessing jobs

✗ Spot is wrong for

  • Production inference APIs with SLAs
  • Real-time serving with vLLM or TGI
  • Multi-node training without fault tolerance
  • Any job where eviction costs more than savings

⚠ Watch out

Spot GPU eviction notices can arrive with 30 seconds of warning on some platforms. Without a robust checkpoint strategy — saving model state every N steps — you may lose hours of training compute with zero recourse.

packet.ai vs CoreWeave, Lambda, and AWS: pricing model availability

Provider On-demand Reserved Spot H200 on-demand B200 on-demand
packet.ai$2.25/hr$3.75/hr
Lambda Labs$3.29/hr$4.99–$5.29/hr
CoreWeave~$3.50+/hr~$5.00+/hr
RunPod$4.39/hr$5.89/hr
AWS$8.00–$13.78/hr$10.00–$14.24/hr

packet.ai offers the lowest publicly verified on-demand rate for both H200 SXM and B200 SXM across tracked providers as of June 2026, and is one of a small number of neo-cloud providers offering both on-demand and reserved without enterprise negotiation.

How to choose: a decision framework for AI/ML teams

1

Classify by interruption tolerance

Zero tolerance: on-demand or reserved only — spot is out. Can checkpoint and resume: spot is viable for cost-sensitive batch jobs.

2

Calculate projected utilisation

Below 65% on average: on-demand wins. Above 65% consistently: reserved probably wins. Highly variable: start on-demand, migrate to reserved once stable.

3

Match GPU to workload, then check pricing model math

Production inference 70B+ models → B200 or H200 on reserved. Fine-tuning 7B–30B → H100 on-demand at $0.65/hr. Cost-sensitive batch → H100 or L40S on-demand.

4

Account for the full cost, not just the hourly rate

Hyperscalers add egress fees, managed service costs, and reserved instance complexity. packet.ai charges for GPU capacity only — browse available clusters.

Frequently asked questions

On-demand GPU pricing charges per hour with no minimum commitment — you pay only for what you use. Reserved pricing locks in a discounted rate (typically 20–40% lower) in exchange for a 1–3 month commitment. On packet.ai, on-demand H200 SXM starts at $2.25/hr. Reserved contracts reduce that for teams running steady workloads above ~65% utilisation.
Reserved pricing wins when projected GPU utilisation exceeds the break-even threshold — roughly 65% for most H200 and B200 configurations on packet.ai. Production inference clusters and steady training runs typically run at 85–95% utilisation, well past that threshold. Development workloads and variable traffic are better served by on-demand.
Spot GPU instances offer the lowest per-hour rate — sometimes 50–70% below on-demand — but can be interrupted with short notice (as little as 30 seconds). They are appropriate for batch training with checkpoint/resume, offline inference pipelines, and hyperparameter sweeps. They are the wrong call for real-time inference APIs or any job that cannot tolerate interruption.
packet.ai B200 SXM starts at $3.75/hr on-demand — the lowest confirmed B200 on-demand rate across 26 tracked providers as of June 2026, and more than 50% below AWS-equivalent capacity. Reserved contracts reduce the rate further for teams committing to 1–3 months of B200 capacity.
Yes for most configurations. H200 SXM on packet.ai starts at $2.25/hr on-demand versus $3.29–$3.50/hr on Lambda and CoreWeave, and $8.00–$13.78/hr on AWS. Prices vary by cluster size and region — check available clusters for live availability.
For production LLM inference running at consistent load (85–95% utilisation), reserved pricing is almost always cheaper on a total-cost basis. On-demand makes sense during ramp-up or for unpredictable traffic. Spot is inappropriate for production inference because preemption breaks SLAs. On packet.ai, H200 SXM and B200 SXM reserved capacity are both available.
Yes. The B200 SXM delivers roughly 4× the inference throughput of an H100 SXM in FP8 precision thanks to 9,000 TFLOPS FP4 compute and 8 TB/s HBM3e bandwidth. At packet.ai’s rates of $3.75/hr (B200) versus $0.65/hr (H100), verify whether the throughput gain justifies the cost ratio for your specific model size and serving volume before committing.

Last reviewed: 10 June 2026. Browse available GPU clusters on packet.ai →

Waste less compute.

Same models. Same API. Fraction of the cost. Start free — no credit card required.

Start Building →

More from the blog