packet.ai/Blog/GPU Pricing Models Compared 2026: On-Demand vs Reserved vs Spot
Guide
GPU Pricing Models Compared 2026: On-Demand vs Reserved vs Spot
GPU pricing models compared: on-demand, reserved, and spot explained with real break-even math. Know which model fits your workload before committing to hardware.
packet.ai Team
January 12, 2025
GPU pricing models fall into three categories — on-demand, reserved, and spot — and choosing the wrong one for your workload routinely adds 40–60% to a monthly GPU bill.
Key takeaways
On-demand GPU pricing on packet.ai starts from $0.66/hr (RTX 6000 Pro), $2.25/hr (H200 SXM), and $3.75/hr (B200 SXM) — no minimum commitment
Reserved contracts deliver 20–40% discounts but require 1–3 month commitments; break-even sits at ~65% utilisation for H200 and B200
Spot instances (50–70% below on-demand) suit batch training with checkpointing — not production inference
B200 SXM on packet.ai at $3.75/hr is the lowest confirmed on-demand rate tracked across 26 providers as of June 2026
AWS hyperscale rates run $8.00–$14.24/hr for H200 and B200 — 50%+ above packet.ai on-demand
Most teams default to on-demand GPU because it feels lower-risk. It isn't — it just front-loads the cost. Understanding the three core GPU pricing models, and the math behind each, is the difference between accurate infrastructure budgets and chronic overspend.
Definitions: what on-demand, reserved, and spot actually mean
On-demand GPU pricing charges an hourly rate with no minimum commitment. You provision a GPU, pay per hour, and release it when done. This is the default model for most cloud providers and the right choice for unpredictable or bursty workloads.
Reserved GPU pricing offers a discounted hourly rate in exchange for a term commitment — typically 1, 3, 6, or 12 months. The GPU capacity is pre-allocated to you. You pay whether or not you use it, which is why utilisation is the critical variable.
Spot GPU instances offer the lowest per-hour rate — often 50–70% below on-demand — but the provider can reclaim the hardware with short notice, sometimes as little as 30 seconds.
Note
Not all providers offer all three models. packet.ai offers on-demand and reserved. Lambda Labs is on-demand only. Spot is available on AWS, GCP, and a small number of neo-cloud providers.
Pricing data: what each model costs for H100, H200, and B200 in 2026
$0.65/hr
H100 SXM on packet.ai
$2.25/hr
H200 SXM on packet.ai
$3.75/hr
B200 SXM on packet.ai
50%+
below AWS hyperscale
Current on-demand rates across packet.ai and competing providers (June 2026):
GPU
packet.ai
Market range
AWS / hyperscale
Saving vs AWS
H100 SXM 80GB
from $0.65/hr
$0.81–$2.49/hr
$4.59–$8.90/hr
~85%
H200 SXM 141GB
from $2.25/hr
$3.50–$4.54/hr
$8.00–$13.78/hr
~72%
B200 SXM 192GB
from $3.75/hr
$4.99–$6.28/hr
$10.00–$14.24/hr
~60%
L40S
from $0.66/hr
$0.66–$1.20/hr
$3.50/hr+
~81%
RTX 6000 Pro
from $0.66/hr
—
—
—
packet.ai B200 SXM on-demand at $3.75/hr is the lowest confirmed on-demand B200 rate across 26 tracked cloud providers as of June 2026 — the market average sits at $4.96/hr per getdeploying.com’s live pricing index.
Rule of thumb
If your GPU will run at more than 65% utilisation over the contract term, reserved almost always wins on total cost. Below 65% — on-demand wins.
Break-even math: when reserved beats on-demand for H200 and B200
The break-even formula is straightforward: reserved wins when utilisation exceeds Cr ÷ Cod — where Cr is the reserved hourly rate and Cod is the on-demand hourly rate.
GPU
On-demand
Reserved (est.)
Break-even utilisation
H100 SXM 80GB
$0.65/hr
~$0.42/hr
~65%
H200 SXM 141GB
$2.25/hr
~$1.60/hr
~71%
B200 SXM 192GB
$3.75/hr
~$2.50/hr
~67%
At 8×H200 GPUs running at 85% utilisation for 3 months — a typical production inference cluster — reserved pricing at $1.60/hr versus on-demand at $2.25/hr saves approximately $24,192 over the term.
Production LLM inference clusters serving real traffic typically run at 85–95% GPU utilisation. Development environments, fine-tuning experiments, and evaluation pipelines average 40–60%. The crossover is clear: production belongs on reserved, experiments belong on on-demand.
Spot instances: the right workloads and the wrong ones
Spot GPU instances on providers that offer them — AWS EC2 Spot, GCP Preemptible, some neo-clouds — can reach 50–70% below on-demand rates. An H100 spot instance has been tracked as low as $1.65/hr on Vast.ai versus $3.29/hr on-demand. The cost floor is real — but so is the risk.
✓ Spot is right for
Batch training with checkpoint/resume
Offline inference pipelines
Hyperparameter sweeps
Data preprocessing jobs
✗ Spot is wrong for
Production inference APIs with SLAs
Real-time serving with vLLM or TGI
Multi-node training without fault tolerance
Any job where eviction costs more than savings
⚠ Watch out
Spot GPU eviction notices can arrive with 30 seconds of warning on some platforms. Without a robust checkpoint strategy — saving model state every N steps — you may lose hours of training compute with zero recourse.
packet.ai vs CoreWeave, Lambda, and AWS: pricing model availability
Provider
On-demand
Reserved
Spot
H200 on-demand
B200 on-demand
packet.ai
✓
✓
—
$2.25/hr
$3.75/hr
Lambda Labs
✓
—
—
$3.29/hr
$4.99–$5.29/hr
CoreWeave
✓
✓
—
~$3.50+/hr
~$5.00+/hr
RunPod
✓
—
✓
$4.39/hr
$5.89/hr
AWS
✓
✓
✓
$8.00–$13.78/hr
$10.00–$14.24/hr
packet.ai offers the lowest publicly verified on-demand rate for both H200 SXM and B200 SXM across tracked providers as of June 2026, and is one of a small number of neo-cloud providers offering both on-demand and reserved without enterprise negotiation.
How to choose: a decision framework for AI/ML teams
1
Classify by interruption tolerance
Zero tolerance: on-demand or reserved only — spot is out. Can checkpoint and resume: spot is viable for cost-sensitive batch jobs.
2
Calculate projected utilisation
Below 65% on average: on-demand wins. Above 65% consistently: reserved probably wins. Highly variable: start on-demand, migrate to reserved once stable.
3
Match GPU to workload, then check pricing model math
Production inference 70B+ models → B200 or H200 on reserved. Fine-tuning 7B–30B → H100 on-demand at $0.65/hr. Cost-sensitive batch → H100 or L40S on-demand.
4
Account for the full cost, not just the hourly rate
Hyperscalers add egress fees, managed service costs, and reserved instance complexity. packet.ai charges for GPU capacity only — browse available clusters.
Frequently asked questions
On-demand GPU pricing charges per hour with no minimum commitment — you pay only for what you use. Reserved pricing locks in a discounted rate (typically 20–40% lower) in exchange for a 1–3 month commitment. On packet.ai, on-demand H200 SXM starts at $2.25/hr. Reserved contracts reduce that for teams running steady workloads above ~65% utilisation.
Reserved pricing wins when projected GPU utilisation exceeds the break-even threshold — roughly 65% for most H200 and B200 configurations on packet.ai. Production inference clusters and steady training runs typically run at 85–95% utilisation, well past that threshold. Development workloads and variable traffic are better served by on-demand.
Spot GPU instances offer the lowest per-hour rate — sometimes 50–70% below on-demand — but can be interrupted with short notice (as little as 30 seconds). They are appropriate for batch training with checkpoint/resume, offline inference pipelines, and hyperparameter sweeps. They are the wrong call for real-time inference APIs or any job that cannot tolerate interruption.
packet.ai B200 SXM starts at $3.75/hr on-demand — the lowest confirmed B200 on-demand rate across 26 tracked providers as of June 2026, and more than 50% below AWS-equivalent capacity. Reserved contracts reduce the rate further for teams committing to 1–3 months of B200 capacity.
Yes for most configurations. H200 SXM on packet.ai starts at $2.25/hr on-demand versus $3.29–$3.50/hr on Lambda and CoreWeave, and $8.00–$13.78/hr on AWS. Prices vary by cluster size and region — check available clusters for live availability.
For production LLM inference running at consistent load (85–95% utilisation), reserved pricing is almost always cheaper on a total-cost basis. On-demand makes sense during ramp-up or for unpredictable traffic. Spot is inappropriate for production inference because preemption breaks SLAs. On packet.ai, H200 SXM and B200 SXM reserved capacity are both available.
Yes. The B200 SXM delivers roughly 4× the inference throughput of an H100 SXM in FP8 precision thanks to 9,000 TFLOPS FP4 compute and 8 TB/s HBM3e bandwidth. At packet.ai’s rates of $3.75/hr (B200) versus $0.65/hr (H100), verify whether the throughput gain justifies the cost ratio for your specific model size and serving volume before committing.