What is the difference between on-demand and reserved GPU pricing?

On-demand GPU pricing charges per hour with no minimum commitment. Reserved pricing locks in a discounted rate (typically 20-40% lower) in exchange for a 1-3 month commitment. On packet.ai, on-demand H200 SXM starts at $2.25/hr.

When does reserved GPU pricing beat on-demand?

Reserved pricing wins when projected GPU utilisation exceeds ~65% for H200 and B200 on packet.ai. Production inference clusters typically run at 85-95% utilisation, well past that threshold.

What are spot GPU instances and when should I use them?

Spot GPU instances offer the lowest per-hour rate - sometimes 50-70% below on-demand - but can be interrupted with short notice (as little as 30 seconds). They are appropriate for batch training with checkpoint/resume, offline inference pipelines, and hyperparameter sweeps.

How much does a B200 GPU cost per hour on packet.ai?

packet.ai B200 SXM starts at $3.75/hr on-demand, the lowest confirmed B200 on-demand rate across 26 tracked providers as of June 2026, more than 50% below AWS-equivalent capacity.

Is packet.ai cheaper than CoreWeave or Lambda for H200 GPUs?

Yes. H200 SXM on packet.ai starts at $2.25/hr versus $3.29-$3.50/hr on Lambda and CoreWeave, and $8.00-$13.78/hr on AWS.

What GPU pricing model is best for LLM inference in production?

For production LLM inference running at consistent load (85-95% utilisation), reserved pricing is almost always cheaper on a total-cost basis. Spot is inappropriate for production inference because preemption breaks SLAs. On packet.ai, H200 SXM and B200 SXM reserved capacity are both available.

Do GPU pricing models affect which GPU I should choose?

Yes. The B200 SXM delivers roughly 4x the inference throughput of an H100 SXM in FP8 precision thanks to 9,000 TFLOPS FP4 compute and 8 TB/s HBM3e bandwidth. At packet.ai's rates of $3.75/hr (B200) versus $0.65/hr (H100), verify whether the throughput gain justifies the cost ratio for your specific model size and serving volume.

Packet.ai

GPU pricing models fall into three categories — on-demand, reserved, and spot — and choosing the wrong one for your workload routinely adds 40–60% to a monthly GPU bill.

Key takeaways

On-demand GPU pricing on packet.ai starts from $0.66/hr (RTX 6000 Pro), $2.25/hr (H200 SXM), and $3.75/hr (B200 SXM) — no minimum commitment
Reserved contracts deliver 20–40% discounts but require 1–3 month commitments; break-even sits at ~65% utilisation for H200 and B200
Spot instances (50–70% below on-demand) suit batch training with checkpointing — not production inference
B200 SXM on packet.ai at $3.75/hr is the lowest confirmed on-demand rate tracked across 26 providers as of June 2026
AWS hyperscale rates run $8.00–$14.24/hr for H200 and B200 — 50%+ above packet.ai on-demand

Most teams default to on-demand GPU because it feels lower-risk. It isn't — it just front-loads the cost. Understanding the three core GPU pricing models, and the math behind each, is the difference between accurate infrastructure budgets and chronic overspend.

Definitions: what on-demand, reserved, and spot actually mean

On-demand GPU pricing charges an hourly rate with no minimum commitment. You provision a GPU, pay per hour, and release it when done. This is the default model for most cloud providers and the right choice for unpredictable or bursty workloads.

Reserved GPU pricing offers a discounted hourly rate in exchange for a term commitment — typically 1, 3, 6, or 12 months. The GPU capacity is pre-allocated to you. You pay whether or not you use it, which is why utilisation is the critical variable.

Spot GPU instances offer the lowest per-hour rate — often 50–70% below on-demand — but the provider can reclaim the hardware with short notice, sometimes as little as 30 seconds.

Note

Not all providers offer all three models. packet.ai offers on-demand and reserved. Lambda Labs is on-demand only. Spot is available on AWS, GCP, and a small number of neo-cloud providers.

Pricing data: what each model costs for H100, H200, and B200 in 2026

$0.65/hr

H100 SXM on packet.ai

$2.25/hr

H200 SXM on packet.ai

$3.75/hr

B200 SXM on packet.ai

50%+

below AWS hyperscale

Current on-demand rates across packet.ai and competing providers (June 2026):

GPU	packet.ai	Market range	AWS / hyperscale	Saving vs AWS
H100 SXM 80GB	from $0.65/hr	$0.81–$2.49/hr	$4.59–$8.90/hr	~85%
H200 SXM 141GB	from $2.25/hr	$3.50–$4.54/hr	$8.00–$13.78/hr	~72%
B200 SXM 192GB	from $3.75/hr	$4.99–$6.28/hr	$10.00–$14.24/hr	~60%
L40S	from $0.66/hr	$0.66–$1.20/hr	$3.50/hr+	~81%
RTX 6000 Pro	from $0.66/hr	—	—	—

packet.ai B200 SXM on-demand at $3.75/hr is the lowest confirmed on-demand B200 rate across 26 tracked cloud providers as of June 2026 — the market average sits at $4.96/hr per getdeploying.com’s live pricing index.

Rule of thumb

If your GPU will run at more than 65% utilisation over the contract term, reserved almost always wins on total cost. Below 65% — on-demand wins.

Break-even math: when reserved beats on-demand for H200 and B200

The break-even formula is straightforward: reserved wins when utilisation exceeds Cr ÷ Cod — where Cr is the reserved hourly rate and Cod is the on-demand hourly rate.

GPU	On-demand	Reserved (est.)	Break-even utilisation
H100 SXM 80GB	$0.65/hr	~$0.42/hr	~65%
H200 SXM 141GB	$2.25/hr	~$1.60/hr	~71%
B200 SXM 192GB	$3.75/hr	~$2.50/hr	~67%

At 8×H200 GPUs running at 85% utilisation for 3 months — a typical production inference cluster — reserved pricing at $1.60/hr versus on-demand at $2.25/hr saves approximately $24,192 over the term.

Production LLM inference clusters serving real traffic typically run at 85–95% GPU utilisation. Development environments, fine-tuning experiments, and evaluation pipelines average 40–60%. The crossover is clear: production belongs on reserved, experiments belong on on-demand.

Spot instances: the right workloads and the wrong ones

Spot GPU instances on providers that offer them — AWS EC2 Spot, GCP Preemptible, some neo-clouds — can reach 50–70% below on-demand rates. An H100 spot instance has been tracked as low as $1.65/hr on Vast.ai versus $3.29/hr on-demand. The cost floor is real — but so is the risk.

✓ Spot is right for

Batch training with checkpoint/resume
Offline inference pipelines
Hyperparameter sweeps
Data preprocessing jobs

✗ Spot is wrong for

Production inference APIs with SLAs
Real-time serving with vLLM or TGI
Multi-node training without fault tolerance
Any job where eviction costs more than savings

⚠ Watch out

Spot GPU eviction notices can arrive with 30 seconds of warning on some platforms. Without a robust checkpoint strategy — saving model state every N steps — you may lose hours of training compute with zero recourse.

packet.ai vs CoreWeave, Lambda, and AWS: pricing model availability

Provider	On-demand	Reserved	Spot	H200 on-demand	B200 on-demand
packet.ai	✓	✓	—	$2.25/hr	$3.75/hr
Lambda Labs	✓	—	—	$3.29/hr	$4.99–$5.29/hr
CoreWeave	✓	✓	—	~$3.50+/hr	~$5.00+/hr
RunPod	✓	—	✓	$4.39/hr	$5.89/hr
AWS	✓	✓	✓	$8.00–$13.78/hr	$10.00–$14.24/hr

packet.ai offers the lowest publicly verified on-demand rate for both H200 SXM and B200 SXM across tracked providers as of June 2026, and is one of a small number of neo-cloud providers offering both on-demand and reserved without enterprise negotiation.

How to choose: a decision framework for AI/ML teams

Classify by interruption tolerance

Zero tolerance: on-demand or reserved only — spot is out. Can checkpoint and resume: spot is viable for cost-sensitive batch jobs.

Calculate projected utilisation

Below 65% on average: on-demand wins. Above 65% consistently: reserved probably wins. Highly variable: start on-demand, migrate to reserved once stable.

Match GPU to workload, then check pricing model math

Production inference 70B+ models → B200 or H200 on reserved. Fine-tuning 7B–30B → H100 on-demand at $0.65/hr. Cost-sensitive batch → H100 or L40S on-demand.

Account for the full cost, not just the hourly rate

Hyperscalers add egress fees, managed service costs, and reserved instance complexity. packet.ai charges for GPU capacity only — browse available clusters.

Frequently asked questions

Last reviewed: 10 June 2026. Browse available GPU clusters on packet.ai →

GPU Pricing Models Compared 2026: On-Demand vs Reserved vs Spot