Start Building
Coming soon — join waitlist

NVIDIA Hopper, 141 GB HBM3e, SXM

NVIDIA H200141 GB HBM3e. Inference at scale.

The NVIDIA H200 is the Hopper GPU with 141 GB HBM3e memory, 1.76× the capacity of H100. It runs 70B models at FP16 natively and delivers 4.8 TB/s of bandwidth for the fastest token generation at scale. Coming soon to packet.ai.

From $2.49/GPU-hrComing soon

Pricing to be confirmed at launch

141GB
HBM3e memory
4.8TB/s
Memory bandwidth
1979TFLOPS
FP8 compute
900GB/s
NVLink bandwidth
Architecture

Hopper HBM3e — inference at scale.

The H200 is the H100 with 141 GB HBM3e memory — the same Hopper compute engine, 1.76× the capacity and 1.43× the bandwidth.

GH100 Hopper die

Same 80-billion-transistor die as the H100. 4th-gen Tensor Cores, MIG support, NVLink 4.0, with HBM3e delivering 1.4× the bandwidth.

141 GB HBM3e at 4.8 TB/s

76% more memory than H100. Long-context LLMs, multi-modal pipelines, and large-batch training fit without sharding.

4th-gen Transformer Engine

FP8 training and inference with per-tensor scaling for up to 4× speedup over FP16 on Hopper.

NVLink 4.0 at 900 GB/s

Scale across NVSwitch-connected nodes for large training runs where gradient communication is the bottleneck.

Technical specs

NVIDIA H200 specifications.

SpecificationValueGreat for
GPU architecture
NVIDIA Hopper
FP8 Tensor Cores — same compute as H100.
GPU memory
141 GB HBM3e
70B at FP16 native. 1.76× H100.
Memory bandwidth
4.8 TB/s
Fastest token generation per GPU outside Blackwell.
FP8 compute
1979 TFLOPS
Same as H100 SXM. All the speed, more memory.
NVLink
4.0 · 900 GB/s
Multi-GPU scale when single-card is not enough.
MIG
Up to 7 instances
Multi-tenant serving with 141 GB headroom.
Pricing

Ways to run H200.

Dedicated or monthly — plus multi-node clusters.

DedicatedMonthly · Single-tenant
TBC /month

Reserved H200 at a flat monthly rate. Full single-tenant isolation, predictable cost exclusively for you. 99.99% SLA, zero noisy-neighbour risk.

Get a wholesale quote →
Multi-node Cluster
From 8 GPUs

Scale frontier inference across multiple H200 nodes with NVLink 4.0 and InfiniBand interconnect.

  • 8–512 GPUs per cluster
  • NVLink + InfiniBand
  • Provisioned in <1 hr
Get a wholesale quote →
Use cases

What the H200 is built for.

70B inference at FP16

141 GB lets you run 70B models at FP16 natively. No quantisation, no model sharding.

  • 70B at FP16 native
  • No quantisation needed
  • Single-GPU deployment

High-throughput API serving

4.8 TB/s bandwidth delivers the fastest token generation per GPU outside Blackwell.

  • 4.8 TB/s HBM3e
  • Low p99 latency
  • MIG multi-tenant

Large-scale fine-tuning

141 GB allows full-parameter fine-tuning of 70B models on a single card.

  • Full-parameter 70B
  • LoRA / QLoRA
  • NVLink 4.0 for multi-GPU
FAQ

NVIDIA H200, answered.

For anything else, reach help@packet.ai.

What is the NVIDIA H200?

Hopper GPU with 141 GB HBM3e. Same FP8 compute as H100, 1.76× the memory and 1.43× the bandwidth.

When will H200 be available?

Coming soon to packet.ai. Join the waitlist for early access.

Can H200 run 70B models natively?

Yes. 141 GB HBM3e is enough for Llama 3.1 70B at FP16 on a single card, no quantisation needed.

How does H200 compare to H100?

Same FP8 Tensor Cores, 1.76× the memory (141 vs 80 GB), and 1.43× the bandwidth (4.8 vs 3.35 TB/s). Inference throughput is noticeably higher at large batch sizes.

Does H200 support MIG?

Yes. Up to 7 isolated MIG instances for multi-tenant inference.

H200 — coming soon.

141 GB HBM3e. Join the waitlist for early access on packet.ai.

On-demand · hourly billing · US & EU regions

NVIDIA H200from $2.49/GPU-hr Coming soon
Join waitlist →