Coming soon — join waitlist

NVIDIA Hopper, 141 GB HBM3e, SXM

NVIDIA H200141 GB HBM3e. Inference at scale.

The NVIDIA H200 is the Hopper GPU with 141 GB HBM3e memory, 1.76× the capacity of H100. It runs 70B models at FP16 natively and delivers 4.8 TB/s of bandwidth for the fastest token generation at scale. Coming soon to packet.ai.

From $2.49/GPU-hrComing soon

Pricing to be confirmed at launch

Join waitlist →See pricing

141GB

HBM3e memory

4.8TB/s

Memory bandwidth

1979TFLOPS

FP8 compute

900GB/s

NVLink bandwidth

Architecture

Hopper HBM3e — inference at scale.

The H200 is the H100 with 141 GB HBM3e memory — the same Hopper compute engine, 1.76× the capacity and 1.43× the bandwidth.

GH100 Hopper die

Same 80-billion-transistor die as the H100. 4th-gen Tensor Cores, MIG support, NVLink 4.0, with HBM3e delivering 1.4× the bandwidth.

141 GB HBM3e at 4.8 TB/s

76% more memory than H100. Long-context LLMs, multi-modal pipelines, and large-batch training fit without sharding.

4th-gen Transformer Engine

FP8 training and inference with per-tensor scaling for up to 4× speedup over FP16 on Hopper.

NVLink 4.0 at 900 GB/s

Scale across NVSwitch-connected nodes for large training runs where gradient communication is the bottleneck.

Technical specs

NVIDIA H200 specifications.

SpecificationValueGreat for

GPU architecture

NVIDIA Hopper

FP8 Tensor Cores — same compute as H100.

GPU memory

141 GB HBM3e

70B at FP16 native. 1.76× H100.

Memory bandwidth

4.8 TB/s

Fastest token generation per GPU outside Blackwell.

FP8 compute

1979 TFLOPS

Same as H100 SXM. All the speed, more memory.

NVLink

4.0 · 900 GB/s

Multi-GPU scale when single-card is not enough.

MIG

Up to 7 instances

Multi-tenant serving with 141 GB headroom.

Pricing

Ways to run H200.

Dedicated or monthly — plus multi-node clusters.

Coming soon

DedicatedHourly · Single-tenant

$2.49 /GPU-hr

Full H200 card reserved exclusively for you. 99.99% SLA, zero noisy-neighbour risk.

Join waitlist →

DedicatedMonthly · Single-tenant

TBC /month

Reserved H200 at a flat monthly rate. Full single-tenant isolation, predictable cost exclusively for you. 99.99% SLA, zero noisy-neighbour risk.

Get a wholesale quote →

Multi-node Cluster

From 8 GPUs

Scale frontier inference across multiple H200 nodes with NVLink 4.0 and InfiniBand interconnect.

8–512 GPUs per cluster
NVLink + InfiniBand
Provisioned in <1 hr

Get a wholesale quote →

Use cases

What the H200 is built for.

70B inference at FP16

141 GB lets you run 70B models at FP16 natively. No quantisation, no model sharding.

70B at FP16 native
No quantisation needed
Single-GPU deployment

High-throughput API serving

4.8 TB/s bandwidth delivers the fastest token generation per GPU outside Blackwell.

4.8 TB/s HBM3e
Low p99 latency
MIG multi-tenant

Large-scale fine-tuning

141 GB allows full-parameter fine-tuning of 70B models on a single card.

Full-parameter 70B
LoRA / QLoRA
NVLink 4.0 for multi-GPU

FAQ

NVIDIA H200, answered.

For anything else, reach help@packet.ai.

What is the NVIDIA H200?

Hopper GPU with 141 GB HBM3e. Same FP8 compute as H100, 1.76× the memory and 1.43× the bandwidth.

When will H200 be available?

Coming soon to packet.ai. Join the waitlist for early access.

Can H200 run 70B models natively?

Yes. 141 GB HBM3e is enough for Llama 3.1 70B at FP16 on a single card, no quantisation needed.

How does H200 compare to H100?

Same FP8 Tensor Cores, 1.76× the memory (141 vs 80 GB), and 1.43× the bandwidth (4.8 vs 3.35 TB/s). Inference throughput is noticeably higher at large batch sizes.

Does H200 support MIG?

Yes. Up to 7 isolated MIG instances for multi-tenant inference.

H200 — coming soon.

141 GB HBM3e. Join the waitlist for early access on packet.ai.

Join waitlist →Talk to a human

On-demand · hourly billing · US & EU regions

NVIDIA H200from $2.49/GPU-hr Coming soon

Join waitlist →