🚀 B200 bare metal now at $5.6/hr. The best price you'll find. DC in US West → (Access it from Bare metal button on top after login).

Get Your B200 →
Start Building
Platform features

Everything to ship AI products.

From signup to SSH in under 5 minutes. Latest-generation NVIDIA silicon, managed inference, image generation, real-time monitoring, and transparent per-hour billing, wired together in one developer-grade platform.

<5min
Signup to SSH on a live GPU
180GB
Max VRAM available per GPU (B200)
99.9%
Uptime SLA on dedicated GPU services
24/7
Engineer-staffed support · <15 min response
NVIDIA silicon

Latest-generation GPUs, on-demand.

The most powerful AI hardware NVIDIA ships, available by the hour, no waitlist, no contract.

NVIDIA B200
Blackwell
$3.75/hr
HBM3e
180GBVRAM
Flagship Blackwell · large-scale training
  • 2.5× faster than H100
  • Ideal for 70B+ models
  • Production-grade inference
Join waitlist
RTX 6000 Pro
Blackwell
$0.66/hr
GDDR7 ECC
96GBVRAM
Cost-effective Blackwell · dev & production
  • Great for 7B–70B models
  • Professional visualization
  • Dev-to-production workflow
Launch Pro
NVIDIA A100
Ampere
$0.69/hr
HBM2e
80GBVRAM
Large-scale training · distributed workloads
  • Multi-node cluster ready
  • High memory for LLM fine-tuning
  • Stable, dedicated performance
Launch A100
NVIDIA L40S
Ada Lovelace
$0.54/hr
GDDR6
48GBVRAM
Inference-optimized · production workloads
  • High throughput inference
  • Cost-efficient scaling
  • Dedicated server performance
Launch L40S
Deploy your way

SSH, web terminal, or one-click deploy.

Pick the workflow that fits. Every path lands you in the same place: a real GPU with CUDA, Python, and your stack ready to go.

# Connect to your GPU instance
$ssh root@gpu-b200-01.packet.ai
# CUDA, Python, and drivers are pre-installed
$nvidia-smi
NVIDIA B200 · 180GB HBM3e · CUDA 12.8
# Deploy a HuggingFace model in one command
$vllm serve meta-llama/Llama-3.1-70B-Instruct
Raw SSH
Full root access with your SSH key. Ubuntu, CUDA, your stack, exactly how you'd run it locally.
Web Terminal
Browser-based terminal. No client needed. Works from a Chromebook, an iPad, or in a meeting room.
HuggingFace one-click
Pick any model on HuggingFace. We auto-calculate memory, pick the right GPU shape, and serve via vLLM.
Token Factory
Managed inference API. Pay per token. OpenAI-compatible. Swap your base URL, done.
Learn more →
Monitoring

Real-time GPU metrics.

Live utilization, VRAM, temperature, and power draw for every instance. System stats, billing, and activity logs from one dashboard, without installing an agent.

  • GPU utilization
  • VRAM tracking
  • Temperature
  • Power draw
  • CPU & RAM
gpu-b200-01 · us-west-2LIVE
GPU utilization
87%
VRAM
142 / 180 GB
Power
420W / 700W
Temperature
68°C
uptime · 14d 6hlast refresh · just now
Built for developers

Persistent storage. Pre-installed toolchains. Everything you need to ship fast.

Reboot your instance, your files are still there. PyTorch, TensorFlow, vLLM, Jupyter, already installed and tuned for the GPU you launched on.

Persistent storage
Your data survives reboots. Stop pods, resume later with all files intact. Only pay storage cost while stopped, not GPU cost.
NVMe SSDs
High-speed local storage for fast model loading and checkpoint saves.
Shared volumes
Attach persistent volumes to any instance. Store models and datasets separately from compute.
Pre-installed CUDA
Latest drivers and CUDA toolkit ready to go.
Python & ML libraries
PyTorch, TensorFlow, and common ML tools pre-configured.
Docker support
Run containerized workloads with full GPU passthrough.
Jupyter ready
Start notebooks instantly for interactive development.
vLLM optimized
High-performance inference with OpenAI-compatible API.
SSH key management
Manage multiple keys. Auto-inject into new instances.
Starting at
$0.66/ hr
NVIDIA RTX 6000 Pro · 96GB VRAM · dynamic.
No contractsNo minimumsCancel anytimeNo hidden fees
Billing

Transparent, fair pricing.

Pay for what you use. Prepaid wallet with real-time tracking, auto-refill, and early-termination credits. No surprises on month-end invoices.

Hourly billing
No minimums, no daily-rate gimmicks. Pay-per-hour from the second your kernel starts.
Prepaid wallet
Add credit when you want. Spend down as you use. Refund the rest.
Auto-refill
Set a threshold. We top up automatically so a training run never interrupts overnight.
Real-time tracking
See spend update second-by-second. Per-GPU breakdowns, per-project tags, exportable CSVs.
Low-balance alerts
Email, SMS, or webhook. Triggered at the wallet thresholds you set, not generic 80%.
Invoice history
Download monthly statements for accounting. Tax-ready. SOC 2 reconcilable.
Security

Enterprise-grade infrastructure.

Isolated instances, encrypted storage, US and EU datacenters with SOC 2-aligned controls. Per-workload network isolation. AES-256 at rest, TLS 1.3 in transit.

Isolated containers
Per-customer VM/container isolation with dedicated GPU passthrough.
AES-256 encryption
Encryption at rest on all persistent volumes and snapshots.
TLS 1.3
End-to-end encryption in transit, including SSH and API traffic.
99.9% SLA
Backed by service credits on every dedicated tier, in writing.
US & EU datacenters
Tier-3 facilities with N+1 power and dedicated cooling.
Enterprise security
Role-based access, audit logs, SSO via OIDC/SAML.
24/7 monitoring
In-house SRE on call. Anomaly detection on every fleet.
SOC 2 aligned
SOC 2 Type II report available under NDA. ISO 27001 in flight.

Real humans. Fast response.

No chatbots, no ticket queues. Talk directly to infrastructure engineers. 24/7 support with typical response in minutes.

Contact support
Frequently asked

Questions about the platform.

The most common questions we hear on walkthroughs. Don't see yours?Ask us directly →

Which NVIDIA GPUs are available, and which ship the same day?
NVIDIA B200, H200, A100, RTX 6000 Pro, RTX 5090, L40S, and H100 SXM are available on-demand. Single-GPU and small fleets (under 8 GPUs) are typically provisioned in under 5 minutes. Larger clusters and InfiniBand-connected nodes take 1-2 weeks depending on region.
How fast can I actually get a GPU running?
From signup to SSH on a live B200 or H200 in under 5 minutes via the dashboard, CLI, or REST API. No manual approvals, no waitlists. You'll land in an Ubuntu 22.04 environment with CUDA, drivers, PyTorch, vLLM, and Jupyter pre-installed.
Do you support multi-node training with InfiniBand?
Yes. Multi-node clusters with NVIDIA Quantum-2 InfiniBand (400 Gb/s) or RoCEv2 fabric are available for distributed training of large models. NCCL, MPI, and Slurm support out of the box. Available in US East/West and EU regions; sizing from 4 to 128+ nodes.
What does “Token Factory” give me that vLLM-on-a-VM doesn't?
Token Factory is the managed inference layer. OpenAI-compatible endpoint, auto-scaling, model registry, per-token billing, KV-cache reuse, and request batching tuned for the GPU under the hood. Just change your base_url to api.packet.ai/v1. For maximum control, run vLLM yourself on a dedicated B200 instead. Both work.
Does my data persist between sessions?
Yes. Persistent storage survives reboots and instance stops. You only pay storage cost while a pod is stopped, not GPU cost. Local NVMe for hot data; shared volumes for datasets across instances; S3-compatible object storage for archives.
Is there a free tier or trial?
Yes. $10 in free credits on signup, no credit card required. Enough for about 5 hours on an H200 or 15 hours on RTX 6000 Pro. Add wallet funds when you want to keep going; refund the rest when you're done.
What support do you offer in production?
24/7 human-staffed support. P1 incidents get a response in under 15 minutes. Enterprise tier includes Slack Connect with our SRE team, dedicated account engineer, and on-call escalation. No tiered ticket queues; you talk to people who know CUDA.
Are you SOC 2 / GDPR / HIPAA aligned?
SOC 2 Type II report available under NDA from security@packet.ai. GDPR-compliant with a published DPA and Standard Contractual Clauses. HIPAA-eligible workloads via dedicated single-tenant clusters. Contact us to scope a BAA.

Start building. No card required.

Launch a GPU in minutes. No credit card required to explore.