🚀 B200 bare metal now at $5.6/hr. The best price you'll find. DC in US West → (Access it from Bare metal button on top after login).
Get Your B200 →Token Factory serves open models behind a drop-in, OpenAI-compatible API. Pay per token, scale to zero, and never manage a GPU. Point your existing SDK at packet.ai and ship.
Token Factory speaks the OpenAI API. Swap the base URL and key — your existing SDKs, prompts, and tooling just work.
# pip install openai — point it at packet.aifrom openai import OpenAIclient = OpenAI(base_url="https://api.packet.ai/v1",api_key="$PACKET_API_KEY",)resp = client.chat.completions.create(model="llama-3.3-70b",messages=[{"role": "user", "content": "Hi!"}],stream=True,)
# any HTTP client workscurl https://api.packet.ai/v1/chat/completions \-H "Authorization: Bearer $PACKET_API_KEY" \-H "Content-Type: application/json" \-d '{"model": "llama-3.3-70b","messages": [{"role":"user","content":"Hi!"}],"stream": true}'
// npm i openaiimport OpenAI from "openai";const client = new OpenAI({baseURL: "https://api.packet.ai/v1",apiKey: process.env.PACKET_API_KEY,});const stream = await client.chat.completions.create({model: "llama-3.3-70b", stream: true,messages: [{ role: "user", content: "Hi!" }],});
A curated catalog of the best open models, hosted and continuously updated. Indicative launch pricing per 1M tokens.
Full /v1/chat/completions and /v1/embeddings surface. Change base_url and key — keep every line of your app.
SSE token streaming, JSON mode, and function/tool calling, exactly as your client already uses them.
Capacity follows your traffic with no cold-start tax. Idle costs nothing. You pay per token, not per hour.
Input and output tokens metered separately. No minimums, no platform fee, no egress surprise.
Host your own LoRA or full checkpoint behind the same OpenAI-compatible endpoint.
We never train on your data. DPA, audit support, and EU data residency available.
Low-latency streaming chat for customer-facing products.
Embeddings plus generation behind one billing relationship.
Tool-using agents and high-volume batch classification.
Create a key in the dashboard. No GPU quota requests, no infrastructure to provision.
Set base_url to api.packet.ai/v1 and reuse your existing OpenAI client and prompts.
Traffic autoscales behind the endpoint. You only pay for the tokens you actually use.
For anything not here, reach help@packet.ai.
Explore more: Dynamic GPU, Dedicated GPU, GPU Clusters, Pixel Factory, and Use Cases.
base_url and key and your existing client works unchanged.Token Factory is launching soon. Join the waitlist for early access and launch-pricing credits.
Get early access to Token Factory and launch-pricing credits. We'll only email you about access.
We'll email your Token Factory API key as early access opens — along with launch-pricing credits.
