🚀 B200 bare metal now at $5.6/hr. The best price you'll find. DC in US West → (Access it from Bare metal button on top after login).

Get Your B200 →
Start Building
Launching soon
Token Factory

OpenAI-compatible inference.
Without the GPU ops.

Token Factory serves open models behind a drop-in, OpenAI-compatible API. Pay per token, scale to zero, and never manage a GPU. Point your existing SDK at packet.ai and ship.

OpenAI-compatible · per-token billing · streaming, JSON mode & function calling
POST · api.packet.ai200 OK
POST/v1/chat/completions
// drop-in OpenAI client
"model": "llama-3.3-70b",
"stream": true
streaming tokens · 38 ms TTFT
Drop-in
OpenAI-compatible API
Per-token
pay only for what you use
Scale to 0
no idle GPU cost
US & EU
data residency
The API

Change one line. Keep your code.

Token Factory speaks the OpenAI API. Swap the base URL and key — your existing SDKs, prompts, and tooling just work.

# pip install openai — point it at packet.ai
 
from openai import OpenAI
 
client = OpenAI(
base_url="https://api.packet.ai/v1",
api_key="$PACKET_API_KEY",
)
 
resp = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Hi!"}],
stream=True,
)
Model catalog

Open models, served and ready.

A curated catalog of the best open models, hosted and continuously updated. Indicative launch pricing per 1M tokens.

ModelContextTypeFrom / 1M tok
Llama 3.3 70BMeta · instruct
128K
Chat
$0.59 Popular
Llama 3.1 8BMeta · instruct
128K
Chat
$0.06
Qwen2.5 72BAlibaba · instruct
128K
Chat
$0.62
DeepSeek-V3DeepSeek · MoE
64K
Chat
$0.85 New
Mistral Small 3Mistral · instruct
32K
Chat
$0.18
BGE-M3embeddings
8K
Embed
$0.02
Prices indicative for launch · input/output metered separately · your own fine-tunes hostable on request.
Why Token Factory

Everything the SDK expects and more.

Drop-in compatible

Full /v1/chat/completions and /v1/embeddings surface. Change base_url and key — keep every line of your app.

Streaming & tools

SSE token streaming, JSON mode, and function/tool calling, exactly as your client already uses them.

Autoscale & scale-to-zero

Capacity follows your traffic with no cold-start tax. Idle costs nothing. You pay per token, not per hour.

Per-token billing

Input and output tokens metered separately. No minimums, no platform fee, no egress surprise.

Bring your fine-tune

Host your own LoRA or full checkpoint behind the same OpenAI-compatible endpoint.

Private & compliant

We never train on your data. DPA, audit support, and EU data residency available.

Built for

Built for production LLM apps.

Chatbots & assistants

Low-latency streaming chat for customer-facing products.

  • Token streaming
  • Function calling
  • JSON mode

RAG & search

Embeddings plus generation behind one billing relationship.

  • Embedding models
  • Long context
  • Cheap 8B tier

Agents & batch

Tool-using agents and high-volume batch classification.

  • Tool calls
  • Scale to zero
  • No rate-limit walls
How it works

Three lines to your first token.

01

Get an API key

Create a key in the dashboard. No GPU quota requests, no infrastructure to provision.

02

Point your SDK

Set base_url to api.packet.ai/v1 and reuse your existing OpenAI client and prompts.

03

Ship & scale

Traffic autoscales behind the endpoint. You only pay for the tokens you actually use.

FAQ

Token Factory, answered.

For anything not here, reach help@packet.ai.

Explore more: Dynamic GPU, Dedicated GPU, GPU Clusters, Pixel Factory, and Use Cases.

What is Token Factory?
Managed, OpenAI-compatible inference for open models. You call an endpoint; packet.ai runs and scales the GPUs behind it. No servers, no cold-start management.
Is it really OpenAI-compatible?
Yes — /v1/chat/completions and /v1/embeddings, with streaming, JSON mode, and function calling. Change your base_url and key and your existing client works unchanged.
Which models are available?
A curated catalog including Llama 3.3, Qwen2.5, DeepSeek-V3, and Mistral, plus embedding models. You can also host your own fine-tune behind the same API.
How is it priced?
Per token, with input and output metered separately. No minimums and scale-to-zero, so idle traffic costs nothing. Launch pricing starts at $0.06/1M tokens.
When does it launch?
Token Factory is in private preview. Join the waitlist and we'll send your API key as access opens up.

Inference without
the infrastructure.

Token Factory is launching soon. Join the waitlist for early access and launch-pricing credits.

View docs
Launching soon · OpenAI-compatible · per-token billing