Launching soon

Token Factory

OpenAI-compatible inference.
Without the GPU ops.

Q: Is it really OpenAI-compatible?

Yes — /v1/chat/completions and /v1/embeddings, with streaming, JSON mode, and function calling. Change your base_url and key and your existing client works unchanged.

Q: When does it launch?

Token Factory is in private preview. Join the waitlist and we’ll send your API key as access opens up.

Token Factory serves open models behind a drop-in, OpenAI-compatible API. Pay per token, scale to zero, and never manage a GPU. Point your existing SDK at packet.ai and ship.

OpenAI-compatible · per-token billing · streaming, JSON mode & function calling

POST · api.packet.ai200 OK

POST/v1/chat/completions

// drop-in OpenAI client

"model": "llama-3.3-70b",

"stream": true

streaming tokens · 38 ms TTFT

Drop-in

OpenAI-compatible API

Per-token

pay only for what you use

Scale to 0

no idle GPU cost

US & EU

data residency

The API

Change one line. Keep your code.

Token Factory speaks the OpenAI API. Swap the base URL and key — your existing SDKs, prompts, and tooling just work.

# pip install openai — point it at packet.ai
 
from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.packet.ai/v1",
    api_key="$PACKET_API_KEY",
)
 
resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hi!"}],
    stream=True,
)

# any HTTP client works
curl https://api.packet.ai/v1/chat/completions \
  -H "Authorization: Bearer $PACKET_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role":"user","content":"Hi!"}],
    "stream": true
  }'

// npm i openai
import OpenAI from "openai";
 
const client = new OpenAI({
  baseURL: "https://api.packet.ai/v1",
  apiKey: process.env.PACKET_API_KEY,
});
 
const stream = await client.chat.completions.create({
  model: "llama-3.3-70b", stream: true,
  messages: [{ role: "user", content: "Hi!" }],
});

Model catalog

Open models, served and ready.

A curated catalog of the best open models, hosted and continuously updated. Indicative launch pricing per 1M tokens.

ModelContextTypeFrom / 1M tok

Llama 3.3 70BMeta · instruct

128K

Chat

$0.59 Popular

Llama 3.1 8BMeta · instruct

128K

Chat

$0.06

Qwen2.5 72BAlibaba · instruct

128K

Chat

$0.62

DeepSeek-V3DeepSeek · MoE

64K

Chat

$0.85 New

Mistral Small 3Mistral · instruct

32K

Chat

$0.18

BGE-M3embeddings

Embed

$0.02

Prices indicative for launch · input/output metered separately · your own fine-tunes hostable on request.

Why Token Factory

Everything the SDK expects and more.

Drop-in compatible

Full /v1/chat/completions and /v1/embeddings surface. Change base_url and key — keep every line of your app.

Streaming & tools

SSE token streaming, JSON mode, and function/tool calling, exactly as your client already uses them.

Autoscale & scale-to-zero

Capacity follows your traffic with no cold-start tax. Idle costs nothing. You pay per token, not per hour.

Per-token billing

Input and output tokens metered separately. No minimums, no platform fee, no egress surprise.

Bring your fine-tune

Host your own LoRA or full checkpoint behind the same OpenAI-compatible endpoint.

Private & compliant

We never train on your data. DPA, audit support, and EU data residency available.

Built for

Built for production LLM apps.

Chatbots & assistants

Low-latency streaming chat for customer-facing products.

Token streaming
Function calling
JSON mode

RAG & search

Embeddings plus generation behind one billing relationship.

Embedding models
Long context
Cheap 8B tier

Agents & batch

Tool-using agents and high-volume batch classification.

Tool calls
Scale to zero
No rate-limit walls

How it works

Three lines to your first token.

Get an API key

Create a key in the dashboard. No GPU quota requests, no infrastructure to provision.

Point your SDK

Set base_url to api.packet.ai/v1 and reuse your existing OpenAI client and prompts.

Ship & scale

Traffic autoscales behind the endpoint. You only pay for the tokens you actually use.

FAQ

Token Factory, answered.

For anything not here, reach help@packet.ai.

Explore more: Dynamic GPU, Dedicated GPU, GPU Clusters, Pixel Factory, and Use Cases.

What is Token Factory?

Managed, OpenAI-compatible inference for open models. You call an endpoint; packet.ai runs and scales the GPUs behind it. No servers, no cold-start management.

Is it really OpenAI-compatible?

Yes — /v1/chat/completions and /v1/embeddings, with streaming, JSON mode, and function calling. Change your base_url and key and your existing client works unchanged.

Which models are available?

A curated catalog including Llama 3.3, Qwen2.5, DeepSeek-V3, and Mistral, plus embedding models. You can also host your own fine-tune behind the same API.

How is it priced?

Per token, with input and output metered separately. No minimums and scale-to-zero, so idle traffic costs nothing. Launch pricing starts at $0.06/1M tokens.

When does it launch?

Token Factory is in private preview. Join the waitlist and we'll send your API key as access opens up.

Inference without
the infrastructure.

Token Factory is launching soon. Join the waitlist for early access and launch-pricing credits.

View docs

Launching soon · OpenAI-compatible · per-token billing

OpenAI-compatible inference.Without the GPU ops.

Change one line. Keep your code.

Open models, served and ready.

Everything the SDK expects and more.

Drop-in compatible

Streaming & tools

Autoscale & scale-to-zero

Per-token billing

Bring your fine-tune

Private & compliant

Built for production LLM apps.

Chatbots & assistants

RAG & search

Agents & batch

Three lines to your first token.

Get an API key

Point your SDK

Ship & scale

Token Factory, answered.

Inference withoutthe infrastructure.

OpenAI-compatible inference.
Without the GPU ops.

Inference without
the infrastructure.