Start Building
In stock · Provisions in ~5 min

NVIDIA Ada Lovelace, 48 GB GDDR6, PCIe Gen4

NVIDIA L40SProduction-grade inference at scale.

The NVIDIA L40S is the Ada Lovelace data-centre GPU built for production AI inference. 48 GB GDDR6 memory and 864 GB/s bandwidth, enough to run 13B models at FP16 without quantisation. Available on packet.ai from $0.92/GPU-hour.

From $0.92/GPU-hr≈ 76% below H100 cost

Dedicated $0.92/hr · Monthly $604/mo

48GB
GDDR6 memory
864GB/s
Memory bandwidth
91.6TFLOPS
FP32 compute
350W
TDP
Architecture

Ada Lovelace for inference at scale.

The L40S combines data-centre reliability with Ada-generation compute for production inference workloads.

Ada Lovelace Tensor Cores

4th-gen Tensor Cores with FP8 sparsity support. Efficient inference at datacenter scale.

48 GB GDDR6 at 864 GB/s

Enough memory for 13B at FP16 and 34B at 4-bit. Production-class inference without quantisation on common models.

PCIe Gen4 data-centre form factor

Passive cooling, PCIe Gen4. Designed for dense data-centre racks, not workstations.

Hardware video encode

Dual NVENC and NVDEC engines for real-time AV1/H.265 video AI pipelines alongside LLM inference.

Technical specs

NVIDIA L40S specifications.

SpecificationValueGreat for
GPU architecture
NVIDIA Ada Lovelace
Ada Tensor Cores with FP8 for efficient inference.
GPU memory
48 GB GDDR6
13B at FP16 native, 34B at 4-bit.
Memory bandwidth
864 GB/s
Fast enough for token generation without batching tricks.
FP32 compute
91.6 TFLOPS
Strong inference throughput at FP32 and FP16.
Host interface
PCIe Gen4 x16
Data-centre deployment without specialised SXM boards.
Power
350W TDP
High throughput within a 350W envelope.
Pricing

Three ways to run L40S.

Dedicated or monthly — plus multi-node clusters.

DedicatedMonthly · Single-tenant
$604 /month
35% off hourly rate

Reserved L40S at a flat monthly rate. Full single-tenant isolation, predictable cost.

Deploy Monthly →
Multi-node Cluster
From 8 GPUs

Scale inference across multiple L40S nodes with InfiniBand interconnect and dedicated NVLink fabric.

  • 8–512 GPUs per cluster
  • NVLink + InfiniBand
  • Provisioned in <1 hr
Get a wholesale quote →
Use cases

What the L40S is built for.

Production LLM inference

48 GB handles 13B at FP16 natively. Low latency, predictable SLA.

  • 13B at FP16 native
  • 34B at 4-bit
  • 99.99% SLA

API serving & RAG

Dedicated card means no scheduler interference for real-time API workloads.

  • Dedicated single-tenant
  • Predictable p99 latency
  • Monthly flat billing

Video AI pipelines

Dual NVENC/NVDEC engines alongside LLM inference for video generation and processing.

  • AV1 hardware encode
  • Dual NVDEC
  • Real-time throughput
FAQ

NVIDIA L40S, answered.

For anything else, reach help@packet.ai.

What is the NVIDIA L40S?

Ada Lovelace data-centre GPU: 48 GB GDDR6, 864 GB/s, built for production inference and video AI.

How much does it cost?

$0.92/GPU-hour dedicated, or $604/month flat rate.

What models fit in L40S?

13B at FP16 natively, 34B at 4-bit. For 70B+ models, use H200 or B200.

Does the L40S have NVLink?

No. PCIe Gen4 only. For NVLink multi-GPU, use H100 SXM or H200.

Is the L40S good for training?

It can fine-tune up to 13B at FP16. For larger training runs, H100 or B200 is more cost-effective.

How fast can I deploy L40S?

SSH-ready in under 5 minutes on Dedicated hourly. Multi-node clusters provision in under 1 hour.

Run the L40S. 48 GB Ada Lovelace from $0.92/hr.

Production-grade inference at $0.92/hr dedicated or $604/mo flat.

On-demand · hourly billing · US & EU regions

NVIDIA L40Sfrom $0.92/GPU-hr In stock
Deploy L40S →