In stock · Provisions in ~5 min

NVIDIA Ada Lovelace, 48 GB GDDR6, PCIe Gen4

NVIDIA L40SProduction-grade inference at scale.

The NVIDIA L40S is the Ada Lovelace data-centre GPU built for production AI inference. 48 GB GDDR6 memory and 864 GB/s bandwidth, enough to run 13B models at FP16 without quantisation. Available on packet.ai from $0.92/GPU-hour.

From $0.92/GPU-hr≈ 76% below H100 cost

Dedicated $0.92/hr · Monthly $604/mo

Deploy L40S →See pricing

48GB

GDDR6 memory

864GB/s

Memory bandwidth

91.6TFLOPS

FP32 compute

350W

TDP

Architecture

Ada Lovelace for inference at scale.

The L40S combines data-centre reliability with Ada-generation compute for production inference workloads.

Ada Lovelace Tensor Cores

4th-gen Tensor Cores with FP8 sparsity support. Efficient inference at datacenter scale.

48 GB GDDR6 at 864 GB/s

Enough memory for 13B at FP16 and 34B at 4-bit. Production-class inference without quantisation on common models.

PCIe Gen4 data-centre form factor

Passive cooling, PCIe Gen4. Designed for dense data-centre racks, not workstations.

Hardware video encode

Dual NVENC and NVDEC engines for real-time AV1/H.265 video AI pipelines alongside LLM inference.

Technical specs

NVIDIA L40S specifications.

SpecificationValueGreat for

GPU architecture

NVIDIA Ada Lovelace

Ada Tensor Cores with FP8 for efficient inference.

GPU memory

48 GB GDDR6

13B at FP16 native, 34B at 4-bit.

Memory bandwidth

864 GB/s

Fast enough for token generation without batching tricks.

FP32 compute

91.6 TFLOPS

Strong inference throughput at FP32 and FP16.

Host interface

PCIe Gen4 x16

Data-centre deployment without specialised SXM boards.

Power

350W TDP

High throughput within a 350W envelope.

Pricing

Three ways to run L40S.

Dedicated or monthly — plus multi-node clusters.

Best value

DedicatedHourly · Single-tenant

$0.92 /GPU-hr

Full L40S card reserved exclusively for you. Zero noisy-neighbour risk, 99.99% SLA.

Deploy Hourly →

DedicatedMonthly · Single-tenant

$604 /month

35% off hourly rate

Reserved L40S at a flat monthly rate. Full single-tenant isolation, predictable cost.

Deploy Monthly →

Multi-node Cluster

From 8 GPUs

Scale inference across multiple L40S nodes with InfiniBand interconnect and dedicated NVLink fabric.

8–512 GPUs per cluster
NVLink + InfiniBand
Provisioned in <1 hr

Get a wholesale quote →

Use cases

What the L40S is built for.

Production LLM inference

48 GB handles 13B at FP16 natively. Low latency, predictable SLA.

13B at FP16 native
34B at 4-bit
99.99% SLA

API serving & RAG

Dedicated card means no scheduler interference for real-time API workloads.

Dedicated single-tenant
Predictable p99 latency
Monthly flat billing

Video AI pipelines

Dual NVENC/NVDEC engines alongside LLM inference for video generation and processing.

AV1 hardware encode
Dual NVDEC
Real-time throughput

FAQ

NVIDIA L40S, answered.

For anything else, reach help@packet.ai.

What is the NVIDIA L40S?

Ada Lovelace data-centre GPU: 48 GB GDDR6, 864 GB/s, built for production inference and video AI.

How much does it cost?

$0.92/GPU-hour dedicated, or $604/month flat rate.

What models fit in L40S?

13B at FP16 natively, 34B at 4-bit. For 70B+ models, use H200 or B200.

Does the L40S have NVLink?

No. PCIe Gen4 only. For NVLink multi-GPU, use H100 SXM or H200.

Is the L40S good for training?

It can fine-tune up to 13B at FP16. For larger training runs, H100 or B200 is more cost-effective.

How fast can I deploy L40S?

SSH-ready in under 5 minutes on Dedicated hourly. Multi-node clusters provision in under 1 hour.

Run the L40S. 48 GB Ada Lovelace from $0.92/hr.

Production-grade inference at $0.92/hr dedicated or $604/mo flat.

Deploy L40S →Talk to a human

On-demand · hourly billing · US & EU regions

NVIDIA L40Sfrom $0.92/GPU-hr In stock

Deploy L40S →