NVIDIA Ada Lovelace, 48 GB GDDR6, PCIe Gen4
The NVIDIA L40S is the Ada Lovelace data-centre GPU built for production AI inference. 48 GB GDDR6 memory and 864 GB/s bandwidth, enough to run 13B models at FP16 without quantisation. Available on packet.ai from $0.92/GPU-hour.
Dedicated $0.92/hr · Monthly $604/mo
The L40S combines data-centre reliability with Ada-generation compute for production inference workloads.
4th-gen Tensor Cores with FP8 sparsity support. Efficient inference at datacenter scale.
Enough memory for 13B at FP16 and 34B at 4-bit. Production-class inference without quantisation on common models.
Passive cooling, PCIe Gen4. Designed for dense data-centre racks, not workstations.
Dual NVENC and NVDEC engines for real-time AV1/H.265 video AI pipelines alongside LLM inference.
Dedicated or monthly — plus multi-node clusters.
Full L40S card reserved exclusively for you. Zero noisy-neighbour risk, 99.99% SLA.
Deploy Hourly →Reserved L40S at a flat monthly rate. Full single-tenant isolation, predictable cost.
Deploy Monthly →Scale inference across multiple L40S nodes with InfiniBand interconnect and dedicated NVLink fabric.
48 GB handles 13B at FP16 natively. Low latency, predictable SLA.
Dedicated card means no scheduler interference for real-time API workloads.
Dual NVENC/NVDEC engines alongside LLM inference for video generation and processing.
Ada Lovelace data-centre GPU: 48 GB GDDR6, 864 GB/s, built for production inference and video AI.
$0.92/GPU-hour dedicated, or $604/month flat rate.
13B at FP16 natively, 34B at 4-bit. For 70B+ models, use H200 or B200.
No. PCIe Gen4 only. For NVLink multi-GPU, use H100 SXM or H200.
It can fine-tune up to 13B at FP16. For larger training runs, H100 or B200 is more cost-effective.
SSH-ready in under 5 minutes on Dedicated hourly. Multi-node clusters provision in under 1 hour.
Production-grade inference at $0.92/hr dedicated or $604/mo flat.
On-demand · hourly billing · US & EU regions