CVE-2026-53923
CWE-200Published: June 22, 2026· Updated: Jun 24, 2026
Official Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Technical Analysis
CVE-2026-53923 can be exploited remotely over the network without requiring physical or adjacent access, significantly expanding the attack surface for threat actors.
The vulnerability requires no privileges and no user interaction, making it a prime target for automated exploitation campaigns and worm-like propagation.
A successful exploit results in complete confidentiality breach (data exposure), with a CVSS base score of 7.5.
From a weakness classification perspective (CWE-200): Information exposure vulnerabilities leak sensitive data to unauthorized actors.
CVSS v3.1 Vector Breakdown
Affected Vendors & Products
Exploit & PoC Resources
Official Patches & Advisories
All References (3)
Quick Facts
Related CVEs (CWE-200)
Recommended Actions
- →Apply vendor patches immediately
- →Monitor CVE-2026-53923 in threat intel feeds
- →Review IDS/IPS signatures for exploitation attempts