Memory Model

Valkey GLIDE clients have a memory profile shaped by the shared Rust core that sits underneath every language binding. This page describes where that memory lives, what is (and is not) configurable, and how to size your process and container accordingly.

Where GLIDE Memory Lives

A GLIDE process has three memory regions that contribute to its overall footprint:

Region	Owner	Typical contents
Language runtime	JVM heap / Python interpreter / V8 heap / Go runtime / etc.	Request and response objects, your application code, client wrapper state
Rust core	Rust allocator, reported as native/off-heap by the language runtime	Tokio runtime, connection buffers, cluster topology, in-flight command state
OS / shared libraries	Kernel + loader	Thread stacks, loaded native libraries (glide-core, OpenSSL, etc.), TCP kernel buffers

All three show up in the operating system’s resident set size (RSS) for the process. In a container, RSS is what counts toward your memory limit — not just the language-runtime heap.

What the Rust Core Costs

The Rust core is shared across every GLIDE client in the same process — it is initialised lazily on the first client creation and re-used for every subsequent client. Opening a second GlideClient in the same JVM/interpreter does not multiply the Rust-runtime cost.

What the core contains at idle:

A single Tokio async runtime with a small default worker pool (see below) and 2 MB thread stacks.
One TCP connection per Valkey node — cluster-mode-disabled setups use a single connection; cluster-mode-enabled scales with shard count.
A small native buffer registry that grows on demand when commands carry large payloads, and shrinks when those buffers are released.

Under steady load, the Rust core also holds:

Serialized requests in flight and deserialized responses on the return path.
Cluster topology and slot map (cluster mode only).
Pub/Sub messages queued in an unbounded push notification channel (tokio::sync::mpsc::unbounded_channel). These accumulate depending on publisher rate and the number of subscribed channels — if the application consumes messages slower than they arrive, this buffer grows without limit.

Language-Specific Notes

GLIDE’s Java client does not use JVM NIO direct (off-heap) buffers for network I/O. All socket reads and writes are performed in the Rust core; responses are passed as a native pointer across JNI and converted into JVM heap objects (Strings, byte arrays, maps) on the Java side.

Practical consequence: you do not need to tune -XX:MaxDirectMemorySize for GLIDE. The JVM’s direct-buffer pool stays effectively empty regardless of concurrency or payload size. This contrasts with Netty-based clients (e.g. Lettuce), which reserve direct buffers per channel for reads and writes.

The default JVM heap size is usually adequate. Large values and high concurrency pressure the heap through allocation of response objects — the -Xmx you would use for any similar workload is the right starting point.

The Rust core’s native memory is visible in process RSS but not reported by sys.getsizeof or the tracemalloc module — those inspect Python objects only. To observe total footprint use OS-level tools (ps, /proc/self/status, container memory metrics, or psutil).

Python response objects (bytes, str, lists for multi-value replies) live on the CPython heap and are subject to normal reference-counted reclamation.

The Rust core allocates outside the V8 heap; V8’s process.memoryUsage() reports rss (which includes the native side) and heapUsed (which does not). Use rss when budgeting for a container. The Node client uses N-API to cross the boundary; JavaScript response objects live in the V8 heap and are GC-managed.

The Rust core is invoked through CGo. Memory allocated by the Rust core (via its own allocator) is not visible in runtime.MemStats — those stats only track Go’s own runtime allocations. Container sizing should be based on OS-reported RSS, not MemStats.Sys.

The Rust core’s allocations are outside the managed heap and will not be reported by GC.GetTotalMemory. Process-level counters (Process.WorkingSet64, OS-level RSS) reflect the true footprint.

GLIDE PHP is a C extension built against the Zend Engine (PHP’s runtime). PHP objects — including associative arrays returned by commands — are allocated via PHP’s internal allocator (emalloc), while the Rust core uses its own allocator. Both reside in native (process) memory; there is no separate managed heap. However, memory_get_usage() only reports emalloc-tracked allocations and does not include the Rust core’s memory. Use memory_get_usage(true) or process-level RSS for total memory sizing.

The synchronous blocking API (ClientType::SyncClient) means at most one command is in flight per client at a time, which bounds per-client Rust-side buffer use. However, Pub/Sub messages are received asynchronously by the Rust core’s push notification handler and queued into an unbounded linked list on the C side. If the PHP process does not consume messages promptly, this queue grows without limit.

Tuning Knobs

GLIDE deliberately exposes a small set of memory-relevant configuration, keeping defaults that suit a wide range of workloads.

Exposed in every binding

inflightRequestsLimit — maximum concurrent in-flight requests per client. The cap exists precisely to bound queuing memory. Default is 1000.
requestTimeout — caps how long commands (and their associated buffers) can sit in flight before being released with an error.

Not exposed (intentionally)

GLIDE does not surface configuration for the Rust runtime’s thread pool, the connection pool size (there is one connection per node — managed by the core), or internal buffer sizes. These are tuned for the general case in the Rust core and are not user-facing knobs.

Observing Memory Usage

GLIDE exposes one in-process statistic and relies on your runtime / OS for the rest.

From the client: use getStatistics() to read the number of active connections and clients — useful for confirming that multiple GlideClient instances in the same process are sharing the Rust runtime as expected.

From the runtime:

// JVM heap
Runtime.getRuntime().totalMemory();
// Direct buffer pool (empty for GLIDE)
for (BufferPoolMXBean b :
    ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)) {
    System.out.println(b.getName() + ": " + b.getMemoryUsed());
}
// Enable NMT at startup with -XX:NativeMemoryTracking=summary
// then: jcmd <pid> VM.native_memory summary

import psutil, os
rss_mb = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024

const { rss, heapUsed, external } = process.memoryUsage();

var m runtime.MemStats
runtime.ReadMemStats(&m)
// native side ≈ m.Sys - m.HeapSys (rough, includes Go runtime overhead too)

using System.Diagnostics;
long rss = Process.GetCurrentProcess().WorkingSet64;

// emalloc-tracked memory only (excludes Rust core)
$phpMemory = memory_get_usage();
// Real allocated pages (closer to RSS, includes Rust core)
$realMemory = memory_get_usage(true);
// Peak real allocation
$peakMemory = memory_get_peak_usage(true);

From the OS: on Linux, cat /proc/$PID/status | grep VmRSS gives the authoritative resident set size. In Kubernetes / ECS / Fargate, the container runtime reports this as the memory metric your limits apply to.

Container Sizing Recommendation

A safe starting point:

Measure peak runtime heap under your expected workload.
Add headroom for the Rust core. A few tens of megabytes is typical for single-shard workloads; cluster mode adds roughly one connection’s worth per shard.
Leave kernel / TCP buffer headroom (a few MB).
Set the container limit above the sum. Don’t set -Xmx to the full container limit — the Rust core needs space too.

Then measure in your environment (peak RSS over representative traffic) and tighten from there.

Heap vs. Native Split

Because GLIDE’s Rust core allocates outside the language runtime’s managed heap, you need to leave enough container memory for native allocations. Setting the heap limit too high starves the Rust core; setting it too low wastes capacity.

Key points:

The native share scales with the number of clients (each adds connections and inflight tracking) and with value sizes (larger payloads mean larger native buffers in transit).
Pub/Sub subscriptions add unbounded queue memory on the native side — factor this in if you have high-volume subscriptions.
These ratios are starting points. Profile your workload with -XX:NativeMemoryTracking=summary and compare jcmd VM.native_memory against heap usage to find the right balance for your scenario.

Connection Management — when to use one client vs many.
Limit In-Flight Requests — the primary knob for bounding queued memory.
Tracking Resources — the getStatistics() API.
The Rust Core — architectural context for the regions described above.