Skip to content

Memory Model

Valkey GLIDE clients have a memory profile shaped by the shared Rust core that sits underneath every language binding. This page describes where that memory lives, what is (and is not) configurable, and how to size your process and container accordingly.

A GLIDE process has three memory regions that contribute to its overall footprint:

RegionOwnerTypical contents
Language runtimeJVM heap / Python interpreter / V8 heap / Go runtime / etc.Request and response objects, your application code, client wrapper state
Rust coreRust allocator, reported as native/off-heap by the language runtimeTokio runtime, connection buffers, cluster topology, in-flight command state
OS / shared librariesKernel + loaderThread stacks, loaded native libraries (glide-core, OpenSSL, etc.), TCP kernel buffers

All three show up in the operating system’s resident set size (RSS) for the process. In a container, RSS is what counts toward your memory limit — not just the language-runtime heap.

The Rust core is shared across every GLIDE client in the same process — it is initialised lazily on the first client creation and re-used for every subsequent client. Opening a second GlideClient in the same JVM/interpreter does not multiply the Rust-runtime cost.

What the core contains at idle:

  • A single Tokio async runtime with a small default worker pool (see below) and 2 MB thread stacks.
  • One TCP connection per Valkey node — cluster-mode-disabled setups use a single connection; cluster-mode-enabled scales with shard count.
  • A small native buffer registry that grows on demand when commands carry large payloads, and shrinks when those buffers are released.

Under steady load, the Rust core also holds:

  • Serialized requests in flight and deserialized responses on the return path.
  • Cluster topology and slot map (cluster mode only).
  • Pub/Sub messages queued in an unbounded push notification channel (tokio::sync::mpsc::unbounded_channel). These accumulate depending on publisher rate and the number of subscribed channels — if the application consumes messages slower than they arrive, this buffer grows without limit.

GLIDE’s Java client does not use JVM NIO direct (off-heap) buffers for network I/O. All socket reads and writes are performed in the Rust core; responses are passed as a native pointer across JNI and converted into JVM heap objects (Strings, byte arrays, maps) on the Java side.

Practical consequence: you do not need to tune -XX:MaxDirectMemorySize for GLIDE. The JVM’s direct-buffer pool stays effectively empty regardless of concurrency or payload size. This contrasts with Netty-based clients (e.g. Lettuce), which reserve direct buffers per channel for reads and writes.

The default JVM heap size is usually adequate. Large values and high concurrency pressure the heap through allocation of response objects — the -Xmx you would use for any similar workload is the right starting point.

GLIDE deliberately exposes a small set of memory-relevant configuration, keeping defaults that suit a wide range of workloads.

  • inflightRequestsLimit — maximum concurrent in-flight requests per client. The cap exists precisely to bound queuing memory. Default is 1000.
  • requestTimeout — caps how long commands (and their associated buffers) can sit in flight before being released with an error.

GLIDE does not surface configuration for the Rust runtime’s thread pool, the connection pool size (there is one connection per node — managed by the core), or internal buffer sizes. These are tuned for the general case in the Rust core and are not user-facing knobs.

GLIDE exposes one in-process statistic and relies on your runtime / OS for the rest.

From the client: use getStatistics() to read the number of active connections and clients — useful for confirming that multiple GlideClient instances in the same process are sharing the Rust runtime as expected.

From the runtime:

// JVM heap
Runtime.getRuntime().totalMemory();
// Direct buffer pool (empty for GLIDE)
for (BufferPoolMXBean b :
ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)) {
System.out.println(b.getName() + ": " + b.getMemoryUsed());
}
// Enable NMT at startup with -XX:NativeMemoryTracking=summary
// then: jcmd <pid> VM.native_memory summary

From the OS: on Linux, cat /proc/$PID/status | grep VmRSS gives the authoritative resident set size. In Kubernetes / ECS / Fargate, the container runtime reports this as the memory metric your limits apply to.

A safe starting point:

  1. Measure peak runtime heap under your expected workload.
  2. Add headroom for the Rust core. A few tens of megabytes is typical for single-shard workloads; cluster mode adds roughly one connection’s worth per shard.
  3. Leave kernel / TCP buffer headroom (a few MB).
  4. Set the container limit above the sum. Don’t set -Xmx to the full container limit — the Rust core needs space too.

Then measure in your environment (peak RSS over representative traffic) and tighten from there.

Because GLIDE’s Rust core allocates outside the language runtime’s managed heap, you need to leave enough container memory for native allocations. Setting the heap limit too high starves the Rust core; setting it too low wastes capacity.

Key points:

  • The native share scales with the number of clients (each adds connections and inflight tracking) and with value sizes (larger payloads mean larger native buffers in transit).
  • Pub/Sub subscriptions add unbounded queue memory on the native side — factor this in if you have high-volume subscriptions.
  • These ratios are starting points. Profile your workload with -XX:NativeMemoryTracking=summary and compare jcmd VM.native_memory against heap usage to find the right balance for your scenario.