Elliott Triangular Kernel v4.2 — Constant-Memory Inference

The Problem vs. The Solution

Standard Growing Memory
Memory O(N) grows forever → KV cache keeps expanding, leads to OOM

Elliott Triangular
Memory O(L·d) stays flat at L=2,048 or 8,192 - you choose

Original y_i = Σ [1 − (i−j)/L] · v_j

Rewrite (O(1) update) y_i = (1/L) × (sum_jv_i − (i − L) × sum_v_i)

Normalized w_sum = m − m(m−1)/(2L), y_i = raw / w_sum

L is configurable - smaller L = less memory, larger L = longer memory. Both use same O(1) update.

Memory stays SAME size forever: circular buffer (L slots) + two running sums

Cap per-user memory at ~L·d. Pick L=2,048 for ~64MB/user or L=8,192 for ~256MB/user.

No quadratic scan. Token 5 or token 500k costs the same — predictable SLOs.

Long chats, agents, and logs can't blow up memory. Eliminate midnight pages.

Auto-forget after L tokens. Perfect for privacy modes and compliance.

Runs on L4 / CPU with tiny footprint. No custom kernels required.

Metric	Standard Attention (32k ctx)	Elliott L=2,048	Elliott L=8,192
Memory per user	~8-12 GB (grows)	~64 MB (flat)	~256 MB (flat)
Concurrent users per 80GB H100	~6-8	~1,000+	~250+
Best use case	short demos	edge VPS, chatbots, AU hosting	premium long-chat, code assistants
Latency at token 50k	degrades	constant	constant

Illustrative numbers – actual depends on precision and implementation

Listed as ELLIOTT TRIANGULAR – NEW in Kernel Census 2026. Not equivalent to standard Bartlett.