The Problem vs. The Solution
Memory O(N) grows forever → KV cache keeps expanding, leads to OOM
Memory O(L·d) stays flat at L=2,048 or 8,192 - you choose
How It Works
y_i = Σ [1 − (i−j)/L] · v_j
y_i = (1/L) × (sum_jv_i − (i − L) × sum_v_i)
w_sum = m − m(m−1)/(2L), y_i = raw / w_sum
L is configurable - smaller L = less memory, larger L = longer memory. Both use same O(1) update.
Memory stays SAME size forever: circular buffer (L slots) + two running sums
Why Hosting Companies Care
Fixed Memory = Higher Density
Cap per-user memory at ~L·d. Pick L=2,048 for ~64MB/user or L=8,192 for ~256MB/user.
Stable Latency = O(d) per token
No quadratic scan. Token 5 or token 500k costs the same — predictable SLOs.
No OOM Crashes
Long chats, agents, and logs can't blow up memory. Eliminate midnight pages.
Built-in Privacy
Auto-forget after L tokens. Perfect for privacy modes and compliance.
Edge Ready
Runs on L4 / CPU with tiny footprint. No custom kernels required.
Cost Math — Example 7B model, d=4096
| Metric | Standard Attention (32k ctx) | Elliott L=2,048 | Elliott L=8,192 |
|---|---|---|---|
| Memory per user | ~8-12 GB (grows) | ~64 MB (flat) | ~256 MB (flat) |
| Concurrent users per 80GB H100 | ~6-8 | ~1,000+ | ~250+ |
| Best use case | short demos | edge VPS, chatbots, AU hosting | premium long-chat, code assistants |
| Latency at token 50k | degrades | constant | constant |
Illustrative numbers – actual depends on precision and implementation
Proven Novelty
Listed as ELLIOTT TRIANGULAR – NEW in Kernel Census 2026. Not equivalent to standard Bartlett.