Live
The latest in AI — model releases, research breakthroughs, and industry news
Back to all articles

Stanford Researchers Propose Attention-Free Transformer That Cuts Memory Use by 80%

A new paper from Stanford introduces 'Linear Recurrence Networks' that replace attention mechanisms with linear recurrences, achieving comparable quality with dramatically lower memory requirements.

Twitter LinkedIn

Researchers from Stanford University have published a paper introducing Linear Recurrence Networks (LRNs), an architecture that replaces the quadratic-attention mechanism at the core of every modern language model with an O(n) linear recurrence operation.

The result: models that achieve 95% of transformer quality while using 80% less memory during inference and 60% less during training. On a single A100 GPU, the paper demonstrates that a 70B-parameter LRN can process 1M tokens in a single forward pass — something that would require 8 A100s with a standard transformer.

How It Works

LRNs replace the self-attention layer with a structured linear recurrence that maintains a compressed “memory state” as it processes each token. This state grows logarithmically rather than linearly with sequence length, giving the model the ability to reference earlier parts of long contexts without the O(n²) cost.

The key insight is using a “selective recurrence” mechanism that learns which information to keep in the compressed state and which to discard — analogous to how attention learns which tokens to focus on, but with fixed computational cost.

Benchmark Results

On standard language modeling benchmarks, 70B LRN models match 70B transformer baselines within 1-2% on MMLU, HumanEval, and GSM8K. On long-context tasks (>32K tokens), LRNs actually outperform transformers by 3-5%, likely because the recurrence mechanism handles long-range dependencies more naturally.

Open Source

The researchers have released training code, model weights for several sizes (1B, 7B, 13B), and evaluation scripts on GitHub under the Apache 2.0 license. Several AI labs, including Mistral and Nous Research, have already begun experimenting with the architecture.

Architecture Efficiency Research Stanford

Related Articles