Stanford Researchers Propose Attention-Free Transformer That Cuts Memory Use by 80%

A new paper from Stanford introduces 'Linear Recurrence Networks' that replace attention mechanisms with linear recurrences, achieving comparable quality with dramatically lower memory requirements.

mujeeburehman0000@gmail.com

Contributor

2 min read Jul 14, 2025

Twitter LinkedIn

Researchers from Stanford University have published a paper introducing Linear Recurrence Networks (LRNs), an architecture that replaces the quadratic-attention mechanism at the core of every modern language model with an O(n) linear recurrence operation.

The result: models that achieve 95% of transformer quality while using 80% less memory during inference and 60% less during training. On a single A100 GPU, the paper demonstrates that a 70B-parameter LRN can process 1M tokens in a single forward pass — something that would require 8 A100s with a standard transformer.

How It Works

LRNs replace the self-attention layer with a structured linear recurrence that maintains a compressed “memory state” as it processes each token. This state grows logarithmically rather than linearly with sequence length, giving the model the ability to reference earlier parts of long contexts without the O(n²) cost.

The key insight is using a “selective recurrence” mechanism that learns which information to keep in the compressed state and which to discard — analogous to how attention learns which tokens to focus on, but with fixed computational cost.

Benchmark Results

On standard language modeling benchmarks, 70B LRN models match 70B transformer baselines within 1-2% on MMLU, HumanEval, and GSM8K. On long-context tasks (>32K tokens), LRNs actually outperform transformers by 3-5%, likely because the recurrence mechanism handles long-range dependencies more naturally.

Open Source

The researchers have released training code, model weights for several sizes (1B, 7B, 13B), and evaluation scripts on GitHub under the Apache 2.0 license. Several AI labs, including Mistral and Nous Research, have already begun experimenting with the architecture.

Architecture Efficiency Research Stanford

Research & Papers · 11mo ago

DeepMind’s AlphaFold 3 Predicts Drug Interactions with 89% Accuracy

Google DeepMind's AlphaFold 3 can now predict how small molecules bind to protein targets, accelerating drug discovery pipelines by years.

mujeeburehman0000@gmail.com

2 min

Research & Papers · 11mo ago

Anthropic Makes Breakthrough in AI Interpretability with Sparse Autoencoders

New research from Anthropic demonstrates that sparse autoencoders can identify specific 'circuits' in large language models, opening a path to understanding how AI systems make decisions.

mujeeburehman0000@gmail.com

2 min

Stanford Researchers Propose Attention-Free Transformer That Cuts Memory Use by 80%

Related Articles

DeepMind’s AlphaFold 3 Predicts Drug Interactions with 89% Accuracy

Anthropic Makes Breakthrough in AI Interpretability with Sparse Autoencoders