AlgorithmMLOpen SourceRecommendation Systems

X For You Feed Algorithm
A Deep Dive from Scratch

May 16, 2026~20 min readxai-org/x-algorithm · May 15, 2026

Every time you open X, a complex multi-stage pipeline runs to decide what appears in your For You feed. On January 20, 2026 xAI first open-sourced the algorithm at github.com/xai-org/x-algorithm, then published a major update on May 15, 2026 — announced by Elon Musk with 30M+ views. This article walks through every component of that latest release — architecture, ML models, filtering logic — with interactive visualizations and real source code.

Big Picture#

Your For You feed is built by three co-operating services, all orchestrated by a central Home Mixer. The pipeline runs entirely in Rust (core pipeline + Thunder) and Python/JAX (Phoenix ML models), with gRPC connecting each service. The diagram below shows the real-time data flow on every feed request.

Live Architecture

Your Request
Home Mixer
Thunder
Phoenix ML
Grox
Ranked Feed

Hover over a service to learn its role · All written in Rust (pipeline) and Python/JAX (ML)

Info

The entire pipeline executes on every single feed request — typically in under 150ms. Parallelism is used aggressively at every stage where dependencies allow.

The Four Core Components#

Before diving into each pipeline stage, here is a quick map of which service does what:

ComponentLanguageJob
Home MixerRustOrchestrates every pipeline stage — the conductor
ThunderRustIn-memory real-time store of posts from accounts you follow
PhoenixPython / JAXML retrieval + transformer ranking (Grok-based)
GroxPythonContent understanding — spam, safety, topic classification

Pipeline Walkthrough#

The CandidatePipeline trait in Rust defines the exact execution order across eight distinct stages. Use the interactive explorer below to walk through each step — you will see the source code, the logic, and why each stage exists.

rust
// candidate-pipeline/candidate_pipeline.rs
async fn execute(&self, query: Q) -> PipelineResult<Q, C> {
    let hydrated_query = self.hydrate_query(query).await;
    let candidates    = self.fetch_candidates(&hydrated_query).await;
    let hydrated      = self.hydrate(&hydrated_query, candidates).await;
    let (kept, _)     = self.filter(&hydrated_query, hydrated);
    let scored        = self.score(&hydrated_query, kept).await;
    let selected      = self.select(&hydrated_query, scored);
    // ... post-selection hydration, filters, side effects
}
Step 1 of 8

Query Hydration

Load everything about you before fetching a single post.

What happens

  • Your engagement history — likes, replies, reposts, dwell time
  • Accounts you follow, mute, and block
  • Topics you subscribe to
  • Your IP address for geo-aware ranking
  • A Bloom filter of posts you have already seen
  • Previously served post IDs from this session

Note: All hydrators run in parallel and merge into a single ScoredPostsQuery.

Source (Rust)

let hydrate_futures = hydrators.iter().map(|h| h.run(&query));
let results = join_all(hydrate_futures).await; // parallel fetch
for (hydrator, result) in hydrators.iter().zip(results) {
    if let Ok(hydrated) = result {
        hydrator.update(&mut hydrated_query, hydrated);
    }
}

The Phoenix ML Model#

Phoenix is the recommendation engine responsible for discovering out-of-network content. It has two components: a retrieval model to find candidates, and a ranking model (a full transformer) to score them.

Retrieval — Two-Tower Architecture#

The retrieval model uses a classic two-tower neural network:

  • User Tower — encodes your engagement history (post hashes + action types + product surface + dwell time) into a single dense vector.
  • Candidate Tower — encodes each post in the global corpus into a vector, normalized to the unit sphere for stable dot products.
  • Similarity Search — dot product between your user vector and all candidate vectors; top-K are returned to the ranking model.
python
# phoenix/recsys_retrieval_model.py
class CandidateTower(hk.Module):
    """Two modes: MLP projection (default) or mean-pool → L2 norm."""

    def __call__(self, post_author_embedding: jax.Array) -> jax.Array:
        # Mean-pool across hash embeddings, then L2-normalize to unit sphere
        candidate_representation = jnp.mean(post_author_embedding, axis=-2)
        candidate_norm_sq = jnp.sum(candidate_representation**2,
                                    axis=-1, keepdims=True)
        # EPS clamp prevents division-by-zero for zero embeddings
        candidate_norm = jnp.sqrt(jnp.maximum(candidate_norm_sq, EPS))
        candidate_representation = candidate_representation / candidate_norm
        return candidate_representation.astype(post_author_embedding.dtype)

Ranking — Grok-Based Transformer#

The ranking model is a Grok-based transformer (ported from the Grok-1 open-source release). It takes three types of input tokens, concatenated into a single sequence:

  • 1. Your user embedding (1 token)
  • 2. Your engagement history (up to 128 tokens)
  • 3. Candidate posts (up to 32 tokens each)
python
# phoenix/recsys_model.py
embeddings = jnp.concatenate(
    [user_embeddings, history_embeddings, candidate_embeddings], axis=1
)

# Candidates CANNOT attend to each other — only to user context.
# This makes scores consistent regardless of what else is in the batch.
model_output = self.model(
    embeddings,
    padding_mask,
    candidate_start_offset=candidate_start_offset,
)

Note

A key design choice: candidates cannot attend to each other — only to the user context. This means a post's score does not change depending on which other posts are in the same batch, making scores deterministic and cache-friendly.

Scoring Formula#

The RankingScorer computes the weighted score from 22 predicted signals (20 discrete action probabilities + 2 continuous predictions). These are weighted and summed, then author diversity and OON adjustment are applied in the same scorer pass:

weighted = Σ (wi × P(actioni))

diversity_score = weighted × decayauthor_position  ·  final = diversity × oon_factor

Info

Actual weight values are runtime feature-switch parameters, not hardcoded constants — they can be tuned in production without redeploying. The simulator below uses illustrative relative weights to show directionality.

Drag the sliders to see how adjusting each predicted signal moves the final score. Key additions vs. earlier descriptions: retweet (not "repost"), vqv (video quality view), share_via_dm, quoted_vqv, and not_dwelled as a negative signal.

Interactive

Scoring Formula Simulator

Final Score

3.99

NegativeNeutralPositive

Drag the sliders below to see how each predicted action probability affects the final score.

Positive Signals

Favorite
w=+1+0.30
P=0P=0.30P=1
Reply
w=+1.2+0.36
P=0P=0.30P=1
Retweet
w=+1.5+0.45
P=0P=0.30P=1
Quote
w=+1.3+0.39
P=0P=0.30P=1
Click
w=+0.5+0.15
P=0P=0.30P=1
Profile Click
w=+0.8+0.24
P=0P=0.30P=1
VQV (Video Quality View)
w=+0.7+0.21
P=0P=0.30P=1
Photo Expand
w=+0.4+0.12
P=0P=0.30P=1
Share
w=+1.4+0.42
P=0P=0.30P=1
Share via DM
w=+1.1+0.33
P=0P=0.30P=1
Share via Copy Link
w=+0.6+0.18
P=0P=0.30P=1
Dwell (discrete)
w=+0.9+0.27
P=0P=0.30P=1
Quoted Post Click
w=+0.5+0.15
P=0P=0.30P=1
Quoted Post VQV
w=+0.5+0.15
P=0P=0.30P=1
Dwell Time (continuous)
w=+0.8+0.24
P=0P=0.30P=1
Click Dwell Time (continuous)
w=+0.6+0.18
P=0P=0.30P=1
Follow Author
w=+2+0.60
P=0P=0.30P=1

Negative Signals

Not Interested
w=-2-0.10
P=0P=0.05P=1
Block Author
w=-5-0.25
P=0P=0.05P=1
Mute Author
w=-3-0.15
P=0P=0.05P=1
Report
w=-4-0.20
P=0P=0.05P=1
Not Dwelled
w=-1-0.05
P=0P=0.05P=1

Score = Σ (weighti × P(actioni))

Current: +1×0.30 + +1.2×0.30 + +1.5×0.30 + +1.3×0.30 + +0.5×0.30 + +0.8×0.30 + +0.7×0.30 + +0.4×0.30 + +1.4×0.30 + +1.1×0.30 + +0.6×0.30 + +0.9×0.30 + +0.5×0.30 + +0.5×0.30 + +0.8×0.30 + +0.6×0.30 + +2×0.30 + -2×0.05 + -5×0.05 + -3×0.05 + -4×0.05 + -1×0.05

Post age is also a factor — bucketed into 1-hour bins up to 80 hours, giving fresher posts an advantage:

python
# phoenix/recsys_model.py
POST_AGE_MAX_MINUTES = 4800  # 80 hours

def compute_post_age_bucket(impr_ts_sec, post_creation_ts_sec,
                             granularity_mins=60):
    post_age_minutes = (impr_ts_sec - post_creation_ts_sec) // 60
    bucket = (post_age_minutes // granularity_mins) + 1
    return jnp.clip(bucket, 0, overflow_bucket)

The Grox Content Pipeline#

Grox is a separate content-understanding service that runs classifiers against posts before they enter the main ranking pipeline. It uses Grok (the LLM) to make decisions.

ClassifierPurpose
SpamEapiLowFollowerClassifierDetects spam from low-follower accounts using Grok
Safety / PTOSFlags content policy violations
Post CategoryTags posts with topic categories for better matching
Media ProcessingAnalyses images and video via ASR and vision models
python
# grox/classifiers/content/spam.py
class SpamEapiLowFollowerClassifier(ContentClassifier):
    async def _classify(self, post: Post) -> list[ContentCategoryResult]:
        convo = await self._to_convo(post)
        result = await self._sample(convo)   # calls Grok
        return await self._parse(post, result)

# grox/engine.py — processes tasks from an async queue
async def _run(self, started_event: Event):
    await self._init_run()
    while not self._is_shutdown() or not self._task_queue.empty():
        task = await self._poll_task()
        asyncio.create_task(self._run_task(task))

What the Algorithm Rewards#

Based on the scoring formula and pipeline structure, here is what genuinely moves posts higher in feeds:

Strong Engagement Signals

  • Likes, retweets, and quote posts are the strongest positive signals
  • Replies are weighted positively — a post that sparks conversation rises
  • Long dwell time — if people stop scrolling, the RankingScorer rewards it via both dwell_score (discrete) and dwell_time (continuous)

Account Health

  • Mutual follows with your audience improves in-network retrieval
  • Consistent recency — Thunder trims old posts, AgeFilter removes stale ones
  • Video engagement (vqv_score) and photo expand are dedicated scorer signals — rich media matters

Discoverability

  • Out-of-network retrieval is purely ML-driven via the two-tower model
  • Writing content that matches the embedding profile of engaged users is the only reliable lever
  • Profile clicks from a post signal the author is discovery-worthy

Content Integrity

  • Posts that pass Grox safety checks get to compete in ranking
  • Content that attracts engagement without prompting negative feedback is ideal
  • Brand safety signals are hydrated at the candidate level

What the Algorithm Penalizes#

Hard Filters (Instant Removal)

  • Posts from blocked/muted accounts removed before scoring — AuthorSocialgraphFilter
  • Posts containing muted keywords removed entirely — MutedKeywordFilter
  • Posts flagged by Grox as spam, violence, or PTOS — VFFilter
  • Posts older than the retention threshold — AgeFilter

Soft Penalties (Score Reduction)

  • 'Not Interested' — negative weight in RankingScorer pulls down similar content
  • Block author and mute author carry the strongest negative weights in the formula
  • Report is a dedicated negative signal
  • not_dwelled is a negative signal — posts people scroll past fast are penalised
  • RankingScorer's author diversity decay attenuates repeated authors — flooding the feed backfires

Key Design Decisions#

01

No Hand-Engineered Features

The transformer learns everything from your engagement sequence. There are no manually crafted relevance signals: "We have eliminated every single hand-engineered feature and most heuristics from the system." Keyword stuffing, hashtag farming, and posting at specific times do not directly influence model predictions.

02

Candidate Isolation in Ranking

During transformer inference, candidates only attend to the user context — not to each other. This means scores are deterministic per (user, post) pair regardless of which other posts are being scored in the same batch.

python
model_output = self.model(
    embeddings,
    padding_mask,
    candidate_start_offset=candidate_start_offset,
)

03

Hash-Based Embeddings

Both retrieval and ranking use multiple hash functions to look up embeddings. Hash 0 is reserved for padding/missing values. This avoids maintaining a vocabulary and handles rare or unseen IDs gracefully.

python
@dataclass
class HashConfig:
    num_user_hashes:   int = 2
    num_item_hashes:   int = 2
    num_author_hashes: int = 2
    num_ip_hashes:     int = 0

04

Multi-Action Prediction (22 signals)

The RankingScorer combines 22 predicted signals (20 discrete action probabilities + 2 continuous predictions like dwell_time and click_dwell_time). Predicting this many distinct engagement types — rather than a single 'relevance' score — lets the feed's character be tuned post-training just by adjusting runtime weight parameters, without retraining. The model outputs discrete logits via an unembedding matrix and continuous predictions via a separate sigmoid head.

05

Composable Pipeline Architecture

Every stage is a pluggable trait. Adding a new filter, scorer, or data source does not require touching the pipeline executor. This is the core pattern that makes the system extensible.

rust
pub trait CandidatePipeline<Q, C> {
    fn query_hydrators(&self)        -> &[Box<dyn QueryHydrator<Q>>];
    fn sources(&self)                -> &[Box<dyn Source<Q, C>>];
    fn hydrators(&self)              -> &[Box<dyn Hydrator<Q, C>>];
    fn filters(&self)                -> &[Box<dyn Filter<Q, C>>];
    fn scorers(&self)                -> &[Box<dyn Scorer<Q, C>>];
    fn selector(&self)               -> &dyn Selector<Q, C>;
    fn post_selection_filters(&self) -> &[Box<dyn Filter<Q, C>>];
    fn side_effects(&self)           -> Arc<Vec<Box<dyn SideEffect<Q, C>>>>;
}

06

In-Memory Real-Time Serving (Thunder)

Thunder avoids database reads for in-network content by maintaining post data entirely in memory, updated live from Kafka. This gives sub-millisecond retrieval latency for followed accounts — a crucial performance optimization at X's scale.

Common Misconceptions#

These myths persist because the algorithm is opaque by default. The open-source code corrects them:

Myth

More hashtags = more reach

Reality

Hashtags are not in the scoring formula. Engagement patterns drive reach, not metadata.

Myth

Posting at peak hours is the trick

Reality

The age bucket normalizes recency across all hours. There is no peak-hour boost in the model.

Myth

High follower count guarantees visibility

Reality

RankingScorer applies an author diversity decay multiplier. A smaller account with better engagement can outrank a large one with worse engagement.

Myth

The algorithm only shows you people you follow

Reality

Phoenix retrieval (out-of-network) content from accounts you have never seen can fill a large portion of your feed.

Myth

Negative feedback only hurts that one post

Reality

The transformer is trained on your full engagement sequence. Consistent negative signals reshape what the model predicts you will like across future sessions.

Myth

Going viral once permanently boosts your account

Reality

Scores are computed per (user, post) pair. Past virality carries no persistent account-level boost — each post is scored independently against each viewer's history.

All Research

Based on xai-org/x-algorithm · Released Jan 20, 2026 · Updated May 15, 2026