System Design Interview: Design Twitter News Feed

Designing a news feed like Twitter (or X) is a classic system design interview question. It tests your ability to handle massive scale, write-heavy vs. read-heavy workloads, and complex fanout operations. This guide covers the “Hybrid Approach” often expected by interviewers at top tech companies.

Need a quick revision before interviews? Read the companion cheat sheet: System Design Interview: Twitter News Feed System Design CheatSheet.

Open Table of Contents

Interview Framework: How to Approach This Problem
Step 1: Clarifying Requirements
Step 2: Core Assumptions and Constraints
Step 3: High-Level Architecture
Step 4: The Hardest Problem - Timeline Fanout
- The Complexity
Step 5: Key Technical Decision - Push vs Pull vs Hybrid
Step 6: Database Design and Storage
- Data Classification
- Schema Design (Simplified)
Step 7: Scaling the System
Step 8: Security and Permissions
- Private Accounts
- Authentication
Step 9: Handling Edge Cases
- Edge Case 1: The “Unfollow” Event
- Edge Case 2: Viral Retweets
Step 10: Performance Optimizations
Real-World Implementations
- Twitter (X)
- Facebook News Feed
Common Interview Follow-Up Questions
Conclusion
References
YouTube Videos

Interview Framework: How to Approach This Problem

In a system design interview, when asked to design Twitter, here’s the structured approach you should follow:

Clarify requirements (5 minutes) - Identify if we are focusing on the Home Timeline, User Timeline, or Search.
State assumptions (2 minutes) - Define the scale (e.g., 300M DAU).
High-level design (10 minutes) - Sketch the API gateway, services, and storage.
Deep dive (20 minutes) - Focus on the Fanout Service (this is the heart of Twitter).
Scale and optimize (10 minutes) - Discuss caching strategies (Redis) and Hybrid Fanout.
Edge cases (3 minutes) - Celebrity accounts (Justin Bieber problem).

Key mindset: Don’t just design for “now”. Design for the read/write imbalance (reads >> writes).

Step 1: Clarifying Requirements

Questions to Ask the Interviewer

Q: Are we focusing on the Home Timeline or User Timeline?

Interviewer: Focus on the Home Timeline (aggregating tweets from people you follow).

Q: Does the timeline include retweets, replies, and media?

Interviewer: Yes, for now assume text and images.

Q: How fast should a new tweet appear in a follower’s feed?

Interviewer: Near real-time. Within 5 seconds ideally.

Q: Do we need to support editing tweets?

Interviewer: No, tweets are immutable for this exercise.

Functional Requirements

Post Tweet: User can post a tweet (text/image).
Home Timeline: User can view a scrollable list of tweets from followees.
Follow: User can follow/unfollow others.

Non-Functional Requirements

High Availability: The system should be always available (Eventual consistency is acceptable for timelines).
Low Latency: 200ms latency for generating a timeline.
Read Heavy: The read load is roughly 100x the write load.

Step 2: Core Assumptions and Constraints

To design effectively, we need concrete numbers.

DAU (Daily Active Users): 300 Million.
Tweets per day: 500 Million (approx 6k tweets/sec).
Timeline Views: 20 Billion per day (approx 230k req/sec).
Read:Write Ratio: ~100:1.
Followers: Average user has ~200 followers. “Whales” (celebrities) have millions.

Constraint: The system must handle the “Celebrity Problem” (one tweet triggering millions of feed updates).

Step 3: High-Level Architecture

“Let me start with a high-level architecture. We’ll separate the Write Path (Posting) from the Read Path (Reading).”

System Flow Diagram

flowchart TD
    User[Client] --> LB[Load Balancer]
    LB --> API[API Gateway]

    API -- Post Tweet --> TweetSvc[Tweet Service]
    API -- Get Timeline --> FeedSvc[Timeline Service]
    API -- Follow --> GraphSvc[User Graph Service]

    TweetSvc --> Cache[(Tweet Cache)]
    TweetSvc --> DB[(Tweet DB)]
    TweetSvc --> Fanout[Fanout Service]

    Fanout --> GraphSvc
    Fanout --> RCache[(Redis Timeline Cache)]

    FeedSvc --> RCache
    FeedSvc --> Cache

Data Flow

Write Path: User posts a tweet via Tweet Service. It is saved to DB/Cache.
Fanout: Tweet Service triggers Fanout Service.
Fanout Logic: Fanout Service fetches followers from Graph Service and injects the Tweet ID into each follower’s “Home Timeline” list in Redis.
Read Path: User requests timeline. Timeline Service fetches the pre-computed list of Tweet IDs from Redis, then hydrates them with content from Tweet Cache.

Why This Architecture?

Decoupled Write/Read: Writing happens asynchronously (fanout). Reading is just fetching a pre-built list (O(1)).
Key Insight: We optimize for the read path because it happens 100x more often.

Step 4: The Hardest Problem - Timeline Fanout

The core challenge of Twitter is distribution. When user A tweets, it must appear in the feeds of all 1,000 followers.

The Complexity

If User A has 1 million followers, a single write triggers 1 million updates. If we do this synchronously, the write api will time out. If we query the DB on load (SELECT * FROM tweets WHERE user_id IN (following_list)), the DB will die under join load.

We need Fanout-on-Write (Push Model).

Step 5: Key Technical Decision - Push vs Pull vs Hybrid

“This is the most critical decision in the interview. Let’s compare approaches.”

Approach 1: Pull Model (Fanout-on-Read)

How: Store tweets. On read, query DB for all followees, merge, sort, return.
Pros: Simple writes. No “celebrity problem”.
Cons: Terrible read latency. DB struggle with expensive joins.
Verdict: ❌ Too slow for 300M DAU.

Approach 2: Push Model (Fanout-on-Write)

How: maintain a pre-computed “Timeline List” for every user in Redis. When X tweets, append Tweet ID to lists of all X’s followers.
Pros: Lightning fast reads (O(1)).
Cons: “Justin Bieber Problem”. Writing to 100M timeline lists takes too long.
Verdict: ❌ Fails for celebrities.

Approach 3: Hybrid Approach (The Win!)

How:
1. Normal Users: Use Push Model.
2. Celebrities/Whales: Use Pull Model.
Logic: When a normal user tweets, push to all followers. When Bieber tweets, don’t push.
Read Time: When I load my feed, the system fetches my pre-computed list (pushed tweets) AND queries the tweets of celebrities I follow (pull), then merges them.
Verdict: ✅ Best of both worlds.

Step 6: Database Design and Storage

Data Classification

User Data/Auth: SQL (MySQL/PostgreSQL) - Strong consistency needed.
Social Graph (Follows): SQL or Graph DB (Neo4j/TAO). TAO (Facebook’s creation) is ideal here.
Tweets: NoSQL (Cassandra/DynamoDB). High write throughput, simple KV access.
Timelines: Redis. List of Tweet IDs.

Schema Design (Simplified)

Tweet Table (Cassandra)

timestamp | tweet_id | user_id | content | media_url
PK: (user_id, timestamp)  -- Clustered by time for fast retrieval

Timeline Cache (Redis)

Key: user_id_timeline
Value: List[tweet_id_1, tweet_id_2, tweet_id_3...]

Step 7: Scaling the System

Bottleneck 1: Redis Memory

Problem: Storing timelines for 300M users is expensive.
Solution: Only store the last 800 tweet IDs. Older tweets are fetched via DB fallback (Pull) if the user scrolls that far.

Bottleneck 2: Tweet ID Generation

Problem: We need globally unique, time-sortable IDs faster than DB auto-increment.
Solution: Snowflake ID (Twitter’s own invention).
- 64-bit integer: 41 bits timestamp, 10 bits machine ID, 12 bits sequence.
- Allows sorting by ID to equal sorting by time.

Capacity Planning

Storage: 500M tweets * 200 bytes = 100GB/day = 36TB/year. manageable.
Media: Stored in S3/CDN. Only metadata in DB.

Step 8: Security and Permissions

“We need to ensure users only see content allowed.”

Private Accounts

If a user is private, the Fanout service must check is_approved_follower. In the Hybrid model, private tweets are rarely pushed to public caches without strict ACLs.

Authentication

Start with OAuth 2.0 / JWT for API handling.

Step 9: Handling Edge Cases

Edge Case 1: The “Unfollow” Event

Scenario: User A unfollows User B. Handling: Asynchronously remove B’s tweets from A’s Redis list. This is expensive, so we might just filter them out at Read time for a while until the cache expires.

Edge Case 2: Viral Retweets

Scenario: A tweet gets 1M retweets in minutes. Handling: Cache the tweet object heavily (CDN + Local Cache) to prevent hot partitions in the Tweet DB.

Step 10: Performance Optimizations

Pagination: Use max_id (cursor-based pagination) instead of offset. Offset is slow in SQL.
CDN: Cache images and videos at the edge (Cloudflare/Akamai).
Compression: Compress text logic before storage (though tweets are short, metadata adds up).

Real-World Implementations

Twitter (X)

Originally Ruby on Rails, moved to Scala (Finagle).
Uses Fanout-on-Write for most.
Invented Snowflake for IDs.
Uses Manhattan (internal NoSQL) and Redis clusters.

Facebook News Feed

Uses a darker version of Hybrid (mostly Pull with smart ranking).
TAO (The Association Object) for graph data.

Common Interview Follow-Up Questions

Q: How do you handle searching tweets?

Answer: “Search should be a separate read-optimized system:

Write tweets to primary Tweet DB first.
Publish async indexing events to a search pipeline.
Index in Elasticsearch/OpenSearch with language analyzers and hashtag/mention fields.
Accept a short indexing delay (eventual consistency) to keep write latency low.

Trade-off: Slight search lag is acceptable; coupling search in the write path is not.”

Q: How to implement ‘Trends’?

Answer: “Use stream processing over engagement events:

Ingest likes, retweets, replies, and hashtag usage into Kafka.
Aggregate scores in sliding windows (for example 5, 15, 60 minutes).
Normalize by region and baseline volume to avoid always-on popular tags.
Apply anti-spam filters before publishing trend lists.

This catches fast-moving topics while limiting manipulation.”

Q: How do you handle celebrity accounts with massive fan-out?

Answer: “I use hybrid fan-out:

Normal users: fan-out-on-write for fast timeline reads.
Celebrity users: fan-out-on-read to avoid pushing to millions of inboxes instantly.
Merge both sources in timeline service with ranking.
Cache merged timelines for short TTL to reduce recomputation.

This controls write amplification without hurting read latency for most users.”

Q: How do you remove deleted tweets from cached home timelines?

Answer: “I treat delete as a high-priority invalidation event:

Emit tombstone event when tweet is deleted or moderated.
Remove tweet ID from timeline cache indexes.
Enforce final read-time filter against authoritative tweet status.
Run periodic cache repair jobs for missed invalidations.

This prevents stale or policy-violating content from lingering in feeds.”

Q: How would you support “Following” (chronological) and “For You” (ranked) feeds?

Answer: “I’d split the serving paths but reuse shared storage:

Following feed: timeline merge sorted by timestamp.
For You feed: candidate generation + ML ranking + business rules.
Share safety filters, user graph, and tweet metadata between both paths.
Expose separate caching policies per tab.

Trade-off: Two pipelines increase complexity, but product flexibility and experimentation improve significantly.”

Conclusion

Designing Twitter is about managing distribution. The Hybrid Fanout pattern is the gold standard answer. By mixing Push for normal users and Pull for celebrities, we balance write load and read latency.

Key Takeaways:

Read-heavy system (optimize for reads).
Fanout-on-write is great, but fails for celebrities.
Hybrid approach solves the scale issue.
Snowflake IDs are crucial for sorting.

References

YouTube Videos

“How to Answer System Design Interview Questions” - Exponent [https://www.youtube.com/watch?v=NtMvNh0WFVM]
“Design Twitter - System Design Interview” - Gaurav Sen [https://www.youtube.com/watch?v=wYk0xPP_P_8]