Ad Serving System Design: Real-Time Bidding, Auctions & Scale
Ad serving is one of the most demanding distributed systems in production today. Google processes over 100 billion ad impressions per day. Each impression triggers an auction that must complete in under 100 milliseconds — including network round-trips to dozens of bidders. Designing a system that operates at this scale while maximizing revenue, maintaining fairness, and detecting fraud is a masterclass in system design trade-offs.
The Ad Serving Pipeline#
An ad serving system processes every page load through a well-defined pipeline:
- Ad Request — The user's browser or app sends a request containing context (page URL, device, geo, user ID).
- User Enrichment — The ad server enriches the request with user profile data, cookie syncs, and audience segments.
- Candidate Selection — Eligible campaigns and creatives are filtered based on targeting rules, budgets, and frequency caps.
- Auction — Eligible bids are scored and ranked. The winner is selected.
- Ad Rendering — The winning creative is returned and rendered in the user's viewport.
- Event Tracking — Impressions, clicks, and conversions are tracked asynchronously.
The entire pipeline from request to response must complete in under 100ms to avoid degrading page load times and user experience.
Real-Time Bidding (RTB)#
RTB is the protocol that enables programmatic ad buying. When a publisher has an ad slot to fill, it sends a bid request to an ad exchange, which fans it out to multiple demand-side platforms (DSPs).
OpenRTB Protocol#
The IAB's OpenRTB specification defines the bid request/response format:
- Bid Request: Contains impression details (size, position, floor price), site/app info, device info, and user data.
- Bid Response: Contains the bid price, creative markup, advertiser domain, and deal ID (if applicable).
- Timeout: Exchanges enforce strict timeouts — typically 80-120ms. Late responses are discarded.
Architecture#
Publisher → SSP → Ad Exchange → DSP₁, DSP₂, ... DSPₙ
← Bid Responses
Winner Selected
← Creative Returned
The exchange fans out requests to 20-50 DSPs simultaneously. Each DSP must evaluate the impression against thousands of active campaigns and return a bid — all within the timeout window.
Ad Auction Mechanics#
Second-Price Auction (Vickrey)#
Historically, ad exchanges used second-price auctions: the highest bidder wins but pays the second-highest bid plus one cent. This encourages truthful bidding because bidders maximize utility by bidding their true valuation.
Example: Bidder A bids $5.00, Bidder B bids $3.00. Bidder A wins and pays $3.01.
First-Price Auction#
Most major exchanges have shifted to first-price auctions where the winner pays exactly what they bid. This simplifies the auction but introduces bid shading — DSPs deliberately bid below their true valuation to avoid overpaying.
Bid Shading Algorithms#
DSPs use ML models to predict the optimal bid:
- Win-rate models: Predict the probability of winning at each price point.
- Landscape models: Estimate the distribution of competing bids.
- Optimal bid: Maximize
P(win) * (value - bid).
Targeting Strategies#
Contextual Targeting#
Match ads to page content without relying on user data:
- Keyword extraction from page text using NLP.
- Topic classification using pre-trained models (IAB content taxonomy).
- Sentiment analysis to avoid brand-unsafe placements.
- URL categorization for broad content matching.
Behavioral Targeting#
Build user profiles from browsing history and interaction data:
- Interest segments: Users who visited travel sites in the last 30 days.
- In-market segments: Users actively researching a purchase category.
- Retargeting: Users who visited the advertiser's site but didn't convert.
- Lookalike audiences: Users similar to an advertiser's best customers.
First-Party Data Activation#
With cookie deprecation, first-party data is increasingly important:
- Publisher data: Logged-in user attributes, subscription tier, content preferences.
- Advertiser data: CRM segments uploaded via clean rooms.
- Identity resolution: Probabilistic and deterministic matching across devices.
Ad Ranking & Scoring#
The ad server doesn't just pick the highest bid. It optimizes for expected revenue using a scoring function:
Score = Bid × P(click) × P(conversion | click) × Quality Factor
- P(click): Click-through rate prediction using features like ad position, creative size, user history, and context.
- P(conversion): Post-click conversion probability estimated from historical data.
- Quality Factor: Penalizes low-quality or irrelevant ads to preserve user experience.
ML Models for CTR Prediction#
Production systems use deep learning models trained on billions of examples:
- Feature engineering: User embeddings, ad embeddings, cross features, sequential behavior.
- Model architecture: Wide & Deep, DeepFM, or transformer-based models.
- Serving latency: Models must return predictions in under 10ms. Techniques include model distillation, quantization, and feature caching.
Click Tracking & Conversion Attribution#
Click Tracking#
When a user clicks an ad, the request passes through a click tracker before redirecting to the landing page:
- User clicks ad → request hits
click.adserver.com/click?id=xyz. - Click server logs the event (impression ID, timestamp, user agent, IP).
- Server responds with a 302 redirect to the advertiser's landing page.
- Total added latency must be under 50ms.
Conversion Attribution#
Attribution connects ad impressions/clicks to downstream conversions:
- Last-click attribution: Credit goes to the last ad clicked before conversion.
- View-through attribution: Credit impressions that were viewed but not clicked, within a lookback window (typically 24 hours).
- Multi-touch attribution (MTA): Distributes credit across multiple touchpoints using Shapley values or data-driven models.
- Incrementality testing: A/B tests that measure the causal lift of ad exposure.
Attribution Pipeline#
Events (clicks, impressions) → Stream Processing (Kafka/Flink)
→ Join with Conversion Events → Attribution Model
→ Reporting Database → Advertiser Dashboard
Conversion events arrive asynchronously — sometimes days after the impression. The system must efficiently join events across large time windows.
Fraud Detection#
Ad fraud costs the industry $80+ billion annually. Detection operates at multiple layers:
Pre-Bid Fraud Detection#
Filter fraudulent traffic before the auction:
- IP reputation: Block known data center IPs, proxy networks, and VPNs.
- Device fingerprinting: Detect emulators, headless browsers, and spoofed user agents.
- Traffic pattern analysis: Flag abnormal request rates, geographic impossibilities, and bot signatures.
Post-Bid Fraud Detection#
Analyze served impressions for fraud signals:
- Invalid traffic (IVT): Sophisticated bots that mimic human behavior — detected through mouse movement analysis, viewport verification, and JavaScript challenges.
- Domain spoofing: Verify that the ad was served on the declared domain using
ads.txtandsellers.json. - Click fraud: Detect click farms through clustering analysis on IP, device, and timing patterns.
- Conversion fraud: Identify fake installs and conversions using survival analysis and behavioral scoring.
ML-Based Detection#
- Anomaly detection: Autoencoders trained on legitimate traffic to flag outliers.
- Supervised classification: Gradient-boosted models trained on labeled fraud datasets.
- Graph analysis: Detect fraud rings by analyzing relationships between IPs, devices, and publishers.
Latency Requirements & Optimization#
The 100ms end-to-end budget is the defining constraint:
| Component | Budget |
|---|---|
| Network (publisher → exchange) | 10-20ms |
| User enrichment & lookup | 5-10ms |
| Bid request fan-out to DSPs | 50-80ms |
| Auction logic | 1-2ms |
| Response serialization | 1-2ms |
Optimization Techniques#
- Edge serving: Deploy ad servers in multiple PoPs close to users.
- Connection pooling: Maintain persistent HTTP/2 connections to DSPs.
- Parallel fan-out: Send bid requests to all DSPs simultaneously.
- Predictive pre-fetching: Pre-compute user profiles and cache them in Redis.
- Tiered timeouts: Fast-path for direct deals; full auction only when needed.
Scale Considerations#
A production ad serving system handles:
- Millions of QPS across the exchange.
- Petabytes of event data per day for tracking and attribution.
- Billions of user profiles updated in near real-time.
- Thousands of ML model updates per day for CTR prediction and fraud detection.
Data Infrastructure#
- Event streaming: Kafka for real-time event ingestion.
- Stream processing: Apache Flink for real-time aggregations, attribution joins, and fraud scoring.
- Batch processing: Spark for daily reporting, model training data preparation, and reconciliation.
- Storage: ClickHouse or Druid for OLAP queries on event data; Redis for low-latency profile lookups.
Key Design Decisions#
- First-price vs. second-price auction — First-price is simpler but requires bid shading; second-price encourages truthful bidding.
- Server-side vs. header bidding — Header bidding gives publishers more control but increases page latency.
- Cookie-based vs. cookieless targeting — Privacy regulations are pushing the industry toward contextual and first-party solutions.
- Real-time vs. batch attribution — Real-time enables faster optimization but is more complex to build reliably.
- Centralized vs. distributed auction — Centralized is simpler; distributed reduces latency but complicates consistency.
Wrapping Up#
Ad serving system design sits at the intersection of distributed systems, machine learning, and economics. The strict latency requirements, massive scale, and adversarial environment (fraud) make it one of the most challenging system design problems. Understanding the auction mechanics, targeting pipeline, and event tracking infrastructure gives you a strong foundation for both interviews and real-world ad tech engineering.
199 articles on system design at codelit.io/blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsReal-Time Collaborative Editor
Notion-like document editor with real-time collaboration, conflict resolution, and rich media.
9 componentsE-Commerce Checkout System
Production checkout flow with Stripe payments, inventory management, and fraud detection.
11 componentsBuild this architecture
Generate an interactive architecture for Ad Serving System Design in seconds.
Try it in Codelit →
Comments