Proximity Service System Design: Geospatial Indexing, Nearby Search & Radius Queries
Proximity Service System Design#
A proximity service answers one question: "What's near me?" Every time you open Yelp, Google Maps, or Uber, a proximity service finds relevant places or drivers within a given radius — fast.
Functional Requirements#
- Nearby search — given a user's location and radius, return nearby businesses/places
- Business CRUD — owners can add, update, or remove their listings
- Detailed view — users can view detailed information about a place
Non-Functional Requirements#
- Low latency — nearby queries must return in < 100 ms
- High availability — the read path must be highly available
- Eventual consistency — business updates can propagate with slight delay
- Scale — 100 M daily active users, 200 M businesses
Scale Estimation#
DAU: 100 M
Searches per user per day: ~5
QPS: 100M * 5 / 86400 ≈ 5,800 → peak ~12,000 QPS
Business records: 200 M × ~1 KB ≈ 200 GB (fits in memory)
Read-heavy workload — optimise the read path aggressively.
High-Level Architecture#
Client → Load Balancer → Location Service (read) → Geospatial Index
→ Business Service (write) → Business DB
The Location Service handles all nearby queries. The Business Service handles CRUD and pushes updates to the geospatial index asynchronously.
The Core Problem: Spatial Queries#
A naive SQL query like WHERE lat BETWEEN x1 AND x2 AND lng BETWEEN y1 AND y2 doesn't scale — it can't use a standard B-tree index on two dimensions simultaneously. We need geospatial indexing.
Geospatial Indexing Strategies#
1. Geohash#
Geohash encodes a 2D coordinate into a 1D string by recursively bisecting latitude and longitude:
(37.7749, -122.4194) → "9q8yyk"
- Each character narrows the bounding box
- Nearby points usually share a common prefix
- Precision levels: 4 chars ≈ 39 km, 5 chars ≈ 5 km, 6 chars ≈ 1.2 km
Querying: To find businesses within a radius, compute the geohash prefix for the user's cell plus its 8 neighbouring cells, then query all 9 prefixes:
SELECT * FROM businesses
WHERE geohash LIKE '9q8yy%'
OR geohash LIKE '9q8yz%'
-- ... 7 more neighbours
Edge problem: Two points across a cell boundary can have completely different geohashes. Querying neighbours solves this.
2. Quadtree#
A quadtree recursively subdivides 2D space into four quadrants. Each leaf node holds up to N businesses.
World → NW | NE | SW | SE
Each quadrant subdivides further until ≤ N items per leaf
- In-memory data structure — fast traversal
- Build time: O(n log n), query time: O(log n)
- Must be rebuilt or partially updated when businesses change
3. Google S2 Geometry#
S2 maps the Earth's surface onto a unit sphere, then projects onto six cube faces. Each face is subdivided into cells at 30 levels.
- Hierarchical cell IDs — a single 64-bit integer
- Supports region coverings — approximate any shape with a set of cells
- Used by Google Maps, Foursquare, Tinder
Comparison:
| Feature | Geohash | Quadtree | S2 |
|---|---|---|---|
| Storage | DB column | In-memory tree | DB column or memory |
| Precision | Fixed grid | Adaptive density | Adaptive cells |
| Edge handling | Query 9 cells | Natural | Region covering |
| Complexity | Simple | Moderate | Complex |
Database Choices#
PostGIS (PostgreSQL)#
CREATE INDEX idx_location ON businesses USING GIST(location);
SELECT * FROM businesses
WHERE ST_DWithin(location, ST_MakePoint(-122.4, 37.7)::geography, 5000);
Best for: moderate scale, rich spatial queries, transactional consistency.
Redis GEO#
GEOADD businesses -122.4194 37.7749 "biz:123"
GEORADIUS businesses -122.4194 37.7749 5 km COUNT 20 ASC
Best for: ultra-low latency, simple radius queries, caching layer.
Elasticsearch#
{
"query": {
"geo_distance": {
"distance": "5km",
"location": { "lat": 37.77, "lon": -122.42 }
}
}
}
Best for: full-text search combined with geospatial, faceted filtering.
Recommended Approach#
Use Redis GEO as the primary query layer for speed, backed by PostgreSQL + PostGIS as the source of truth.
Caching Nearby Results#
Nearby results are highly cacheable — millions of users in the same city see similar results.
Strategy: Geohash-Based Cache Keys#
cache_key = f"nearby:{geohash_prefix}:{category}:{radius}"
- Cache the list of business IDs for each geohash cell + category
- TTL: 5–15 minutes (businesses don't change often)
- Invalidate on business create/update/delete via pub/sub
Cache Hierarchy#
Client cache (60s) → CDN edge (for popular areas)
→ Redis cluster (geohash → business IDs)
→ PostGIS (source of truth)
For the top 1,000 geohash cells (covering major cities), pre-warm the cache on startup.
Real-Time Location Updates#
For use cases like ride-sharing where drivers move constantly:
Challenge#
Updating a geospatial index on every GPS ping (every 3–5 seconds, millions of drivers) is expensive.
Solution: Write-Behind with Pub/Sub#
Driver app → WebSocket → Location Ingestion Service
→ Redis GEO (GEOADD overwrite, O(log N))
→ Kafka topic "location-updates"
→ Consumer → PostGIS batch update (every 30s)
Redis GEO handles GEOADD as an upsert — updating a moving object is a single O(log N) operation. The persistent store is updated in batches.
Geofencing for Efficiency#
Only propagate updates when a driver crosses a geohash cell boundary:
new_geohash = encode(lat, lng, precision=6)
if new_geohash != driver.current_geohash:
update_index(driver_id, lat, lng)
driver.current_geohash = new_geohash
This reduces write volume by 80–90%.
Putting It All Together#
┌─────────────┐
│ API Gateway │
└──────┬──────┘
┌──────┴──────┐
┌─────┤ Router ├─────┐
│ └─────────────┘ │
┌────────▼────────┐ ┌──────────▼──────────┐
│ Location Service │ │ Business Service │
│ (read-heavy) │ │ (write-moderate) │
└────────┬─────────┘ └──────────┬──────────┘
│ │
┌────────▼────────┐ ┌──────────▼──────────┐
│ Redis GEO │ │ PostgreSQL+PostGIS │
│ (query layer) │◄───│ (source of truth) │
└─────────────────┘ └─────────────────────┘
Key Takeaways#
- Geohash is the simplest approach — encode lat/lng, query prefix + neighbours
- Quadtree adapts to density — fewer cells in sparse areas, more in dense areas
- S2 is the most powerful but also the most complex
- Redis GEO gives sub-millisecond radius queries — use it as the hot layer
- Cache aggressively using geohash prefixes as cache keys
- For moving objects, use write-behind with geohash boundary detection
Design, build, and ship — all in one place. Try codelit.io.
This is article #195 in the Codelit engineering blog.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Uber Real-Time Location System
Handles 5M+ GPS pings per second using H3 hexagonal geospatial indexing.
6 componentsScalable SaaS Application
Modern SaaS with microservices, event-driven processing, and multi-tenant architecture.
10 componentsNetflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsBuild this architecture
Generate an interactive architecture for Proximity Service System Design in seconds.
Try it in Codelit →
Comments