Graph Database Architecture: Model Connected Data with Neo4j and the Property Graph
Graph Database Architecture#
When your data is defined by relationships — who follows whom, what product connects to which supplier, which transactions form a fraud ring — relational databases fight you at every step. Graph databases make connections first-class citizens.
The Property Graph Model#
A property graph has four building blocks:
Nodes — entities (Person, Product, Company)
Labels — categories on nodes (a node can have multiple)
Relationships — directed connections between nodes (FOLLOWS, PURCHASED)
Properties — key-value pairs on both nodes and relationships
Example structure:
(:Person {name: "Alice", age: 32})
-[:FOLLOWS {since: "2024-01"}]->
(:Person {name: "Bob", age: 28})
-[:PURCHASED {amount: 49.99}]->
(:Product {name: "Mechanical Keyboard", category: "electronics"})
Why Not Relational?#
Consider "find friends-of-friends who bought the same product":
-- Relational: 4 JOINs, increasingly slow with scale
SELECT DISTINCT fof.name
FROM users u
JOIN follows f1 ON u.id = f1.follower_id
JOIN follows f2 ON f1.followed_id = f2.follower_id
JOIN purchases p1 ON u.id = p1.user_id
JOIN purchases p2 ON fof.id = p2.user_id
JOIN users fof ON f2.followed_id = fof.id
WHERE u.name = 'Alice'
AND p1.product_id = p2.product_id
AND fof.id != u.id;
// Graph: reads like the question itself
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(fof:Person),
(alice)-[:PURCHASED]->(product)<-[:PURCHASED]-(fof)
WHERE fof <> alice
RETURN DISTINCT fof.name
The graph query is faster because it traverses pointers rather than scanning join tables. At 6+ degrees of separation, relational databases grind to a halt while graph databases maintain constant time per hop.
Cypher Query Language#
Cypher is the SQL of graph databases — declarative and pattern-based.
Core Patterns#
// Create nodes and relationships
CREATE (alice:Person {name: "Alice", role: "engineer"})
CREATE (bob:Person {name: "Bob", role: "designer"})
CREATE (alice)-[:WORKS_WITH {since: 2023}]->(bob)
// Find patterns
MATCH (p:Person)-[:WORKS_WITH]->(colleague)
WHERE p.role = "engineer"
RETURN p.name, collect(colleague.name) AS teammates
// Variable-length paths (1 to 5 hops)
MATCH path = (start:Person {name: "Alice"})-[:FOLLOWS*1..5]->(end:Person)
RETURN end.name, length(path) AS distance
ORDER BY distance
// Shortest path
MATCH path = shortestPath(
(a:Person {name: "Alice"})-[:FOLLOWS*]-(b:Person {name: "Zara"})
)
RETURN [node IN nodes(path) | node.name] AS route
Aggregation and Projection#
// PageRank-style influence scoring
MATCH (p:Person)<-[:FOLLOWS]-(follower)
WITH p, count(follower) AS followerCount
ORDER BY followerCount DESC
LIMIT 10
RETURN p.name, followerCount
// Subgraph extraction
MATCH (p:Person)-[r]->(connected)
WHERE p.name IN ["Alice", "Bob"]
RETURN p, r, connected
Traversal Algorithms#
Graph databases excel at algorithmic queries that would require recursive CTEs or application-side logic in relational systems.
Breadth-First Search (BFS)#
Find all nodes within N hops:
// All people within 3 degrees of Alice
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS*1..3]->(reachable:Person)
RETURN DISTINCT reachable.name,
min(length(shortestPath((alice)-[:FOLLOWS*]-(reachable)))) AS distance
Community Detection#
Identify clusters of densely connected nodes:
// Using Neo4j Graph Data Science library
CALL gds.louvain.stream('social-graph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId
ORDER BY communityId, person
Path Analysis#
// All paths between two nodes (with cycle protection)
MATCH path = (a:Account {id: "ACC-001"})-[:TRANSFERRED_TO*1..6]->(b:Account {id: "ACC-999"})
WHERE ALL(n IN nodes(path) WHERE single(x IN nodes(path) WHERE x = n))
RETURN path,
reduce(total = 0, r IN relationships(path) | total + r.amount) AS totalFlow
Neo4j Architecture Internals#
Storage Layer#
Neo4j uses a native graph storage engine — not a relational database underneath:
Node Store: Fixed-size records (15 bytes each)
[inUse|nextRelId|nextPropId|labels|extra]
Relationship Store: Fixed-size records (34 bytes each)
[inUse|firstNode|secondNode|type|
firstPrevRelId|firstNextRelId|
secondPrevRelId|secondNextRelId|
nextPropId]
Property Store: Linked list of property blocks
[type|keyIndex|value/pointer]
This means traversing a relationship is a pointer chase — O(1) per hop regardless of total graph size.
Index-Free Adjacency#
The key architectural decision: each node physically stores pointers to its adjacent relationships. No index lookup required for traversal.
Relational DB: Find neighbors → scan index → join table → resolve rows
Graph DB: Find neighbors → follow pointer → done
This is why graph databases achieve constant time per hop while relational JOIN performance degrades with table size.
Memory Architecture#
Page Cache: Stores frequently accessed graph pages in memory
Transaction Log: Write-ahead log for durability
ID Buffers: Recycle deleted node/relationship IDs
Query Cache: Compiled query plans
Sizing rule of thumb: allocate enough page cache to hold your entire graph. A 10GB graph needs ~10GB page cache for optimal performance.
Real-World Use Cases#
Social Networks#
// Recommendation: "People you may know"
MATCH (me:User {id: $userId})-[:FRIENDS]->(friend)-[:FRIENDS]->(suggestion)
WHERE NOT (me)-[:FRIENDS]->(suggestion)
AND suggestion <> me
WITH suggestion, count(friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN suggestion.name, mutualFriends
Recommendation Engines#
// Collaborative filtering: users who bought X also bought Y
MATCH (u:User {id: $userId})-[:PURCHASED]->(product)<-[:PURCHASED]-(other),
(other)-[:PURCHASED]->(recommendation)
WHERE NOT (u)-[:PURCHASED]->(recommendation)
WITH recommendation, count(other) AS score
ORDER BY score DESC
LIMIT 5
RETURN recommendation.name, recommendation.price, score
Fraud Detection#
// Find circular money flows (potential money laundering)
MATCH path = (a:Account)-[:TRANSFERRED_TO*3..8]->(a)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
AND ALL(r IN relationships(path) WHERE
duration.between(r.timestamp, head(
[r2 IN relationships(path)
WHERE startNode(r2) = endNode(r) | r2.timestamp]
)).days < 7)
RETURN path,
reduce(s = 0, r IN relationships(path) | s + r.amount) AS totalCirculated
Fraud detection is where graphs dominate — patterns that span multiple entities and relationships are trivial to express in Cypher but nearly impossible with SQL JOINs.
Graph vs Relational: Decision Framework#
Choose Graph When: Choose Relational When:
───────────────────── ──────────────────────
Queries involve 3+ JOINs Data is tabular and uniform
Relationship patterns are complex Aggregation/reporting is primary
Schema evolves frequently Schema is stable and well-defined
Traversal depth varies Queries are predictable
Connected data is the core value Transactions span many tables
Performance comparison for "find 4th-degree connections":
Dataset: 1M nodes, 10M relationships
─────────────────────────────────────
Relational (PostgreSQL): ~28 seconds
Graph (Neo4j): ~2 milliseconds
─────────────────────────────────────
Speedup: ~14,000x
Modeling Best Practices#
1. Relationships ARE data — put properties on edges, not just nodes
2. Use specific rel types — :PURCHASED_ON vs generic :RELATED_TO
3. Avoid super nodes — nodes with 1M+ relationships need partitioning
4. Index lookup properties — create indexes on frequently queried node properties
5. Denormalize for reads — duplicate properties to avoid extra traversals
// Create indexes for common lookups
CREATE INDEX FOR (p:Person) ON (p.email)
CREATE INDEX FOR (p:Product) ON (p.sku)
CREATE CONSTRAINT FOR (u:User) REQUIRE u.id IS UNIQUE
Key Takeaways#
Graph databases solve a fundamentally different problem than relational stores:
- Property graphs model entities, relationships, and properties as first-class concepts
- Cypher expresses complex traversal patterns in readable, declarative syntax
- Index-free adjacency gives O(1) per-hop traversal regardless of graph size
- Use cases like social networks, recommendations, and fraud detection are natural fits
- Choose graphs when relationships are the core value of your data
If your SQL queries have more JOINs than columns in the SELECT clause, it is time to consider a graph database.
Article #320 in the Codelit engineering series. Explore graph databases, system design, and advanced architectures at codelit.io.
Try it on Codelit
Chaos Mode
Simulate node failures and watch cascading impact across your architecture
Related articles
AI Agent Tool Use Architecture: Function Calling, ReAct Loops & Structured Outputs
6 min read
AI searchAI-Powered Search Architecture: Semantic Search, Hybrid Search, and RAG
8 min read
AI safetyAI Safety Guardrails Architecture: Input Validation, Output Filtering, and Human-in-the-Loop
8 min read
Try these templates
Netflix Video Streaming Architecture
Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.
10 componentsSearch Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsData Warehouse & Analytics
Snowflake-like data warehouse with ELT pipelines, SQL analytics, dashboards, and data governance.
8 componentsBuild this architecture
Generate an interactive Graph Database Architecture in seconds.
Try it in Codelit →
Comments