graph databaseNeo4jdata modelingarchitecturesystem design

Graph Database Architecture: Model Connected Data with Neo4j and the Property Graph

March 29, 2026 6 min readBy Codelit Team Discussion

Graph Database Architecture#

When your data is defined by relationships — who follows whom, what product connects to which supplier, which transactions form a fraud ring — relational databases fight you at every step. Graph databases make connections first-class citizens.

The Property Graph Model#

A property graph has four building blocks:

Nodes      — entities (Person, Product, Company)
Labels     — categories on nodes (a node can have multiple)
Relationships — directed connections between nodes (FOLLOWS, PURCHASED)
Properties — key-value pairs on both nodes and relationships

Example structure:

(:Person {name: "Alice", age: 32})
  -[:FOLLOWS {since: "2024-01"}]->
(:Person {name: "Bob", age: 28})
  -[:PURCHASED {amount: 49.99}]->
(:Product {name: "Mechanical Keyboard", category: "electronics"})

Why Not Relational?#

Consider "find friends-of-friends who bought the same product":

-- Relational: 4 JOINs, increasingly slow with scale
SELECT DISTINCT fof.name
FROM users u
JOIN follows f1 ON u.id = f1.follower_id
JOIN follows f2 ON f1.followed_id = f2.follower_id
JOIN purchases p1 ON u.id = p1.user_id
JOIN purchases p2 ON fof.id = p2.user_id
JOIN users fof ON f2.followed_id = fof.id
WHERE u.name = 'Alice'
  AND p1.product_id = p2.product_id
  AND fof.id != u.id;

// Graph: reads like the question itself
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(fof:Person),
      (alice)-[:PURCHASED]->(product)<-[:PURCHASED]-(fof)
WHERE fof <> alice
RETURN DISTINCT fof.name

The graph query is faster because it traverses pointers rather than scanning join tables. At 6+ degrees of separation, relational databases grind to a halt while graph databases maintain constant time per hop.

Cypher Query Language#

Cypher is the SQL of graph databases — declarative and pattern-based.

Core Patterns#

// Create nodes and relationships
CREATE (alice:Person {name: "Alice", role: "engineer"})
CREATE (bob:Person {name: "Bob", role: "designer"})
CREATE (alice)-[:WORKS_WITH {since: 2023}]->(bob)

// Find patterns
MATCH (p:Person)-[:WORKS_WITH]->(colleague)
WHERE p.role = "engineer"
RETURN p.name, collect(colleague.name) AS teammates

// Variable-length paths (1 to 5 hops)
MATCH path = (start:Person {name: "Alice"})-[:FOLLOWS*1..5]->(end:Person)
RETURN end.name, length(path) AS distance
ORDER BY distance

// Shortest path
MATCH path = shortestPath(
  (a:Person {name: "Alice"})-[:FOLLOWS*]-(b:Person {name: "Zara"})
)
RETURN [node IN nodes(path) | node.name] AS route

Aggregation and Projection#

// PageRank-style influence scoring
MATCH (p:Person)<-[:FOLLOWS]-(follower)
WITH p, count(follower) AS followerCount
ORDER BY followerCount DESC
LIMIT 10
RETURN p.name, followerCount

// Subgraph extraction
MATCH (p:Person)-[r]->(connected)
WHERE p.name IN ["Alice", "Bob"]
RETURN p, r, connected

Traversal Algorithms#

Graph databases excel at algorithmic queries that would require recursive CTEs or application-side logic in relational systems.

Breadth-First Search (BFS)#

Find all nodes within N hops:

// All people within 3 degrees of Alice
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS*1..3]->(reachable:Person)
RETURN DISTINCT reachable.name,
       min(length(shortestPath((alice)-[:FOLLOWS*]-(reachable)))) AS distance

Community Detection#

Identify clusters of densely connected nodes:

// Using Neo4j Graph Data Science library
CALL gds.louvain.stream('social-graph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId
ORDER BY communityId, person

Path Analysis#

// All paths between two nodes (with cycle protection)
MATCH path = (a:Account {id: "ACC-001"})-[:TRANSFERRED_TO*1..6]->(b:Account {id: "ACC-999"})
WHERE ALL(n IN nodes(path) WHERE single(x IN nodes(path) WHERE x = n))
RETURN path,
       reduce(total = 0, r IN relationships(path) | total + r.amount) AS totalFlow

Neo4j Architecture Internals#

Storage Layer#

Neo4j uses a native graph storage engine — not a relational database underneath:

Node Store:        Fixed-size records (15 bytes each)
                   [inUse|nextRelId|nextPropId|labels|extra]

Relationship Store: Fixed-size records (34 bytes each)
                   [inUse|firstNode|secondNode|type|
                    firstPrevRelId|firstNextRelId|
                    secondPrevRelId|secondNextRelId|
                    nextPropId]

Property Store:    Linked list of property blocks
                   [type|keyIndex|value/pointer]

This means traversing a relationship is a pointer chase — O(1) per hop regardless of total graph size.

Index-Free Adjacency#

The key architectural decision: each node physically stores pointers to its adjacent relationships. No index lookup required for traversal.

Relational DB:   Find neighbors → scan index → join table → resolve rows
Graph DB:        Find neighbors → follow pointer → done

This is why graph databases achieve constant time per hop while relational JOIN performance degrades with table size.

Memory Architecture#

Page Cache:     Stores frequently accessed graph pages in memory
Transaction Log: Write-ahead log for durability
ID Buffers:     Recycle deleted node/relationship IDs
Query Cache:    Compiled query plans

Sizing rule of thumb: allocate enough page cache to hold your entire graph. A 10GB graph needs ~10GB page cache for optimal performance.

Real-World Use Cases#

// Recommendation: "People you may know"
MATCH (me:User {id: $userId})-[:FRIENDS]->(friend)-[:FRIENDS]->(suggestion)
WHERE NOT (me)-[:FRIENDS]->(suggestion)
  AND suggestion <> me
WITH suggestion, count(friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN suggestion.name, mutualFriends

Recommendation Engines#

// Collaborative filtering: users who bought X also bought Y
MATCH (u:User {id: $userId})-[:PURCHASED]->(product)<-[:PURCHASED]-(other),
      (other)-[:PURCHASED]->(recommendation)
WHERE NOT (u)-[:PURCHASED]->(recommendation)
WITH recommendation, count(other) AS score
ORDER BY score DESC
LIMIT 5
RETURN recommendation.name, recommendation.price, score

Fraud Detection#

// Find circular money flows (potential money laundering)
MATCH path = (a:Account)-[:TRANSFERRED_TO*3..8]->(a)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
  AND ALL(r IN relationships(path) WHERE
      duration.between(r.timestamp, head(
        [r2 IN relationships(path)
         WHERE startNode(r2) = endNode(r) | r2.timestamp]
      )).days < 7)
RETURN path,
       reduce(s = 0, r IN relationships(path) | s + r.amount) AS totalCirculated

Fraud detection is where graphs dominate — patterns that span multiple entities and relationships are trivial to express in Cypher but nearly impossible with SQL JOINs.

Graph vs Relational: Decision Framework#

Choose Graph When:                    Choose Relational When:
─────────────────────                 ──────────────────────
Queries involve 3+ JOINs             Data is tabular and uniform
Relationship patterns are complex     Aggregation/reporting is primary
Schema evolves frequently             Schema is stable and well-defined
Traversal depth varies                Queries are predictable
Connected data is the core value      Transactions span many tables

Performance comparison for "find 4th-degree connections":

Dataset: 1M nodes, 10M relationships
─────────────────────────────────────
Relational (PostgreSQL):  ~28 seconds
Graph (Neo4j):            ~2 milliseconds
─────────────────────────────────────
Speedup: ~14,000x

Modeling Best Practices#

1. Relationships ARE data    — put properties on edges, not just nodes
2. Use specific rel types    — :PURCHASED_ON vs generic :RELATED_TO
3. Avoid super nodes         — nodes with 1M+ relationships need partitioning
4. Index lookup properties   — create indexes on frequently queried node properties
5. Denormalize for reads     — duplicate properties to avoid extra traversals

// Create indexes for common lookups
CREATE INDEX FOR (p:Person) ON (p.email)
CREATE INDEX FOR (p:Product) ON (p.sku)
CREATE CONSTRAINT FOR (u:User) REQUIRE u.id IS UNIQUE

Key Takeaways#

Graph databases solve a fundamentally different problem than relational stores:

Property graphs model entities, relationships, and properties as first-class concepts
Cypher expresses complex traversal patterns in readable, declarative syntax
Index-free adjacency gives O(1) per-hop traversal regardless of graph size
Use cases like social networks, recommendations, and fraud detection are natural fits
Choose graphs when relationships are the core value of your data

If your SQL queries have more JOINs than columns in the SELECT clause, it is time to consider a graph database.

Article #320 in the Codelit engineering series. Explore graph databases, system design, and advanced architectures at codelit.io.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Data Warehouse & Analytics

Snowflake-like data warehouse with ELT pipelines, SQL analytics, dashboards, and data governance.

8 components

Build this architecture

Generate an interactive Graph Database Architecture in seconds.

Try it in Codelit →

graph databaseNeo4jdata modelingarchitecturesystem design

Graph Database Architecture: Model Connected Data with Neo4j and the Property Graph

March 29, 2026 6 min readBy Codelit Team Discussion

Graph Database Architecture#

The Property Graph Model#

A property graph has four building blocks:

Nodes      — entities (Person, Product, Company)
Labels     — categories on nodes (a node can have multiple)
Relationships — directed connections between nodes (FOLLOWS, PURCHASED)
Properties — key-value pairs on both nodes and relationships

Example structure:

(:Person {name: "Alice", age: 32})
  -[:FOLLOWS {since: "2024-01"}]->
(:Person {name: "Bob", age: 28})
  -[:PURCHASED {amount: 49.99}]->
(:Product {name: "Mechanical Keyboard", category: "electronics"})

Why Not Relational?#

Consider "find friends-of-friends who bought the same product":

-- Relational: 4 JOINs, increasingly slow with scale
SELECT DISTINCT fof.name
FROM users u
JOIN follows f1 ON u.id = f1.follower_id
JOIN follows f2 ON f1.followed_id = f2.follower_id
JOIN purchases p1 ON u.id = p1.user_id
JOIN purchases p2 ON fof.id = p2.user_id
JOIN users fof ON f2.followed_id = fof.id
WHERE u.name = 'Alice'
  AND p1.product_id = p2.product_id
  AND fof.id != u.id;

// Graph: reads like the question itself
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS]->()-[:FOLLOWS]->(fof:Person),
      (alice)-[:PURCHASED]->(product)<-[:PURCHASED]-(fof)
WHERE fof <> alice
RETURN DISTINCT fof.name

Cypher Query Language#

Cypher is the SQL of graph databases — declarative and pattern-based.

Core Patterns#

// Create nodes and relationships
CREATE (alice:Person {name: "Alice", role: "engineer"})
CREATE (bob:Person {name: "Bob", role: "designer"})
CREATE (alice)-[:WORKS_WITH {since: 2023}]->(bob)

// Find patterns
MATCH (p:Person)-[:WORKS_WITH]->(colleague)
WHERE p.role = "engineer"
RETURN p.name, collect(colleague.name) AS teammates

// Variable-length paths (1 to 5 hops)
MATCH path = (start:Person {name: "Alice"})-[:FOLLOWS*1..5]->(end:Person)
RETURN end.name, length(path) AS distance
ORDER BY distance

// Shortest path
MATCH path = shortestPath(
  (a:Person {name: "Alice"})-[:FOLLOWS*]-(b:Person {name: "Zara"})
)
RETURN [node IN nodes(path) | node.name] AS route

Aggregation and Projection#

// PageRank-style influence scoring
MATCH (p:Person)<-[:FOLLOWS]-(follower)
WITH p, count(follower) AS followerCount
ORDER BY followerCount DESC
LIMIT 10
RETURN p.name, followerCount

// Subgraph extraction
MATCH (p:Person)-[r]->(connected)
WHERE p.name IN ["Alice", "Bob"]
RETURN p, r, connected

Traversal Algorithms#

Graph databases excel at algorithmic queries that would require recursive CTEs or application-side logic in relational systems.

Breadth-First Search (BFS)#

Find all nodes within N hops:

// All people within 3 degrees of Alice
MATCH (alice:Person {name: "Alice"})-[:FOLLOWS*1..3]->(reachable:Person)
RETURN DISTINCT reachable.name,
       min(length(shortestPath((alice)-[:FOLLOWS*]-(reachable)))) AS distance

Community Detection#

Identify clusters of densely connected nodes:

// Using Neo4j Graph Data Science library
CALL gds.louvain.stream('social-graph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId
ORDER BY communityId, person

Path Analysis#

// All paths between two nodes (with cycle protection)
MATCH path = (a:Account {id: "ACC-001"})-[:TRANSFERRED_TO*1..6]->(b:Account {id: "ACC-999"})
WHERE ALL(n IN nodes(path) WHERE single(x IN nodes(path) WHERE x = n))
RETURN path,
       reduce(total = 0, r IN relationships(path) | total + r.amount) AS totalFlow

Neo4j Architecture Internals#

Storage Layer#

Neo4j uses a native graph storage engine — not a relational database underneath:

Node Store:        Fixed-size records (15 bytes each)
                   [inUse|nextRelId|nextPropId|labels|extra]

Relationship Store: Fixed-size records (34 bytes each)
                   [inUse|firstNode|secondNode|type|
                    firstPrevRelId|firstNextRelId|
                    secondPrevRelId|secondNextRelId|
                    nextPropId]

Property Store:    Linked list of property blocks
                   [type|keyIndex|value/pointer]

This means traversing a relationship is a pointer chase — O(1) per hop regardless of total graph size.

Index-Free Adjacency#

The key architectural decision: each node physically stores pointers to its adjacent relationships. No index lookup required for traversal.

Relational DB:   Find neighbors → scan index → join table → resolve rows
Graph DB:        Find neighbors → follow pointer → done

This is why graph databases achieve constant time per hop while relational JOIN performance degrades with table size.

Memory Architecture#

Page Cache:     Stores frequently accessed graph pages in memory
Transaction Log: Write-ahead log for durability
ID Buffers:     Recycle deleted node/relationship IDs
Query Cache:    Compiled query plans

Sizing rule of thumb: allocate enough page cache to hold your entire graph. A 10GB graph needs ~10GB page cache for optimal performance.

Real-World Use Cases#

// Recommendation: "People you may know"
MATCH (me:User {id: $userId})-[:FRIENDS]->(friend)-[:FRIENDS]->(suggestion)
WHERE NOT (me)-[:FRIENDS]->(suggestion)
  AND suggestion <> me
WITH suggestion, count(friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN suggestion.name, mutualFriends

Recommendation Engines#

// Collaborative filtering: users who bought X also bought Y
MATCH (u:User {id: $userId})-[:PURCHASED]->(product)<-[:PURCHASED]-(other),
      (other)-[:PURCHASED]->(recommendation)
WHERE NOT (u)-[:PURCHASED]->(recommendation)
WITH recommendation, count(other) AS score
ORDER BY score DESC
LIMIT 5
RETURN recommendation.name, recommendation.price, score

Fraud Detection#

// Find circular money flows (potential money laundering)
MATCH path = (a:Account)-[:TRANSFERRED_TO*3..8]->(a)
WHERE ALL(r IN relationships(path) WHERE r.amount > 10000)
  AND ALL(r IN relationships(path) WHERE
      duration.between(r.timestamp, head(
        [r2 IN relationships(path)
         WHERE startNode(r2) = endNode(r) | r2.timestamp]
      )).days < 7)
RETURN path,
       reduce(s = 0, r IN relationships(path) | s + r.amount) AS totalCirculated

Fraud detection is where graphs dominate — patterns that span multiple entities and relationships are trivial to express in Cypher but nearly impossible with SQL JOINs.

Graph vs Relational: Decision Framework#

Choose Graph When:                    Choose Relational When:
─────────────────────                 ──────────────────────
Queries involve 3+ JOINs             Data is tabular and uniform
Relationship patterns are complex     Aggregation/reporting is primary
Schema evolves frequently             Schema is stable and well-defined
Traversal depth varies                Queries are predictable
Connected data is the core value      Transactions span many tables

Performance comparison for "find 4th-degree connections":

Dataset: 1M nodes, 10M relationships
─────────────────────────────────────
Relational (PostgreSQL):  ~28 seconds
Graph (Neo4j):            ~2 milliseconds
─────────────────────────────────────
Speedup: ~14,000x

Modeling Best Practices#

1. Relationships ARE data    — put properties on edges, not just nodes
2. Use specific rel types    — :PURCHASED_ON vs generic :RELATED_TO
3. Avoid super nodes         — nodes with 1M+ relationships need partitioning
4. Index lookup properties   — create indexes on frequently queried node properties
5. Denormalize for reads     — duplicate properties to avoid extra traversals

// Create indexes for common lookups
CREATE INDEX FOR (p:Person) ON (p.email)
CREATE INDEX FOR (p:Product) ON (p.sku)
CREATE CONSTRAINT FOR (u:User) REQUIRE u.id IS UNIQUE

Key Takeaways#

Graph databases solve a fundamentally different problem than relational stores:

Property graphs model entities, relationships, and properties as first-class concepts
Cypher expresses complex traversal patterns in readable, declarative syntax
Index-free adjacency gives O(1) per-hop traversal regardless of graph size
Use cases like social networks, recommendations, and fraud detection are natural fits
Choose graphs when relationships are the core value of your data

If your SQL queries have more JOINs than columns in the SELECT clause, it is time to consider a graph database.

Article #320 in the Codelit engineering series. Explore graph databases, system design, and advanced architectures at codelit.io.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Graph Database Architecture in seconds.

Try it in Codelit →

Graph Database Architecture: Model Connected Data with Neo4j and the Property Graph

Graph Database Architecture#

The Property Graph Model#

Why Not Relational?#

Cypher Query Language#

Core Patterns#

Aggregation and Projection#

Traversal Algorithms#

Breadth-First Search (BFS)#

Community Detection#

Path Analysis#

Neo4j Architecture Internals#

Storage Layer#

Index-Free Adjacency#

Memory Architecture#

Real-World Use Cases#

Social Networks#

Recommendation Engines#

Fraud Detection#

Graph vs Relational: Decision Framework#

Modeling Best Practices#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Data Warehouse & Analytics

Build this architecture

Graph Database Architecture: Model Connected Data with Neo4j and the Property Graph

Graph Database Architecture#

The Property Graph Model#

Why Not Relational?#

Cypher Query Language#

Core Patterns#

Aggregation and Projection#

Traversal Algorithms#

Breadth-First Search (BFS)#

Community Detection#

Path Analysis#

Neo4j Architecture Internals#

Storage Layer#

Index-Free Adjacency#

Memory Architecture#

Real-World Use Cases#

Social Networks#

Recommendation Engines#

Fraud Detection#

Graph vs Relational: Decision Framework#

Modeling Best Practices#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Data Warehouse & Analytics

Build this architecture