Database Types Comparison: SQL, Document, Graph, Vector & Beyond
Choosing the right database is one of the most consequential architecture decisions you will make. Each database type optimizes for a different access pattern, and picking the wrong one means fighting the storage engine instead of leveraging it.
Relational (SQL) Databases#
Relational databases store data in tables with rows and columns. They enforce schemas, support ACID transactions, and use SQL for querying.
Examples: PostgreSQL, MySQL, SQL Server, Oracle, CockroachDB.
Strengths:
- Strong consistency and transactional guarantees.
- Mature tooling, indexing, and query optimization.
- Joins across tables are first-class operations.
- Schema enforcement catches data quality issues early.
Best for: Financial systems, ERP, inventory management, any workload where correctness and complex queries matter more than horizontal scale.
Trade-offs: Vertical scaling hits limits. Sharding is possible but adds operational complexity. Schema migrations on large tables can be disruptive.
Document Databases#
Document databases store data as semi-structured documents — typically JSON or BSON. Each document can have a different structure, and nesting is natural.
Examples: MongoDB, Couchbase, Amazon DocumentDB, Firestore.
Strengths:
- Flexible schema — evolve structure without migrations.
- Natural mapping to application objects.
- Horizontal scaling through sharding.
- Reads that fetch a single entity are fast when all related data is embedded.
Best for: Content management, user profiles, product catalogs, event-driven applications with heterogeneous data shapes.
Trade-offs: Joins across documents are expensive or unsupported. Denormalization leads to data duplication. Transactions across documents are limited (though improving).
Key-Value Stores#
Key-value stores are the simplest database model: a key maps to a value. The database treats the value as an opaque blob.
Examples: Redis, Memcached, Amazon DynamoDB, etcd, Riak.
Strengths:
- Extremely low latency — sub-millisecond reads and writes.
- Simple API: GET, PUT, DELETE.
- Horizontal scaling is straightforward via consistent hashing.
Best for: Session management, caching, feature flags, rate limiting, leaderboards, shopping carts.
Trade-offs: No query language beyond key lookup. Range queries require careful key design (DynamoDB adds sort keys to address this). No joins, no aggregations at the storage layer.
Graph Databases#
Graph databases model data as nodes (entities) and edges (relationships). Traversals across relationships are first-class operations.
Examples: Neo4j, Amazon Neptune, ArangoDB, Dgraph, JanusGraph.
Strengths:
- Relationship traversal in O(1) per hop — no expensive joins.
- Pattern matching queries (Cypher, Gremlin, SPARQL).
- Natural fit for interconnected data.
Best for: Social networks, recommendation engines, fraud detection, knowledge graphs, network topology, access control graphs.
Trade-offs: Aggregation queries are slower than in relational databases. Sharding graphs without cutting important edges is hard. Smaller ecosystem and fewer managed offerings.
Time-Series Databases#
Time-series databases optimize for append-heavy workloads where data is indexed by timestamp and typically queried over time ranges.
Examples: InfluxDB, TimescaleDB (PostgreSQL extension), QuestDB, Amazon Timestream, Prometheus (for metrics).
Strengths:
- Columnar compression yields 10-20x storage savings over row-based stores.
- Built-in downsampling, retention policies, and continuous aggregations.
- Fast range scans and time-windowed aggregations.
Best for: Infrastructure monitoring, IoT sensor data, financial tick data, application metrics, event logs.
Trade-offs: Not designed for random updates or deletes. Point lookups by non-time keys are slow. Schema is typically narrow (timestamp + tags + values).
Wide-Column Stores#
Wide-column stores organize data into rows and column families. Each row can have a different set of columns, and columns are grouped for efficient disk access.
Examples: Apache Cassandra, Apache HBase, Google Bigtable, ScyllaDB.
Strengths:
- Massive write throughput with linear horizontal scaling.
- Tunable consistency (Cassandra offers per-query consistency levels).
- Efficient range scans within a partition.
Best for: High-volume event logging, messaging platforms, recommendation data, time-partitioned analytics, IoT at scale.
Trade-offs: Data modeling is query-driven — you design tables around access patterns, not entities. No joins. Secondary indexes are limited. Read-before-write patterns are expensive.
Vector Databases#
Vector databases store high-dimensional embeddings and support approximate nearest neighbor (ANN) search. They power similarity-based retrieval for AI applications.
Examples: Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector (PostgreSQL extension).
Strengths:
- Sub-second similarity search over millions of vectors.
- Support for hybrid search (vector + metadata filtering).
- Integration with embedding pipelines and LLM workflows.
Best for: Semantic search, RAG (retrieval-augmented generation), image similarity, recommendation systems, anomaly detection on embeddings.
Trade-offs: ANN search is approximate — recall is not 100%. Index building is compute-intensive. Updates to individual vectors can trigger expensive re-indexing. The space is young and APIs are still evolving.
Decision Framework#
When choosing a database, start with your access patterns:
What is your primary access pattern?
│
├── Transactions + complex queries ──────▶ Relational (PostgreSQL)
├── Flexible documents, rapid iteration ─▶ Document (MongoDB)
├── Simple key lookup, sub-ms latency ──▶ Key-Value (Redis, DynamoDB)
├── Relationship traversal ─────────────▶ Graph (Neo4j)
├── Time-ordered append-heavy writes ───▶ Time-Series (TimescaleDB)
├── Massive write throughput at scale ──▶ Wide-Column (Cassandra)
└── Similarity search on embeddings ────▶ Vector (Pinecone, pgvector)
Polyglot Persistence#
Most production systems use more than one database. A typical e-commerce platform might use:
- PostgreSQL for orders and inventory (ACID transactions).
- Redis for session caching and rate limiting.
- Elasticsearch for product search.
- A vector database for "similar items" recommendations.
The key is to let each database do what it does best rather than forcing a single engine to serve every access pattern.
Key Takeaways#
- There is no universal best database. The right choice depends on access patterns, consistency requirements, and scale.
- Relational databases remain the default for transactional workloads with complex queries.
- Document and key-value stores trade query flexibility for horizontal scale and developer velocity.
- Graph databases excel when relationships are the primary query target.
- Time-series and wide-column stores handle high-volume, append-heavy workloads that would overwhelm row-based engines.
- Vector databases are essential infrastructure for AI-powered search and retrieval.
- Polyglot persistence — using multiple database types together — is the norm in modern architectures.
Understand your read/write ratio, consistency needs, and query patterns before committing to a database. The storage engine you choose will shape your data model, your scaling strategy, and your operational burden for years.
Build and explore system design concepts hands-on at codelit.io.
291 articles on system design at codelit.io/blog.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
Try these templates
Build this architecture
Generate an interactive architecture for Database Types Comparison in seconds.
Try it in Codelit →
Comments