object storageS3architecturesystem designcloud storage

Object Storage Architecture: File, Block & Object Storage Explained

March 28, 2026 8 min readBy Codelit Team Discussion

Object Storage Architecture#

Every application stores files — user uploads, images, logs, backups, ML datasets. The architecture you choose for storage determines cost, performance, and scalability.

Three Storage Paradigms#

Block Storage#

Raw storage volumes attached to a single compute instance. Think of it as a virtual hard drive.

Block storage:
  /dev/sda1 → 500GB EBS volume → attached to one EC2 instance
  Low latency (~1ms), fixed size, no built-in sharing

Use cases: Databases, OS boot volumes, high-IOPS workloads. Examples: AWS EBS, Google Persistent Disk, Azure Managed Disks.

File Storage#

A shared filesystem accessible by multiple machines over a network (NFS/SMB).

File storage:
  /shared/uploads/ → NFS mount → accessible by 10 app servers
  Hierarchical directories, file locking, POSIX semantics

Use cases: Shared application data, home directories, CMS media. Examples: AWS EFS, Google Filestore, Azure Files.

Object Storage#

Flat namespace of objects (blobs) accessed via HTTP APIs. No directories — just buckets and keys.

Object storage:
  s3://my-bucket/users/42/avatar.png
  ↑ bucket    ↑ key (not a directory path)

  HTTP PUT/GET/DELETE — no filesystem semantics
  Virtually unlimited capacity, pay-per-GB

Use cases: User uploads, static assets, backups, data lakes, ML training data. Examples: AWS S3, Google Cloud Storage, Azure Blob Storage.

Comparison#

Feature	Block	File	Object
Access pattern	Mounted volume	Network filesystem	HTTP API
Max size	TBs per volume	PBs	Unlimited
Latency	~1ms	~5-10ms	~50-200ms
Concurrent access	Single instance	Multiple instances	Unlimited
Cost (per GB/mo)	$0.08-0.10	$0.30	$0.023
Metadata	Filesystem attrs	Filesystem attrs	Custom key-value

Object storage wins on cost and scalability. That's why it dominates modern architectures.

S3 Architecture Internals#

Amazon S3 — the de facto standard — is engineered for 99.999999999% (11 nines) durability.

How S3 Stores Data#

PUT s3://bucket/photo.jpg

1. Object split into chunks
2. Each chunk erasure-coded (not simple replication)
3. Coded fragments distributed across multiple AZs
4. Metadata stored in a distributed index
5. ACK returned to client

Erasure coding is more space-efficient than 3x replication while providing equivalent durability. A typical scheme like Reed-Solomon 8/4 stores 12 fragments — any 8 can reconstruct the object.

Key Design Decisions#

Flat namespace — No directories. The / in keys is just a convention
Immutable objects — You overwrite, not modify in place
Read-after-write consistency — S3 provides strong consistency for all operations (since 2020)
Unlimited scale — No provisioning. Throughput scales automatically

Presigned URLs#

Presigned URLs let clients upload/download directly to S3 without exposing credentials.

// Server generates a presigned upload URL
const url = s3.getSignedUrl('putObject', {
  Bucket: 'uploads',
  Key: `users/${userId}/${fileId}`,
  ContentType: 'image/jpeg',
  Expires: 300, // 5 minutes
});

// Client uploads directly to S3
// PUT https://uploads.s3.amazonaws.com/users/42/abc123?X-Amz-Signature=...

Architecture Flow#

1. Client → Server: "I want to upload a 5MB JPEG"
2. Server → S3: generate presigned PUT URL (5min TTL)
3. Server → Client: presigned URL
4. Client → S3: PUT file directly (server never touches the bytes)
5. S3 → Lambda/webhook: notify server of completed upload
6. Server: validate, process, update database

This offloads bandwidth and CPU from your servers entirely.

Multipart Uploads#

For large files (>100MB), multipart uploads provide reliability and parallelism.

Multipart upload flow:

1. Initiate → S3 returns uploadId
2. Upload parts in parallel (5MB-5GB each)
   Part 1: bytes 0-10MB      → ETag "abc"
   Part 2: bytes 10MB-20MB   → ETag "def"
   Part 3: bytes 20MB-30MB   → ETag "ghi"
3. Complete → send ordered list of ETags
4. S3 assembles final object

Failed part? Retry just that part, not the entire file.

Benefits:

Parallel uploads — Saturate bandwidth with concurrent parts
Retry granularity — Only re-upload failed parts
Pause/resume — Upload can span multiple sessions
Required for objects >5GB

Storage Tiers#

Not all data is accessed equally. Tiering reduces costs dramatically.

Tier	Access Frequency	Retrieval Time	Cost/GB/mo	Example
Hot (Standard)	Frequent	Instant	$0.023	Active user uploads
Warm (IA)	Monthly	Instant	$0.0125	Old user files
Cold (Glacier Instant)	Quarterly	Instant	$0.004	Compliance archives
Archive (Glacier Deep)	Rarely	12-48 hours	$0.00099	Legal holds, raw logs

Lifecycle Policies#

Automate transitions between tiers:

{
  "Rules": [
    {
      "ID": "TierDown",
      "Status": "Enabled",
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER_INSTANT_RETRIEVAL" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "Expiration": { "Days": 2555 }
    }
  ]
}

Intelligent Tiering#

S3 Intelligent-Tiering automatically moves objects between tiers based on access patterns. Small monitoring fee per object, but zero retrieval fees.

CDN Integration#

Object storage + CDN = fast global delivery.

Architecture:

User → CloudFront/Cloudflare edge (cache HIT) → response in 5ms
User → CloudFront/Cloudflare edge (cache MISS) → S3 origin → cache + respond

Cache key: bucket + object key + query params
TTL: configured per path pattern (images: 30d, API: 0)

Cache Invalidation Strategies#

Immutable keys — Include content hash in filename: avatar-a3f8b2c1.jpg
Versioned keys — v3/logo.png
Explicit invalidation — Purge specific paths (slow, costly at scale)

Immutable keys are the gold standard — no invalidation needed, infinite cache TTL.

Metadata and Indexing#

Object storage has limited query capability. For searchable metadata, maintain a separate index.

S3 object:
  Key: uploads/user-42/invoice-2026-03.pdf
  Custom metadata: { "userId": "42", "type": "invoice", "month": "2026-03" }
  System metadata: { "size": 245000, "contentType": "application/pdf", "lastModified": "..." }

Separate index (DynamoDB/Postgres):
  { key, userId, type, uploadedAt, size, status, thumbnailKey }

S3 can list objects by prefix, but cannot query by metadata values. Always maintain an external index for search.

Deduplication#

Avoid storing duplicate files to save cost and bandwidth.

Content-Addressable Storage#

Hash the file content and use the hash as the key:

Upload flow:
1. Client hashes file → SHA-256: "a3f8b2c1..."
2. Client → Server: "Do you have a3f8b2c1?"
3. Server checks index → if exists, skip upload
4. If new, upload to s3://bucket/blobs/a3f8b2c1
5. Store reference: user-42/report.pdf → a3f8b2c1

Multiple users upload same file → stored once

Reference Counting#

Track how many logical files point to each physical blob. Delete the blob only when the reference count hits zero.

Tools and Providers#

Tool	Type	Standout Feature
AWS S3	Cloud	Industry standard, deepest ecosystem
Google Cloud Storage	Cloud	Tight BigQuery/ML integration
MinIO	Self-hosted	S3-compatible, runs on Kubernetes
Cloudflare R2	Cloud	Zero egress fees
Backblaze B2	Cloud	Lowest cost per GB ($0.006/GB/mo)

Choosing the Right Tool#

Default choice: S3 — widest tooling support, most documentation
Egress-heavy workloads: R2 — zero egress fees save thousands per month
Budget storage: Backblaze B2 — 1/4 the cost of S3 for archival
Self-hosted/air-gapped: MinIO — full S3 API compatibility on your infrastructure
GCP ecosystem: GCS — native integration with BigQuery, Vertex AI, Cloud Functions

Architecture Checklist#

Choose access pattern — Direct S3 access vs presigned URLs vs CDN
Set lifecycle policies — Automate tier transitions from day one
Enable versioning — Protect against accidental overwrites and deletions
Configure CORS — Required for browser-based direct uploads
Implement deduplication — Content-addressable storage for user-uploaded files
Maintain metadata index — External database for searchable file metadata
Set up CDN — CloudFront or Cloudflare in front of your bucket
Monitor costs — Storage, requests, and especially egress

Key Takeaways#

Object storage is the default — Use block/file storage only for specific needs
Presigned URLs offload your servers — Clients upload directly to S3
Tiering is free money — Move cold data to cheaper tiers automatically
CDN is non-negotiable — Cache static assets at the edge
Index externally — S3 is a blob store, not a database
Watch egress costs — Consider R2 or B2 if bandwidth is significant

Object storage underpins nearly every modern application. For deeper dives into storage patterns, CDN architecture, and system design, visit codelit.io.

This is article #170 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Context Engineering for Agentic Systems

2 min read

AI agents

AI Agent Memory Architecture

2 min read

AI agents

Production AI Agent Deployment Checklist

2 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Cloud File Storage Platform

Dropbox-like file storage with sync, sharing, versioning, and real-time collaboration.

8 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Build this architecture

Generate an interactive Object Storage Architecture in seconds.

Try it in Codelit →

object storageS3architecturesystem designcloud storage

Object Storage Architecture: File, Block & Object Storage Explained

March 28, 2026 8 min readBy Codelit Team Discussion

Object Storage Architecture#

Every application stores files — user uploads, images, logs, backups, ML datasets. The architecture you choose for storage determines cost, performance, and scalability.

Three Storage Paradigms#

Block Storage#

Raw storage volumes attached to a single compute instance. Think of it as a virtual hard drive.

Block storage:
  /dev/sda1 → 500GB EBS volume → attached to one EC2 instance
  Low latency (~1ms), fixed size, no built-in sharing

Use cases: Databases, OS boot volumes, high-IOPS workloads. Examples: AWS EBS, Google Persistent Disk, Azure Managed Disks.

File Storage#

A shared filesystem accessible by multiple machines over a network (NFS/SMB).

File storage:
  /shared/uploads/ → NFS mount → accessible by 10 app servers
  Hierarchical directories, file locking, POSIX semantics

Use cases: Shared application data, home directories, CMS media. Examples: AWS EFS, Google Filestore, Azure Files.

Object Storage#

Flat namespace of objects (blobs) accessed via HTTP APIs. No directories — just buckets and keys.

Object storage:
  s3://my-bucket/users/42/avatar.png
  ↑ bucket    ↑ key (not a directory path)

  HTTP PUT/GET/DELETE — no filesystem semantics
  Virtually unlimited capacity, pay-per-GB

Use cases: User uploads, static assets, backups, data lakes, ML training data. Examples: AWS S3, Google Cloud Storage, Azure Blob Storage.

Comparison#

Feature	Block	File	Object
Access pattern	Mounted volume	Network filesystem	HTTP API
Max size	TBs per volume	PBs	Unlimited
Latency	~1ms	~5-10ms	~50-200ms
Concurrent access	Single instance	Multiple instances	Unlimited
Cost (per GB/mo)	$0.08-0.10	$0.30	$0.023
Metadata	Filesystem attrs	Filesystem attrs	Custom key-value

Object storage wins on cost and scalability. That's why it dominates modern architectures.

S3 Architecture Internals#

Amazon S3 — the de facto standard — is engineered for 99.999999999% (11 nines) durability.

How S3 Stores Data#

PUT s3://bucket/photo.jpg

1. Object split into chunks
2. Each chunk erasure-coded (not simple replication)
3. Coded fragments distributed across multiple AZs
4. Metadata stored in a distributed index
5. ACK returned to client

Erasure coding is more space-efficient than 3x replication while providing equivalent durability. A typical scheme like Reed-Solomon 8/4 stores 12 fragments — any 8 can reconstruct the object.

Key Design Decisions#

Flat namespace — No directories. The / in keys is just a convention
Immutable objects — You overwrite, not modify in place
Read-after-write consistency — S3 provides strong consistency for all operations (since 2020)
Unlimited scale — No provisioning. Throughput scales automatically

Presigned URLs#

Presigned URLs let clients upload/download directly to S3 without exposing credentials.

// Server generates a presigned upload URL
const url = s3.getSignedUrl('putObject', {
  Bucket: 'uploads',
  Key: `users/${userId}/${fileId}`,
  ContentType: 'image/jpeg',
  Expires: 300, // 5 minutes
});

// Client uploads directly to S3
// PUT https://uploads.s3.amazonaws.com/users/42/abc123?X-Amz-Signature=...

Architecture Flow#

1. Client → Server: "I want to upload a 5MB JPEG"
2. Server → S3: generate presigned PUT URL (5min TTL)
3. Server → Client: presigned URL
4. Client → S3: PUT file directly (server never touches the bytes)
5. S3 → Lambda/webhook: notify server of completed upload
6. Server: validate, process, update database

This offloads bandwidth and CPU from your servers entirely.

Multipart Uploads#

For large files (>100MB), multipart uploads provide reliability and parallelism.

Multipart upload flow:

1. Initiate → S3 returns uploadId
2. Upload parts in parallel (5MB-5GB each)
   Part 1: bytes 0-10MB      → ETag "abc"
   Part 2: bytes 10MB-20MB   → ETag "def"
   Part 3: bytes 20MB-30MB   → ETag "ghi"
3. Complete → send ordered list of ETags
4. S3 assembles final object

Failed part? Retry just that part, not the entire file.

Benefits:

Parallel uploads — Saturate bandwidth with concurrent parts
Retry granularity — Only re-upload failed parts
Pause/resume — Upload can span multiple sessions
Required for objects >5GB

Storage Tiers#

Not all data is accessed equally. Tiering reduces costs dramatically.

Tier	Access Frequency	Retrieval Time	Cost/GB/mo	Example
Hot (Standard)	Frequent	Instant	$0.023	Active user uploads
Warm (IA)	Monthly	Instant	$0.0125	Old user files
Cold (Glacier Instant)	Quarterly	Instant	$0.004	Compliance archives
Archive (Glacier Deep)	Rarely	12-48 hours	$0.00099	Legal holds, raw logs

Lifecycle Policies#

Automate transitions between tiers:

{
  "Rules": [
    {
      "ID": "TierDown",
      "Status": "Enabled",
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER_INSTANT_RETRIEVAL" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "Expiration": { "Days": 2555 }
    }
  ]
}

Intelligent Tiering#

S3 Intelligent-Tiering automatically moves objects between tiers based on access patterns. Small monitoring fee per object, but zero retrieval fees.

CDN Integration#

Object storage + CDN = fast global delivery.

Architecture:

User → CloudFront/Cloudflare edge (cache HIT) → response in 5ms
User → CloudFront/Cloudflare edge (cache MISS) → S3 origin → cache + respond

Cache key: bucket + object key + query params
TTL: configured per path pattern (images: 30d, API: 0)

Cache Invalidation Strategies#

Immutable keys — Include content hash in filename: avatar-a3f8b2c1.jpg
Versioned keys — v3/logo.png
Explicit invalidation — Purge specific paths (slow, costly at scale)

Immutable keys are the gold standard — no invalidation needed, infinite cache TTL.

Metadata and Indexing#

Object storage has limited query capability. For searchable metadata, maintain a separate index.

S3 object:
  Key: uploads/user-42/invoice-2026-03.pdf
  Custom metadata: { "userId": "42", "type": "invoice", "month": "2026-03" }
  System metadata: { "size": 245000, "contentType": "application/pdf", "lastModified": "..." }

Separate index (DynamoDB/Postgres):
  { key, userId, type, uploadedAt, size, status, thumbnailKey }

S3 can list objects by prefix, but cannot query by metadata values. Always maintain an external index for search.

Deduplication#

Avoid storing duplicate files to save cost and bandwidth.

Content-Addressable Storage#

Hash the file content and use the hash as the key:

Upload flow:
1. Client hashes file → SHA-256: "a3f8b2c1..."
2. Client → Server: "Do you have a3f8b2c1?"
3. Server checks index → if exists, skip upload
4. If new, upload to s3://bucket/blobs/a3f8b2c1
5. Store reference: user-42/report.pdf → a3f8b2c1

Multiple users upload same file → stored once

Reference Counting#

Track how many logical files point to each physical blob. Delete the blob only when the reference count hits zero.

Tools and Providers#

Tool	Type	Standout Feature
AWS S3	Cloud	Industry standard, deepest ecosystem
Google Cloud Storage	Cloud	Tight BigQuery/ML integration
MinIO	Self-hosted	S3-compatible, runs on Kubernetes
Cloudflare R2	Cloud	Zero egress fees
Backblaze B2	Cloud	Lowest cost per GB ($0.006/GB/mo)

Choosing the Right Tool#

Default choice: S3 — widest tooling support, most documentation
Egress-heavy workloads: R2 — zero egress fees save thousands per month
Budget storage: Backblaze B2 — 1/4 the cost of S3 for archival
Self-hosted/air-gapped: MinIO — full S3 API compatibility on your infrastructure
GCP ecosystem: GCS — native integration with BigQuery, Vertex AI, Cloud Functions

Architecture Checklist#

Choose access pattern — Direct S3 access vs presigned URLs vs CDN
Set lifecycle policies — Automate tier transitions from day one
Enable versioning — Protect against accidental overwrites and deletions
Configure CORS — Required for browser-based direct uploads
Implement deduplication — Content-addressable storage for user-uploaded files
Maintain metadata index — External database for searchable file metadata
Set up CDN — CloudFront or Cloudflare in front of your bucket
Monitor costs — Storage, requests, and especially egress

Key Takeaways#

Object storage is the default — Use block/file storage only for specific needs
Presigned URLs offload your servers — Clients upload directly to S3
Tiering is free money — Move cold data to cheaper tiers automatically
CDN is non-negotiable — Cache static assets at the edge
Index externally — S3 is a blob store, not a database
Watch egress costs — Consider R2 or B2 if bandwidth is significant

Object storage underpins nearly every modern application. For deeper dives into storage patterns, CDN architecture, and system design, visit codelit.io.

This is article #170 in the Codelit engineering blog series.

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

AI agents

Build this architecture

Generate an interactive Object Storage Architecture in seconds.

Try it in Codelit →

Object Storage Architecture: File, Block & Object Storage Explained

Object Storage Architecture#

Three Storage Paradigms#

Block Storage#

File Storage#

Object Storage#

Comparison#

S3 Architecture Internals#

How S3 Stores Data#

Key Design Decisions#

Presigned URLs#

Architecture Flow#

Multipart Uploads#

Storage Tiers#

Lifecycle Policies#

Intelligent Tiering#

CDN Integration#

Cache Invalidation Strategies#

Metadata and Indexing#

Deduplication#

Content-Addressable Storage#

Reference Counting#

Tools and Providers#

Choosing the Right Tool#

Architecture Checklist#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Cloud File Storage Platform

Search Engine Architecture

Build this architecture

Object Storage Architecture: File, Block & Object Storage Explained

Object Storage Architecture#

Three Storage Paradigms#

Block Storage#

File Storage#

Object Storage#

Comparison#

S3 Architecture Internals#

How S3 Stores Data#

Key Design Decisions#

Presigned URLs#

Architecture Flow#

Multipart Uploads#

Storage Tiers#

Lifecycle Policies#

Intelligent Tiering#

CDN Integration#

Cache Invalidation Strategies#

Metadata and Indexing#

Deduplication#

Content-Addressable Storage#

Reference Counting#

Tools and Providers#

Choosing the Right Tool#

Architecture Checklist#

Key Takeaways#

Comments

Related articles

Context Engineering for Agentic Systems

AI Agent Memory Architecture

Production AI Agent Deployment Checklist

Try these templates

Netflix Video Streaming Architecture

Cloud File Storage Platform

Search Engine Architecture

Build this architecture