api-designgrpcbackendsystem-design

gRPC Error Handling — Status Codes, Rich Errors, Retries, and Interceptors

March 29, 2026 8 min readBy Codelit Team Discussion

Why gRPC errors are different#

REST uses HTTP status codes — 200, 404, 500. Simple but limited. gRPC has its own status code system with richer semantics and a built-in mechanism for attaching structured error details.

If you treat gRPC errors like HTTP errors, you lose half the power.

gRPC status codes#

gRPC defines 17 status codes. Every response includes exactly one.

Code	Number	Meaning
OK	0	Success
CANCELLED	1	Client cancelled the request
UNKNOWN	2	Unknown error (often a server panic)
INVALID_ARGUMENT	3	Client sent bad input
DEADLINE_EXCEEDED	4	Timeout — operation took too long
NOT_FOUND	5	Resource does not exist
ALREADY_EXISTS	6	Resource already exists (conflict)
PERMISSION_DENIED	7	Caller lacks permission
RESOURCE_EXHAUSTED	8	Rate limit or quota exceeded
FAILED_PRECONDITION	9	Operation rejected due to system state
ABORTED	10	Operation aborted (concurrency conflict)
OUT_OF_RANGE	11	Operation outside valid range
UNIMPLEMENTED	12	Method not implemented
INTERNAL	13	Internal server error
UNAVAILABLE	14	Service temporarily unavailable
DATA_LOSS	15	Unrecoverable data loss
UNAUTHENTICATED	16	Missing or invalid authentication

Choosing the right code#

INVALID_ARGUMENT vs FAILED_PRECONDITION: Use INVALID_ARGUMENT when the input is bad regardless of system state (malformed email). Use FAILED_PRECONDITION when the input is valid but the system is not in the right state (deleting a non-empty directory).

UNAVAILABLE vs INTERNAL: Use UNAVAILABLE for transient failures the client should retry (service restarting). Use INTERNAL for bugs the client cannot fix by retrying.

NOT_FOUND vs PERMISSION_DENIED: If exposing the existence of a resource is a security concern, return PERMISSION_DENIED instead of NOT_FOUND.

Returning errors from a server#

Basic error response (Go)#

import (
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

func (s *server) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    if req.UserId == "" {
        return nil, status.Error(codes.InvalidArgument, "user_id is required")
    }

    user, err := s.db.FindUser(ctx, req.UserId)
    if err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return nil, status.Errorf(codes.NotFound, "user %s not found", req.UserId)
        }
        return nil, status.Error(codes.Internal, "failed to fetch user")
    }

    return user, nil
}

Basic error response (Python)#

import grpc

class UserService(user_pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        if not request.user_id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "user_id is required")

        user = self.db.find_user(request.user_id)
        if user is None:
            context.abort(grpc.StatusCode.NOT_FOUND, f"user {request.user_id} not found")

        return user

Never leak internal details. The status message goes to the client. "failed to fetch user" is fine. "connection refused to postgres://prod-db:5432" is not.

The rich error model#

A status code and message are often not enough. The client needs to know which field was invalid, how long to wait before retrying, or what went wrong in detail.

gRPC's rich error model lets you attach structured error details using protobuf messages from google.rpc.error_details.

Common error detail types#

BadRequest — field-level validation errors
RetryInfo — how long the client should wait before retrying
DebugInfo — stack traces and debug data (do not expose to external clients)
ErrorInfo — machine-readable error reason, domain, and metadata
QuotaFailure — which quota was exceeded
PreconditionFailure — which precondition was not met

Example: field validation errors (Go)#

import (
    "google.golang.org/genproto/googleapis/rpc/errdetails"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

func validateCreateUser(req *pb.CreateUserRequest) error {
    var violations []*errdetails.BadRequest_FieldViolation

    if req.Email == "" {
        violations = append(violations, &errdetails.BadRequest_FieldViolation{
            Field:       "email",
            Description: "email is required",
        })
    }
    if len(req.Password) &lt; 8 {
        violations = append(violations, &errdetails.BadRequest_FieldViolation{
            Field:       "password",
            Description: "password must be at least 8 characters",
        })
    }

    if len(violations) &gt; 0 {
        st := status.New(codes.InvalidArgument, "invalid request")
        detailed, err := st.WithDetails(&errdetails.BadRequest{
            FieldViolations: violations,
        })
        if err != nil {
            return st.Err()
        }
        return detailed.Err()
    }
    return nil
}

Example: retry info for rate limiting#

func (s *server) ProcessOrder(ctx context.Context, req *pb.OrderRequest) (*pb.OrderResponse, error) {
    if !s.rateLimiter.Allow() {
        st := status.New(codes.ResourceExhausted, "rate limit exceeded")
        detailed, _ := st.WithDetails(&errdetails.RetryInfo{
            RetryDelay: durationpb.New(30 * time.Second),
        })
        return nil, detailed.Err()
    }
    // process order...
}

Retry policies#

gRPC has built-in client-side retry support. Configure it via service config — no application code needed.

{
  "methodConfig": [{
    "name": [{"service": "mypackage.MyService"}],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "10s",
      "backoffMultiplier": 2.0,
      "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
    }
  }]
}

Which codes to retry#

UNAVAILABLE — always retry. The server is temporarily down.
DEADLINE_EXCEEDED — retry with caution. The operation might have partially completed.
RESOURCE_EXHAUSTED — retry after the delay from RetryInfo.
ABORTED — retry. Concurrency conflict that may resolve.
INTERNAL — usually do not retry. This is a bug, not a transient failure.
INVALID_ARGUMENT — never retry. The request is wrong.

Hedged requests#

For latency-sensitive calls, gRPC supports hedging: send the same request to multiple backends simultaneously and use the first response. Configure with care — it multiplies load.

{
  "methodConfig": [{
    "name": [{"service": "mypackage.ReadService"}],
    "hedgingPolicy": {
      "maxAttempts": 3,
      "hedgingDelay": "0.5s",
      "nonFatalStatusCodes": ["UNAVAILABLE", "INTERNAL"]
    }
  }]
}

Deadline propagation#

Every gRPC call should have a deadline. Deadlines prevent requests from hanging forever and propagate automatically through the call chain.

Client (5s deadline) → Service A (4.8s remaining) → Service B (4.5s remaining) → Database

Setting deadlines (Go)#

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "123"})
if err != nil {
    st := status.Convert(err)
    if st.Code() == codes.DeadlineExceeded {
        // handle timeout
    }
}

Propagation rules#

The deadline propagates through context — every downstream call inherits the remaining time
A downstream service cannot extend the deadline, only shorten it
When the deadline expires, all in-flight RPCs in the chain are cancelled
Always check ctx.Err() before starting expensive operations

Common mistake: no deadline#

A call without a deadline waits forever. If the server is slow, the client's goroutines/threads pile up. Eventually the client runs out of resources. Always set a deadline.

Error interceptors#

Interceptors (middleware) let you handle errors in one place instead of every RPC method.

Server-side error interceptor (Go)#

func errorInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    resp, err := handler(ctx, req)
    if err != nil {
        // Log the full error internally
        log.Errorf("RPC %s failed: %v", info.FullMethod, err)

        // Record metrics
        errorCounter.WithLabelValues(
            info.FullMethod,
            status.Code(err).String(),
        ).Inc()

        // Sanitize: do not leak internal details to clients
        st := status.Convert(err)
        if st.Code() == codes.Internal {
            return nil, status.Error(codes.Internal, "internal error")
        }
    }
    return resp, err
}

server := grpc.NewServer(
    grpc.UnaryInterceptor(errorInterceptor),
)

What to do in interceptors#

Log every error with full context (method, request ID, stack trace)
Record metrics — error rate by method and status code
Sanitize — strip internal details from INTERNAL and UNKNOWN errors
Translate — convert domain errors to gRPC status codes
Add metadata — inject request IDs or trace IDs into error details

Client-side handling#

Extracting error details (Go)#

resp, err := client.CreateUser(ctx, req)
if err != nil {
    st := status.Convert(err)

    // Check the status code
    switch st.Code() {
    case codes.InvalidArgument:
        // Extract field violations
        for _, detail := range st.Details() {
            if badReq, ok := detail.(*errdetails.BadRequest); ok {
                for _, v := range badReq.FieldViolations {
                    fmt.Printf("Field %s: %s\n", v.Field, v.Description)
                }
            }
        }
    case codes.ResourceExhausted:
        // Extract retry delay
        for _, detail := range st.Details() {
            if retryInfo, ok := detail.(*errdetails.RetryInfo); ok {
                time.Sleep(retryInfo.RetryDelay.AsDuration())
                // retry the call
            }
        }
    case codes.Unavailable:
        // Retry immediately — the built-in retry policy handles this
    default:
        log.Errorf("unexpected error: %s — %s", st.Code(), st.Message())
    }
}

Client-side best practices#

Always check the status code before the message — codes are stable, messages are not
Extract error details for actionable information (field violations, retry delays)
Do not parse the message string — it is for humans, not machines
Handle UNAVAILABLE and DEADLINE_EXCEEDED with retries
Log the full error including details for debugging

Visualize your gRPC architecture#

Map out your services, error flows, and retry policies — try Codelit to generate an interactive architecture diagram.

Key takeaways#

Use the right status code — INVALID_ARGUMENT for bad input, UNAVAILABLE for transient failures, INTERNAL for bugs
Attach rich error details — field violations, retry info, and error reasons give clients actionable information
Configure retry policies in service config — retry UNAVAILABLE and DEADLINE_EXCEEDED, never retry INVALID_ARGUMENT
Always set deadlines — a call without a deadline is a resource leak waiting to happen
Use interceptors for centralized logging, metrics, and error sanitization
Never leak internal details — sanitize INTERNAL errors before they reach the client

Article #436 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Build this architecture

Generate an interactive architecture for gRPC Error Handling in seconds.

Try it in Codelit →

api-designgrpcbackendsystem-design

gRPC Error Handling — Status Codes, Rich Errors, Retries, and Interceptors

March 29, 2026 8 min readBy Codelit Team Discussion

Why gRPC errors are different#

REST uses HTTP status codes — 200, 404, 500. Simple but limited. gRPC has its own status code system with richer semantics and a built-in mechanism for attaching structured error details.

If you treat gRPC errors like HTTP errors, you lose half the power.

gRPC status codes#

gRPC defines 17 status codes. Every response includes exactly one.

Code	Number	Meaning
OK	0	Success
CANCELLED	1	Client cancelled the request
UNKNOWN	2	Unknown error (often a server panic)
INVALID_ARGUMENT	3	Client sent bad input
DEADLINE_EXCEEDED	4	Timeout — operation took too long
NOT_FOUND	5	Resource does not exist
ALREADY_EXISTS	6	Resource already exists (conflict)
PERMISSION_DENIED	7	Caller lacks permission
RESOURCE_EXHAUSTED	8	Rate limit or quota exceeded
FAILED_PRECONDITION	9	Operation rejected due to system state
ABORTED	10	Operation aborted (concurrency conflict)
OUT_OF_RANGE	11	Operation outside valid range
UNIMPLEMENTED	12	Method not implemented
INTERNAL	13	Internal server error
UNAVAILABLE	14	Service temporarily unavailable
DATA_LOSS	15	Unrecoverable data loss
UNAUTHENTICATED	16	Missing or invalid authentication

Choosing the right code#

UNAVAILABLE vs INTERNAL: Use UNAVAILABLE for transient failures the client should retry (service restarting). Use INTERNAL for bugs the client cannot fix by retrying.

NOT_FOUND vs PERMISSION_DENIED: If exposing the existence of a resource is a security concern, return PERMISSION_DENIED instead of NOT_FOUND.

Returning errors from a server#

Basic error response (Go)#

import (
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

func (s *server) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    if req.UserId == "" {
        return nil, status.Error(codes.InvalidArgument, "user_id is required")
    }

    user, err := s.db.FindUser(ctx, req.UserId)
    if err != nil {
        if errors.Is(err, sql.ErrNoRows) {
            return nil, status.Errorf(codes.NotFound, "user %s not found", req.UserId)
        }
        return nil, status.Error(codes.Internal, "failed to fetch user")
    }

    return user, nil
}

Basic error response (Python)#

import grpc

class UserService(user_pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        if not request.user_id:
            context.abort(grpc.StatusCode.INVALID_ARGUMENT, "user_id is required")

        user = self.db.find_user(request.user_id)
        if user is None:
            context.abort(grpc.StatusCode.NOT_FOUND, f"user {request.user_id} not found")

        return user

Never leak internal details. The status message goes to the client. "failed to fetch user" is fine. "connection refused to postgres://prod-db:5432" is not.

The rich error model#

A status code and message are often not enough. The client needs to know which field was invalid, how long to wait before retrying, or what went wrong in detail.

gRPC's rich error model lets you attach structured error details using protobuf messages from google.rpc.error_details.

Common error detail types#

BadRequest — field-level validation errors
RetryInfo — how long the client should wait before retrying
DebugInfo — stack traces and debug data (do not expose to external clients)
ErrorInfo — machine-readable error reason, domain, and metadata
QuotaFailure — which quota was exceeded
PreconditionFailure — which precondition was not met

Example: field validation errors (Go)#

import (
    "google.golang.org/genproto/googleapis/rpc/errdetails"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

func validateCreateUser(req *pb.CreateUserRequest) error {
    var violations []*errdetails.BadRequest_FieldViolation

    if req.Email == "" {
        violations = append(violations, &errdetails.BadRequest_FieldViolation{
            Field:       "email",
            Description: "email is required",
        })
    }
    if len(req.Password) &lt; 8 {
        violations = append(violations, &errdetails.BadRequest_FieldViolation{
            Field:       "password",
            Description: "password must be at least 8 characters",
        })
    }

    if len(violations) &gt; 0 {
        st := status.New(codes.InvalidArgument, "invalid request")
        detailed, err := st.WithDetails(&errdetails.BadRequest{
            FieldViolations: violations,
        })
        if err != nil {
            return st.Err()
        }
        return detailed.Err()
    }
    return nil
}

Example: retry info for rate limiting#

func (s *server) ProcessOrder(ctx context.Context, req *pb.OrderRequest) (*pb.OrderResponse, error) {
    if !s.rateLimiter.Allow() {
        st := status.New(codes.ResourceExhausted, "rate limit exceeded")
        detailed, _ := st.WithDetails(&errdetails.RetryInfo{
            RetryDelay: durationpb.New(30 * time.Second),
        })
        return nil, detailed.Err()
    }
    // process order...
}

Retry policies#

gRPC has built-in client-side retry support. Configure it via service config — no application code needed.

{
  "methodConfig": [{
    "name": [{"service": "mypackage.MyService"}],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "10s",
      "backoffMultiplier": 2.0,
      "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
    }
  }]
}

Which codes to retry#

UNAVAILABLE — always retry. The server is temporarily down.
DEADLINE_EXCEEDED — retry with caution. The operation might have partially completed.
RESOURCE_EXHAUSTED — retry after the delay from RetryInfo.
ABORTED — retry. Concurrency conflict that may resolve.
INTERNAL — usually do not retry. This is a bug, not a transient failure.
INVALID_ARGUMENT — never retry. The request is wrong.

Hedged requests#

For latency-sensitive calls, gRPC supports hedging: send the same request to multiple backends simultaneously and use the first response. Configure with care — it multiplies load.

{
  "methodConfig": [{
    "name": [{"service": "mypackage.ReadService"}],
    "hedgingPolicy": {
      "maxAttempts": 3,
      "hedgingDelay": "0.5s",
      "nonFatalStatusCodes": ["UNAVAILABLE", "INTERNAL"]
    }
  }]
}

Deadline propagation#

Every gRPC call should have a deadline. Deadlines prevent requests from hanging forever and propagate automatically through the call chain.

Client (5s deadline) → Service A (4.8s remaining) → Service B (4.5s remaining) → Database

Setting deadlines (Go)#

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

resp, err := client.GetUser(ctx, &pb.GetUserRequest{UserId: "123"})
if err != nil {
    st := status.Convert(err)
    if st.Code() == codes.DeadlineExceeded {
        // handle timeout
    }
}

Propagation rules#

The deadline propagates through context — every downstream call inherits the remaining time
A downstream service cannot extend the deadline, only shorten it
When the deadline expires, all in-flight RPCs in the chain are cancelled
Always check ctx.Err() before starting expensive operations

Common mistake: no deadline#

A call without a deadline waits forever. If the server is slow, the client's goroutines/threads pile up. Eventually the client runs out of resources. Always set a deadline.

Error interceptors#

Interceptors (middleware) let you handle errors in one place instead of every RPC method.

Server-side error interceptor (Go)#

func errorInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    resp, err := handler(ctx, req)
    if err != nil {
        // Log the full error internally
        log.Errorf("RPC %s failed: %v", info.FullMethod, err)

        // Record metrics
        errorCounter.WithLabelValues(
            info.FullMethod,
            status.Code(err).String(),
        ).Inc()

        // Sanitize: do not leak internal details to clients
        st := status.Convert(err)
        if st.Code() == codes.Internal {
            return nil, status.Error(codes.Internal, "internal error")
        }
    }
    return resp, err
}

server := grpc.NewServer(
    grpc.UnaryInterceptor(errorInterceptor),
)

What to do in interceptors#

Log every error with full context (method, request ID, stack trace)
Record metrics — error rate by method and status code
Sanitize — strip internal details from INTERNAL and UNKNOWN errors
Translate — convert domain errors to gRPC status codes
Add metadata — inject request IDs or trace IDs into error details

Client-side handling#

Extracting error details (Go)#

resp, err := client.CreateUser(ctx, req)
if err != nil {
    st := status.Convert(err)

    // Check the status code
    switch st.Code() {
    case codes.InvalidArgument:
        // Extract field violations
        for _, detail := range st.Details() {
            if badReq, ok := detail.(*errdetails.BadRequest); ok {
                for _, v := range badReq.FieldViolations {
                    fmt.Printf("Field %s: %s\n", v.Field, v.Description)
                }
            }
        }
    case codes.ResourceExhausted:
        // Extract retry delay
        for _, detail := range st.Details() {
            if retryInfo, ok := detail.(*errdetails.RetryInfo); ok {
                time.Sleep(retryInfo.RetryDelay.AsDuration())
                // retry the call
            }
        }
    case codes.Unavailable:
        // Retry immediately — the built-in retry policy handles this
    default:
        log.Errorf("unexpected error: %s — %s", st.Code(), st.Message())
    }
}

Client-side best practices#

Always check the status code before the message — codes are stable, messages are not
Extract error details for actionable information (field violations, retry delays)
Do not parse the message string — it is for humans, not machines
Handle UNAVAILABLE and DEADLINE_EXCEEDED with retries
Log the full error including details for debugging

Visualize your gRPC architecture#

Map out your services, error flows, and retry policies — try Codelit to generate an interactive architecture diagram.

Key takeaways#

Use the right status code — INVALID_ARGUMENT for bad input, UNAVAILABLE for transient failures, INTERNAL for bugs
Attach rich error details — field violations, retry info, and error reasons give clients actionable information
Configure retry policies in service config — retry UNAVAILABLE and DEADLINE_EXCEEDED, never retry INVALID_ARGUMENT
Always set deadlines — a call without a deadline is a resource leak waiting to happen
Use interceptors for centralized logging, metrics, and error sanitization
Never leak internal details — sanitize INTERNAL errors before they reach the client

Article #436 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this architecture →

Comments

API design

Try these templates

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Build this architecture

Generate an interactive architecture for gRPC Error Handling in seconds.

Try it in Codelit →

gRPC Error Handling — Status Codes, Rich Errors, Retries, and Interceptors

Why gRPC errors are different#

gRPC status codes#

Choosing the right code#

Returning errors from a server#

Basic error response (Go)#

Basic error response (Python)#

The rich error model#

Common error detail types#

Example: field validation errors (Go)#

Example: retry info for rate limiting#

Retry policies#

Which codes to retry#

Hedged requests#

Deadline propagation#

Setting deadlines (Go)#

Propagation rules#

Common mistake: no deadline#

Error interceptors#

Server-side error interceptor (Go)#

What to do in interceptors#

Client-side handling#

Extracting error details (Go)#

Client-side best practices#

Visualize your gRPC architecture#

Key takeaways#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

Build this architecture

gRPC Error Handling — Status Codes, Rich Errors, Retries, and Interceptors

Why gRPC errors are different#

gRPC status codes#

Choosing the right code#

Returning errors from a server#

Basic error response (Go)#

Basic error response (Python)#

The rich error model#

Common error detail types#

Example: field validation errors (Go)#

Example: retry info for rate limiting#

Retry policies#

Which codes to retry#

Hedged requests#

Deadline propagation#

Setting deadlines (Go)#

Propagation rules#

Common mistake: no deadline#

Error interceptors#

Server-side error interceptor (Go)#

What to do in interceptors#

Client-side handling#

Extracting error details (Go)#

Client-side best practices#

Visualize your gRPC architecture#

Key takeaways#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Multiplayer Game Backend

Build this architecture