databasespostgresqldebugginginfrastructure

Database Connection Leak Detection — Find and Fix Leaks Before They Take You Down

March 29, 2026 7 min readBy Codelit Team Discussion

What is a connection leak?#

Your application opens a database connection, uses it, and never returns it to the pool. The pool shrinks. Eventually all connections are exhausted. New requests block or fail. Your application is down — not because the database is overloaded, but because your code forgot to close a connection.

Connection leaks are one of the most common causes of production outages in database-backed applications. They are insidious because they happen slowly, then all at once.

Symptoms of a connection leak#

Before you know the root cause, you see these patterns:

Gradual degradation#

Response times creep up over hours or days
The application works fine after a restart but degrades again
Database connections grow monotonically — they go up but never come back down

Sudden failure#

Connection pool exhausted errors: Unable to acquire connection from pool
PostgreSQL: FATAL: too many connections for role
MySQL: Too many connections
Application threads blocked waiting for a connection

Misleading metrics#

Database CPU and memory look fine — the database is not the bottleneck
Application memory grows slowly (each leaked connection holds buffers)
Load balancer health checks start failing intermittently

Detection with PostgreSQL pg_stat_activity#

PostgreSQL exposes every active connection through the pg_stat_activity view. This is your primary diagnostic tool.

Count connections by state#

SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'your_database'
GROUP BY state
ORDER BY count DESC;

Healthy output shows most connections as idle (returned to pool) with a few active (executing queries). If you see hundreds of idle connections growing over time, connections are being opened but not reused properly.

Find long-lived idle connections#

SELECT pid, usename, application_name, client_addr,
       backend_start, state_change,
       now() - state_change AS idle_duration,
       query
FROM pg_stat_activity
WHERE state = 'idle'
  AND now() - state_change &gt; interval '10 minutes'
ORDER BY idle_duration DESC;

Connections idle for longer than your pool's max idle time are likely leaked. The application_name and client_addr fields tell you which service is leaking.

Track connection age#

SELECT pid, usename, application_name,
       backend_start,
       now() - backend_start AS connection_age
FROM pg_stat_activity
WHERE datname = 'your_database'
ORDER BY connection_age DESC
LIMIT 20;

In a healthy pool, connections are recycled. If you see connections that are hours or days old while the pool's max lifetime is 30 minutes, those connections were leaked and never returned.

Identify the leaking query#

SELECT pid, query, state,
       now() - query_start AS query_duration,
       now() - backend_start AS connection_age
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start &gt; interval '5 minutes'
ORDER BY query_duration DESC;

Long-running queries that never complete can hold connections indefinitely. Look for queries stuck in idle in transaction state — this means a transaction was started but never committed or rolled back.

Connection pool monitoring#

Your connection pool is the chokepoint. Monitor it aggressively.

HikariCP (Java)#

HikariCP exposes metrics via JMX or Micrometer:

hikaricp_connections_active — currently in use
hikaricp_connections_idle — available in pool
hikaricp_connections_pending — threads waiting for a connection
hikaricp_connections_timeout_total — connection acquisition timeouts

Alert when pending is greater than 0 for more than 30 seconds. This means demand exceeds supply.

Alert when active equals maximum_pool_size for more than 60 seconds. The pool is saturated.

pgBouncer#

If you use pgBouncer as an external connection pooler:

SHOW POOLS;
SHOW CLIENTS;
SHOW SERVERS;

Key fields from SHOW POOLS:

cl_active — client connections actively linked to a server connection
cl_waiting — client connections waiting for a server connection
sv_active — server connections in use
sv_idle — server connections available

Alert when cl_waiting is greater than 0 persistently.

Generic metrics to track#

Regardless of your pool implementation, track these:

Metric	Healthy	Leaking
Active connections	Fluctuates with load	Grows monotonically
Idle connections	Stable	Shrinks to zero
Pending requests	0 or brief spikes	Sustained and growing
Connection age	Below max lifetime	Exceeds max lifetime
Pool utilization	20-70%	Approaching 100%

Leak prevention patterns#

Always use try-with-resources (or equivalent)#

The most common leak: opening a connection and not closing it in the error path.

Java:

// Correct — connection is always returned
try (Connection conn = dataSource.getConnection()) {
    // use connection
} // auto-closed even if exception is thrown

Python:

# Correct — context manager returns connection
with engine.connect() as conn:
    result = conn.execute(query)
# connection returned to pool here

Go:

// Correct — defer ensures close
rows, err := db.Query(query)
if err != nil {
    return err
}
defer rows.Close() // always close rows

Set aggressive pool timeouts#

Do not let connections live forever. Configure:

Max lifetime: 30 minutes. Connections older than this are closed and replaced.
Idle timeout: 10 minutes. Idle connections beyond this are removed.
Connection timeout: 5 seconds. If a connection cannot be acquired in 5 seconds, fail fast rather than queue indefinitely.
Leak detection threshold: 60 seconds. If a connection is checked out for longer than this, log a warning with the stack trace.

Enable leak detection in your pool#

HikariCP has built-in leak detection:

spring:
  datasource:
    hikari:
      leak-detection-threshold: 60000  # 60 seconds

When a connection is held longer than the threshold, HikariCP logs the stack trace of the code that borrowed it. This pinpoints the exact line that forgot to return the connection.

Use statement timeouts#

Set a maximum query execution time at the database level:

-- PostgreSQL: kill queries running longer than 30 seconds
ALTER ROLE app_user SET statement_timeout = '30s';

This prevents a single runaway query from holding a connection indefinitely.

Wrap connection acquisition in middleware#

In web applications, acquire the connection at the start of the request and return it at the end — regardless of what happens in between:

@app.middleware("http")
async def db_session_middleware(request, call_next):
    async with get_db_session() as session:
        request.state.db = session
        response = await call_next(request)
    # session is returned here, guaranteed
    return response

Automated alerting#

Prometheus + Alertmanager rules#

groups:
  - name: connection_pool_alerts
    rules:
      - alert: ConnectionPoolExhausted
        expr: hikaricp_connections_active / hikaricp_connections_max == 1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Connection pool is fully exhausted"

      - alert: ConnectionPoolLeakSuspected
        expr: increase(hikaricp_connections_active[1h]) > 10
          and hikaricp_connections_idle == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Possible connection leak — active growing, idle at zero"

PostgreSQL connection count alert#

      - alert: PostgreSQLTooManyConnections
        expr: pg_stat_activity_count > (pg_settings_max_connections * 0.8)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL connections at 80% of max_connections"

Dashboard checklist#

Build a Grafana dashboard with these panels:

Connection pool utilization — active / max over time
Pending connection requests — should be zero
Connection age distribution — histogram of how long connections have been alive
pg_stat_activity by state — stacked area chart of idle, active, idle in transaction
Connection acquisition latency — p50, p95, p99

Emergency response#

When connections are exhausted in production:

Restart the leaking service — immediate relief, buys time
Kill idle connections in PostgreSQL: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND now() - state_change > interval '30 minutes';
Increase max_connections temporarily — not a fix, but buys time
Deploy a connection pooler (pgBouncer) if you do not have one
Find and fix the leak using the detection methods above

Visualize your database connection architecture#

Map out your services, connection pools, and database topology with interactive diagrams — try Codelit to generate architecture diagrams from a text prompt.

Key takeaways#

Connection leaks degrade slowly then fail suddenly — monitor pool utilization before it hits 100%
pg_stat_activity is your best friend — query it regularly to spot idle and long-lived connections
Always use try-with-resources or context managers to guarantee connection return
Enable leak detection in your pool (HikariCP's leak-detection-threshold logs the offending stack trace)
Set aggressive timeouts — max lifetime, idle timeout, statement timeout
Alert on pool saturation and monotonically growing active connections before users notice

Article #423 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

backend

API Gateway Custom Plugins — Extending Kong, Envoy, and NGINX

6 min read

api gateway

API Gateway Request Transformation — Headers, Bodies, URLs, and Protocol Translation

8 min read

api design

Health Check Endpoints — Liveness, Readiness, Startup Probes, and Dependency Checks

7 min read

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Connection Leak Detection in seconds.

Try it in Codelit →

databasespostgresqldebugginginfrastructure

Database Connection Leak Detection — Find and Fix Leaks Before They Take You Down

March 29, 2026 7 min readBy Codelit Team Discussion

What is a connection leak?#

Connection leaks are one of the most common causes of production outages in database-backed applications. They are insidious because they happen slowly, then all at once.

Symptoms of a connection leak#

Before you know the root cause, you see these patterns:

Gradual degradation#

Response times creep up over hours or days
The application works fine after a restart but degrades again
Database connections grow monotonically — they go up but never come back down

Sudden failure#

Connection pool exhausted errors: Unable to acquire connection from pool
PostgreSQL: FATAL: too many connections for role
MySQL: Too many connections
Application threads blocked waiting for a connection

Misleading metrics#

Database CPU and memory look fine — the database is not the bottleneck
Application memory grows slowly (each leaked connection holds buffers)
Load balancer health checks start failing intermittently

Detection with PostgreSQL pg_stat_activity#

PostgreSQL exposes every active connection through the pg_stat_activity view. This is your primary diagnostic tool.

Count connections by state#

SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'your_database'
GROUP BY state
ORDER BY count DESC;

Find long-lived idle connections#

SELECT pid, usename, application_name, client_addr,
       backend_start, state_change,
       now() - state_change AS idle_duration,
       query
FROM pg_stat_activity
WHERE state = 'idle'
  AND now() - state_change &gt; interval '10 minutes'
ORDER BY idle_duration DESC;

Connections idle for longer than your pool's max idle time are likely leaked. The application_name and client_addr fields tell you which service is leaking.

Track connection age#

SELECT pid, usename, application_name,
       backend_start,
       now() - backend_start AS connection_age
FROM pg_stat_activity
WHERE datname = 'your_database'
ORDER BY connection_age DESC
LIMIT 20;

In a healthy pool, connections are recycled. If you see connections that are hours or days old while the pool's max lifetime is 30 minutes, those connections were leaked and never returned.

Identify the leaking query#

SELECT pid, query, state,
       now() - query_start AS query_duration,
       now() - backend_start AS connection_age
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start &gt; interval '5 minutes'
ORDER BY query_duration DESC;

Connection pool monitoring#

Your connection pool is the chokepoint. Monitor it aggressively.

HikariCP (Java)#

HikariCP exposes metrics via JMX or Micrometer:

hikaricp_connections_active — currently in use
hikaricp_connections_idle — available in pool
hikaricp_connections_pending — threads waiting for a connection
hikaricp_connections_timeout_total — connection acquisition timeouts

Alert when pending is greater than 0 for more than 30 seconds. This means demand exceeds supply.

Alert when active equals maximum_pool_size for more than 60 seconds. The pool is saturated.

pgBouncer#

If you use pgBouncer as an external connection pooler:

SHOW POOLS;
SHOW CLIENTS;
SHOW SERVERS;

Key fields from SHOW POOLS:

cl_active — client connections actively linked to a server connection
cl_waiting — client connections waiting for a server connection
sv_active — server connections in use
sv_idle — server connections available

Alert when cl_waiting is greater than 0 persistently.

Generic metrics to track#

Regardless of your pool implementation, track these:

Metric	Healthy	Leaking
Active connections	Fluctuates with load	Grows monotonically
Idle connections	Stable	Shrinks to zero
Pending requests	0 or brief spikes	Sustained and growing
Connection age	Below max lifetime	Exceeds max lifetime
Pool utilization	20-70%	Approaching 100%

Leak prevention patterns#

Always use try-with-resources (or equivalent)#

The most common leak: opening a connection and not closing it in the error path.

Java:

// Correct — connection is always returned
try (Connection conn = dataSource.getConnection()) {
    // use connection
} // auto-closed even if exception is thrown

Python:

# Correct — context manager returns connection
with engine.connect() as conn:
    result = conn.execute(query)
# connection returned to pool here

Go:

// Correct — defer ensures close
rows, err := db.Query(query)
if err != nil {
    return err
}
defer rows.Close() // always close rows

Set aggressive pool timeouts#

Do not let connections live forever. Configure:

Max lifetime: 30 minutes. Connections older than this are closed and replaced.
Idle timeout: 10 minutes. Idle connections beyond this are removed.
Connection timeout: 5 seconds. If a connection cannot be acquired in 5 seconds, fail fast rather than queue indefinitely.
Leak detection threshold: 60 seconds. If a connection is checked out for longer than this, log a warning with the stack trace.

Enable leak detection in your pool#

HikariCP has built-in leak detection:

spring:
  datasource:
    hikari:
      leak-detection-threshold: 60000  # 60 seconds

When a connection is held longer than the threshold, HikariCP logs the stack trace of the code that borrowed it. This pinpoints the exact line that forgot to return the connection.

Use statement timeouts#

Set a maximum query execution time at the database level:

-- PostgreSQL: kill queries running longer than 30 seconds
ALTER ROLE app_user SET statement_timeout = '30s';

This prevents a single runaway query from holding a connection indefinitely.

Wrap connection acquisition in middleware#

In web applications, acquire the connection at the start of the request and return it at the end — regardless of what happens in between:

@app.middleware("http")
async def db_session_middleware(request, call_next):
    async with get_db_session() as session:
        request.state.db = session
        response = await call_next(request)
    # session is returned here, guaranteed
    return response

Automated alerting#

Prometheus + Alertmanager rules#

groups:
  - name: connection_pool_alerts
    rules:
      - alert: ConnectionPoolExhausted
        expr: hikaricp_connections_active / hikaricp_connections_max == 1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Connection pool is fully exhausted"

      - alert: ConnectionPoolLeakSuspected
        expr: increase(hikaricp_connections_active[1h]) > 10
          and hikaricp_connections_idle == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Possible connection leak — active growing, idle at zero"

PostgreSQL connection count alert#

      - alert: PostgreSQLTooManyConnections
        expr: pg_stat_activity_count > (pg_settings_max_connections * 0.8)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PostgreSQL connections at 80% of max_connections"

Dashboard checklist#

Build a Grafana dashboard with these panels:

Connection pool utilization — active / max over time
Pending connection requests — should be zero
Connection age distribution — histogram of how long connections have been alive
pg_stat_activity by state — stacked area chart of idle, active, idle in transaction
Connection acquisition latency — p50, p95, p99

Emergency response#

When connections are exhausted in production:

Restart the leaking service — immediate relief, buys time
Kill idle connections in PostgreSQL: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND now() - state_change > interval '30 minutes';
Increase max_connections temporarily — not a fix, but buys time
Deploy a connection pooler (pgBouncer) if you do not have one
Find and fix the leak using the detection methods above

Visualize your database connection architecture#

Map out your services, connection pools, and database topology with interactive diagrams — try Codelit to generate architecture diagrams from a text prompt.

Key takeaways#

Connection leaks degrade slowly then fail suddenly — monitor pool utilization before it hits 100%
pg_stat_activity is your best friend — query it regularly to spot idle and long-lived connections
Always use try-with-resources or context managers to guarantee connection return
Enable leak detection in your pool (HikariCP's leak-detection-threshold logs the offending stack trace)
Set aggressive timeouts — max lifetime, idle timeout, statement timeout
Alert on pool saturation and monotonically growing active connections before users notice

Article #423 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

{ }

Explore the Discord architecture interactively

Try it →

Try it on Codelit

Cost Estimator

See estimated AWS monthly costs for every component in your architecture

Build this architecture →

Comments

backend

Try these templates

PostgreSQL High Availability Cluster

Production PostgreSQL with streaming replication, connection pooling, automated failover, and monitoring.

10 components

Build this architecture

Generate an interactive architecture for Database Connection Leak Detection in seconds.

Try it in Codelit →

Database Connection Leak Detection — Find and Fix Leaks Before They Take You Down

What is a connection leak?#

Symptoms of a connection leak#

Gradual degradation#

Sudden failure#

Misleading metrics#

Detection with PostgreSQL pg_stat_activity#

Count connections by state#

Find long-lived idle connections#

Track connection age#

Identify the leaking query#

Connection pool monitoring#

HikariCP (Java)#

pgBouncer#

Generic metrics to track#

Leak prevention patterns#

Always use try-with-resources (or equivalent)#

Set aggressive pool timeouts#

Enable leak detection in your pool#

Use statement timeouts#

Wrap connection acquisition in middleware#

Automated alerting#

Prometheus + Alertmanager rules#

PostgreSQL connection count alert#

Dashboard checklist#

Emergency response#

Visualize your database connection architecture#

Key takeaways#

Comments

Related articles

API Gateway Custom Plugins — Extending Kong, Envoy, and NGINX

API Gateway Request Transformation — Headers, Bodies, URLs, and Protocol Translation

Health Check Endpoints — Liveness, Readiness, Startup Probes, and Dependency Checks

Try these templates

PostgreSQL High Availability Cluster

Build this architecture

Database Connection Leak Detection — Find and Fix Leaks Before They Take You Down

What is a connection leak?#

Symptoms of a connection leak#

Gradual degradation#

Sudden failure#

Misleading metrics#

Detection with PostgreSQL pg_stat_activity#

Count connections by state#

Find long-lived idle connections#

Track connection age#

Identify the leaking query#

Connection pool monitoring#

HikariCP (Java)#

pgBouncer#

Generic metrics to track#

Leak prevention patterns#

Always use try-with-resources (or equivalent)#

Set aggressive pool timeouts#

Enable leak detection in your pool#

Use statement timeouts#

Wrap connection acquisition in middleware#

Automated alerting#

Prometheus + Alertmanager rules#

PostgreSQL connection count alert#

Dashboard checklist#

Emergency response#

Visualize your database connection architecture#

Key takeaways#

Comments

Related articles

API Gateway Custom Plugins — Extending Kong, Envoy, and NGINX

API Gateway Request Transformation — Headers, Bodies, URLs, and Protocol Translation

Health Check Endpoints — Liveness, Readiness, Startup Probes, and Dependency Checks

Try these templates

PostgreSQL High Availability Cluster

Build this architecture