Skip to main content

Tomcat Graceful Shutdown — Connection Draining and Zero-Downtime Redeployment

When operating a production service, you will inevitably encounter situations requiring a Tomcat restart — deployments, emergency patches, or server maintenance. If you simply force-kill the Tomcat process (kill -9) without any preparation, all in-flight HTTP requests are immediately terminated, and clients experience 502 Bad Gateway or Connection Reset errors. Graceful Shutdown is the key technique to prevent this problem: it waits for existing requests to complete before safely shutting down the server.

Why Graceful Shutdown Is Necessary

Looking at the problems that occur with a forced kill (SIGKILL) in a real production environment makes it clear how dangerous abrupt termination can be.

Problems caused by forced termination:

  • Loss of in-progress HTTP response data
  • Incomplete database transaction rollbacks causing data inconsistencies
  • File corruption from interrupted writes
  • Loss of user session data
  • Catastrophic errors such as duplicate payments or duplicate processing when external API calls are cut off mid-flight

Graceful Shutdown stops accepting new requests (by removing the instance from the load balancer), waits for all currently processing requests to complete, and then safely shuts down.

Tomcat 9+ server.xml Graceful Shutdown Configuration

Tomcat's own Graceful Shutdown is configured in server.xml.

<!-- /opt/tomcat/conf/server.xml -->
<Server port="8005" shutdown="SHUTDOWN">

<!-- Connector configuration for Graceful Shutdown -->
<Connector port="8080"
protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"

<!-- Maximum requests per keep-alive connection -->
maxKeepAliveRequests="100"

<!-- Keep-Alive connection timeout (ms) -->
keepAliveTimeout="5000"

<!-- Maximum number of threads -->
maxThreads="200"

<!-- Minimum number of spare threads -->
minSpareThreads="10"

<!-- Connection accept count (queue backlog) -->
acceptCount="100"

<!-- Linger time when closing connections during Graceful Shutdown (ms) -->
connectionLinger="-1" />

<Engine name="Catalina" defaultHost="localhost">
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="false">
</Host>
</Engine>
</Server>

Tomcat receives a shutdown signal when the SHUTDOWN string is sent to the shutdown port (default 8005). However, this approach causes immediate termination, so implementing true Graceful Shutdown requires additional configuration and scripts.

Connector-Level Connection Draining Configuration

Connection Draining is the process of waiting for existing connections to close naturally. When using the Tomcat NIO Connector, configure it as follows.

<!-- Connection Draining optimization with NIO Connector -->
<Connector port="8080"
protocol="org.apache.coyote.http11.Http11NioProtocol"
connectionTimeout="20000"
redirectPort="8443"
maxThreads="200"
minSpareThreads="10"

<!-- Shorten Keep-Alive timeout for faster draining -->
keepAliveTimeout="3000"

<!-- Maximum connections -->
maxConnections="1000"

<!-- Socket linger: -1 uses OS default -->
connectionLinger="-1"

<!-- Socket SO_TIMEOUT -->
soTimeout="60000" />

Connection Draining in the Shutdown Script

#!/bin/bash
# /opt/tomcat/bin/graceful-shutdown.sh

TOMCAT_HOME=/opt/tomcat
TOMCAT_PID_FILE=$TOMCAT_HOME/tomcat.pid
DRAIN_WAIT=30 # Connection draining wait time (seconds)
SHUTDOWN_TIMEOUT=60 # Maximum shutdown wait time (seconds)

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting Graceful Shutdown"

# 1. Get Tomcat PID
if [ ! -f "$TOMCAT_PID_FILE" ]; then
TOMCAT_PID=$(pgrep -f 'catalina')
else
TOMCAT_PID=$(cat $TOMCAT_PID_FILE)
fi

if [ -z "$TOMCAT_PID" ]; then
echo "Tomcat is not running."
exit 0
fi

echo "Tomcat PID: $TOMCAT_PID"

# 2. Send SIGTERM (begin Graceful Shutdown)
kill -SIGTERM $TOMCAT_PID
echo "[$(date '+%Y-%m-%d %H:%M:%S')] SIGTERM sent, waiting for draining..."

# 3. Wait for connection draining
sleep $DRAIN_WAIT

# 4. Confirm process termination (wait up to SHUTDOWN_TIMEOUT seconds)
ELAPSED=0
while kill -0 $TOMCAT_PID 2>/dev/null; do
if [ $ELAPSED -ge $SHUTDOWN_TIMEOUT ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Timeout! Forcing kill"
kill -SIGKILL $TOMCAT_PID
break
fi
sleep 5
ELAPSED=$((ELAPSED + 5))
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Still waiting... (${ELAPSED}s elapsed)"
done

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Tomcat shutdown complete"

Using JVM Shutdown Hooks

Java applications can detect shutdown signals through JVM Shutdown Hooks and perform cleanup tasks.

// GracefulShutdownManager.java
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;

public class GracefulShutdownManager {

private static final AtomicBoolean isShuttingDown = new AtomicBoolean(false);
private static final int SHUTDOWN_TIMEOUT_SECONDS = 30;

public static void registerShutdownHook(ExecutorService executorService) {
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
System.out.println("[ShutdownHook] JVM shutdown signal detected, starting Graceful Shutdown");
isShuttingDown.set(true);

// Stop accepting new tasks
executorService.shutdown();

try {
// Wait for existing tasks to complete
if (!executorService.awaitTermination(SHUTDOWN_TIMEOUT_SECONDS, TimeUnit.SECONDS)) {
System.err.println("[ShutdownHook] Timeout! Forcing shutdown");
executorService.shutdownNow();
if (!executorService.awaitTermination(10, TimeUnit.SECONDS)) {
System.err.println("[ShutdownHook] ExecutorService did not terminate");
}
}
} catch (InterruptedException e) {
executorService.shutdownNow();
Thread.currentThread().interrupt();
}

System.out.println("[ShutdownHook] Graceful Shutdown complete");
}, "shutdown-hook-thread"));
}

public static boolean isShuttingDown() {
return isShuttingDown.get();
}
}
// HealthCheckServlet.java — reflect shutdown state in the health check endpoint
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;

public class HealthCheckServlet extends HttpServlet {

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
if (GracefulShutdownManager.isShuttingDown()) {
// Returning 503 tells the load balancer to stop sending traffic here
resp.setStatus(HttpServletResponse.SC_SERVICE_UNAVAILABLE);
resp.getWriter().write("{\"status\": \"shutting_down\"}");
} else {
resp.setStatus(HttpServletResponse.SC_OK);
resp.getWriter().write("{\"status\": \"healthy\"}");
}
}
}

Spring Boot Embedded Tomcat Graceful Shutdown

Since Spring Boot 2.3.0, embedded Tomcat Graceful Shutdown can be simply configured in application.yml.

# application.yml
server:
# Enable Graceful Shutdown
shutdown: graceful
port: 8080
tomcat:
threads:
max: 200
min-spare: 10
connection-timeout: 20s
keep-alive-timeout: 5s

spring:
lifecycle:
# Maximum wait time for Graceful Shutdown
timeout-per-shutdown-phase: 30s
// Custom shutdown event handler
import org.springframework.context.ApplicationListener;
import org.springframework.context.event.ContextClosedEvent;
import org.springframework.stereotype.Component;

@Component
public class AppShutdownListener implements ApplicationListener<ContextClosedEvent> {

@Override
public void onApplicationEvent(ContextClosedEvent event) {
System.out.println("[Spring] Application context close event received");
// Clean up DB connections, external API connections, etc.
performCleanup();
}

private void performCleanup() {
System.out.println("[Spring] Performing resource cleanup...");
// Actual cleanup logic
}
}

Removing a Node from the Load Balancer (Nginx)

Before shutting down Tomcat, you must first remove the target node from the load balancer (Nginx). This prevents new requests from being routed to the server being shut down.

# /etc/nginx/conf.d/upstream.conf
upstream tomcat_cluster {
least_conn;

# Server 1 (deployment target)
server 192.168.1.10:8080 weight=1;

# Server 2 (running)
server 192.168.1.11:8080 weight=1;

# Backup server
server 192.168.1.12:8080 backup;

keepalive 32;
}
#!/bin/bash
# nginx-drain-node.sh — Draining a node from Nginx upstream

NGINX_CONF="/etc/nginx/conf.d/upstream.conf"
TARGET_SERVER="192.168.1.10:8080"
DRAIN_WAIT=30

# 1. Mark the target server as 'down'
# With Nginx Plus (dynamic upstream management):
# curl -X PATCH http://localhost/api/upstreams/tomcat_cluster/servers/0 \
# -d '{"down": true}' -H "Content-Type: application/json"

# With open-source Nginx — edit config file and reload
sed -i "s|server ${TARGET_SERVER} weight=1;|server ${TARGET_SERVER} down;|g" $NGINX_CONF

# 2. Validate Nginx configuration
nginx -t
if [ $? -ne 0 ]; then
echo "Nginx config error! Canceling drain"
sed -i "s|server ${TARGET_SERVER} down;|server ${TARGET_SERVER} weight=1;|g" $NGINX_CONF
exit 1
fi

# 3. Reload Nginx (zero-downtime)
nginx -s reload
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Removed ${TARGET_SERVER} from Nginx"

# 4. Wait for existing connections to complete
echo "Waiting for draining... (${DRAIN_WAIT}s)"
sleep $DRAIN_WAIT

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Draining complete, Tomcat can now be shut down"

Complete Redeployment Script

A script that automates the full process: drain → shutdown → deploy → health check → restore upstream.

#!/bin/bash
# full-deploy.sh — Complete zero-downtime redeployment script

set -euo pipefail

# Configuration
TARGET_SERVER="192.168.1.10"
TOMCAT_PORT=8080
TOMCAT_HOME="/opt/tomcat"
WAR_FILE="/deploy/app.war"
NGINX_CONF="/etc/nginx/conf.d/upstream.conf"
HEALTH_CHECK_URL="http://${TARGET_SERVER}:${TOMCAT_PORT}/health"
DRAIN_WAIT=30
HEALTH_CHECK_RETRIES=12
HEALTH_CHECK_INTERVAL=5

log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}

# Phase 1: Remove node from Nginx upstream
log "Phase 1: Removing ${TARGET_SERVER} from Nginx upstream"
ssh deploy@${TARGET_SERVER} "
sed -i 's|server ${TARGET_SERVER}:${TOMCAT_PORT} weight=1;|server ${TARGET_SERVER}:${TOMCAT_PORT} down;|g' ${NGINX_CONF}
nginx -t && nginx -s reload
"
log "Nginx reload complete"

# Phase 2: Wait for connection draining
log "Phase 2: Waiting for connection draining (${DRAIN_WAIT}s)"
sleep $DRAIN_WAIT

# Phase 3: Tomcat Graceful Shutdown
log "Phase 3: Starting Tomcat Graceful Shutdown"
ssh deploy@${TARGET_SERVER} "
TOMCAT_PID=\$(pgrep -f catalina || true)
if [ -n \"\$TOMCAT_PID\" ]; then
kill -SIGTERM \$TOMCAT_PID
echo 'SIGTERM sent'
for i in \$(seq 1 12); do
if ! kill -0 \$TOMCAT_PID 2>/dev/null; then
echo 'Tomcat shutdown complete'
break
fi
echo \"Still waiting... (\$((i * 5))s)\"
sleep 5
done
if kill -0 \$TOMCAT_PID 2>/dev/null; then
kill -SIGKILL \$TOMCAT_PID
echo 'Forced kill executed'
fi
fi
"

# Phase 4: Deploy new WAR
log "Phase 4: Deploying new WAR file"
scp $WAR_FILE deploy@${TARGET_SERVER}:${TOMCAT_HOME}/webapps/ROOT.war
ssh deploy@${TARGET_SERVER} "${TOMCAT_HOME}/bin/startup.sh"
log "Tomcat start complete"

# Phase 5: Health check
log "Phase 5: Starting health check (max $((HEALTH_CHECK_RETRIES * HEALTH_CHECK_INTERVAL))s)"
for i in $(seq 1 $HEALTH_CHECK_RETRIES); do
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_CHECK_URL || echo "000")
if [ "$HTTP_STATUS" == "200" ]; then
log "Health check passed! (HTTP $HTTP_STATUS)"
break
fi
log "Health check attempt ${i}/${HEALTH_CHECK_RETRIES}: HTTP ${HTTP_STATUS}"
if [ $i -eq $HEALTH_CHECK_RETRIES ]; then
log "Health check failed! Please perform a rollback."
exit 1
fi
sleep $HEALTH_CHECK_INTERVAL
done

# Phase 6: Restore Nginx upstream
log "Phase 6: Restoring ${TARGET_SERVER} in Nginx upstream"
ssh deploy@${TARGET_SERVER} "
sed -i 's|server ${TARGET_SERVER}:${TOMCAT_PORT} down;|server ${TARGET_SERVER}:${TOMCAT_PORT} weight=1;|g' ${NGINX_CONF}
nginx -t && nginx -s reload
"

log "Redeployment complete! ${TARGET_SERVER} is now receiving traffic."

Graceful Shutdown with HAProxy

When using HAProxy as a load balancer, you can dynamically change server state through the Stats Socket.

# /etc/haproxy/haproxy.cfg relevant configuration
global
# Enable Stats socket (for dynamic server state changes)
stats socket /var/run/haproxy/admin.sock mode 660 level admin

backend tomcat_backend
balance roundrobin
option httpchk GET /health
http-check expect status 200

server tomcat1 192.168.1.10:8080 check inter 2s fall 3 rise 2
server tomcat2 192.168.1.11:8080 check inter 2s fall 3 rise 2
#!/bin/bash
# haproxy-graceful-drain.sh

HAPROXY_SOCK="/var/run/haproxy/admin.sock"
TARGET_BACKEND="tomcat_backend"
TARGET_SERVER="tomcat1"
DRAIN_WAIT=30

# 1. Set server to DRAIN state (maintain existing sessions, block new ones)
echo "set server ${TARGET_BACKEND}/${TARGET_SERVER} state drain" | \
socat stdio $HAPROXY_SOCK
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Set ${TARGET_SERVER} to DRAIN state"

# 2. Check current session count
check_sessions() {
echo "show servers info ${TARGET_BACKEND}" | socat stdio $HAPROXY_SOCK | \
grep $TARGET_SERVER | awk '{print $NF}'
}

# 3. Wait until all sessions are complete
echo "Waiting for session draining..."
ELAPSED=0
while true; do
SESSIONS=$(check_sessions)
echo "Remaining sessions: ${SESSIONS} (${ELAPSED}s elapsed)"
if [ "$SESSIONS" == "0" ] || [ $ELAPSED -ge $DRAIN_WAIT ]; then
break
fi
sleep 5
ELAPSED=$((ELAPSED + 5))
done

# 4. Set server to maintenance state
echo "set server ${TARGET_BACKEND}/${TARGET_SERVER} state maint" | \
socat stdio $HAPROXY_SOCK
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Set ${TARGET_SERVER} to MAINT state"

# To restore after deployment:
# echo "set server tomcat_backend/tomcat1 state ready" | socat stdio /var/run/haproxy/admin.sock

Graceful Shutdown in Docker Compose

# docker-compose.yml
version: '3.8'

services:
tomcat:
image: tomcat:9.0-jdk17
ports:
- "8080:8080"
volumes:
- ./webapps:/usr/local/tomcat/webapps
- ./conf/server.xml:/usr/local/tomcat/conf/server.xml
environment:
- JAVA_OPTS=-Dserver.shutdown=graceful
# Grace period: send SIGTERM on docker stop, then SIGKILL after this duration
stop_grace_period: 60s
stop_signal: SIGTERM
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s

nginx:
image: nginx:1.25
ports:
- "80:80"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
depends_on:
tomcat:
condition: service_healthy
stop_grace_period: 30s
# Restarting a specific service in Docker Compose
# 1. Stop the service (SIGTERM -> wait 60s -> SIGKILL)
docker compose stop tomcat

# 2. Update to new image and start
docker compose up -d tomcat

# 3. Wait for health check to pass
docker compose wait tomcat

Kubernetes Environment Hints

In Kubernetes, use terminationGracePeriodSeconds and preStop hooks.

# Kubernetes Deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
name: tomcat-app
spec:
replicas: 3
template:
spec:
# Maximum wait time when terminating a Pod (default 30s)
terminationGracePeriodSeconds: 60
containers:
- name: tomcat
image: tomcat:9.0-jdk17
ports:
- containerPort: 8080
lifecycle:
preStop:
exec:
# Wait for draining before Pod termination
command: ["/bin/sh", "-c", "sleep 15"]
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60
periodSeconds: 10

During a Kubernetes Rolling Update:

  1. Wait until the new Pod is in Ready state
  2. Send SIGTERM to the old Pod
  3. Execute the preStop hook (sleep 15s provides draining time)
  4. If not terminated within terminationGracePeriodSeconds, send SIGKILL

Testing Graceful Shutdown Under Load

A test to verify that Graceful Shutdown works correctly under real traffic conditions.

#!/bin/bash
# graceful-shutdown-test.sh

TARGET_URL="http://192.168.1.10:8080/api/test"
TOMCAT_PID=$(pgrep -f catalina)

echo "=== Graceful Shutdown Test Start ==="
echo "Tomcat PID: $TOMCAT_PID"

# 1. Send continuous requests in the background
ab -n 1000 -c 50 -k $TARGET_URL > /tmp/ab_result.txt 2>&1 &
AB_PID=$!
echo "Load generation started (ab PID: $AB_PID)"

# 2. Trigger Graceful Shutdown after 3 seconds
sleep 3
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting Graceful Shutdown"
kill -SIGTERM $TOMCAT_PID

# 3. Wait for ab to complete
wait $AB_PID

# 4. Analyze results
echo ""
echo "=== Test Results ==="
grep -E "(Failed requests|Non-2xx|Complete requests)" /tmp/ab_result.txt

echo ""
echo "If Graceful Shutdown worked correctly, 'Failed requests' should be 0."

Test success criteria:

  • Failed requests: 0 — all requests completed normally
  • Non-2xx responses: 0 — no error responses such as 503
  • New requests after shutdown are handled with 502 (Nginx routes them to another server)

With a properly implemented Graceful Shutdown, users will experience zero errors regardless of whether the downtime reason is a deployment, emergency patch, or server maintenance.