Tomcat Graceful Shutdown — 커넥션 드레이닝과 무중단 재배포

서비스를 운영하다 보면 배포, 긴급 패치, 서버 점검 등의 이유로 Tomcat을 재시작해야 하는 상황이 반드시 발생합니다. 이때 아무런 준비 없이 Tomcat 프로세스를 강제 종료(kill -9)하면 현재 처리 중이던 HTTP 요청들이 모두 중단되고, 클라이언트는 502 Bad Gateway나 Connection Reset 오류를 경험하게 됩니다. Graceful Shutdown 은 이러한 문제를 방지하기 위한 핵심 기법으로, 기존 요청이 완료될 때까지 대기한 후 서버를 안전하게 종료합니다.

Graceful Shutdown이 필요한 이유

실제 프로덕션 환경에서 발생하는 문제를 살펴보면, 일반적인 강제 종료(SIGKILL)가 얼마나 위험한지 이해할 수 있습니다.

강제 종료 시 발생하는 문제들:

진행 중인 HTTP 요청의 응답 데이터 손실
데이터베이스 트랜잭션 롤백 미완료로 인한 데이터 불일치
파일 쓰기 도중 중단으로 인한 파일 손상
사용자 세션 데이터 유실
외부 API 호출 중간에 연결이 끊겨 중복 결제, 중복 처리 등의 치명적 오류 발생

Graceful Shutdown은 새로운 요청은 더 이상 받지 않되(로드밸런서 단에서 제거), 이미 처리 중인 요청은 완료될 때까지 대기하다가 모두 끝나면 안전하게 종료하는 방식입니다.

Tomcat 9+ `server.xml` Graceful Shutdown 설정

Tomcat 자체의 Graceful Shutdown은 server.xml에서 설정합니다.

<!-- /opt/tomcat/conf/server.xml -->
<Server port="8005" shutdown="SHUTDOWN">

  <!-- Graceful shutdown을 위한 Connector 설정 -->
  <Connector port="8080"
             protocol="HTTP/1.1"
             connectionTimeout="20000"
             redirectPort="8443"

             <!-- 기존 연결 유지 최대 요청 수 -->
             maxKeepAliveRequests="100"

             <!-- Keep-Alive 연결 타임아웃 (ms) -->
             keepAliveTimeout="5000"

             <!-- 최대 스레드 수 -->
             maxThreads="200"

             <!-- 최소 유휴 스레드 수 -->
             minSpareThreads="10"

             <!-- 연결 수락 카운트 (큐 대기) -->
             acceptCount="100"

             <!-- Graceful Shutdown 시 연결 종료 대기 시간 (ms) -->
             connectionLinger="-1" />

  <Engine name="Catalina" defaultHost="localhost">
    <Host name="localhost" appBase="webapps"
          unpackWARs="true" autoDeploy="false">
    </Host>
  </Engine>
</Server>

Tomcat은 shutdown 포트(기본 8005)로 SHUTDOWN 문자열을 보내면 종료 신호를 받습니다. 그러나 이 방식은 즉각 종료이므로, 실제 Graceful Shutdown을 구현하려면 추가적인 설정과 스크립트가 필요합니다.

Connector 레벨 Connection Draining 설정

Connection Draining은 기존 연결이 자연스럽게 닫힐 때까지 기다리는 과정입니다. Tomcat NIO Connector를 사용할 때 아래와 같이 설정합니다.

<!-- NIO Connector를 사용한 Connection Draining 최적화 -->
<Connector port="8080"
           protocol="org.apache.coyote.http11.Http11NioProtocol"
           connectionTimeout="20000"
           redirectPort="8443"
           maxThreads="200"
           minSpareThreads="10"

           <!-- Keep-Alive 연결 타임아웃을 짧게 설정하여 빠른 드레이닝 -->
           keepAliveTimeout="3000"

           <!-- Graceful Shutdown 시 기존 요청 완료 대기 시간 -->
           maxConnections="1000"

           <!-- 소켓 린저 설정: -1은 OS 기본값 사용 -->
           connectionLinger="-1"

           <!-- 소켓 SO_TIMEOUT -->
           soTimeout="60000" />

셧다운 스크립트에서의 Connection Draining

#!/bin/bash
# /opt/tomcat/bin/graceful-shutdown.sh

TOMCAT_HOME=/opt/tomcat
TOMCAT_PID_FILE=$TOMCAT_HOME/tomcat.pid
DRAIN_WAIT=30  # 커넥션 드레이닝 대기 시간 (초)
SHUTDOWN_TIMEOUT=60  # 최대 종료 대기 시간 (초)

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Graceful Shutdown 시작"

# 1. Tomcat PID 확인
if [ ! -f "$TOMCAT_PID_FILE" ]; then
    TOMCAT_PID=$(pgrep -f 'catalina')
else
    TOMCAT_PID=$(cat $TOMCAT_PID_FILE)
fi

if [ -z "$TOMCAT_PID" ]; then
    echo "Tomcat이 실행 중이지 않습니다."
    exit 0
fi

echo "Tomcat PID: $TOMCAT_PID"

# 2. SIGTERM 전송 (Graceful Shutdown 시작)
kill -SIGTERM $TOMCAT_PID
echo "[$(date '+%Y-%m-%d %H:%M:%S')] SIGTERM 전송 완료, 드레이닝 대기 중..."

# 3. 연결 드레이닝 대기
sleep $DRAIN_WAIT

# 4. 프로세스 종료 확인 (최대 SHUTDOWN_TIMEOUT 초 대기)
ELAPSED=0
while kill -0 $TOMCAT_PID 2>/dev/null; do
    if [ $ELAPSED -ge $SHUTDOWN_TIMEOUT ]; then
        echo "[$(date '+%Y-%m-%d %H:%M:%S')] 타임아웃! 강제 종료 실행"
        kill -SIGKILL $TOMCAT_PID
        break
    fi
    sleep 5
    ELAPSED=$((ELAPSED + 5))
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] 종료 대기 중... (${ELAPSED}초 경과)"
done

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Tomcat 종료 완료"

JVM Shutdown Hook 활용

Java 애플리케이션은 JVM Shutdown Hook을 통해 종료 신호를 감지하고 정리 작업을 수행할 수 있습니다.

// GracefulShutdownManager.java
import java.util.concurrent.ExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;

public class GracefulShutdownManager {

    private static final AtomicBoolean isShuttingDown = new AtomicBoolean(false);
    private static final int SHUTDOWN_TIMEOUT_SECONDS = 30;

    public static void registerShutdownHook(ExecutorService executorService) {
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            System.out.println("[ShutdownHook] JVM 종료 신호 감지, Graceful Shutdown 시작");
            isShuttingDown.set(true);

            // 새로운 작업 수락 중지
            executorService.shutdown();

            try {
                // 기존 작업이 완료될 때까지 대기
                if (!executorService.awaitTermination(SHUTDOWN_TIMEOUT_SECONDS, TimeUnit.SECONDS)) {
                    System.err.println("[ShutdownHook] 타임아웃! 강제 종료 실행");
                    executorService.shutdownNow();
                    // 강제 종료 후에도 완료 대기
                    if (!executorService.awaitTermination(10, TimeUnit.SECONDS)) {
                        System.err.println("[ShutdownHook] ExecutorService가 종료되지 않았습니다");
                    }
                }
            } catch (InterruptedException e) {
                executorService.shutdownNow();
                Thread.currentThread().interrupt();
            }

            System.out.println("[ShutdownHook] Graceful Shutdown 완료");
        }, "shutdown-hook-thread"));
    }

    public static boolean isShuttingDown() {
        return isShuttingDown.get();
    }
}

// HealthCheckServlet.java — 헬스체크 엔드포인트에서 셧다운 상태 반영
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;

public class HealthCheckServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
        if (GracefulShutdownManager.isShuttingDown()) {
            // 503을 반환하면 로드밸런서가 이 인스턴스로 트래픽을 보내지 않음
            resp.setStatus(HttpServletResponse.SC_SERVICE_UNAVAILABLE);
            resp.getWriter().write("{\"status\": \"shutting_down\"}");
        } else {
            resp.setStatus(HttpServletResponse.SC_OK);
            resp.getWriter().write("{\"status\": \"healthy\"}");
        }
    }
}

Spring Boot 내장 Tomcat Graceful Shutdown

Spring Boot 2.3.0 이후부터는 내장 Tomcat의 Graceful Shutdown을 application.yml에서 간단히 설정할 수 있습니다.

# application.yml
server:
  # Graceful Shutdown 활성화
  shutdown: graceful
  port: 8080
  tomcat:
    # 스레드 풀 설정
    threads:
      max: 200
      min-spare: 10
    # 연결 타임아웃
    connection-timeout: 20s
    # Keep-Alive 연결 최대 요청 수
    keep-alive-timeout: 5s

spring:
  lifecycle:
    # Graceful Shutdown 최대 대기 시간
    timeout-per-shutdown-phase: 30s

// Spring Boot Graceful Shutdown 동작 확인
// SIGTERM 수신 시 자동으로:
// 1. 새로운 요청 수락 중지 (503 반환)
// 2. 처리 중인 요청 완료 대기 (최대 timeout-per-shutdown-phase)
// 3. ApplicationContext 종료
// 4. JVM 종료

// 커스텀 종료 이벤트 처리
import org.springframework.context.ApplicationListener;
import org.springframework.context.event.ContextClosedEvent;
import org.springframework.stereotype.Component;

@Component
public class AppShutdownListener implements ApplicationListener<ContextClosedEvent> {

    @Override
    public void onApplicationEvent(ContextClosedEvent event) {
        System.out.println("[Spring] 애플리케이션 컨텍스트 종료 이벤트 발생");
        // DB 커넥션 풀 정리, 외부 API 연결 해제 등
        performCleanup();
    }

    private void performCleanup() {
        System.out.println("[Spring] 리소스 정리 작업 수행 중...");
        // 실제 정리 로직
    }
}

로드밸런서(Nginx)에서 노드 제거 절차

Tomcat을 종료하기 전에 반드시 로드밸런서(Nginx)에서 해당 노드를 먼저 제거해야 합니다. 이렇게 해야 새로운 요청이 종료 대상 서버로 전달되는 것을 막을 수 있습니다.

# /etc/nginx/conf.d/upstream.conf
upstream tomcat_cluster {
    least_conn;

    # 서버 1 (배포 대상)
    server 192.168.1.10:8080 weight=1;

    # 서버 2 (운영 중)
    server 192.168.1.11:8080 weight=1;

    # backup 서버
    server 192.168.1.12:8080 backup;

    keepalive 32;
}

#!/bin/bash
# nginx-drain-node.sh — Nginx upstream에서 노드 드레이닝

NGINX_CONF="/etc/nginx/conf.d/upstream.conf"
TARGET_SERVER="192.168.1.10:8080"
DRAIN_WAIT=30

# 1. 대상 서버를 'down' 상태로 마킹
# Nginx Plus를 사용하는 경우 (동적 업스트림 관리)
# curl -X PATCH http://localhost/api/upstreams/tomcat_cluster/servers/0 \
#      -d '{"down": true}' -H "Content-Type: application/json"

# Nginx 오픈소스를 사용하는 경우 — 설정 파일 수정 후 리로드
sed -i "s|server ${TARGET_SERVER} weight=1;|server ${TARGET_SERVER} down;|g" $NGINX_CONF

# 2. Nginx 설정 검증
nginx -t
if [ $? -ne 0 ]; then
    echo "Nginx 설정 오류! 드레이닝 취소"
    # 원래 설정으로 복원
    sed -i "s|server ${TARGET_SERVER} down;|server ${TARGET_SERVER} weight=1;|g" $NGINX_CONF
    exit 1
fi

# 3. Nginx 리로드 (무중단)
nginx -s reload
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Nginx에서 ${TARGET_SERVER} 제거 완료"

# 4. 기존 연결이 완료될 때까지 대기
echo "드레이닝 대기 중... (${DRAIN_WAIT}초)"
sleep $DRAIN_WAIT

echo "[$(date '+%Y-%m-%d %H:%M:%S')] 드레이닝 완료, Tomcat 종료 가능"

완전한 재배포 절차 스크립트

드레이닝 → 셧다운 → 배포 → 헬스체크 → Upstream 복구 전 과정을 자동화하는 스크립트입니다.

#!/bin/bash
# full-deploy.sh — 완전한 무중단 재배포 스크립트

set -euo pipefail

# 설정
TARGET_SERVER="192.168.1.10"
TOMCAT_PORT=8080
TOMCAT_HOME="/opt/tomcat"
WAR_FILE="/deploy/app.war"
NGINX_CONF="/etc/nginx/conf.d/upstream.conf"
HEALTH_CHECK_URL="http://${TARGET_SERVER}:${TOMCAT_PORT}/health"
DRAIN_WAIT=30
HEALTH_CHECK_RETRIES=12
HEALTH_CHECK_INTERVAL=5

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}

# Phase 1: Nginx Upstream에서 노드 제거
log "Phase 1: Nginx upstream에서 ${TARGET_SERVER} 제거"
ssh deploy@${TARGET_SERVER} "
    sed -i 's|server ${TARGET_SERVER}:${TOMCAT_PORT} weight=1;|server ${TARGET_SERVER}:${TOMCAT_PORT} down;|g' ${NGINX_CONF}
    nginx -t && nginx -s reload
"
log "Nginx 리로드 완료"

# Phase 2: 커넥션 드레이닝 대기
log "Phase 2: 커넥션 드레이닝 대기 (${DRAIN_WAIT}초)"
sleep $DRAIN_WAIT

# Phase 3: Tomcat Graceful Shutdown
log "Phase 3: Tomcat Graceful Shutdown 시작"
ssh deploy@${TARGET_SERVER} "
    TOMCAT_PID=\$(pgrep -f catalina || true)
    if [ -n \"\$TOMCAT_PID\" ]; then
        kill -SIGTERM \$TOMCAT_PID
        echo 'SIGTERM 전송 완료'
        # 종료 대기
        for i in \$(seq 1 12); do
            if ! kill -0 \$TOMCAT_PID 2>/dev/null; then
                echo 'Tomcat 종료 완료'
                break
            fi
            echo \"종료 대기 중... (\$((i * 5))초)\"
            sleep 5
        done
        # 아직 살아있으면 강제 종료
        if kill -0 \$TOMCAT_PID 2>/dev/null; then
            kill -SIGKILL \$TOMCAT_PID
            echo '강제 종료 실행'
        fi
    fi
"

# Phase 4: 새 WAR 배포
log "Phase 4: 새 WAR 파일 배포"
scp $WAR_FILE deploy@${TARGET_SERVER}:${TOMCAT_HOME}/webapps/ROOT.war
ssh deploy@${TARGET_SERVER} "${TOMCAT_HOME}/bin/startup.sh"
log "Tomcat 시작 완료"

# Phase 5: 헬스체크
log "Phase 5: 헬스체크 시작 (최대 $((HEALTH_CHECK_RETRIES * HEALTH_CHECK_INTERVAL))초 대기)"
for i in $(seq 1 $HEALTH_CHECK_RETRIES); do
    HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $HEALTH_CHECK_URL || echo "000")
    if [ "$HTTP_STATUS" == "200" ]; then
        log "헬스체크 성공! (HTTP $HTTP_STATUS)"
        break
    fi
    log "헬스체크 시도 ${i}/${HEALTH_CHECK_RETRIES}: HTTP ${HTTP_STATUS}"
    if [ $i -eq $HEALTH_CHECK_RETRIES ]; then
        log "헬스체크 실패! 롤백을 수행하세요."
        exit 1
    fi
    sleep $HEALTH_CHECK_INTERVAL
done

# Phase 6: Nginx Upstream 복구
log "Phase 6: Nginx upstream에 ${TARGET_SERVER} 복구"
ssh deploy@${TARGET_SERVER} "
    sed -i 's|server ${TARGET_SERVER}:${TOMCAT_PORT} down;|server ${TARGET_SERVER}:${TOMCAT_PORT} weight=1;|g' ${NGINX_CONF}
    nginx -t && nginx -s reload
"

log "재배포 완료! ${TARGET_SERVER}가 트래픽을 수신하기 시작합니다."

HAProxy와 연동 시 Graceful 절차

HAProxy를 로드밸런서로 사용하는 경우, Stats Socket을 통해 동적으로 서버 상태를 변경할 수 있습니다.

# /etc/haproxy/haproxy.cfg 관련 설정
global
    # Stats socket 활성화 (동적 서버 상태 변경용)
    stats socket /var/run/haproxy/admin.sock mode 660 level admin

backend tomcat_backend
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200

    # 서버 정의
    server tomcat1 192.168.1.10:8080 check inter 2s fall 3 rise 2
    server tomcat2 192.168.1.11:8080 check inter 2s fall 3 rise 2

#!/bin/bash
# haproxy-graceful-drain.sh

HAPROXY_SOCK="/var/run/haproxy/admin.sock"
TARGET_BACKEND="tomcat_backend"
TARGET_SERVER="tomcat1"
DRAIN_WAIT=30

# 1. 서버를 DRAIN 상태로 설정 (기존 세션은 유지, 새 세션 차단)
echo "set server ${TARGET_BACKEND}/${TARGET_SERVER} state drain" | \
    socat stdio $HAPROXY_SOCK
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ${TARGET_SERVER} DRAIN 상태로 설정"

# 2. 현재 세션 수 확인
check_sessions() {
    echo "show servers info ${TARGET_BACKEND}" | socat stdio $HAPROXY_SOCK | \
        grep $TARGET_SERVER | awk '{print $NF}'
}

# 3. 세션이 모두 완료될 때까지 대기
log "세션 드레이닝 대기 중..."
ELAPSED=0
while true; do
    SESSIONS=$(check_sessions)
    echo "남은 세션: ${SESSIONS} (${ELAPSED}초 경과)"
    if [ "$SESSIONS" == "0" ] || [ $ELAPSED -ge $DRAIN_WAIT ]; then
        break
    fi
    sleep 5
    ELAPSED=$((ELAPSED + 5))
done

# 4. 서버를 완전히 다운 상태로 변경
echo "set server ${TARGET_BACKEND}/${TARGET_SERVER} state maint" | \
    socat stdio $HAPROXY_SOCK
echo "[$(date '+%Y-%m-%d %H:%M:%S')] ${TARGET_SERVER} MAINT 상태로 설정 완료"

# 배포 후 복구 명령:
# echo "set server tomcat_backend/tomcat1 state ready" | socat stdio /var/run/haproxy/admin.sock

Docker Compose 환경에서의 Graceful Shutdown

# docker-compose.yml
version: '3.8'

services:
  tomcat:
    image: tomcat:9.0-jdk17
    ports:
      - "8080:8080"
    volumes:
      - ./webapps:/usr/local/tomcat/webapps
      - ./conf/server.xml:/usr/local/tomcat/conf/server.xml
    environment:
      - JAVA_OPTS=-Dserver.shutdown=graceful
    # Graceful Shutdown 대기 시간 설정
    # docker stop 시 SIGTERM을 보내고, 이 시간(초) 후에 SIGKILL 전송
    stop_grace_period: 60s
    # 컨테이너 종료 시그널 설정
    stop_signal: SIGTERM
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

  nginx:
    image: nginx:1.25
    ports:
      - "80:80"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      tomcat:
        condition: service_healthy
    stop_grace_period: 30s

# Docker Compose에서 특정 서비스만 재시작
# 1. 서비스 중지 (SIGTERM → 60초 대기 → SIGKILL)
docker compose stop tomcat

# 2. 새 이미지로 업데이트 후 시작
docker compose up -d tomcat

# 3. 헬스체크 완료 대기
docker compose wait tomcat

쿠버네티스 환경 힌트

쿠버네티스 환경에서는 terminationGracePeriodSeconds와 preStop 훅을 활용합니다.

# kubernetes deployment 예시
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-app
spec:
  replicas: 3
  template:
    spec:
      # Pod 종료 시 최대 대기 시간 (기본 30초)
      terminationGracePeriodSeconds: 60
      containers:
      - name: tomcat
        image: tomcat:9.0-jdk17
        ports:
        - containerPort: 8080
        lifecycle:
          preStop:
            exec:
              # Pod 종료 전 드레이닝 대기
              command: ["/bin/sh", "-c", "sleep 15"]
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10

쿠버네티스에서 Rolling Update 시 흐름은 다음과 같습니다.

새 Pod가 Ready 상태가 될 때까지 대기
기존 Pod에 SIGTERM 전송
preStop 훅 실행 (sleep 15초로 드레이닝 시간 확보)
terminationGracePeriodSeconds 내에 종료되지 않으면 SIGKILL

부하 발생 중 Graceful Shutdown 테스트

실제 트래픽이 있는 상황에서 Graceful Shutdown이 정상 동작하는지 검증하는 테스트입니다.

#!/bin/bash
# graceful-shutdown-test.sh

TARGET_URL="http://192.168.1.10:8080/api/test"
TOMCAT_PID=$(pgrep -f catalina)

echo "=== Graceful Shutdown 테스트 시작 ==="
echo "Tomcat PID: $TOMCAT_PID"

# 1. 백그라운드에서 지속적으로 요청 전송
ab -n 1000 -c 50 -k $TARGET_URL > /tmp/ab_result.txt 2>&1 &
AB_PID=$!
echo "부하 생성 시작 (ab PID: $AB_PID)"

# 2. 3초 후 Graceful Shutdown 시작
sleep 3
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Graceful Shutdown 시작"
kill -SIGTERM $TOMCAT_PID

# 3. ab 완료 대기
wait $AB_PID

# 4. 결과 분석
echo ""
echo "=== 테스트 결과 ==="
grep -E "(Failed requests|Non-2xx|Complete requests)" /tmp/ab_result.txt

echo ""
echo "Graceful Shutdown이 정상 동작했다면 Failed requests가 0이어야 합니다."

테스트 성공 기준:

Failed requests: 0 — 모든 요청이 정상 완료
Non-2xx responses: 0 — 503 등 오류 응답 없음
셧다운 후 새로운 요청은 502로 처리됨 (Nginx가 다른 서버로 전달)

Graceful Shutdown을 올바르게 구현하면 배포, 긴급 패치, 서버 점검 등 어떤 상황에서도 사용자가 오류를 경험하지 않는 무중단 운영이 가능합니다.

Graceful Shutdown이 필요한 이유​

Tomcat 9+ server.xml Graceful Shutdown 설정​

Connector 레벨 Connection Draining 설정​

셧다운 스크립트에서의 Connection Draining​

JVM Shutdown Hook 활용​

Spring Boot 내장 Tomcat Graceful Shutdown​

로드밸런서(Nginx)에서 노드 제거 절차​

완전한 재배포 절차 스크립트​

HAProxy와 연동 시 Graceful 절차​

Docker Compose 환경에서의 Graceful Shutdown​

쿠버네티스 환경 힌트​

부하 발생 중 Graceful Shutdown 테스트​