Pro Tips — Docker Environment Operations

Running Docker containers and running them reliably are two different things. This chapter systematically covers the techniques that production experts use: health checks, rolling updates, resource limits, security hardening, and vulnerability scanning — everything you need for stable container operations.

Health Checks: The HEALTHCHECK Directive

Docker doesn't know if the actual service is healthy even when the container process is running. The HEALTHCHECK directive lets Docker periodically inspect container state and mark it as healthy or unhealthy.

Defining HEALTHCHECK in a Dockerfile

# Node.js app example
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

EXPOSE 8080

# Health check configuration
HEALTHCHECK --interval=30s \
            --timeout=10s \
            --start-period=40s \
            --retries=3 \
            CMD curl -f http://localhost:8080/health || exit 1

CMD ["node", "server.js"]

# Nginx example
FROM nginx:1.25-alpine

# Alpine images lack curl — use wget instead
HEALTHCHECK --interval=30s \
            --timeout=5s \
            --start-period=10s \
            --retries=3 \
            CMD wget -q --spider http://localhost/health || exit 1

Option	Default	Description
`--interval`	30s	Interval between health checks
`--timeout`	30s	Timeout for individual checks
`--start-period`	0s	Wait time after container start before first check
`--retries`	3	Consecutive failures before switching to unhealthy

healthcheck Configuration in docker-compose.yml

Even without a HEALTHCHECK in the Dockerfile, you can override it in docker-compose.yml:

services:
  app:
    image: my-app:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  db:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

Checking health check status:

# Check container status (STATUS column shows healthy/unhealthy)
docker ps

# View detailed health check history
docker inspect --format='{{json .State.Health}}' app | python -m json.tool

depends_on + condition: service_healthy Pattern

Using depends_on alone only waits for the container to start (not healthy) — it doesn't guarantee the service is ready. Using condition: service_healthy ensures the next container starts only after the dependency reaches healthy status.

version: "3.9"

services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3

  app:
    image: my-app:latest
    depends_on:
      db:
        condition: service_healthy   # Start only when DB is healthy
      redis:
        condition: service_healthy   # Start only when Redis is healthy
    environment:
      DATABASE_URL: postgresql://user:password@db:5432/mydb
      REDIS_URL: redis://redis:6379

  nginx:
    image: nginx:1.25-alpine
    depends_on:
      app:
        condition: service_healthy   # Start only when app is healthy
    ports:
      - "80:80"

Rolling Updates: docker compose up --no-deps --build

A pattern for zero-downtime service updates using docker compose on a single server.

# Rebuild and restart only the app (without touching other services)
docker compose up -d --no-deps --build app

# Update multiple services sequentially
docker compose up -d --no-deps --build nginx app

# Check status after update
docker compose ps
docker compose logs -f app --tail 50

--no-deps: Don't restart dependent services (keeps db, redis, etc. running) --build: Rebuild the image, then restart the container

Blue-Green Deployment Script

#!/bin/bash
# blue-green-deploy.sh

set -e

IMAGE_NAME="my-app"
IMAGE_TAG="${1:-latest}"

echo "==> Building new image: ${IMAGE_NAME}:${IMAGE_TAG}"
docker build -t "${IMAGE_NAME}:${IMAGE_TAG}" .

echo "==> Replacing with new container (zero-downtime)"
docker compose up -d --no-deps app

echo "==> Waiting for health check (up to 60 seconds)"
for i in $(seq 1 12); do
    STATUS=$(docker inspect --format='{{.State.Health.Status}}' app 2>/dev/null || echo "unknown")
    if [ "$STATUS" = "healthy" ]; then
        echo "==> App is healthy."
        break
    fi
    echo "    Waiting... (${i}/12) Current status: ${STATUS}"
    sleep 5
done

echo "==> Reloading Nginx"
docker compose exec nginx nginx -s reload

echo "==> Cleaning up old images"
docker image prune -f

echo "==> Deployment complete"

Docker Swarm Rolling Updates

In Docker Swarm mode, rolling updates are supported natively when updating services.

# docker-compose.swarm.yml
version: "3.9"

services:
  app:
    image: my-app:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1          # Update 1 container at a time
        delay: 10s              # Wait time between each update
        failure_action: rollback  # Auto-rollback on failure
        monitor: 60s            # Monitoring time after update
        max_failure_ratio: 0.1  # Rollback if more than 10% fail
      rollback_config:
        parallelism: 1
        delay: 5s
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

# Swarm service rolling update
docker service update \
  --image my-app:v2.0 \
  --update-parallelism 1 \
  --update-delay 10s \
  my_app

# Check update status
docker service ps my_app

# Rollback
docker service rollback my_app

Resource Limits: deploy.resources and ulimits

Containers that consume unlimited resources affect the entire host server. Resource limits are core to service stability.

version: "3.9"

services:
  app:
    image: my-app:latest
    deploy:
      resources:
        limits:
          cpus: "1.0"         # Max 1 CPU core
          memory: 512M        # Max 512MB memory
        reservations:
          cpus: "0.25"        # Minimum guaranteed CPU
          memory: 128M        # Minimum guaranteed memory
    ulimits:
      nofile:
        soft: 65535           # File descriptor limit (soft)
        hard: 65535           # File descriptor limit (hard)
      nproc:
        soft: 4096
        hard: 4096

  db:
    image: postgres:16-alpine
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "0.5"
          memory: 512M
    shm_size: "256m"          # PostgreSQL shared memory

To apply deploy.resources on a single host, use the --compatibility flag:

docker compose --compatibility up -d

Or use mem_limit and cpus fields directly in newer Compose versions:

services:
  app:
    image: my-app:latest
    mem_limit: 512m
    cpus: 1.0
    mem_reservation: 128m

Container Security Hardening

Running as Non-Root User

FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY --chown=node:node . .

# Run as node user, not root
USER node

EXPOSE 8080
CMD ["node", "server.js"]

services:
  app:
    image: my-app:latest
    user: "1000:1000"         # Specify UID:GID directly

Read-Only Filesystem

services:
  app:
    image: my-app:latest
    read_only: true           # Container filesystem is read-only
    tmpfs:
      - /tmp                  # Mount only paths needing temp files as tmpfs
      - /var/run

Limiting Linux Capabilities

services:
  nginx:
    image: nginx:1.25-alpine
    cap_drop:
      - ALL                   # Remove all capabilities
    cap_add:
      - NET_BIND_SERVICE      # Add only what's needed for port 80/443 binding
    security_opt:
      - no-new-privileges:true  # Prevent privilege escalation

Applying seccomp Profiles

services:
  app:
    image: my-app:latest
    security_opt:
      - seccomp:./seccomp/app-profile.json
      - no-new-privileges:true

Image Vulnerability Scanning

Docker Scout

# Scan image with Docker Scout
docker scout cves my-app:latest

# Filter by severity
docker scout cves --only-severity critical,high my-app:latest

# Show only fixable vulnerabilities
docker scout cves --only-fixable my-app:latest

# Generate SBOM (Software Bill of Materials)
docker scout sbom my-app:latest

Trivy (Open-Source Vulnerability Scanner)

# Install Trivy (Ubuntu/Debian)
sudo apt-get install -y trivy

# Scan image
trivy image my-app:latest

# Filter by severity (CRITICAL, HIGH only)
trivy image --severity CRITICAL,HIGH my-app:latest

# CI/CD pipeline mode (return result as exit code)
trivy image --exit-code 1 --severity CRITICAL my-app:latest

# Filesystem scan (includes Dockerfile, dependency files)
trivy fs .

# Generate JSON report
trivy image --format json --output report.json my-app:latest

Trivy auto-scan in CI pipeline (GitHub Actions):

# .github/workflows/security-scan.yml
name: Security Scan

on: [push, pull_request]

jobs:
  trivy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t my-app:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: my-app:${{ github.sha }}
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH

.env File Security and Docker Secrets

.env files are convenient for development, but Docker Secrets are recommended for production.

Add .env to .gitignore

.env
.env.production
.env.local
*.pem
*.key

Docker Secrets (Swarm Mode)

# Create a secret
echo "my-secret-password" | docker secret create db_password -
cat ./ssl/privkey.pem | docker secret create ssl_key -

# List secrets
docker secret ls

# docker-compose.swarm.yml
version: "3.9"

services:
  app:
    image: my-app:latest
    secrets:
      - db_password
      - ssl_key
    environment:
      # Secrets are mounted as files under /run/secrets/
      DB_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    external: true
  ssl_key:
    external: true

Useful Debugging Commands

# Check status of all running containers
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Execute commands inside a container
docker exec -it app sh
docker exec -it app bash

# Real-time resource usage monitoring
docker stats
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"

# View detailed container information
docker inspect app

# Check container IP address
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' app

# View container filesystem changes
docker diff app

# Analyze image layers
docker history my-app:latest --no-trunc

# Volume usage
docker volume ls
docker volume inspect my_volume

# Network inspection
docker network ls
docker network inspect my_network

# Clean up unused resources
docker system prune -f                    # Stopped containers, unused networks, dangling images
docker system prune --volumes -f          # Include volumes (CAUTION: deletes data)
docker image prune -a -f                  # Delete all unused images

# Disk usage analysis
docker system df
docker system df -v

Production Checklist

Items to verify before deploying to production.

# Recommended production docker-compose.yml pattern
version: "3.9"

services:
  app:
    image: my-app:${IMAGE_TAG:-latest}
    restart: unless-stopped          # restart policy configured
    read_only: true                  # read-only filesystem
    user: "1000:1000"               # non-root user
    security_opt:
      - no-new-privileges:true       # prevent privilege escalation
    cap_drop:
      - ALL                          # remove unnecessary capabilities
    mem_limit: 512m                  # memory limit
    cpus: 1.0                        # CPU limit
    healthcheck:                     # health check configured
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    logging:                         # log driver and rotation
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
    environment:
      - NODE_ENV=production
    env_file:
      - .env.production              # separate environment variable file
    tmpfs:
      - /tmp                         # tmpfs for temp directory
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"

Checklist summary:

Health Checks: The HEALTHCHECK Directive​

Defining HEALTHCHECK in a Dockerfile​

healthcheck Configuration in docker-compose.yml​

depends_on + condition: service_healthy Pattern​

Rolling Updates: docker compose up --no-deps --build​

Blue-Green Deployment Script​

Docker Swarm Rolling Updates​

Resource Limits: deploy.resources and ulimits​

Container Security Hardening​

Running as Non-Root User​

Read-Only Filesystem​

Limiting Linux Capabilities​

Applying seccomp Profiles​

Image Vulnerability Scanning​

Docker Scout​

Trivy (Open-Source Vulnerability Scanner)​

.env File Security and Docker Secrets​

Add .env to .gitignore​

Docker Secrets (Swarm Mode)​

Useful Debugging Commands​

Production Checklist​