Performance Optimization Pro Tips

Setting up caching, compression, and HTTP/2 is just the start. You need to validate how much faster things actually got with load tests and pinpoint bottlenecks precisely to complete a true performance optimization.

Load Testing Tool Comparison

Tool	Highlights	Protocol	Best For
`ab` (Apache Bench)	Simple, quick to install	HTTP/1.1	Fast baseline measurement
`wrk`	High-performance, Lua scripts	HTTP/1.1	Throughput & latency
`k6`	JS scenarios, visualization	HTTP/1.1–2	Scenario-based load tests
`h2load`	HTTP/2 dedicated	HTTP/2	HTTP/2 performance validation
`locust`	Python, distributed	HTTP	Complex user-flow scenarios

ab (Apache Bench)

# Basic usage
# -n: total requests, -c: concurrent connections, -k: Keep-Alive
ab -n 10000 -c 100 -k https://example.com/api/products

# Reading the output
# Requests per second:    1234.56 [#/sec]  ← throughput (higher is better)
# Time per request:         81.00 [ms]      ← mean latency
# Transfer rate:           456.78 [Kbytes/sec]

# Percentile distribution (p99 > 1000 ms needs attention)
# 50%     45 ms
# 75%     78 ms
# 90%    120 ms
# 95%    180 ms
# 99%    450 ms
# 100%  1200 ms (slowest request)

# POST request test
ab -n 5000 -c 50 -k \
   -H "Content-Type: application/json" \
   -p /tmp/payload.json \
   https://example.com/api/orders

wrk — High-Performance Benchmark

# Install
sudo apt install wrk

# Basic usage
# -t: threads, -c: connections, -d: duration
wrk -t4 -c100 -d30s https://example.com/api/products
# Running 30s test @ https://example.com/api/products
#   4 threads and 100 connections
#
#   Thread Stats   Avg      Stdev     Max   +/- Stdev
#     Latency    45.67ms   12.34ms 234.56ms   87.65%
#     Req/Sec   567.89     45.67   678.90     78.90%
#   68145 requests in 30s, 12.34MB read
# Requests/sec:   2271.50
# Latency:          45ms

# POST request with Lua script
cat > /tmp/post.lua << 'EOF'
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"
wrk.body = '{"userId": 1, "productId": 100, "qty": 2}'
EOF

wrk -t4 -c50 -d30s -s /tmp/post.lua https://example.com/api/orders

# Show latency percentile distribution
wrk -t4 -c100 -d30s --latency https://example.com/

k6 — Scenario-Based Load Testing

# Install
sudo apt install k6

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export const options = {
    stages: [
        { duration: '1m', target: 50  },  // ramp up to 50 users over 1 min
        { duration: '3m', target: 200 },  // hold 200 users for 3 min
        { duration: '1m', target: 0   },  // ramp down over 1 min
    ],
    thresholds: {
        http_req_duration: ['p(95)<500'],  // 95th percentile < 500 ms
        errors: ['rate<0.01'],             // error rate < 1%
    },
};

export default function () {
    const res = http.get('https://example.com/api/products', {
        headers: { 'Accept': 'application/json' },
    });

    check(res, {
        'status 200': (r) => r.status === 200,
        'response < 500ms': (r) => r.timings.duration < 500,
    });

    errorRate.add(res.status !== 200);
    sleep(1);
}

# Run
k6 run load-test.js

# Stream results to Grafana (InfluxDB)
k6 run --out influxdb=http://localhost:8086/k6 load-test.js

h2load — HTTP/2 Performance

# Install
sudo apt install nghttp2-client

# HTTP/2 load test
# -n: total requests, -c: connections, -m: concurrent streams per connection
h2load -n 10000 -c 50 -m 10 https://example.com/api/products

# finished in 5.23s, 1912.04 req/s, 2.34MB/s
# requests: 10000 total, 10000 succeeded
# status codes: 10000 2xx
# time for request: min=5ms  max=234ms  mean=52ms

Bottleneck Analysis Methodology

Step 1: Decompose Response Time

curl -o /dev/null -s -w "
DNS:          %{time_namelookup}s
TCP Connect:  %{time_connect}s
TLS:          %{time_appconnect}s
TTFB:         %{time_starttransfer}s
Total:        %{time_total}s
Size:         %{size_download} bytes
" https://example.com/api/products

Step 2: Separate Nginx vs Backend Time

log_format perf '$remote_addr [$time_local] '
                '"$request" $status '
                'req=$request_time '
                'upstream=$upstream_response_time '
                'cache=$upstream_cache_status';

# Nginx overhead = total time - upstream time
# Large gap here → Nginx itself is the bottleneck (check cache, compression)

Step 3: Diagnose the Bottleneck Location

If response is slow — checklist:
──────────────────────────────────────────────────
1. DNS slow?          → check DNS TTL and caching
2. TCP/TLS slow?     → enable Keep-Alive, TLS session cache
3. TTFB slow?        → backend logic, DB queries
4. Transfer slow?    → response size (compression), network bandwidth
──────────────────────────────────────────────────

If TTFB is slow (backend bottleneck):
  - DB slow query log: EXPLAIN ANALYZE
  - N+1 query problem
  - Missing timeout on external API calls
  - Thread pool exhaustion (Tomcat maxThreads exceeded)

If transfer is slow (network/size bottleneck):
  - Compression not enabled (check gzip/brotli)
  - Cache not applied (X-Cache-Status: MISS)
  - Oversized response payload (check pagination)

Performance SLO Targets

Metric	Target	How to Improve
TTFB	< 200 ms	Caching, DB query optimization
p95 response time	< 500 ms	Thread pool tuning, caching
p99 response time	< 1000 ms	Timeouts, circuit breakers
Error rate	< 0.1%	Health checks, auto-retry
Compression ratio	> 60% (text)	Verify gzip/brotli settings
Cache hit rate	> 80%	Optimize proxy_cache config

Automated Performance Gate (CI/CD)

#!/bin/bash
# performance-gate.sh — validate performance before deployment

ENDPOINT="https://staging.example.com/api/products"
MAX_P95_MS=500
MIN_RPS=100

echo "=== Performance Gate Check ==="

result=$(wrk -t4 -c100 -d30s --latency "$ENDPOINT" 2>&1)

p95=$(echo "$result" | grep "99%" | awk '{print $2}' | sed 's/ms//')
rps=$(echo "$result" | grep "Requests/sec" | awk '{print $2}')

echo "p95: ${p95}ms (threshold: ${MAX_P95_MS}ms)"
echo "RPS: ${rps} (threshold: ${MIN_RPS})"

if (( $(echo "$p95 > $MAX_P95_MS" | bc -l) )); then
    echo "❌ Performance gate FAILED: p95 ${p95}ms > ${MAX_P95_MS}ms"
    exit 1
fi

echo "✅ Performance gate PASSED"

Load Testing Tool Comparison​

ab (Apache Bench)​

wrk — High-Performance Benchmark​

k6 — Scenario-Based Load Testing​

h2load — HTTP/2 Performance​

Bottleneck Analysis Methodology​

Step 1: Decompose Response Time​

Step 2: Separate Nginx vs Backend Time​

Step 3: Diagnose the Bottleneck Location​

Performance SLO Targets​

Automated Performance Gate (CI/CD)​