Skip to main content

Log Analysis Fundamentals

Web server logs are the core data source for understanding traffic patterns, diagnosing failures, and detecting security threats. This chapter covers Nginx and Apache log formats in depth, and walks through practical log analysis using standard Linux tools — awk, grep, sed, cut, sort, and uniq. From JSON log format configuration to real-time monitoring, every technique here is production-ready.


Understanding Nginx Log Formats

The main Format (Default)

Nginx's default log format is defined as main in nginx.conf.

# /etc/nginx/nginx.conf
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';

access_log /var/log/nginx/access.log main;
}

Sample log line:

192.168.1.100 - john [28/Mar/2026:10:15:32 +0900] "GET /api/users HTTP/1.1" 200 1523 "https://example.com" "Mozilla/5.0 (Windows NT 10.0)" "-"

Field reference:

FieldExampleDescription
$remote_addr192.168.1.100Client IP address
$remote_userjohnHTTP auth user (- if none)
$time_local28/Mar/2026:10:15:32 +0900Request timestamp (local timezone)
$requestGET /api/users HTTP/1.1HTTP method, URI, protocol
$status200HTTP response status code
$body_bytes_sent1523Response body bytes (headers excluded)
$http_refererhttps://example.comReferer header
$http_user_agentMozilla/5.0 ...Client User-Agent
$http_x_forwarded_for-Original IP when behind a proxy

Adding Response Time Fields

The default format does not include response time. Add $request_time and $upstream_response_time for performance analysis.

# /etc/nginx/nginx.conf
http {
log_format timed '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time '
'uht=$upstream_header_time urt=$upstream_response_time';

access_log /var/log/nginx/access.log timed;
}
FieldDescription
$request_timeTotal time from request receipt to response completion (seconds, 3 decimal places)
$upstream_connect_timeTime to establish connection with upstream
$upstream_header_timeTime until upstream response headers received
$upstream_response_timeTotal upstream response time

Understanding Apache Log Formats

The combined Format

Apache's default combined format is nearly identical to Nginx's main.

# /etc/apache2/apache2.conf or httpd.conf
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/apache2/access.log combined

Sample log line:

203.0.113.5 - - [28/Mar/2026:09:45:12 +0900] "POST /login HTTP/1.1" 302 0 "-" "curl/7.81.0"
TokenDescription
%hClient host (IP address)
%lRFC 1413 identity (usually -)
%uAuthenticated username
%tRequest timestamp
%rFull request line
%>sFinal response status code
%OBytes sent including headers
%{Referer}iReferer header
%{User-Agent}iUser-Agent header

Log Parsing with awk/grep/sed

Status Code Distribution

# Count requests per HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

Sample output:

  15823 200
2341 304
876 404
123 500
45 302

Top 10 Client IPs

# Top 10 IPs by request count
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Same approach works for Apache combined logs
awk '{print $1}' /var/log/apache2/access.log | sort | uniq -c | sort -rn | head -10

Top 10 Requested URLs

# Top 10 URLs by request count
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Strip query strings before counting
awk '{print $7}' /var/log/nginx/access.log | cut -d'?' -f1 | sort | uniq -c | sort -rn | head -10

Filtering by Time Range

# Extract all requests for a specific date
grep "28/Mar/2026" /var/log/nginx/access.log | awk '{print $9}' | sort | uniq -c

# Count requests during a specific hour (10:xx)
awk '/28\/Mar\/2026:10:/' /var/log/nginx/access.log | wc -l

Error Rate Calculation and 4xx/5xx Aggregation

Aggregating 4xx Errors

# Total 4xx error count
awk '$9 ~ /^4/' /var/log/nginx/access.log | wc -l

# Breakdown by 4xx status code
awk '$9 ~ /^4/ {print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

5xx Errors and Error Rate

# Total 5xx error count
awk '$9 ~ /^5/' /var/log/nginx/access.log | wc -l

# Error rate calculation script
#!/bin/bash
LOG="/var/log/nginx/access.log"
TOTAL=$(wc -l < "$LOG")
ERR5XX=$(awk '$9 ~ /^5/' "$LOG" | wc -l)
ERR4XX=$(awk '$9 ~ /^4/' "$LOG" | wc -l)

echo "Total requests : $TOTAL"
echo "4xx errors : $ERR4XX ($(echo "scale=2; $ERR4XX * 100 / $TOTAL" | bc)%)"
echo "5xx errors : $ERR5XX ($(echo "scale=2; $ERR5XX * 100 / $TOTAL" | bc)%)"

Top 10 URLs Generating 5xx Errors

awk '$9 ~ /^5/ {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

Response Time Analysis

Assuming the timed log format with rt= prefix for request time:

Average and Maximum Response Time

# Extract response times from rt= fields and compute stats
awk '{
for(i=1; i<=NF; i++) {
if($i ~ /^rt=/) {
split($i, a, "=")
sum += a[2]
if(a[2]+0 > max) max = a[2]+0
count++
}
}
}
END {
if(count > 0) printf "Avg response time: %.3fs\nMax: %.3fs\n", sum/count, max
}' /var/log/nginx/access.log
# Simpler approach using grep and awk
grep -o 'rt=[0-9.]*' /var/log/nginx/access.log | cut -d= -f2 | \
awk '{sum+=$1; if($1>max) max=$1; count++} END {printf "avg=%.3f max=%.3f count=%d\n", sum/count, max, count}'

Extract Requests Slower Than 1 Second

# Find slow requests taking 1 second or more
awk '{
for(i=1; i<=NF; i++) {
if($i ~ /^rt=/) {
split($i, a, "=")
if(a[2]+0 >= 1.0) print $0
}
}
}' /var/log/nginx/access.log | head -20

Detecting Abnormal Patterns (Scanners and Brute Force)

Detecting Scanners via 404 Floods

# IPs generating excessive 404s (likely scanners)
awk '$9 == "404" {print $1}' /var/log/nginx/access.log | \
sort | uniq -c | sort -rn | awk '$1 > 50 {print "SCANNER SUSPECT:", $2, "404 count:", $1}'

Detecting Brute Force on Login Endpoints

# IPs sending excessive POST requests to login endpoints
awk '$6 == "\"POST" && ($7 ~ /\/login/ || $7 ~ /\/api\/auth/) {print $1}' /var/log/nginx/access.log | \
sort | uniq -c | sort -rn | awk '$1 > 20 {print "BRUTE-FORCE SUSPECT:", $2, "attempts:", $1}'

Detecting Known Malicious User-Agents

# Flag requests from known scanner/bot user-agents
grep -E '"(sqlmap|nikto|nmap|masscan|zgrab|python-requests|curl)' /var/log/nginx/access.log | \
awk '{print $1, $6, $7}' | sort | uniq -c | sort -rn | head -20

High-Frequency Request Detection (Per-Minute)

# Top IPs by per-minute request volume (threshold: 50 requests/min)
awk '{
match($4, /\[([0-9\/A-Za-z:]+:[0-9]+:[0-9]+)/, arr)
key = $1 " " arr[1]
count[key]++
}
END {
for(k in count) if(count[k] >= 50) print count[k], k
}' /var/log/nginx/access.log | sort -rn | head -20

One-Liner Reference: cut/sort/uniq/awk

# 1. HTTP method distribution (GET/POST/PUT/DELETE)
awk '{print $6}' /var/log/nginx/access.log | tr -d '"' | sort | uniq -c | sort -rn

# 2. Top 10 referer domains
awk '{print $11}' /var/log/nginx/access.log | grep -v '"-"' | \
cut -d/ -f3 | sort | uniq -c | sort -rn | head -10

# 3. Request count by hour (histogram)
awk '{print $4}' /var/log/nginx/access.log | \
cut -d: -f2 | sort | uniq -c

# 4. Top 10 URLs by total bandwidth (bytes)
awk '{bytes[$7]+=$10} END {for(u in bytes) print bytes[u], u}' /var/log/nginx/access.log | \
sort -rn | head -10

# 5. All request paths from a specific IP on a specific date
grep "28/Mar/2026" /var/log/nginx/access.log | grep "192.168.1.100" | awk '{print $7}'

# 6. HTTP protocol version distribution
awk '{print $8}' /var/log/nginx/access.log | tr -d '"' | sort | uniq -c

# 7. Detect zero-byte responses
awk '$10 == "0" || $10 == "-" {print $1, $7, $9}' /var/log/nginx/access.log | head -20

# 8. Average requests per minute
TOTAL=$(wc -l < /var/log/nginx/access.log)
MINUTES=$(awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | sort -u | wc -l)
echo "Avg requests/min: $(echo "scale=1; $TOTAL / $MINUTES" | bc)"

JSON Log Format Configuration and jq Analysis

Nginx JSON Log Format

# /etc/nginx/nginx.conf
http {
log_format json_combined escape=json
'{'
'"time":"$time_iso8601",'
'"remote_addr":"$remote_addr",'
'"method":"$request_method",'
'"uri":"$uri",'
'"args":"$args",'
'"status":$status,'
'"bytes_sent":$body_bytes_sent,'
'"request_time":$request_time,'
'"upstream_response_time":"$upstream_response_time",'
'"referer":"$http_referer",'
'"user_agent":"$http_user_agent",'
'"x_forwarded_for":"$http_x_forwarded_for"'
'}';

access_log /var/log/nginx/access.json json_combined;
}

Apply the configuration:

nginx -t && systemctl reload nginx

Analyzing JSON Logs with jq

# Install jq
apt-get install -y jq # Debian/Ubuntu
yum install -y jq # CentOS/RHEL

# Filter 500 errors
jq 'select(.status == 500) | {time, uri, remote_addr}' /var/log/nginx/access.json

# Top 10 slowest requests
jq -s 'sort_by(.request_time) | reverse | .[0:10] | .[] | {uri, request_time, status}' \
/var/log/nginx/access.json

# Request count by status code
jq -s 'group_by(.status) | map({status: .[0].status, count: length}) | sort_by(.count) | reverse[]' \
/var/log/nginx/access.json

# All requests from a specific IP
jq 'select(.remote_addr == "192.168.1.100") | {time, method, uri, status}' \
/var/log/nginx/access.json

# Calculate average response time
jq -s '[.[].request_time] | add / length' /var/log/nginx/access.json

# List URIs with response time >= 1 second
jq 'select(.request_time >= 1.0) | .uri' /var/log/nginx/access.json | sort | uniq -c | sort -rn

Real-Time Log Monitoring

Basic tail -f

# Stream access log in real time
tail -f /var/log/nginx/access.log

# Stream error log in real time
tail -f /var/log/nginx/error.log

# Monitor both access and error logs simultaneously
tail -f /var/log/nginx/access.log /var/log/nginx/error.log

Combining tail -f with grep Filters

# Show only 5xx errors in real time
tail -f /var/log/nginx/access.log | grep ' 5[0-9][0-9] '

# Track requests from a specific IP in real time
tail -f /var/log/nginx/access.log | grep "203.0.113.5"

# Detect slow requests in real time (response time >= 2 seconds)
tail -f /var/log/nginx/access.log | awk '{
for(i=1;i<=NF;i++) if($i~/^rt=/) {split($i,a,"="); if(a[2]+0>=2.0) print $0}
}'

Periodic Statistics with watch

# Refresh error counts every 10 seconds
watch -n 10 "awk '\$9~/^[45]/{print \$9}' /var/log/nginx/access.log | sort | uniq -c"

# Refresh top 5 IPs every 10 seconds
watch -n 10 "awk '{print \$1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -5"

Tracking Logs Through Rotation

# --follow=name tracks by filename, automatically switches to the new file after rotation
tail --follow=name /var/log/nginx/access.log

# Merge current and previous rotated log
tail -f /var/log/nginx/access.log.1 /var/log/nginx/access.log

Pro Tip: Daily Log Report Automation

In production environments, register a daily report script in cron to automatically summarize the previous day's traffic every morning.

#!/bin/bash
# /usr/local/bin/daily-log-report.sh
# cron: 0 6 * * * /usr/local/bin/daily-log-report.sh >> /var/log/nginx/daily-report.log 2>&1

LOG="/var/log/nginx/access.log.1" # Yesterday's rotated log
DATE=$(date -d "yesterday" +%d/%b/%Y)
REPORT="/tmp/nginx-report-$(date +%Y%m%d).txt"

{
echo "===== Nginx Daily Report: $DATE ====="
echo ""
echo "[Total Requests]"
grep "$DATE" "$LOG" | wc -l

echo ""
echo "[Status Code Distribution]"
grep "$DATE" "$LOG" | awk '{print $9}' | sort | uniq -c | sort -rn

echo ""
echo "[Top 10 Client IPs]"
grep "$DATE" "$LOG" | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

echo ""
echo "[Top 10 URLs]"
grep "$DATE" "$LOG" | awk '{print $7}' | cut -d? -f1 | sort | uniq -c | sort -rn | head -10

echo ""
echo "[5xx Error URLs]"
grep "$DATE" "$LOG" | awk '$9~/^5/{print $7}' | sort | uniq -c | sort -rn | head -10
} > "$REPORT"

cat "$REPORT"

For JSON-format logs, replace the awk pipelines with jq queries for more structured and flexible reporting. This script can also be extended to send the report via email or a Slack webhook.