Log Analysis Fundamentals
Web server logs are the core data source for understanding traffic patterns, diagnosing failures, and detecting security threats. This chapter covers Nginx and Apache log formats in depth, and walks through practical log analysis using standard Linux tools — awk, grep, sed, cut, sort, and uniq. From JSON log format configuration to real-time monitoring, every technique here is production-ready.
Understanding Nginx Log Formats
The main Format (Default)
Nginx's default log format is defined as main in nginx.conf.
# /etc/nginx/nginx.conf
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
}
Sample log line:
192.168.1.100 - john [28/Mar/2026:10:15:32 +0900] "GET /api/users HTTP/1.1" 200 1523 "https://example.com" "Mozilla/5.0 (Windows NT 10.0)" "-"
Field reference:
| Field | Example | Description |
|---|---|---|
$remote_addr | 192.168.1.100 | Client IP address |
$remote_user | john | HTTP auth user (- if none) |
$time_local | 28/Mar/2026:10:15:32 +0900 | Request timestamp (local timezone) |
$request | GET /api/users HTTP/1.1 | HTTP method, URI, protocol |
$status | 200 | HTTP response status code |
$body_bytes_sent | 1523 | Response body bytes (headers excluded) |
$http_referer | https://example.com | Referer header |
$http_user_agent | Mozilla/5.0 ... | Client User-Agent |
$http_x_forwarded_for | - | Original IP when behind a proxy |
Adding Response Time Fields
The default format does not include response time. Add $request_time and $upstream_response_time for performance analysis.
# /etc/nginx/nginx.conf
http {
log_format timed '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time '
'uht=$upstream_header_time urt=$upstream_response_time';
access_log /var/log/nginx/access.log timed;
}
| Field | Description |
|---|---|
$request_time | Total time from request receipt to response completion (seconds, 3 decimal places) |
$upstream_connect_time | Time to establish connection with upstream |
$upstream_header_time | Time until upstream response headers received |
$upstream_response_time | Total upstream response time |
Understanding Apache Log Formats
The combined Format
Apache's default combined format is nearly identical to Nginx's main.
# /etc/apache2/apache2.conf or httpd.conf
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/apache2/access.log combined
Sample log line:
203.0.113.5 - - [28/Mar/2026:09:45:12 +0900] "POST /login HTTP/1.1" 302 0 "-" "curl/7.81.0"
| Token | Description |
|---|---|
%h | Client host (IP address) |
%l | RFC 1413 identity (usually -) |
%u | Authenticated username |
%t | Request timestamp |
%r | Full request line |
%>s | Final response status code |
%O | Bytes sent including headers |
%{Referer}i | Referer header |
%{User-Agent}i | User-Agent header |
Log Parsing with awk/grep/sed
Status Code Distribution
# Count requests per HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
Sample output:
15823 200
2341 304
876 404
123 500
45 302
Top 10 Client IPs
# Top 10 IPs by request count
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Same approach works for Apache combined logs
awk '{print $1}' /var/log/apache2/access.log | sort | uniq -c | sort -rn | head -10
Top 10 Requested URLs
# Top 10 URLs by request count
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Strip query strings before counting
awk '{print $7}' /var/log/nginx/access.log | cut -d'?' -f1 | sort | uniq -c | sort -rn | head -10
Filtering by Time Range
# Extract all requests for a specific date
grep "28/Mar/2026" /var/log/nginx/access.log | awk '{print $9}' | sort | uniq -c
# Count requests during a specific hour (10:xx)
awk '/28\/Mar\/2026:10:/' /var/log/nginx/access.log | wc -l
Error Rate Calculation and 4xx/5xx Aggregation
Aggregating 4xx Errors
# Total 4xx error count
awk '$9 ~ /^4/' /var/log/nginx/access.log | wc -l
# Breakdown by 4xx status code
awk '$9 ~ /^4/ {print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
5xx Errors and Error Rate
# Total 5xx error count
awk '$9 ~ /^5/' /var/log/nginx/access.log | wc -l
# Error rate calculation script
#!/bin/bash
LOG="/var/log/nginx/access.log"
TOTAL=$(wc -l < "$LOG")
ERR5XX=$(awk '$9 ~ /^5/' "$LOG" | wc -l)
ERR4XX=$(awk '$9 ~ /^4/' "$LOG" | wc -l)
echo "Total requests : $TOTAL"
echo "4xx errors : $ERR4XX ($(echo "scale=2; $ERR4XX * 100 / $TOTAL" | bc)%)"
echo "5xx errors : $ERR5XX ($(echo "scale=2; $ERR5XX * 100 / $TOTAL" | bc)%)"
Top 10 URLs Generating 5xx Errors
awk '$9 ~ /^5/ {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
Response Time Analysis
Assuming the timed log format with rt= prefix for request time:
Average and Maximum Response Time
# Extract response times from rt= fields and compute stats
awk '{
for(i=1; i<=NF; i++) {
if($i ~ /^rt=/) {
split($i, a, "=")
sum += a[2]
if(a[2]+0 > max) max = a[2]+0
count++
}
}
}
END {
if(count > 0) printf "Avg response time: %.3fs\nMax: %.3fs\n", sum/count, max
}' /var/log/nginx/access.log
# Simpler approach using grep and awk
grep -o 'rt=[0-9.]*' /var/log/nginx/access.log | cut -d= -f2 | \
awk '{sum+=$1; if($1>max) max=$1; count++} END {printf "avg=%.3f max=%.3f count=%d\n", sum/count, max, count}'
Extract Requests Slower Than 1 Second
# Find slow requests taking 1 second or more
awk '{
for(i=1; i<=NF; i++) {
if($i ~ /^rt=/) {
split($i, a, "=")
if(a[2]+0 >= 1.0) print $0
}
}
}' /var/log/nginx/access.log | head -20
Detecting Abnormal Patterns (Scanners and Brute Force)
Detecting Scanners via 404 Floods
# IPs generating excessive 404s (likely scanners)
awk '$9 == "404" {print $1}' /var/log/nginx/access.log | \
sort | uniq -c | sort -rn | awk '$1 > 50 {print "SCANNER SUSPECT:", $2, "404 count:", $1}'
Detecting Brute Force on Login Endpoints
# IPs sending excessive POST requests to login endpoints
awk '$6 == "\"POST" && ($7 ~ /\/login/ || $7 ~ /\/api\/auth/) {print $1}' /var/log/nginx/access.log | \
sort | uniq -c | sort -rn | awk '$1 > 20 {print "BRUTE-FORCE SUSPECT:", $2, "attempts:", $1}'
Detecting Known Malicious User-Agents
# Flag requests from known scanner/bot user-agents
grep -E '"(sqlmap|nikto|nmap|masscan|zgrab|python-requests|curl)' /var/log/nginx/access.log | \
awk '{print $1, $6, $7}' | sort | uniq -c | sort -rn | head -20
High-Frequency Request Detection (Per-Minute)
# Top IPs by per-minute request volume (threshold: 50 requests/min)
awk '{
match($4, /\[([0-9\/A-Za-z:]+:[0-9]+:[0-9]+)/, arr)
key = $1 " " arr[1]
count[key]++
}
END {
for(k in count) if(count[k] >= 50) print count[k], k
}' /var/log/nginx/access.log | sort -rn | head -20
One-Liner Reference: cut/sort/uniq/awk
# 1. HTTP method distribution (GET/POST/PUT/DELETE)
awk '{print $6}' /var/log/nginx/access.log | tr -d '"' | sort | uniq -c | sort -rn
# 2. Top 10 referer domains
awk '{print $11}' /var/log/nginx/access.log | grep -v '"-"' | \
cut -d/ -f3 | sort | uniq -c | sort -rn | head -10
# 3. Request count by hour (histogram)
awk '{print $4}' /var/log/nginx/access.log | \
cut -d: -f2 | sort | uniq -c
# 4. Top 10 URLs by total bandwidth (bytes)
awk '{bytes[$7]+=$10} END {for(u in bytes) print bytes[u], u}' /var/log/nginx/access.log | \
sort -rn | head -10
# 5. All request paths from a specific IP on a specific date
grep "28/Mar/2026" /var/log/nginx/access.log | grep "192.168.1.100" | awk '{print $7}'
# 6. HTTP protocol version distribution
awk '{print $8}' /var/log/nginx/access.log | tr -d '"' | sort | uniq -c
# 7. Detect zero-byte responses
awk '$10 == "0" || $10 == "-" {print $1, $7, $9}' /var/log/nginx/access.log | head -20
# 8. Average requests per minute
TOTAL=$(wc -l < /var/log/nginx/access.log)
MINUTES=$(awk '{print $4}' /var/log/nginx/access.log | cut -d: -f1-3 | sort -u | wc -l)
echo "Avg requests/min: $(echo "scale=1; $TOTAL / $MINUTES" | bc)"
JSON Log Format Configuration and jq Analysis
Nginx JSON Log Format
# /etc/nginx/nginx.conf
http {
log_format json_combined escape=json
'{'
'"time":"$time_iso8601",'
'"remote_addr":"$remote_addr",'
'"method":"$request_method",'
'"uri":"$uri",'
'"args":"$args",'
'"status":$status,'
'"bytes_sent":$body_bytes_sent,'
'"request_time":$request_time,'
'"upstream_response_time":"$upstream_response_time",'
'"referer":"$http_referer",'
'"user_agent":"$http_user_agent",'
'"x_forwarded_for":"$http_x_forwarded_for"'
'}';
access_log /var/log/nginx/access.json json_combined;
}
Apply the configuration:
nginx -t && systemctl reload nginx
Analyzing JSON Logs with jq
# Install jq
apt-get install -y jq # Debian/Ubuntu
yum install -y jq # CentOS/RHEL
# Filter 500 errors
jq 'select(.status == 500) | {time, uri, remote_addr}' /var/log/nginx/access.json
# Top 10 slowest requests
jq -s 'sort_by(.request_time) | reverse | .[0:10] | .[] | {uri, request_time, status}' \
/var/log/nginx/access.json
# Request count by status code
jq -s 'group_by(.status) | map({status: .[0].status, count: length}) | sort_by(.count) | reverse[]' \
/var/log/nginx/access.json
# All requests from a specific IP
jq 'select(.remote_addr == "192.168.1.100") | {time, method, uri, status}' \
/var/log/nginx/access.json
# Calculate average response time
jq -s '[.[].request_time] | add / length' /var/log/nginx/access.json
# List URIs with response time >= 1 second
jq 'select(.request_time >= 1.0) | .uri' /var/log/nginx/access.json | sort | uniq -c | sort -rn
Real-Time Log Monitoring
Basic tail -f
# Stream access log in real time
tail -f /var/log/nginx/access.log
# Stream error log in real time
tail -f /var/log/nginx/error.log
# Monitor both access and error logs simultaneously
tail -f /var/log/nginx/access.log /var/log/nginx/error.log
Combining tail -f with grep Filters
# Show only 5xx errors in real time
tail -f /var/log/nginx/access.log | grep ' 5[0-9][0-9] '
# Track requests from a specific IP in real time
tail -f /var/log/nginx/access.log | grep "203.0.113.5"
# Detect slow requests in real time (response time >= 2 seconds)
tail -f /var/log/nginx/access.log | awk '{
for(i=1;i<=NF;i++) if($i~/^rt=/) {split($i,a,"="); if(a[2]+0>=2.0) print $0}
}'
Periodic Statistics with watch
# Refresh error counts every 10 seconds
watch -n 10 "awk '\$9~/^[45]/{print \$9}' /var/log/nginx/access.log | sort | uniq -c"
# Refresh top 5 IPs every 10 seconds
watch -n 10 "awk '{print \$1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -5"
Tracking Logs Through Rotation
# --follow=name tracks by filename, automatically switches to the new file after rotation
tail --follow=name /var/log/nginx/access.log
# Merge current and previous rotated log
tail -f /var/log/nginx/access.log.1 /var/log/nginx/access.log
Pro Tip: Daily Log Report Automation
In production environments, register a daily report script in cron to automatically summarize the previous day's traffic every morning.
#!/bin/bash
# /usr/local/bin/daily-log-report.sh
# cron: 0 6 * * * /usr/local/bin/daily-log-report.sh >> /var/log/nginx/daily-report.log 2>&1
LOG="/var/log/nginx/access.log.1" # Yesterday's rotated log
DATE=$(date -d "yesterday" +%d/%b/%Y)
REPORT="/tmp/nginx-report-$(date +%Y%m%d).txt"
{
echo "===== Nginx Daily Report: $DATE ====="
echo ""
echo "[Total Requests]"
grep "$DATE" "$LOG" | wc -l
echo ""
echo "[Status Code Distribution]"
grep "$DATE" "$LOG" | awk '{print $9}' | sort | uniq -c | sort -rn
echo ""
echo "[Top 10 Client IPs]"
grep "$DATE" "$LOG" | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
echo ""
echo "[Top 10 URLs]"
grep "$DATE" "$LOG" | awk '{print $7}' | cut -d? -f1 | sort | uniq -c | sort -rn | head -10
echo ""
echo "[5xx Error URLs]"
grep "$DATE" "$LOG" | awk '$9~/^5/{print $7}' | sort | uniq -c | sort -rn | head -10
} > "$REPORT"
cat "$REPORT"
For JSON-format logs, replace the awk pipelines with jq queries for more structured and flexible reporting. This script can also be extended to send the report via email or a Slack webhook.