Skip to content

Blocked IP and Logging

A append-only, Common Log Format (CLF)-inspired specification for recording blocked IP addresses. Site owners can use standardize how abusive requests are logged, rotated, and analyzed.

1. Introduction

Blocked IP logging captures every flagged (abusive) client request in monthly text files. By emulating CLF, operators leverage familiar tooling (grep, awk, logrotate, ELK, Splunk, etc.) without custom parsers. This document defines the log format, file location, rotation guidance, analysis tips, and customization hooks.

2. Purpose

  • Visibility: Maintain an auditable record of all abuse events with automatic monthly organization.
  • Simplicity: Use a single line per event, ready for CLI or log‑analysis platforms.
  • Scalability: Monthly file rotation prevents individual files from growing too large.
  • Extensibility: Allow additional metadata fields if needed.

3. Log File Location

The middleware writes to monthly log files organized by year and month:

<APP_ROOT>/var/logs/YYYY/MM/blocked_ips.log

Examples:

  • <APP_ROOT>/var/logs/2025/05/blocked_ips.log (May 2025)
  • <APP_ROOT>/var/logs/2025/06/blocked_ips.log (June 2025)

Ensure the directory structure is writable by the application process. The middleware automatically creates directories as needed with 0755 permissions. Operators should protect these files from unauthorized access and include them in backup schedules only if compliance requires.

4. Log Entry Format

Entries follow this CLF-derived pattern:

%h - - [%t] "BLOCKEDIP %U" %s %b "%{User-Agent}i"
PlaceholderMeaning
%hRemote host (client IP)
- -Identity and user (RFC 1413, unused)
[%t]Timestamp in dd/Mon/yyyy:HH:mm:ss +0000 (e.g., [27/May/2025:20:15:00 +0000])
"BLOCKEDIP %U"Literal BLOCKEDIP action and raw request URI path
%sHTTP status code (429)
%bResponse size in bytes (0 for blocked responses)
"%{User-Agent}i"Quoted and sanitized User‑Agent header

4.1. Example Entry

203.0.113.45 - - [27/May/2025:20:15:00 +0000] "BLOCKEDIP /wp-login.php" 429 0 "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"

4.2. User-Agent Sanitization

User-Agent headers are automatically sanitized for safe logging:

  • Control characters (\x00-\x1F, \x7F) are removed
  • Double quotes, newlines, carriage returns, and backslashes are escaped
  • Empty User-Agent headers are replaced with -

5. Integration

The BlockIpMiddleware handles all logging automatically when an upstream middleware sets the is_abusive request attribute to true. No additional integration is required - the framework:

  • Validates IP addresses before logging
  • Sanitizes User-Agent headers for safe storage
  • Creates monthly directories automatically
  • Handles file writing with proper locking and error recovery
  • Logs events to the framework logger for monitoring

6. Log Rotation & Retention

6.1. Automatic Monthly Rotation

The framework's middleware automatically creates new monthly files, eliminating the need for complex rotation scripts. However, you may still want to implement retention policies.

6.2. Cleanup Script Example

bash
#!/bin/bash
# Remove blocked IP logs older than 6 months
find /var/storage -name "blocked_ips.log" -path "*/????/??/*" -mtime +180 -delete

6.3. Logrotate Configuration (Optional)

For additional compression of older files:

/var/storage/*/??/blocked_ips.log {
    monthly
    rotate 6
    missingok
    notifempty
    compress
    delaycompress
    nocopytruncate
    postrotate
        # Optional: signal application to refresh file handles
    endscript
}

7. Parsing & Analysis

7.1. Command Line Examples

bash
# Count total blocks for current month
grep "BLOCKEDIP" /var/storage/$(date +%Y/%m)/blocked_ips.log | wc -l

# Top 10 blocked IPs this month
awk '{print $1}' /var/storage/$(date +%Y/%m)/blocked_ips.log | sort | uniq -c | sort -nr | head -10

# Blocks by hour for today
grep "$(date +%d/%b/%Y)" /var/storage/$(date +%Y/%m)/blocked_ips.log | awk -F: '{print $2}' | sort | uniq -c

# Find all blocks for specific IP across all months
find /var/storage -name "blocked_ips.log" -exec grep "203.0.113.45" {} + | wc -l

7.2. Log Analysis Tools

  • Splunk/ELK: Ingest as CLF with custom pattern for BLOCKEDIP %U
  • GoAccess: Use --log-format=COMBINED with custom time format
  • AWStats: Configure custom log format for blocked requests

8. Monitoring & Alerting

8.1. Disk Usage Monitoring

bash
# Alert if any monthly log exceeds 100MB
find /var/storage -name "blocked_ips.log" -size +100M -exec echo "Large blocked IP log: {}" \;

8.2. Abuse Pattern Detection

bash
# Alert if more than 1000 blocks in last hour
recent_blocks=$(grep "$(date -d '1 hour ago' +'%d/%b/%Y:%H')" /var/storage/$(date +%Y/%m)/blocked_ips.log | wc -l)
if [ $recent_blocks -gt 1000 ]; then
    echo "High abuse detected: $recent_blocks blocks in last hour"
fi

9. Configuration

The framework produces a standardized log format and monthly file organization. The framework's middleware implementation is designed as a complete solution and does not require customization for typical use cases. All configuration is handled through the framework's existing middleware chain and request attributes.

10. Troubleshooting

IssueCause & Fix
Directory creation failsCheck filesystem permissions; ensure parent directories exist
Missing entriesVerify is_abusive flag is set; check IP validation logic
File write failuresMonitor disk space; check file permissions and locks
Large file sizesReview retention policies; consider archive compression
Invalid log entriesEnsure User-Agent sanitization is working properly

Performance Considerations

  • File locking: LOCK_EX provides safety but may cause brief delays under high load
  • Directory structure: Monthly organization prevents filesystem performance issues with large directories
  • Buffer size: For high-traffic sites, consider buffering writes or using async logging
  • Monitoring overhead: Log analysis should run during off-peak hours for large datasets