Skip to content

Cloudflare WAF Rules

Bullet‑Proofing Your Website: A Practical Guide to Custom Cloudflare WAF Rules

Running a public‑facing site today means battling everything from opportunistic credential‑stuffers to bandwidth‑hogging scrapers. Cloudflare’s managed WAF packs a punch, but the real power comes when you layer custom rules tailored to your threat model. Below I break down four battle‑tested rule sets our team uses in production complete with copy‑pastable expressions, deployment tips, and maintenance workflow.

TL;DR Skip the good bots, throttle the noisy crawlers, block the bad neighborhoods, and interrogate anyone who shows up at a login page from a VPN.


1 Allowed Bots (Bypass)

Let the legit automation do its job fast indexing, uptime checks, accessibility tests without wasting CPU cycles on extra challenges.

wfl
(
  cf.client.bot
  and cf.verified_bot_category in {
    "Search Engine Crawler"
    "Search Engine Optimization"
    "Monitoring & Analytics"
    "Advertising & Marketing"
    "Page Preview"
    "Academic Research"
    "Security"
    "Accessibility"
    "Webhooks"
    "Feed Fetcher"
  }
)
or (
  http.user_agent contains "letsencrypt"
  and http.request.uri.path contains "/.well-known/acme-challenge/"
)

Action : Bypass  Priority : 1

Why it matters

  • SEO karma: Googlebot/Bingbot index without friction.
  • Cert renewals: Let’s Encrypt’s ACME‑challenge passes unimpeded, preventing renewal failures.
  • Performance: Good bots skip the heavier rules down‑stream, trimming eval time.

2 Limit Crawlers (Managed Challenge)

Yandex, Ahrefs, Semrush… use them or not, they can hammer your origin. We play nice if they solve a JS challenge, fine; otherwise they back off.

wfl
(
  http.user_agent contains "yandex"
  or http.user_agent contains "sogou"
  or http.user_agent contains "semrush"
  or http.user_agent contains "ahrefs"
  or http.user_agent contains "baidu"
  or http.user_agent contains "python-requests"
  or http.user_agent contains "neevabot"
  or http.user_agent contains "CF-UC"
  or http.user_agent contains "sitelock"
  or http.user_agent contains "mj12bot"
  or http.user_agent contains "zoominfobot"
  or http.user_agent contains "mojeek"
)
or (
  (
    http.user_agent contains "crawl"
    or http.user_agent contains "spider"
    or http.user_agent contains "bot"
  )
  and not cf.client.bot
)
or (
  ip.src.asnum in {135061 23724 4808}
  and http.user_agent contains "siteaudit"
)

Action : Managed Challenge  Priority : 10

Why it matters

  • Bandwidth control: The challenge forces scrapers to slow down dramatically.
  • False‑positive safety: Verified bots (Google, Bing, etc.) bypass because cf.client.bot == true.
  • Quick edits: The any() function keeps the expression readable when you need to drop or add UAs.

3 Block Hostnets & Exploits (Block)

Some networks are a never‑ending source of spam and attacks. Combine that with classic exploit probes and rogue AI scrapers, then drop them cold.

wfl
(
  ip.src.asnum in {
    200373 198571 26496 31815 18450 398101 50673 7393 14061
    205544 199610 21501 16125 51540 264649 39020 30083 35540
    55293 36943 32244 6724 63949 7203 201924 30633 208046 36352
    25264 32475 23033 31898 210920 211252 16276 23470 136907
    12876 210558 132203 61317 212238 37963 13238 2639 20473
    63018 395954 19437 207990 27411 53667 27176 396507 206575
    20454 51167 60781 62240 398493 206092 63023 213230 26347
    20738 45102 24940 57523 8100 8560 6939 14178 46606 197540
    397630 9009 11878
  }
)
or (
  cf.verified_bot_category in {
    "AI Crawler"
    "Other"
  }
)
or (
  http.request.uri.path contains "/xmlrpc.php"
  or http.request.uri.path contains "/wp-config.php"
  or http.request.uri.path contains "/wlwmanifest.xml"
)
or (
  ip.src.country in {
    "T1"
  }
)

Action : Block  Priority : 20

Why it matters

  • Zero mercy: Known hostile ASNs and TOR‑heavy tiers get blocked instantly.
  • Exploit probes stopped early: Common attack paths never hit PHP.
  • AI scrapers: Block the “AI Crawler” category until you decide if they can crawl.

4 VPN & Login Protection (Managed Challenge)

Credential‑stuffers love VPNs + /login. Throw them a JavaScript puzzle first if they’re legit users, they’ll pass.

wfl
(
  ip.src.asnum in {
    60068 9009 16247 51332 212238 131199 22298 29761
    62639 206150 210277 46562 8100 3214 206092 206074
    206164 213074
  }
)
or (
  http.request.uri.path contains "login"
)

Action : Managed Challenge  Priority : 30

Why it matters

  • Brute‑force dampener: VPN/proxy networks see a CAPTCHA before the login form renders.
  • Wide coverage: Regex picks up /user/login, /admin/login, etc., not just wp-login.php.

Deployment Workflow (10‑Minute Setup)

  1. Create Rules – In the Cloudflare dashboard or via Terraform/API, paste each expression with its action and priority.
  2. Tag for Audit – Add a label like managed_by=SOP-CF-WAF-2025 so you can filter events later.
  3. Smoke Test – Use Firewall → Tools → Simulate to run sample requests and confirm expected outcomes.
  4. Monitor – Check Security → Events daily for the first week; adjust if legitimate services get caught.

Keeping It Fresh

  • Weekly: Glance at logs for new aggressive UAs add them to Limit Crawlers if needed.
  • Quarterly: Refresh the hostile ASN list from your threat‑intel feed.
  • Incident‑Driven: If a zero‑day starts hitting wp-json tomorrow, clone Block Hostnets and add a new regex go live in 60 seconds.

Final Thoughts

Cloudflare’s managed rules are a great foundation, but bespoke rules close the gaps specific to your stack and traffic profile. Start with the four above, tune them for your environment, and you’ll neutralize >90% of noisy traffic before it touches your server.