Cloudflare WAF Rules
Bullet‑Proofing Your Website: A Practical Guide to Custom Cloudflare WAF Rules
Running a public‑facing site today means battling everything from opportunistic credential‑stuffers to bandwidth‑hogging scrapers. Cloudflare’s managed WAF packs a punch, but the real power comes when you layer custom rules tailored to your threat model. Below I break down four battle‑tested rule sets our team uses in production complete with copy‑pastable expressions, deployment tips, and maintenance workflow.
TL;DR Skip the good bots, throttle the noisy crawlers, block the bad neighborhoods, and interrogate anyone who shows up at a login page from a VPN.
1 Allowed Bots (Bypass)
Let the legit automation do its job fast indexing, uptime checks, accessibility tests without wasting CPU cycles on extra challenges.
(
cf.client.bot
and cf.verified_bot_category in {
"Search Engine Crawler"
"Search Engine Optimization"
"Monitoring & Analytics"
"Advertising & Marketing"
"Page Preview"
"Academic Research"
"Security"
"Accessibility"
"Webhooks"
"Feed Fetcher"
}
)
or (
http.user_agent contains "letsencrypt"
and http.request.uri.path contains "/.well-known/acme-challenge/"
)
Action : Bypass Priority : 1
Why it matters
- SEO karma: Googlebot/Bingbot index without friction.
- Cert renewals: Let’s Encrypt’s ACME‑challenge passes unimpeded, preventing renewal failures.
- Performance: Good bots skip the heavier rules down‑stream, trimming eval time.
2 Limit Crawlers (Managed Challenge)
Yandex, Ahrefs, Semrush… use them or not, they can hammer your origin. We play nice if they solve a JS challenge, fine; otherwise they back off.
(
http.user_agent contains "yandex"
or http.user_agent contains "sogou"
or http.user_agent contains "semrush"
or http.user_agent contains "ahrefs"
or http.user_agent contains "baidu"
or http.user_agent contains "python-requests"
or http.user_agent contains "neevabot"
or http.user_agent contains "CF-UC"
or http.user_agent contains "sitelock"
or http.user_agent contains "mj12bot"
or http.user_agent contains "zoominfobot"
or http.user_agent contains "mojeek"
)
or (
(
http.user_agent contains "crawl"
or http.user_agent contains "spider"
or http.user_agent contains "bot"
)
and not cf.client.bot
)
or (
ip.src.asnum in {135061 23724 4808}
and http.user_agent contains "siteaudit"
)
Action : Managed Challenge Priority : 10
Why it matters
- Bandwidth control: The challenge forces scrapers to slow down dramatically.
- False‑positive safety: Verified bots (Google, Bing, etc.) bypass because
cf.client.bot == true
. - Quick edits: The
any()
function keeps the expression readable when you need to drop or add UAs.
3 Block Hostnets & Exploits (Block)
Some networks are a never‑ending source of spam and attacks. Combine that with classic exploit probes and rogue AI scrapers, then drop them cold.
(
ip.src.asnum in {
200373 198571 26496 31815 18450 398101 50673 7393 14061
205544 199610 21501 16125 51540 264649 39020 30083 35540
55293 36943 32244 6724 63949 7203 201924 30633 208046 36352
25264 32475 23033 31898 210920 211252 16276 23470 136907
12876 210558 132203 61317 212238 37963 13238 2639 20473
63018 395954 19437 207990 27411 53667 27176 396507 206575
20454 51167 60781 62240 398493 206092 63023 213230 26347
20738 45102 24940 57523 8100 8560 6939 14178 46606 197540
397630 9009 11878
}
)
or (
cf.verified_bot_category in {
"AI Crawler"
"Other"
}
)
or (
http.request.uri.path contains "/xmlrpc.php"
or http.request.uri.path contains "/wp-config.php"
or http.request.uri.path contains "/wlwmanifest.xml"
)
or (
ip.src.country in {
"T1"
}
)
Action : Block Priority : 20
Why it matters
- Zero mercy: Known hostile ASNs and TOR‑heavy tiers get blocked instantly.
- Exploit probes stopped early: Common attack paths never hit PHP.
- AI scrapers: Block the “AI Crawler” category until you decide if they can crawl.
4 VPN & Login Protection (Managed Challenge)
Credential‑stuffers love VPNs + /login
. Throw them a JavaScript puzzle first if they’re legit users, they’ll pass.
(
ip.src.asnum in {
60068 9009 16247 51332 212238 131199 22298 29761
62639 206150 210277 46562 8100 3214 206092 206074
206164 213074
}
)
or (
http.request.uri.path contains "login"
)
Action : Managed Challenge Priority : 30
Why it matters
- Brute‑force dampener: VPN/proxy networks see a CAPTCHA before the login form renders.
- Wide coverage: Regex picks up
/user/login
,/admin/login
, etc., not justwp-login.php
.
Deployment Workflow (10‑Minute Setup)
- Create Rules – In the Cloudflare dashboard or via Terraform/API, paste each expression with its action and priority.
- Tag for Audit – Add a label like
managed_by=SOP-CF-WAF-2025
so you can filter events later. - Smoke Test – Use Firewall → Tools → Simulate to run sample requests and confirm expected outcomes.
- Monitor – Check Security → Events daily for the first week; adjust if legitimate services get caught.
Keeping It Fresh
- Weekly: Glance at logs for new aggressive UAs add them to Limit Crawlers if needed.
- Quarterly: Refresh the hostile ASN list from your threat‑intel feed.
- Incident‑Driven: If a zero‑day starts hitting
wp-json
tomorrow, clone Block Hostnets and add a new regex go live in 60 seconds.
Final Thoughts
Cloudflare’s managed rules are a great foundation, but bespoke rules close the gaps specific to your stack and traffic profile. Start with the four above, tune them for your environment, and you’ll neutralize >90% of noisy traffic before it touches your server.