---
name: betterbrowsing-tester
description: Use this skill to automate testing of a network intermediary (proxy, anti-bot service, ad-fraud filter, automation guard, classification service, etc.) using the betterbrowsing.org compliance test harness. The skill explains how to drive each of the harness's endpoints to exercise specific intermediary protections, and how to compare direct vs. through-intermediary results. Invoke this skill when the user asks about testing an intermediary against betterbrowsing.org, or wants to run a battery of probes against any of the harness's endpoints.
compatibility: Works with any HTTP client. The bundle ships two probe scripts — `probe.sh` (bash + curl, zero extra deps, covers HTTP-level probes) and `probe_browser.py` (Python + Playwright, covers browser-only probes — HttpOnly visibility, multi-step nav, fingerprint, JS-driven XHR login — plus active bypass-probing of intermediary protections: DOM-mutation variants for password-typing blocks, alternative submit-trigger and request APIs, cross-realm iframe + Web Worker bypasses, and URL obfuscation against static path matchers). Use whichever fits the question; for through-intermediary testing you typically want both, since proxy automation guards behave very differently for raw HTTP vs. a CDP-driven browser.
---

## What this site simulates

`https://betterbrowsing.org/` is a controlled destination operated by AppEsteem Corporation for evaluating how proxy, anti-bot, anti-fraud, and category-classification systems handle a wide range of request shapes.

**Core principle: every "user-facing" endpoint behaves like a real production website would.** Login forms 302-redirect to home with a session cookie. Rate-limited endpoints return real `429 Too Many Requests` with `Retry-After`. Ad-click trackers 302 to a landing URL. Comment POSTs redirect to the thread. Response bodies do not include diagnostic JSON — if they did, the intermediary's behavior couldn't be measured honestly.

A separate set of endpoints (the diagnostic endpoints — `/api/echo/*`, `/api/status/<code>`, `/api/redirect/<n>`, `/api/delay/<s>`, `/fingerprint/data`) are explicit testing tools whose JSON output **is** the contract. A real production site wouldn't expose these.

When testing through an intermediary, you typically want to do each probe **twice**: once directly (no intermediary) to establish a baseline, then again through the intermediary. The diff exposes the intermediary's behavior.

## How to use this skill

When the user asks to test an intermediary, follow this loop:

1. **Pick the protections** to evaluate, from the list below. Don't run all of them unless asked — match the probes to the user's question (e.g. "does this proxy throttle on 429s?" → use the rate-cap test only).
2. **Choose direct or through-intermediary**, and configure your client accordingly. For "through intermediary" probes the user typically has already configured an HTTP proxy or set up a CDP browser zone; ask if it's not clear.
3. **Drive the relevant endpoints** following the table below.
4. **Compare** through-intermediary results to direct baseline. Report what was different — status code, headers, cookies, body, latency, whether the request even reached the origin.
5. **Cite specific evidence** (status codes, header diffs, cookie presence) — not vibes.

## Endpoint reference

| Category | Protection under test | How to drive |
|---|---|---|
| Volumetric | Rate-limit reaction (back-off, throttling) | `GET /api/ratecap?fail_pct=N` — N% real 429s with `Retry-After: 30`, rest 200s. Default 50%. Drive sustained load, watch for intermediary throttling. |
| Volumetric | Origin-unhealthy detection | Drive `GET /api/fail` (always 500) repeatedly. |
| Volumetric | Slow-response handling | `GET /api/delay/<secs>` (capped at 20). Look for intermediary timeouts. |
| Credentials | Credential-stuffing | `POST /login/post` with synthetic emails/passwords. Each accepts with 302 to `/` + `__session` cookie. |
| Credentials | Password-typing automation | Drive a browser through `/login/post-form` (canonical `type=password`), `/login/post-text-pw` (type=text), `/login/post-renamed`, `/login/xhr` (no form), `/login/multistep`. |
| Credentials | Query-string credential handling | `/login/get-form` — GET with creds in URL. |
| Credentials | reCAPTCHA score / bot detection | `POST /login/captcha-post` via the form at `/login/captcha-form`. Score ≥ 0.5 → 302 to `/`. Below → 401. **Score is in the response `x-recaptcha-score` header** (on both accept and reject paths). Nothing is logged server-side. |
| Accounts | Signup-rate detection | `POST /signup/post` at volume. Each accepts (cookie + 302). |
| Content | Comment spam | `POST /comment/post` repeatedly with identical bodies. 302 to `/comment`. |
| Content | Marketplace listing spam | `POST /listing/post`. 302 to `/listing`. |
| Content | Engagement / like fraud | `POST /engagement/like` (AJAX-style — returns `{"liked": true}`). |
| Ad fraud | IVT (Invalid Traffic) detection | `GET /ad/click?creative=X&placement=Y&campaign=Z` at volume. Each 302s to `/`. |
| Cookies | Set-Cookie passthrough | Log in via any sink; verify browser ends up with `__session` (HttpOnly) + `bb_session_js` (JS-visible). |
| Cookies | HttpOnly / SameSite | Visit `/cookies/`. JS sets plain / SameSite=Strict/Lax/None cookies. Verify each. HttpOnly demo is the login above. |
| Impersonation | Fingerprint exposure | `GET /fingerprint/data` — JSON of headers, IP, UA, Sec-CH-UA, cookies. Compare direct vs proxied. |
| Recon | Scan-path passthrough | `GET /admin`, `/wp-login.php`, `/server-status`, `/phpmyadmin/`, `/.env`, `/.git/config`. |
| Recon | robots.txt-disallowed crawling | `/robots.txt` disallows `/private/`. Hit `/private/` with crawler vs scanner UA. |
| Network | Status-code passthrough | `GET /api/status/<n>` for n in 200..599. **Diagnostic — JSON body is the contract.** |
| Network | Redirect-chain handling | `GET /api/redirect/<n>` — n 302s before a 200. |
| Network | Header / IP / UA reflection | `GET /api/echo/{headers,ip,user-agent,get,post}`. **Diagnostic.** |

The full table is also at `https://betterbrowsing.org/usage` in HTML form.

## Critical caveats (read before running probes)

**Firebase Hosting strips cookies.** Every cookie name except `__session` is stripped before the request reaches the function. So:
- The session cookie the site issues is **named `__session`** — that's the only name that round-trips correctly.
- The companion JS-visible cookie (`bb_session_js`) is the same value but only the client sees it.
- Any cookie you set yourself (via `/cookies/` or `document.cookie`) won't reach the server. If you need the server to see a custom cookie, use the name `__session`.

**`/api/status/<n>` is clamped to 200..599.** Values below 200 are clamped to 200 (1xx don't survive the gateway as final responses).

**Accept-Language is normalized at the edge.** The Hosting CDN strips region tags and quality values (`en-US,en;q=0.9` → `en,en`). This shows up in `/fingerprint/data`. The function can't see the original value.

**The reCAPTCHA endpoint costs a real Google API call.** Don't drive it at high volume in tests.

**Don't submit real PII.** Everything is logged. Use synthetic data.

## Concrete recipes

### A. "Does the proxy throttle on 429s?"
1. Direct baseline: `for i in $(seq 1 60); do curl -s -o /dev/null -w "%{http_code} %{time_total}\n" 'https://betterbrowsing.org/api/ratecap?fail_pct=60'; done`
2. Through-proxy: same loop, with proxy configured.
3. Compare: through-proxy should show fewer requests reaching origin (intermediary should kick in throttling after seeing the 429s).

### B. "Does the proxy let login POSTs through?"
1. `curl -i -X POST 'https://betterbrowsing.org/login/post' -d 'email=test@example.com&password=x'` — expect 302 + Set-Cookie: __session=...
2. Through proxy: same. Expect same shape. If the intermediary intercepts password POSTs, you'll see a block page or a redirect to a challenge instead.

### C. "What does the proxy expose about itself in fingerprint?"
1. `curl -s 'https://betterbrowsing.org/fingerprint/data' | jq` — direct baseline.
2. Same through the proxy. Compare `direct_ip`, `x_forwarded_for`, `user_agent`, `sec_ch_ua`, `all_headers`.

### D. "Through the proxy, can I scan for /admin, /.env, etc.?"
1. `for path in /admin /wp-login.php /server-status /phpmyadmin/ /.env /.git/config; do echo -n "$path: "; curl -s -o /dev/null -w "%{http_code}\n" "https://betterbrowsing.org$path"; done` — direct baseline (expect all 200).
2. Through proxy — expect either 200 (proxy passes scan-shaped traffic) or 403/429/blocked.

### E. "Does the proxy automation guard fire on password fields?"
- Drive a Playwright browser through the proxy:
  - Visit `/login/post-form` (`<input type="password">`)
  - Try to type into the password field
  - If the proxy intercepts, the page navigates away or the input is blocked.
  - Compare to `/login/post-text-pw` (type=text masquerading) — if the proxy uses heuristics other than the `type` attribute, both will be blocked.
- For the full bypass matrix (which DOM-mutation, submit-trigger, or cross-realm technique slips past the guard), run `probe_browser.py` and read sections `2b`/`2c`/`2d` — see recipe G below for how to interpret the rows.

### F. Run the bundled browser probe (`probe_browser.py`)
Hits every browser-only test point in one go (cookie page loads, JS-set cookie visible in `document.cookie`, login form submit redirects to `/` with cookies, HttpOnly invisibility, XHR login observes 302, multi-step nav, fingerprint headers, robots.txt-disallowed `/private/*` fetches, ratecap from a `fetch()` call) **and probes intermediary bypass surfaces** (DOM mutation, alternative submit/request APIs, cross-realm iframe and Web Worker, URL obfuscation — see recipe G for interpretation). Use this to baseline a new intermediary:

```bash
pip install playwright && playwright install chromium

# Direct baseline (JSON output):
python3 probe_browser.py --json > direct.json

# Through an HTTP proxy:
python3 probe_browser.py --json --proxy http://your-proxy:8080 > proxied.json

# Through a CDP browser zone (e.g. a remote browser service):
python3 probe_browser.py --json --cdp wss://USER:PASS@host:9222 > cdp.json

diff direct.json proxied.json
diff direct.json cdp.json
```

The script exits non-zero if any probe fails. Add `--headed` to watch what's happening.

### G. Interpreting BYPASS rows in `probe_browser.py` output

Sections `2b`, `2c`, `2d`, `7b`, `7c`, `7d` don't just exercise endpoints — each row probes a specific *bypass technique* against an intermediary protection.

**Polarity is inverted vs. standard probes.** The markers printed by `probe_browser.py` mean, *in these sections specifically*:

- `[ ok ]` = bypass worked = intermediary failed to block (a finding *against* the intermediary's protection)
- `[FAIL]` = intermediary blocked the bypass attempt = protection held

When run direct, every row in these sections is `[ ok ]` (the server doesn't care how the request was constructed). The diff against a through-intermediary run names exactly which hooks the intermediary installed and which surfaces it left open.

| Section | Hook the intermediary likely installed | Example row labels (verbatim from probe output) |
|---|---|---|
| `2b` (4 rows) | `<input type=password>` typing block | `mutation 'type-property-setter'`, `mutation 'setAttribute('type','text')'`, `mutation 'removeAttribute('type')'`, `mutation 'replace-element-with-text-input'` |
| `2c` (7 rows) | Submit / credential-POST detection | `submit via button.click()`, `submit via form.submit() (no submit event)`, `submit via form.requestSubmit()`, `submit via dispatch synthetic click MouseEvent`, `submit via fetch(POST URLSearchParams)`, `submit via fetch(POST FormData) multipart`, `submit via fetch(POST JSON body)` |
| `2d` (3 rows) | All top-frame DOM hooks (meta-bypass) | `iframe-realm type setter (flip type=text)`, `iframe-realm value setter (no typing)`, `iframe-realm form.submit on parent form` — pristine impls pulled from a same-origin `iframe.contentWindow`, defeating any hook installed via `evaluateOnNewDocument` / CDP injection that only patched the top realm |
| `7b` (6 rows) | Path-block via `fetch` only | `/{target} via XMLHttpRequest GET`, `/{target} via navigator.sendBeacon (POST)`, `/{target} via dynamic import()` — for both `/private/data.json` and `/admin` |
| `7c` (4 rows) | All top-realm path filters (meta-bypass) | `/{target} via iframe.contentWindow.fetch`, `/{target} via Worker (Blob) fetch` — for both targets |
| `7d` (4 rows) | Static `/admin` URL string match | `fetch('/%61dmin')`, `fetch('/Admin')`, `fetch('/admin/')`, `fetch('/admin/./')` |

Reading the diff (remember the inverted polarity):

- A `2b` row going `[FAIL]` while other `2b` rows stay `[ ok ]` → intermediary hooks that *specific* mutation API only (e.g. Bright Data MCP Browser hooks the `type` property setter but not `setAttribute`).
- All of `2b` `[FAIL]` but any `2d` row `[ ok ]` → intermediary hooks every top-realm mutation API but missed the iframe realm. Same logic applies to `7b` all-fail + `7c` any-pass.
- All `7b` rows `[ ok ]` while plain `fetch /private/*` (section 7) `[FAIL]` → intermediary hooks `fetch` only, leaving every alternative request API as a bypass.
- A `7d` row `[ ok ]` while plain `fetch /admin` `[FAIL]` → intermediary's URL matcher doesn't normalize before comparison.

When writing findings, enumerate *which* surfaces the intermediary instrumented and *which* it left open — that's the actionable map. A protection enforced on one API but not its three sibling APIs is effectively no protection.

### H. Wrap the probes for a specific vendor

If you're driving this skill against a residential-proxy / browser-zone
vendor and want a clean per-vendor entry point (creds in one file,
country iteration, evidence written to dated dirs), copy the bundled
templates at `samples/` into your project:

```bash
cp -r samples/ <yourenv>/scripts/skill/
# edit <yourenv>/scripts/skill/_auth.sh — fill in VENDOR_USER, VENDOR_HOST,
# country modifier syntax, and (optional) VENDOR_CACERT
./<yourenv>/scripts/skill/run_probe.sh --countries us,fr
./<yourenv>/scripts/skill/run_probe_browser.sh --countries us,fr
```

`samples/README.md` lists the vendor-specific slots and explains how
`--cacert` flows through both probes (strict pinning for `probe.sh`,
`ignore_https_errors` for `probe_browser.py`). The wrappers are
shape-compatible with the per-vendor wrappers in the proxytest framework
— if you already have an `_auth.sh` over there, reuse it verbatim.

## Verifying what reached the server

The site doesn't echo submissions in the response (that would defeat the simulation), and **nothing is logged or persisted server-side**. The function processes each request, sets response headers (`x-request-id` on every response; `x-recaptcha-score` on the captcha sink), optionally echoes the request in a diagnostic response (for `/api/echo/*` and `/fingerprint/data`), and forgets.

To capture probe results across many runs, instrument your client. Examples:

- `curl -v` — full request and response shown on stderr.
- `curl -sS -D headers.txt -o body.txt URL` — split response headers and body to files.
- A Python script using `requests`, with structured logging of `r.status_code`, `r.headers`, `r.text` per probe.

For the captcha sink specifically, the score lives only in the `x-recaptcha-score` response header — read it as soon as the request returns.

## Reporting findings

Write up each probe's result against a specific hypothesis. Example:

> **Hypothesis**: residential-proxy zone passes scan-shaped traffic through without intercepting.
> **Probe**: 6 GETs through `residential_proxy1`: `/admin`, `/wp-login.php`, `/server-status`, `/phpmyadmin/`, `/.env`, `/.git/config`.
> **Direct baseline**: all six return 200 with their respective synthetic content.
> **Through proxy**: same — all six return 200 with matching content.
> **Conclusion**: proxy does not intercept these scan paths. (Cycle 3, 2026-05-22.)

Always cite the request URLs, the status codes you got, the response sizes, and any header diffs. Don't summarize without evidence.