Amazon Scraper API

How to Bypass Amazon Captcha When Scraping in 2026

Updated at

Amazon serves a “Robot Check” page instead of real product HTML when it suspects automation. The page returns HTTP 200, so naive scripts parse it as product data and silently break. The fix is almost never about solving the CAPTCHA. It is about never triggering it in the first place, and the three levers that matter are the IP, the TLS fingerprint, and the request rate.

This guide shows how to detect the robot check, prevent it with residential proxies and curl_cffi browser impersonation, and what to do when prevention fails. All code is Python. If you want to skip every step of this and ship today, the Amazon Scraper API handles CAPTCHA, proxy rotation, and retries on its side and returns clean HTML or JSON for as little as $0.50 per 1,000 requests on Custom plans ($0.90 per 1,000 on pay-as-you-go).

The Answer

The Amazon CAPTCHA (officially “Robot Check”) is triggered by a mix of IP reputation, TLS fingerprint, header patterns, and request rate. Detect it by looking for the string “Robot Check” in the page title or the captcha query parameter in the final URL. Prevent it by combining a residential or mobile proxy pool with curl_cffi’s impersonate="chrome" mode and a realistic header set. If the page still shows, rotate the IP and back off rather than trying to solve the image puzzle. For most commercial workloads, offloading the problem to a managed Amazon scraper API is faster than maintaining your own stack.

What Is the Amazon CAPTCHA and Why Does It Appear?

The Amazon CAPTCHA is a “Robot Check” page that replaces product content with an image puzzle whenever Amazon’s bot detection flags a request as automated. The response still comes back with HTTP status 200, which is the single biggest trap for new scrapers. Your code fetches the URL, gets a success code, parses the HTML, and finds no #productTitle, no price, and no reviews. The page is not broken. You are being asked to prove you are human.

Amazon runs this check through AWS WAF Bot Control, a proprietary risk model, and an internal fingerprinting stack that inspects the TLS client hello, HTTP/2 frame order, the User-Agent string, Accept-Language, cookies, referral source, and request cadence. According to AWS’s own documentation, CAPTCHA challenges in AWS WAF are designed to distinguish humans from bots using a combination of puzzles that are hard for machines but easy for people. A single signal rarely trips the check. The combination does. That is why changing only the User-Agent or only the proxy almost never works on Amazon.

Four conditions reliably trigger the robot check:

  • Datacenter IP addresses, because Amazon sees the ASN (AS14061 DigitalOcean, AS16509 AWS itself, AS24940 Hetzner, and similar commercial ranges) in the first milliseconds of the TCP handshake and classifies the network origin before a single header arrives.
  • A Python or curl TLS fingerprint, because the default requests library and urllib3 negotiate a cipher suite order and ALPN extension set that no real browser produces.
  • Too many requests per minute from the same IP, especially on high-traffic listing pages and seller profiles.
  • Missing or inconsistent browser headers, particularly Sec-Fetch-*, Accept-Language, and the referer chain.

How Do You Detect an Amazon Robot Check Page in Python?

You detect an Amazon robot check page by inspecting the HTML for three markers: the title “Robot Check”, the phrase “Enter the characters you see below”, and the /errors/validateCaptcha path in the response URL or any inline form action. Any one of them is enough to confirm the CAPTCHA was served. Status code alone is useless because Amazon returns 200 in all three scenarios.

Here is a minimal detector you can drop into any scraper:

from curl_cffi import requests

ROBOT_MARKERS = (
 "Robot Check",
 "Enter the characters you see below",
 "/errors/validateCaptcha",
 "api-services-support@amazon.com",
)

def is_robot_check(html: str, final_url: str) -> bool:
 if "captcha" in final_url.lower():
 return True
 lowered = html.lower()
 return any(marker.lower() in lowered for marker in ROBOT_MARKERS)

def fetch(url: str, proxy: str | None = None) -> tuple[str, bool]:
 resp = requests.get(
 url,
 impersonate="chrome",
 proxies={"http": proxy, "https": proxy} if proxy else None,
 timeout=25,
 )
 blocked = is_robot_check(resp.text, str(resp.url))
 return resp.text, blocked

The is_robot_check function catches the three HTML fingerprints plus the captcha URL parameter Amazon uses on the challenge page. Do this check before every parse. A silent fail at this stage turns into hours of debugging later when you notice 40% of your price history is missing.

How Do You Prevent the Amazon CAPTCHA From Triggering?

You prevent the Amazon CAPTCHA from triggering by sending requests that look statistically indistinguishable from a real Chrome browser running on a residential IP. That means four things at once: a residential or mobile proxy, a browser-shaped TLS fingerprint, a full and consistent header set, and a realistic pacing. Fixing only one layer leaves the other three as detection signals.

Which Proxies Actually Work on Amazon?

Residential and mobile proxies are the only proxy types that work reliably on Amazon, because Amazon’s detection stack classifies IPs by ASN and blocks datacenter ranges before the HTTP layer ever runs. Independent benchmarks from Proxyway and Proxies.sx place residential success rates on Amazon at 85% to 99% depending on provider tier. Mobile proxies, which use CGNAT IPs shared by thousands of real phone users, hit 88% or higher and are the most reliable tier on protected pages like seller profiles, where one Proxies.sx benchmark recorded an 85% mobile success rate against 10% for datacenter.

Budget and enterprise residential pools are not interchangeable. A cheap residential pool that recycles IPs flagged on other scraping jobs can score below 50%. A premium pool with strict IP hygiene and sticky sessions scores above 95% on the same product pages. The aimultiple recent benchmarks show the gap costs real money at scale because a blocked request is not free. It wastes the bandwidth, the retry, the CAPTCHA-solve budget, and the database round trip that logs the failure.

Rotate IPs aggressively on product detail pages (one IP per request if possible) and keep a sticky session only when you need to paginate through search results that carry a session cookie. Never send more than 3 to 5 requests per minute from the same residential IP to the same subdomain.

How Do You Fix Your TLS Fingerprint With curl_cffi?

You fix your TLS fingerprint with curl_cffi by calling requests.get(url, impersonate="chrome"), which routes the request through a curl build that replays Chrome’s exact TLS client hello, JA3 hash, and HTTP/2 frame order. This is the single highest-leverage change you can make for Amazon. The default Python requests library produces a TLS handshake no real browser has ever sent, and Amazon’s WAF flags it in the first packet.

The curl_cffi project is a Python binding for curl-impersonate, an open-source patched curl build that replays real browser TLS fingerprints. It supports Chrome, Safari, Firefox, and Edge profiles across recent versions. Independent recent benchmarks show that TLS-fingerprint spoofing alone lifts success rates on basic anti-bot sites from roughly 30% (default Python) to 60 to 70%, and stacking it with residential proxies reaches 85 to 90% on most targets.

Install and use it like this:

# pip install curl_cffi
from curl_cffi import requests

resp = requests.get(
 "https://www.amazon.com/dp/B08N5WRWNW",
 impersonate="chrome",
 headers={
 "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
 "Accept-Language": "en-US,en;q=0.9",
 "Sec-Fetch-Dest": "document",
 "Sec-Fetch-Mode": "navigate",
 "Sec-Fetch-Site": "none",
 "Sec-Fetch-User": "?1",
 "Upgrade-Insecure-Requests": "1",
 },
 proxies={"https": "http://user:pass@residential.provider.com:8001"},
 timeout=25,
)

impersonate="chrome" auto-sets the User-Agent to a matching Chrome version. Do not override it with a mismatched string or you recreate the fingerprint inconsistency the library just fixed.

Which Headers Does Amazon Check?

Amazon checks Accept-Language, Accept-Encoding, User-Agent, the Sec-Fetch-* family, Upgrade-Insecure-Requests, and the referer chain. The detection is not about the presence of any single header, it is about whether the combination matches a real browser session. A request with a Chrome User-Agent, no Sec-Fetch-Dest, and Accept-Language: */* is a clearer bot signal than a request with an obviously wrong User-Agent.

Two rules cover most cases. First, let curl_cffi generate the headers for you via the impersonation profile and only add Accept-Language, which you should set to match the storefront locale (en-US for amazon.com, de-DE for amazon.de, en-GB for amazon.co.uk). Second, set a realistic Referer when you land on a product page after a search. Navigating straight to a deep product URL with no referer, no search history, and no cookies is a minority pattern among real users.

How Fast Can You Send Requests Without Tripping Detection?

You can send roughly one request every 2 to 5 seconds per residential IP without tripping detection, and no more than one request per 10 seconds per datacenter IP on pages that even accept datacenter traffic. The exact ceiling depends on the page type. Product detail pages tolerate higher rates than search result pages, and search pages tolerate higher rates than seller profile pages. Seller profile pages are the most aggressively protected surface on Amazon outside of the reviews endpoint.

For anything above a few hundred ASINs per hour, the practical answer is not to rate-limit harder but to run requests through a large proxy pool with one IP per request. A 1,000-IP residential pool running at 50 concurrent threads and 2 seconds per IP delivers 1,500 ASINs per minute without any single IP ever crossing the per-IP rate threshold. That is exactly how the Amazon Scraper API handles volume on the managed side: up to 50 concurrent on Pro (scales higher on Custom), 19 marketplaces, and a 2.6-second median latency per ASIN under steady load.

What Should You Do When the CAPTCHA Still Appears?

When the CAPTCHA still appears, the correct first move is to rotate to a fresh IP and retry, not to solve the puzzle. Amazon does not whitelist an IP after a single CAPTCHA solve, so the cost of solving is paid in dollars and latency without a lasting benefit. Retrying from a clean residential IP typically returns real HTML on the next attempt. If 2 or 3 retries all return the robot check, the issue is upstream (TLS fingerprint, header shape, IP pool quality), not the specific request.

Should You Solve the CAPTCHA or Retry With a New IP?

You should retry with a new IP in almost every case and only fall back to a CAPTCHA solver when retries also fail or when you are locked into a session that cannot be rebuilt. Retrying costs the price of one extra request. Solving an Amazon image CAPTCHA through a third-party service like 2Captcha or CapSolver costs $1 to $3 per 1,000 solves plus 10 to 30 seconds of latency per solve, and it still fails 10 to 20% of the time. For a scraper making tens of thousands of requests per day, the retry path is cheaper by two orders of magnitude.

Here is a retry pattern that pairs with the detector above:

import time
from curl_cffi import requests

def fetch_with_retry(url: str, proxy_pool: list[str], max_attempts: int = 4) -> str:
 for attempt in range(max_attempts):
 proxy = proxy_pool[attempt % len(proxy_pool)]
 try:
 resp = requests.get(
 url,
 impersonate="chrome",
 proxies={"https": proxy},
 timeout=25,
 )
 if not is_robot_check(resp.text, str(resp.url)):
 return resp.text
 except Exception:
 pass
 time.sleep(2 ** attempt)
 raise RuntimeError(f"Amazon CAPTCHA persisted across {max_attempts} attempts")

Exponential backoff with IP rotation is the single most effective pattern in Amazon scraping.

Does a Headless Browser With Stealth Plugins Help?

A headless browser with stealth plugins helps on pages that need real JavaScript execution, but it is slower, heavier, and not required for the vast majority of Amazon pages. Playwright with the playwright-stealth plugin, Puppeteer with puppeteer-extra-plugin-stealth, or SeleniumBase in its UC (undetected-chrome) mode all patch the automation fingerprints (navigator.webdriver, canvas/WebGL hashes, automation-related permissions) that Amazon uses to flag headless sessions.

The trade-off is cost. A real browser consumes 5 to 10x more RAM and roughly 3 to 5x more bandwidth than a curl_cffi request, and startup latency adds 1 to 3 seconds per session. For pricing, listing, and review scraping, the product HTML is rendered server-side and is fully available without JavaScript. That is why most production Amazon scrapers stay on a curl_cffi plus residential proxy stack and reach for a browser only on edge cases like certain AJAX-loaded price widgets or A/B-tested variant grids.

Is There a Faster Way to Avoid Amazon CAPTCHA Entirely?

The faster way to avoid Amazon CAPTCHA entirely is to delegate the whole unblock layer to a managed scraper API. You send the ASIN or the product URL, the API handles proxy rotation, TLS fingerprinting, header shaping, CAPTCHA detection, and retries on its side, and you get back clean HTML or structured JSON. You pay per successful request, not per failure, and you stop maintaining a proxy pool and a fingerprint library as part of your core product.

The Amazon Scraper API is built for exactly this. It charges $0.90 per 1,000 requests on pay-as-you-go (down to $0.50 per 1,000 on Custom plans), gives 1,000 free requests on signup for testing, supports up to 50 concurrent in-flight requests on paid plans, and covers 19 marketplaces including US, UK, DE, FR, IT, ES, NL, PL, SE, CA, MX, BR, AU, JP, SG, IN, TR, AE, and SA. Failed requests are not billed, so you do not pay for CAPTCHA pages. For high-volume batch work, the async endpoint accepts up to 1,000 ASINs per POST and delivers the results via webhook, which removes the need to orchestrate retries in your own code. A request to the sync endpoint looks like this:

import requests

resp = requests.get(
 "https://api.amazonscraperapi.com/api/v1/amazon/product",
 params={"query": "B08N5WRWNW", "domain": "com"},
 headers={"Authorization": "Bearer asa_live_YOUR_KEY"},
 timeout=30,
)
product = resp.json()
print(product["title"], product["price"])

The same request behind the scenes routes through a residential pool, replays a Chrome TLS fingerprint, detects any robot check, retries on a fresh IP, and only returns once the page parses cleanly.

Bypassing the Amazon CAPTCHA for public product data is generally legal in the United States, based on the Ninth Circuit ruling in hiQ Labs v. LinkedIn, which established that scraping publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA). That precedent covers product pages, prices, and reviews that any logged-out user can view. It does not cover scraping behind a login, purchase flows, or personal account data, and it does not override Amazon’s Conditions of Use, which prohibit automated access. Violating the Conditions of Use is a contract issue, not a criminal issue, but it can still result in IP bans, account suspension, and civil liability if Amazon decides to pursue it.

The practical, defensible posture for a commercial scraper is to scrape only data that a logged-out browser can see, respect robots.txt and rate limits, avoid any flow that requires a login, and do not resell raw scraped content in a way that competes with Amazon’s own products. If your use case is price monitoring, catalog enrichment, MAP compliance, or market research, you are in the mainstream of what the hiQ ruling protects. This is not legal advice, and anyone operating at scale should review the details with counsel in their jurisdiction.

FAQ

Does the Amazon CAPTCHA return HTTP 403 or 429?

No. Amazon returns HTTP 200 with the robot check HTML body, which is why naive scrapers silently fail. Always inspect the response body for "Robot Check" or /errors/validateCaptcha before parsing.

Can I solve the Amazon CAPTCHA with a vision model?

Technically yes, and projects like ziplokk1/scrapy-amazon-robot-middleware have used image recognition against the Amazon image puzzle. In practice, retry-with-new-IP is cheaper and faster, and commercial solvers like 2Captcha and CapSolver exist if you need the fallback.

Are free proxy lists usable on Amazon?

No. Free proxy lists are almost entirely datacenter IPs already flagged on Amazon, plus hijacked residential hosts that get rotated out within minutes. Success rates on Amazon from free lists run in the single digits.

What is the JA3 fingerprint and does Amazon use it?

JA3 is a hash of the TLS client hello that identifies the client software. Amazon’s WAF uses TLS fingerprinting signals including JA3 to flag non-browser clients. curl_cffi with impersonate="chrome" replays a Chrome JA3 hash, which is why it is more effective than requests plus a spoofed User-Agent.

Do I need a headless browser for Amazon price scraping?

No. Amazon renders product titles, prices, ratings, Buy Box seller, and availability server-side in the initial HTML. A curl_cffi request plus a residential proxy is sufficient for price scraping. Save the headless browser for AJAX-heavy edge cases.

How do I know when to upgrade from a DIY stack to a managed API?

Upgrade when the maintenance cost of proxies, fingerprints, and retries exceeds the per-request cost of a managed API. For most teams, that crossover happens between 50,000 and 500,000 requests per month. At $0.90 per 1,000 requests on pay-as-you-go (or $0.50 per 1,000 on Custom plans) on the Amazon Scraper API, a 200,000-request month costs $180 of API spend on PAYG or $100 on Custom versus several engineer-days of proxy and retry maintenance.

Sources