State of Amazon Scraping 2026
Published 2026-05-08 | 24 min read | 5 interactive charts | 50+ cited sources
This is a field report on Amazon scraping in 2026 - what changed in the last 18 months, who actually wins on success rate, where prices really sit, and what the demand picture looks like. Every number below has a source, every conflict is flagged. Skip to the section you need; the contents are independent.
TL;DR (the 60-second version)
- Amazon hardened its perimeter three times in 9 months. Reviews went behind a login wall on 2024-11-05; AWS WAF added native JA4 TLS fingerprinting on 2025-03-06; robots.txt blocked Meta/Google/Mistral/Huawei AI crawlers in August 2025.
- TLS fingerprinting is the dominant chokepoint. Vanilla Python
requestsreturns roughly 2% success on Amazon.curl_cffiwith a real Chrome impersonation profile returns ~94% on the same workload. A 47x improvement from one library swap. - There is a 70x price spread on residential bandwidth. The cheapest list-rate sells at $0.49/GB, the most expensive entry-tier at $8.40/GB and only drops below $4 above 332GB.
- There is a 20x price spread on Amazon scrape APIs. Scrape.do publishes $0.12/1k requests; ScraperAPI's Hobby plan works out to $2.45/1k.
- Independent benchmarks put Amazon success rate around 93%. Proxyway 2025 measured 11 providers at 2 req/s on 15 protected sites; Zyte led at 93.14% overall and 93.30% on Amazon as a single target. Vendor-self-published benchmarks claim 100%, but no truly neutral 2026 benchmark exists.
- AI is the demand engine, not the disruptor. 70% of LLMs are trained on scraped data. GPTBot crawl traffic was up 305% YoY in May 2025. User-driven AI scraping grew 15x in 2025. Only 33.8% of practitioners say they will not try AI-assisted scraping (Apify 2026 survey).
- Reviews stopped being scrapable for most providers. Pre-November 2024 you could pull up to 500 reviews per ASIN across paginated pages. Today, unauthenticated requests see 3-4 reviews and the
product-reviews/URL pattern redirects to login. - The legal floor is "log out and you're a scraper, log in and you're a contract-breacher." Meta v. Bright Data (2024-01-23) confirmed logged-out public scraping does not violate platform terms. hiQ v. LinkedIn settled in December 2022 on breach-of-contract: $500K, all data deleted, permanent injunction.
Why this report exists
Most "state of scraping" reports are vendor blog posts. They cite themselves. They quote market sizes from one analyst firm and present "98% success rate" without disclosing that the benchmark was their own.
This report uses publicly cited numbers from Proxyway, AIMultiple, Mordor Intelligence, Cloudflare, Imperva, Apify, AWS, the courts, and the actual scraper APIs themselves. Conflicts between sources are flagged. Where no defensible source exists, the gap is named.
It focuses specifically on Amazon (not "web scraping" broadly) because Amazon is the most-scraped commercial site in the world and the one whose anti-bot stack changed the most in 2024-2025.
Amazon's hardening timeline
Three discrete, dated events in 9 months reshaped how Amazon scraping works.
2024-11-05: the review lockdown
Amazon locked nearly all product reviews behind a login wall. The https://www.amazon.com/product-reviews/{ASIN} URL pattern, which previously paginated through hundreds of reviews per product, now redirects unauthenticated requests to the login page. Logged-out users see 3-4 reviews per product and nothing more.
Industry coverage names Bright Data as one of the providers visibly stuck on the change. Cookie-pool / authenticated scraping became the only practical path for review extraction at scale.
2025-03-06: AWS WAF makes JA4 fingerprinting click-deployable
AWS announced general availability of JA4 fingerprinting and JA3/JA4 as rate-based-rule aggregation keys for AWS WAF on March 6, 2025. JA4 is the next-generation TLS-handshake fingerprinting standard that supersedes JA3.
The empirical impact on Amazon was already visible by late 2025. A technical deep-dive published 2025-12-29 measured: vanilla Python requests returned ~2% success on Amazon. curl_cffi configured with a Chrome impersonation profile returned ~94% on the same URLs.
The 47x library swap
Amazon success rate on identical URLs, vanilla requests vs curl_cffi with Chrome impersonation.
| Label | Amazon success rate (%) |
|---|---|
| Vanilla Python requests | 2.00% |
| curl_cffi (Chrome impersonate) | 94.00% |
2025-08: AI crawler robots.txt block
In August 2025, Amazon updated its robots.txt to explicitly disallow crawlers from Meta, Google, Huawei, Mistral, and other AI companies. Combined with the Conditions of Use update on 2025-05-30 banning automated access, the contract-side stance is now explicit.
The market: what is actually known
Five major firms publish a "global web scraping market size" number for 2025. They disagree by 4x. The honest line is "estimates range $0.5-4B for 2025; CAGR consensus is 13-18%." See our Global Web Scraping Market Forecast for the multi-source aggregation, methodology, and projection scenarios.
What is more defensible than market size is the demand picture:
- 81% of US retailers use automated price scraping for dynamic repricing, up from 34% in 2020 (Mordor 2026).
- 82% of e-commerce companies use web scraping to collect competitive data (GroupBWT 2026).
- Retail and e-commerce ~37% of total web scraping market activity (same source).
- Price monitoring alone = 25.8% of all scraping applications.
- 65% of enterprises used scraping to feed AI/ML projects in 2024 (Mordor).
Bot traffic context: Imperva's 2025 Bad Bot Report found 51% of all 2024 web traffic was automated - the first year on record where bots exceeded humans. Bad bots specifically were 37% of internet traffic, up from 32% in 2023.
The price spread story
Two price spreads are large enough to be the headline of any procurement deck.
Residential proxy bandwidth: 70x range
Residential proxy entry-tier price ($/GB, 2026)
Lower is better. Same product (residential rotating proxies), 70x ratio between cheapest and most expensive entry tier.
| Label | Entry $/GB |
|---|---|
| Evomi | $0.49 |
| IPRoyal PAYG | $1.75 |
| SOAX (smallest) | $4 |
| Smartproxy | $8.5 |
| Oxylabs (10GB) | $8 |
| Bright Data (10GB) | $8.4 |
That's a 70x ratio between the cheapest and most expensive entry tier on the same product. JoinMassive characterizes the broader range as "$2-15/GB with most reputable providers $5-10/GB". Volume tiers narrow the gap: Bright Data falls to $3.30/GB at 10TB; Oxylabs and Smartproxy reach similar floors at high commit. For anyone scraping less than 1TB/month, residential bandwidth pricing is the single largest variable cost line.
Per-1k Amazon scrape pricing: 20x range
Per-1,000 Amazon-scrape entry pricing ($/1k, 2026)
Lower is better. Reflects packaging more than technology - all top providers cluster near 95% success.
| Label | Entry price ($/1k) |
|---|---|
| Scrape.do | $0.12 |
| ScrapingBee | $0.2 |
| ScrapingDog | $0.2 |
| Decodo | $0.25 |
| ScraperAPI Enterprise | $0.475 |
| Oxylabs Micro | $0.5 |
| Bright Data | $0.75 |
| ZenRows | $1 |
| WebScrapingAPI | $2.45 |
| ScraperAPI Hobby | $2.45 |
| Apify entry | $6.67 |
The procurement headline: the price you pay for the same Amazon scrape varies by 20x and the variance reflects packaging, not technology. If you have a budget per 1,000 requests, your provider choice matters more than your volume.
Provider success rates (with caveats)
Three benchmarks dominate citations in 2026: Proxyway 2025, AIMultiple 2026, and Scrape.do 2026. Two are vendor-funded; one is a vendor-research firm with paid access. No truly neutral third-party benchmark exists.
Proxyway 2025: provider success rate by target site
Average across 11 providers at 2 req/s. Amazon at 93.30% sits between Walmart and Google - mid-difficulty, not hardest.
| Label | Success rate (%) |
|---|---|
| Zillow | 97.85% |
| 94.78% | |
| Amazon | 93.30% |
| Walmart | 93.05% |
| Hyatt | 43.75% |
| G2 | 36.63% |
| Shein | 21.88% |
Amazon at 93.30% sits near the easy end alongside Zillow (97.85%), Google (94.78%), and Walmart (93.05%). Amazon is mid-difficulty, not hardest. The narrative that Amazon is uniquely difficult is provider marketing.
AIMultiple 2026: data-depth comparison
Fields extracted per Amazon product page (AIMultiple 2026)
Higher is more comprehensive. Industry reference: ~350 fields. Bright Data and Apify price themselves accordingly.
| Label | Fields per product |
|---|---|
| Bright Data | 686 |
| Apify | 577 |
| Industry reference | 350 |
| Decodo | 286 |
| Zyte | 131 |
The choice is not "who is most accurate" - everyone above 95% is in the same band - it's "how rich is the extracted record." AIMultiple's broader e-commerce benchmark across 1,700 URLs puts Oxylabs at 98.50%, Zyte at 98.38%, Bright Data at 97.90%, Decodo at 96.29%.
What people scrape Amazon for in 2026
The use-case mix shifted in 2024-2025 because of the review lockdown. The current ranking, by approximate share of Amazon scraping spend:
- Price monitoring and dynamic repricing. The dominant use case. 81% of US retailers automate price scraping. Repricer market = $710M (2025).
- Catalog and dataset enrichment. 200M+ Amazon-product datasets refreshed daily on standard plans, hourly on enterprise.
- MAP compliance. Brand protection, hijack detection. Vendors monitor 10,000+ sellers each across 195 countries.
- Review monitoring (collapsed since 2024-11-05). Pre-lockdown: full review extraction. Post-lockdown: 3-4 visible reviews per ASIN as the ceiling for unauthenticated scraping.
- BSR tracking and stock alerts. Standard feature in seller-software stacks.
- Affiliate / Amazon Associates site population. Smaller share. Mostly served by Amazon's official APIs (PAAPI, deprecated in favor of Creators API).
- AI training and alternative-data feeds. Distinct because the customer is not retail-facing. 67% of US investment advisers use scraping in alternative-data programs.
The legal landscape
Two cases set the modern frame for US scraping law.
hiQ v. LinkedIn (final status: December 2022)
The Ninth Circuit affirmed twice that scraping public data does not violate the CFAA. hiQ then lost on breach of contract and settled on 2022-12-06 with permanent injunction, deletion of all scraped data and derived algorithms, and $500,000 in damages.
Meta v. Bright Data (2024-01-23)
Judge Edward Chen (N.D. Cal.) granted Bright Data summary judgment. Meta's terms only bind logged-in users, so logged-out scraping of public Facebook and Instagram pages is fine. Meta dropped the remaining tortious-interference claim on 2024-02-23 and waived its right to appeal.
Applied to Amazon: scraping public product pages without logging in is on solid ground. Anything that requires authentication - including Amazon reviews post-November 2024 - is a contract question first.
GDPR and CCPA on review data
GDPR penalties run up to €20M or 4% of global turnover. Per CNIL (France) 2025 guidance, even "public" pages may contain personal data requiring GDPR safeguards. The scope explicitly includes user reviews. 86% of organizations increased compliance spending in 2024.
The toolchain weapons race
Three libraries quietly won the practitioner-stack war between 2023 and 2026.
Headless browser weekly npm downloads (early 2026)
Playwright pulled away from Puppeteer/Selenium 5x by weekly downloads.
| Label | Weekly npm downloads (millions) |
|---|---|
| Playwright | 33 |
| Puppeteer | 6 |
| Cypress | 6.5 |
| Selenium WebDriver | 0.5 |
- Playwright: 33M weekly npm downloads. Job postings up 180% YoY in 2025.
- curl_cffi: 5.6k GitHub stars, 480 forks. The de facto Python TLS-impersonation tool.
- undici: 72-97M weekly npm downloads. Powers Node.js's built-in
fetch()from v18.
Apify's 2026 survey on practicing scrapers: Python 69.6% dominance. Library mix: BeautifulSoup 43.5%, Crawlee 34.8%, Selenium 26.1%, Playwright 26.1%. 46.7% rely exclusively on internal/in-house code; 41.7% combine internal + external tools.
AI is the demand engine, not the disruptor
Two questions get asked about AI's effect on scraping. The data answers both.
- 70% of all generative AI / LLM models are trained primarily on scraped web data (Tendem.ai 2026).
- 65% of enterprises used scraping to feed AI/ML projects in 2024 (Mordor 2026).
- GPTBot share grew from 5% (May 2024) to 30% (May 2025) - request volume +305% YoY.
- AI training crawl traffic is ~8x search-crawl volume and ~32x user-triggered crawling.
- User-driven AI bot crawling grew 15x in 2025.
- Apify 2026: 54.2% of practitioners do not currently use AI in scraping. 45.8% do. 66.2% plan to try AI-assisted scraping. Among current AI users: 72.7% report productivity gain, 100% plan to increase AI tool usage.
Headline case study: one enterprise replaced a 15-person manual scraping team with an AI-driven system; year-1 cost dropped from $4.1M to $270K, data accuracy went from 71% to 96%.
Practical recommendations
If you are buying scraping services
- Pick by total cost per 1,000 successful Amazon scrapes at your actual volume, not by entry-tier per-1k price.
- Treat published vendor success rates as marketing; lean on Proxyway 2025 for cross-comparison and AIMultiple 2026 for field-depth.
- Verify your provider routes Amazon scrapes through residential exits matched to the marketplace TLD.
If you are building on top
- TLS impersonation is non-optional. Use curl_cffi / impit / cycletls or pay an API that handles it.
- Cache aggressively at the application layer.
- Reviews are no longer a public dataset.
If you are scraping internally
- Default to a paid scraper API for production Amazon workloads under ~5M/month.
- Track success rate, p95 latency, and per-1k cost as your three quality SLOs.
Methodology and data caveats
All numbers in this report are sourced from publicly cited reports, vendor disclosures, court filings, or product documentation as of 2026-05-08. Where multiple credible sources conflict, all are cited and the conflict is flagged. Where no defensible source exists for a claim worth checking, the gap is named.
Open data gaps that we did not fill:
- Empirical gzipped wire bytes per Amazon product page (we measured ~250KB on our own 2026 traffic; no public benchmark located).
- Amazon storefront's actual per-IP rate threshold (deliberately undocumented).
- Customer counts or revenue disclosures for top-5 scraping APIs.
Spot a claim that drifts or a source that is now stale? Drop a note - this report refreshes annually.
About the publisher
Amazon Scraper API is a managed Amazon scraping API priced at $0.90 per 1,000 successful requests on pay-as-you-go, with 1,000 free per month on signup. Used by repricers, dataset operators, and AI training pipelines.
Try the free ASIN lookup tool, read the methodology behind our extractor, or see how we benchmark against the rest of the field.