Amazon Scraper API

How to Scrape Amazon Best Sellers with Python (2026 Guide)

The Answer

Scraping Amazon Best Sellers means fetching a category chart at amazon.com/Best-Sellers-<Category>/zgbs/<node>/ and parsing the ranked grid of 100 products (two pages of 50) for rank, ASIN, title, price, and rating. The catch that breaks most tutorials: the grid is JavaScript-rendered and lazy-loaded, so a plain requests plus BeautifulSoup call returns only about 30 of them, and getting all 100 needs a headless browser (Playwright or Selenium) or a rendering API. Once you have the chart’s ASINs, Amazon Scraper API turns them into full product records at $0.90 per 1,000 successful requests, with 1,000 free on signup.

What Are Amazon Best Sellers?

Amazon Best Sellers are Amazon’s hourly-refreshed charts of the top 100 products in every category, ranked by sales. The list lives at amazon.com/gp/bestsellers (the human-readable form is amazon.com/Best-Sellers/zgbs), and every category and subcategory has its own chart at a node URL such as amazon.com/Best-Sellers-Electronics/zgbs/electronics/ or a numeric browse-node id like .../zgbs/home-garden/284507/. The zgbs path segment is the stable marker for any Best Sellers page.

Each chart is a pre-ranked editorial list, not a search result. You do not pass a keyword. You pass a category, and Amazon returns positions 1 through 100 in sales order. The top product in a category with at least 100 listings earns the orange “Best Seller” badge.

Best Sellers is one of five Amazon chart families, each with the same structure and a different ranking signal:

  • Best Sellers (/gp/bestsellers) - top 100 by sales volume.
  • New Releases (/gp/new-releases) - best-selling new and upcoming products.
  • Movers & Shakers (/gp/movers-and-shakers) - biggest sales-rank gainers in the past 24 hours.
  • Most Wished For (/gp/most-wished-for) - most-added to wish lists.
  • Gift Ideas (/gp/most-gifted) - most-gifted products.

The scraping technique in this guide works on all five, because they share the same gridItemRoot card layout. Only the base URL changes.

What Is Amazon Best Sellers Rank (BSR)?

Amazon Best Sellers Rank (BSR) is a per-category sales rank that appears on the product detail page and indicates how a product is selling compared to similar products in the same category. Amazon states that BSR is calculated using sales volume data, and that “both recent sales and all-time sales factor into a BSR, though recent sales count more than older sales.”

A single product can carry more than one BSR, because it can sit in multiple categories at once. A water bottle might be #2 in “Kitchen & Dining” and #1 in “Tumblers & Water Glasses” simultaneously. When you scrape a chart you are reading the category-level rank directly (position in the list), which is the same ordering BSR produces for that node.

BSR is the field that makes Best Sellers data commercially useful. Sellers track it to gauge demand, spot rising products before they saturate, and benchmark their own catalog against the category leaders.

How Often Does the Amazon Best Sellers List Update?

The Amazon Best Sellers list updates frequently, commonly reported as hourly, based on recent sales velocity. Amazon’s own documentation describes the ranks as “updated frequently” rather than committing to a fixed interval, and third-party trackers consistently observe roughly hourly movement in both the chart positions and the BSR numbers behind them.

The practical consequence for scraping: a chart is a moving target. If you are monitoring demand, a daily snapshot misses intraday swings, and an hourly cadence matches how fast the underlying data actually changes. Cache accordingly. Pulling the same chart every few minutes wastes requests on data that has not moved.

This update frequency is also why monitoring Best Sellers at scale is a recurring job, not a one-off scrape, which is where a per-success API economy starts to matter more than a script you babysit.

How Is Scraping Best Sellers Different From Scraping Search Results?

Scraping Best Sellers differs from scraping search results in three ways: the chart is a pre-ranked list rather than a query response, it is fixed at 100 products across two pages, and it is JavaScript-rendered rather than served in the initial HTML.

  • Pre-ranked vs query-driven. A Best Sellers chart is ordered by sales, position 1 to 100, and you select it by category node. A search result is ordered by relevance to a keyword (?k=...), with variable depth and a sponsored-listing mix.
  • Fixed pagination. Best Sellers is always 100 products: page one carries 50, page two carries 50 at the pg=2 URL. Search pagination runs deeper and is capped differently.
  • Rendering. The Best Sellers grid is client-side rendered and lazy-loaded, so a raw HTML fetch returns a partial list. Amazon search result pages put their core data in the initial HTML, so requests plus a parser is usually enough there.

That third difference is the one that catches people, so it gets its own section below.

How Do You Set Up the Python Stack for Best Sellers?

You set up the Python stack with three layers: an HTTP or browser client for fetching, BeautifulSoup with lxml for parsing, and residential proxies for the IP layer. For the initial (partial) approach you only need curl_cffi and BeautifulSoup:

pip install curl_cffi beautifulsoup4 lxml

The curl_cffi library matters because it impersonates a real Chrome TLS fingerprint, which Amazon’s AWS WAF inspects in the first packet. Standard requests produces a fingerprint that gets flagged regardless of how clean the rest of your headers are. For setup details on the proxy layer, see our best proxies for Amazon scraping guide.

For the full 100-product approach you also need a headless browser, because of the lazy load. Playwright is the cleaner modern choice:

pip install playwright
playwright install chromium

How Do You Scrape the Best Sellers Page With Python?

You scrape the Best Sellers page by fetching the chart URL, then parsing each gridItemRoot card for rank, ASIN, title, price, and rating. The parser is the same whether the HTML came from a raw fetch or a rendered browser:

import re
from dataclasses import dataclass
from typing import Optional
from bs4 import BeautifulSoup

@dataclass
class BestSeller:
    rank: Optional[int]
    asin: Optional[str]
    title: Optional[str]
    price: Optional[float]
    rating: Optional[float]
    url: Optional[str]

def parse_best_sellers(html: str) -> list[BestSeller]:
    soup = BeautifulSoup(html, "lxml")
    items = []
    for card in soup.select("div#gridItemRoot, div.zg-grid-general-faceout"):
        link = card.select_one("a.a-link-normal[href*='/dp/']")
        href = link.get("href") if link else None
        asin_match = re.search(r"/dp/([A-Z0-9]{10})", href or "")
        rank_el = card.select_one("span.zg-bdg-text")
        title_el = card.select_one("div[class*='p13n-sc-css-line-clamp'], img[alt]")
        price_el = card.select_one("span[class*='p13n-sc-price'], span.a-price span.a-offscreen")
        rating_el = card.select_one("span.a-icon-alt, i.a-icon-star span")
        items.append(BestSeller(
            rank=_int(rank_el.get_text()) if rank_el else None,
            asin=asin_match.group(1) if asin_match else None,
            title=_title(title_el),
            price=_price(price_el),
            rating=_rating(rating_el),
            url=f"https://www.amazon.com{href.split('?')[0]}" if href else None,
        ))
    return items

def _int(text: str) -> Optional[int]:
    m = re.search(r"(\d[\d,]*)", text or "")
    return int(m.group(1).replace(",", "")) if m else None

def _price(el) -> Optional[float]:
    if not el:
        return None
    m = re.search(r"([0-9][0-9,]*\.?[0-9]*)", el.get_text())
    return float(m.group(1).replace(",", "")) if m else None

def _rating(el) -> Optional[float]:
    if not el:
        return None
    m = re.match(r"([0-9.]+)", el.get_text(strip=True))
    return float(m.group(1)) if m else None

def _title(el) -> Optional[str]:
    if not el:
        return None
    return el.get("alt") if el.name == "img" else el.get_text(strip=True)

The rank badge (span.zg-bdg-text) renders as “#1”, “#2”, and so on. The ASIN is the most stable field because it lives in the /dp/<ASIN>/ URL path, which has not changed in years. The title, price, and rating selectors use attribute-contains matching ([class*='p13n-sc-price']) on purpose, because Amazon’s exact class names are build-generated hashes like _cDEzb_p13n-sc-price_3mJ9Z that rotate without notice.

Why Does requests + BeautifulSoup Only Return 30 Products?

A requests plus BeautifulSoup scrape of the Best Sellers page returns only about 30 of the 50 products on a page because the grid is lazy-loaded with JavaScript. The initial HTML that arrives from a raw HTTP fetch contains the first batch of cards. The remaining cards are injected into the DOM only after the browser scrolls toward them, which a raw fetch never does.

from curl_cffi import requests

def fetch_raw(url: str, proxy: str) -> str:
    resp = requests.get(
        url,
        impersonate="chrome",
        proxies={"http": proxy, "https": proxy},
        timeout=30,
    )
    resp.raise_for_status()
    return resp.text

html = fetch_raw("https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/", PROXY)
products = parse_best_sellers(html)
print(len(products))  # ~30, not 50

The count is also non-deterministic. One run returns 30, another 38, another 24, depending on what made it into the server-rendered batch. This is not a bug in your parser. It is the lazy-load boundary. Treat any raw-fetch result under 50 per page as incomplete, not as the real chart.

This is the wall every Best Sellers tutorial hits, and it forces a decision: render the page yourself, or have something render it for you.

How Do You Get All 100 Best Sellers?

You get all 100 Best Sellers by combining two things: rendering each page with a headless browser that scrolls to trigger the lazy load, and visiting both pages (the base URL and pg=2). Playwright handles both:

from playwright.sync_api import sync_playwright

def fetch_rendered(url: str, proxy: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={"server": proxy},
        )
        page = browser.new_page()
        page.goto(url, wait_until="domcontentloaded", timeout=45000)
        # Scroll in steps so every lazy-loaded card enters the DOM.
        for y in range(0, 6000, 800):
            page.mouse.wheel(0, 800)
            page.wait_for_timeout(400)
        page.wait_for_timeout(1500)
        html = page.content()
        browser.close()
        return html

def scrape_chart(category_url: str, proxy: str) -> list[BestSeller]:
    page1 = parse_best_sellers(fetch_rendered(category_url, proxy))
    page2_url = category_url.rstrip("/") + "/ref=zg_bs_pg_2?_encoding=UTF8&pg=2"
    page2 = parse_best_sellers(fetch_rendered(page2_url, proxy))
    seen, merged = set(), []
    for item in page1 + page2:
        if item.asin and item.asin not in seen:
            seen.add(item.asin)
            merged.append(item)
    return merged

Three details make this reliable:

  • Scroll in steps, then wait. A single jump to the bottom can skip cards. Stepped scrolling with a short pause after each step lets each lazy batch render. A final wait of one to two seconds catches the last batch.
  • Hit pg=2 explicitly. The second 50 products live at the pg=2 URL. Without it you cap at 50.
  • Dedupe by ASIN. Overlap between the page-one tail and the page-two head happens occasionally. Deduping by ASIN keeps the merged list clean.

Selenium with webdriver-manager works the same way if you prefer it. The trade-off is the same either way: you now run and maintain a browser fleet, which is the cost that pushes higher-volume jobs toward an API.

How Do You Avoid Getting Blocked Scraping Best Sellers?

Five practices materially raise the success rate of a Best Sellers scraper:

  • Country-matched residential proxies. A US IP scraping amazon.de/gp/bestsellers trips geo-mismatch detection. Match the proxy country to the marketplace TLD, and use residential rather than datacenter IPs for chart pages.
  • Rotate IPs per request. Amazon’s per-IP rate limit cuts in around 30 requests per minute. Rotating per request keeps any single IP under the threshold.
  • Randomized delays. Fixed-interval requests look automated. A random 3 to 10 second gap between chart fetches mimics human pacing.
  • Realistic headers. A current Chrome User-Agent plus an Accept-Language that matches the marketplace. A python-requests User-Agent is an instant flag.
  • Cache to the update cadence. Charts move roughly hourly, so caching a chart for 30 to 60 minutes cuts your request volume and your detection risk by the same factor.

Amazon’s anti-bot stack (AWS WAF, IP rate limits, CAPTCHAs, and invisible JavaScript challenges) is covered in depth in our bypass Amazon CAPTCHA guide.

What’s the Managed API Alternative?

The managed alternative removes the two costs that make Best Sellers scraping painful at scale: running a headless browser fleet for the JavaScript render, and maintaining per-product proxy and parsing logic. The workflow splits cleanly into two stages, and an API fits the expensive second stage exactly.

Stage one is the chart itself: scrape the ranked list to get the top-100 ASINs per category. Stage two is enrichment: pull the full product detail (Buy Box price, current BSR, variations, images, A+ content) for each ASIN. Stage two is where the volume lives, because a single category is 100 products and a catalog of categories multiplies fast.

Amazon Scraper API handles stage two through its product endpoint and a batch endpoint that takes up to 1,000 ASINs in one async call. You hand it the ASINs you pulled from the chart and get back structured JSON for all of them, with the residential proxies, locale-aware parsing, and retries handled internally. Pricing is success-only at $0.90 per 1,000 requests (down to $0.50 per 1,000 on monthly plans), billing nothing for the delisted or wrong-marketplace ASINs that always show up in a large list. Every account starts with 1,000 free requests per month. Compared with the DIY path, you trade per-product proxy bandwidth plus an estimated 4 to 8 hours per month of selector maintenance for a flat per-success rate.

For a single category checked occasionally, the DIY Playwright path is fine. For monitoring many categories on the hourly cadence the charts actually move at, the API path wins on total cost of ownership.

FAQ

How many products are on an Amazon Best Sellers list?

An Amazon Best Sellers list holds the top 100 products in a category, split across two pages of 50. The base chart URL serves the first 50; the pg=2 URL serves the second 50. New Releases, Movers & Shakers, Most Wished For, and Gift Ideas charts follow the same 100-product, two-page structure.

Can you scrape Amazon Best Sellers with BeautifulSoup alone?

Not completely. BeautifulSoup parses whatever HTML you give it, but a raw requests fetch of a Best Sellers page returns only about 30 of the 50 products on a page because the grid lazy-loads with JavaScript. BeautifulSoup is the right parser, but you need a headless browser (Playwright or Selenium) or a rendering API to produce the full HTML first.

How do you get the ASIN from the Best Sellers page?

The ASIN sits in the product link’s URL path, immediately after /dp/. The regex /dp/([A-Z0-9]{10}) extracts it from any card’s a.a-link-normal href. ASINs are always 10 uppercase alphanumeric characters. The /dp/<ASIN>/ path is the most stable field on the page, unlike the hashed CSS class names that rotate.

Scraping publicly visible Best Sellers data sits in the same legal area as any public-web scraping: courts have generally held that scraping public data is not a Computer Fraud and Abuse Act violation, while Amazon’s Conditions of Use prohibit automated access. The practical line most teams follow is to scrape only public, logged-out pages and never login-gated data. See our full guide to Amazon scraping and the law.

What’s the best way to monitor Best Sellers rank over time?

Snapshot the chart on the cadence it moves at (roughly hourly) and store each pull with a timestamp, then track position changes per ASIN. Because the list is 100 products per category and updates frequently, this becomes a recurring high-volume job. A scheduled scrape feeding a per-success API for the product-level enrichment is the common production pattern.

Can you scrape Best Sellers without running a browser?

Yes, by fetching the rendered page through a scraping API that runs the JavaScript for you, then parsing the returned HTML with the same gridItemRoot logic. That removes the headless-browser fleet from your side. The alternative is to render it yourself with Playwright or Selenium, which works but adds the browser as something you operate and maintain.