Noah Bennett· ChocoData Amazon data expert · 8 min read

How to Scrape All Products From an Amazon Seller (Python)

The Answer

Scraping all products from an Amazon seller means finding the seller’s ID (a 13 to 15 character string that starts with A, shown in the “Sold by” link as amazon.com/sp?seller=<ID>), then paging the storefront at amazon.com/s?me=<SELLER_ID>, which is server-rendered so requests plus BeautifulSoup works without a browser. Amazon caps that pagination at roughly the first few hundred products, so for a fuller catalog you subdivide by category and price filters and dedupe by ASIN. To turn the ASINs you collect into full product records, Amazon Scraper API handles up to 1,000 at a time at $0.90 per 1,000 successful requests, with 1,000 free on signup.

What Is an Amazon Seller Storefront?

An Amazon seller storefront is the full catalog of products offered by a single selling account, reachable at amazon.com/s?me=<SELLER_ID>. It is rendered as a normal search results page, except the results are filtered to one merchant instead of one keyword. This is the surface to scrape when you want every product a third-party seller lists.

There are two distinct “store” surfaces on Amazon, and they are not interchangeable:

Third-party seller storefront (s?me=<SELLER_ID>) - keyed to a selling account. Server-rendered, paginated like search results, easy to parse. This is the right target for a complete product catalog.
Amazon Brand Store (amazon.com/stores/<brand>/page/<id>) - keyed to a brand registered in Amazon Brand Registry. A curated marketing microsite with JavaScript-rendered product widgets and no clean &page= pagination.

Most “scrape a seller’s products” jobs mean the first one. The storefront search endpoint is the workhorse, and the rest of this guide builds on it, with a separate section for Brand Stores at the end.

How Do You Find an Amazon Seller ID?

You find an Amazon Seller ID in the URL of the “Sold by” link on any product the seller offers. On a product page, the Buy Box shows “Sold by ” as a link. That link points to the seller’s profile page at amazon.com/sp?ie=UTF8&seller=<SELLER_ID>, and the value after seller= is the ID you need.

The Seller ID, Merchant ID, and Merchant Token are three names for the same identifier. It is an alphanumeric string, commonly 13 to 15 characters, that almost always starts with a capital A (for example A2L77EE7U53NWQ, the ID behind Amazon Basics). It is stable: a seller keeps the same ID for the life of the account, which is what makes it a reliable key for monitoring a competitor’s catalog over time.

To extract it programmatically, scrape one of the seller’s product pages and pull the seller= parameter from the “Sold by” link:

import re
from urllib.parse import urlparse, parse_qs

def seller_id_from_product(html: str) -> str | None:
    m = re.search(r'href="(/sp\?[^"]*seller=[^"]+)"', html)
    if not m:
        return None
    qs = parse_qs(urlparse(m.group(1)).query)
    return qs.get("seller", [None])[0]

Once you have the ID, you never need the product page again. Everything else runs off the storefront URL.

How Do You List All Products From a Seller?

You list all products from a seller by requesting amazon.com/s?me=<SELLER_ID> and paging through the results with the &page= parameter. The storefront page is server-rendered, so each result card (with its ASIN, title, price, and rating) is in the initial HTML, and a requests plus BeautifulSoup scrape reads it directly without a browser.

import re
from dataclasses import dataclass
from typing import Optional
from curl_cffi import requests
from bs4 import BeautifulSoup

@dataclass
class SellerProduct:
    asin: str
    title: Optional[str]
    price: Optional[float]
    rating: Optional[float]
    url: Optional[str]

def fetch_storefront_page(seller_id: str, page: int, proxy: str) -> str:
    url = f"https://www.amazon.com/s?me={seller_id}&page={page}"
    resp = requests.get(
        url,
        impersonate="chrome",
        proxies={"http": proxy, "https": proxy},
        timeout=30,
    )
    resp.raise_for_status()
    return resp.text

def parse_results(html: str) -> list[SellerProduct]:
    soup = BeautifulSoup(html, "lxml")
    out = []
    for card in soup.select("div[data-asin]"):
        asin = card.get("data-asin", "").strip()
        if not asin:
            continue
        title_el = card.select_one("h2 a span, [data-cy='title-recipe'] span")
        price_el = card.select_one("span.a-price span.a-offscreen")
        rating_el = card.select_one("span.a-icon-alt")
        out.append(SellerProduct(
            asin=asin,
            title=title_el.get_text(strip=True) if title_el else None,
            price=_price(price_el),
            rating=_rating(rating_el),
            url=f"https://www.amazon.com/dp/{asin}",
        ))
    return out

def _price(el) -> Optional[float]:
    if not el:
        return None
    m = re.search(r"([0-9][0-9,]*\.?[0-9]*)", el.get_text())
    return float(m.group(1).replace(",", "")) if m else None

def _rating(el) -> Optional[float]:
    if not el:
        return None
    m = re.match(r"([0-9.]+)", el.get_text(strip=True))
    return float(m.group(1)) if m else None

The div[data-asin] selector is the durable anchor. Every search result card carries the ASIN in a data-asin attribute, and selecting on the attribute survives Amazon’s frequent class-name churn far better than targeting the hashed layout classes. The per-page result count is variable (roughly 16 to 48 depending on layout), so do not hard-code it. Loop pages until you stop getting new ASINs.

Why Can’t You Get More Than a Few Hundred Products?

You cannot get a seller’s entire catalog from a single s?me= query because Amazon caps how deep its search and storefront pagination goes. The cap has tightened over time (older reports cite around 7 pages, more recent ones around 20), and page 21 is blocked, so one query realistically yields only the first few hundred products regardless of how many the seller actually lists. Verify the current cap on the day you run the job, because Amazon adjusts it.

The workaround is to subdivide the result set so each slice fits under the cap, then merge and dedupe:

Add a category or department filter. Append the search department (&i=<category>) or use the left-rail category facets to split the storefront into smaller, separately paginated result sets.
Add price-band filters. Slice by &low-price= and &high-price= ranges. Five or six bands usually fragment even a large catalog into cap-sized chunks.
Change the sort. Sorting by something other than “Featured” (&s=price-asc-rank, &s=review-rank) resurfaces different products near the top, catching items the default sort buries.
Dedupe by ASIN. Every slice overlaps. Collect everything into a dict keyed by ASIN so each product is counted once.

def scrape_seller_catalog(seller_id: str, proxy: str, max_pages: int = 20) -> list[SellerProduct]:
    seen: dict[str, SellerProduct] = {}
    price_bands = [(0, 25), (25, 50), (50, 100), (100, 250), (250, 100000)]
    for low, high in price_bands:
        for page in range(1, max_pages + 1):
            url_html = fetch_storefront_page_filtered(seller_id, page, low, high, proxy)
            rows = parse_results(url_html)
            new = [r for r in rows if r.asin not in seen]
            for r in new:
                seen[r.asin] = r
            if not rows or not new:
                break  # no results, or every ASIN already seen -> next band
    return list(seen.values())

This is best-effort, not guaranteed-complete. Amazon does not expose a clean “give me every product” endpoint for a seller, so the realistic goal is high coverage through smart subdivision, not a perfect dump.

How Do You Scrape an Amazon Brand Store?

You scrape an Amazon Brand Store at amazon.com/stores/<brand>/page/<id> with a headless browser, because Brand Store product grids are JavaScript-rendered widgets rather than server-rendered search results. The product tiles are not in the initial HTML, so requests plus BeautifulSoup returns an empty catalog. Playwright renders the widgets:

from playwright.sync_api import sync_playwright

def fetch_brand_store(store_url: str, proxy: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, proxy={"server": proxy})
        page = browser.new_page()
        page.goto(store_url, wait_until="networkidle", timeout=45000)
        for _ in range(8):
            page.mouse.wheel(0, 1000)
            page.wait_for_timeout(500)
        html = page.content()
        browser.close()
        return html

The Brand Store URL carries a store entity ID in UUID form (/stores/<brand>/page/<8-4-4-4-12>). Stores span multiple sub-pages rather than numbered result pages, so you traverse the in-store navigation links and parse each sub-page’s rendered tiles for the /dp/<ASIN>/ links. A third-party seller’s s?me= storefront is the cleaner target when one exists; reserve the Brand Store path for first-party or vendor brands that have no equivalent seller storefront.

How Do You Set Up the Python Stack?

You set up the Python stack with curl_cffi and BeautifulSoup for the server-rendered storefront pages, plus Playwright only if you also need Brand Stores:

pip install curl_cffi beautifulsoup4 lxml
pip install playwright && playwright install chromium   # Brand Stores only

The curl_cffi library impersonates a Chrome TLS fingerprint, which is the single biggest factor in not getting flagged on the first request. The storefront s?me= pages do not need a browser, which keeps the common case fast and cheap. For the proxy layer, residential IPs matched to the marketplace country are covered in our best proxies for Amazon scraping guide.

How Do You Avoid Getting Blocked?

Five practices raise the success rate of a seller-catalog scraper:

Country-matched residential proxies. Match the proxy country to the marketplace TLD, and use residential rather than datacenter IPs. Storefront pagination at depth is exactly the pattern Amazon’s anti-bot watches for.
Rotate IPs per request. Paging a large catalog means many requests to one host. Rotating per request keeps any single IP under Amazon’s roughly 30-requests-per-minute threshold.
Randomized delays. A random 3 to 10 second gap between page requests breaks the fixed-interval signature that flags automation.
Realistic headers. A current Chrome User-Agent and a matching Accept-Language. Header consistency matters as much as the values.
Cache aggressively. A seller’s catalog changes slowly. Caching storefront pages for an hour or more cuts request volume, cost, and detection risk together.

Amazon serves its anti-bot challenges through AWS WAF, IP rate limiting, and CAPTCHAs. The deeper treatment is in our bypass Amazon CAPTCHA guide.

What’s the Managed API Alternative?

The managed alternative removes the proxy rotation, the CAPTCHA handling, and the per-product parsing, and it is the natural fit for the enrichment half of this job. Scraping the storefront gives you a list of ASINs; turning each ASIN into a full product record (Buy Box price, variations, images, rating breakdown, availability) is the part that scales with catalog size and breaks most often when Amazon ships a layout change.

Amazon Scraper API handles that enrichment through its product endpoint and a batch endpoint that accepts up to 1,000 ASINs in a single async call. You scrape the s?me= storefront for the ASIN list, hand the list to the batch endpoint, and get structured JSON back for the whole catalog, with proxies, retries, and locale-aware parsing handled internally. Billing is success-only at $0.90 per 1,000 requests (down to $0.50 per 1,000 on monthly plans), so the delisted and out-of-marketplace ASINs that show up in any large catalog cost nothing. For keyword-driven discovery rather than a single seller, the search endpoint covers that side.

For a one-off scrape of a small seller, the DIY s?me= path is fine. For monitoring competitor catalogs on a schedule, the API path removes the maintenance that a self-run storefront scraper accumulates.

FAQ

What is the difference between a Seller ID and an ASIN?

A Seller ID identifies a selling account (the merchant), and an ASIN identifies a product. The Seller ID (13 to 15 characters, usually starting with A) keys the storefront URL s?me=<SELLER_ID>. The ASIN (10 uppercase alphanumeric characters) keys a product page dp/<ASIN>. You use the Seller ID to list a seller’s products, and the ASIN to pull the detail for each one.

Can you scrape an Amazon seller’s products without a browser?

Yes, for a third-party seller storefront. The s?me=<SELLER_ID> storefront is server-rendered, so requests (or curl_cffi) plus BeautifulSoup reads every result card from the initial HTML. You only need a headless browser like Playwright for Amazon Brand Stores at /stores/, whose product widgets are JavaScript-rendered.

How many products can you scrape from one seller?

In practice the first few hundred per query, because Amazon caps storefront and search pagination (around 20 pages recently, page 21 blocked). To approach a seller’s full catalog, subdivide the storefront with category and price-band filters so each slice fits under the cap, then dedupe by ASIN. There is no single endpoint that returns a seller’s entire catalog in one call.

How do you find an Amazon Brand Store URL?

A Brand Store URL has the form amazon.com/stores/<brand>/page/<id>, where the id is a UUID. The fastest way to find it is the “Visit the Store” link under the title on any of the brand’s product pages, or the brand link in the byline. Brand Stores are registered to brands in Amazon Brand Registry, so only brands that built one will have a /stores/ URL.

Is scraping an Amazon seller’s storefront allowed?

Scraping public, logged-out storefront pages sits in the same legal space as other public-web scraping: courts have generally treated public-data scraping as outside the Computer Fraud and Abuse Act, while Amazon’s Conditions of Use prohibit automated access. Most teams scrape only public pages and avoid anything behind a login. The full picture is in our guide to Amazon scraping and the law.