How to Scrape Amazon Search Results in 2026
Updated at
The Answer
Scraping Amazon search results means fetching amazon.com/s?k=<keyword> and parsing the result grid into structured data: organic vs sponsored positions, ASINs, titles, prices, ratings, review counts, image URLs, and Prime eligibility. The durable CSS selectors are [data-asin] for the result containers, [data-component-type="s-search-result"] for organic-only filtering, .s-pagination-next for pagination, and .puis-sponsored-label-text for sponsored-position detection. Python with curl_cffi plus residential proxies handles the request layer; BeautifulSoup parses the HTML; the typical success rate is 90 to 95% on the first attempt and 98%+ with one retry. For production workloads above a few thousand keyword scrapes per month, a managed search endpoint like Amazon Scraper API’s search returns the same structured data without the proxy + fingerprint maintenance, starting at $0.90 per 1,000 successful requests with 1,000 free on signup.
What Data Is in an Amazon Search Result?
An Amazon search result page contains, per result: an ASIN, a product title, a Buy Box price (sometimes a strikethrough price), a star rating and review count, an image URL, a sponsored or organic label, and structural metadata like the position in the grid and the page number. The full search response also carries pagination links, related search suggestions, brand filters, and category breadcrumbs at the page level.
What you can pull from a single search request:
- Per-result fields: ASIN, title, Buy Box price, list price, currency, star rating, review count, hero image URL, Prime badge, sponsored badge, badge text (“Best Seller”, “Amazon’s Choice”), seller name (when displayed), delivery estimate.
- Page-level fields: total result count, current page number, last page number, related searches, brand filter list, department breadcrumb.
The data is denser than a single product detail page because Amazon optimizes the search SERP for browsing breadth rather than depth. A typical search returns 16 to 60 results per page (varies by category), each one a compact summary card with ~12 to 15 fields rather than the ~55 fields a product detail page carries.
Why Scrape Amazon Search Results?
Five common use cases for Amazon search scraping, ordered by commercial frequency:
- Keyword rank tracking. Tracking which ASINs rank for which keywords over time, the same way SEO teams track Google rankings. Drives FBA seller decisions, ad-spend allocation, and content updates.
- Sponsored ad monitoring. Identifying which competitors are bidding on which keywords, what positions they hold, and what ad copy they run. Drives PPC strategy.
- Category trend analysis. Aggregating top-N results across category-level keywords (“running shoes”, “wireless earbuds”) to detect entrants, exits, and price shifts.
- Brand protection. Catching unauthorized resellers and counterfeits in the search results for branded keywords.
- Catalog enrichment. Building seed ASIN lists for further scraping by querying broad keywords and harvesting result ASINs.
Each use case has different sensitivity to organic-vs-sponsored split, freshness, and geography. Rank tracking needs daily or sub-daily fresh data. Sponsored monitoring needs hourly fresh data during ad campaign launches. Category analysis can tolerate weekly cadence.
How Do You Scrape Amazon Search Results With Python?
You scrape Amazon search results with Python by sending a GET request to https://www.amazon.com/s?k=<encoded-keyword> through a residential proxy, then parsing the returned HTML with BeautifulSoup using stable CSS selectors. The pattern is the same as product page scraping but with different selectors and a different result-extraction loop.
pip install curl_cffi beautifulsoup4 lxml
The request layer using curl_cffi (which replays a real Chrome TLS fingerprint to avoid Amazon’s first-packet detection):
from urllib.parse import quote_plus
from curl_cffi import requests
from bs4 import BeautifulSoup
PROXY = "http://user:[email protected]:8080"
ROBOT_MARKERS = (
"captchacharacters",
"Enter the characters you see below",
"Robot Check",
)
class AmazonBlocked(RuntimeError):
pass
def fetch_search(keyword: str, page: int = 1) -> str:
url = f"https://www.amazon.com/s?k={quote_plus(keyword)}&page={page}"
resp = requests.get(
url,
impersonate="chrome",
proxies={"http": PROXY, "https": PROXY},
timeout=30,
)
resp.raise_for_status()
if any(marker in resp.text for marker in ROBOT_MARKERS):
raise AmazonBlocked(f"Robot check on {url}")
return resp.text
The robot-check detection is non-optional. Amazon returns HTTP 200 for its CAPTCHA page, so a naive scraper that only checks the status code will happily hand parsed-empty results to downstream code without ever failing loudly. The marker check catches the three HTML signatures Amazon uses for its bot challenges.
What CSS Selectors Pull Amazon Search Result Data?
The CSS selectors that work as of April 2026, ranked by stability:
- Result container:
[data-asin](any element with adata-asinattribute is a search result card) - Organic-only result container:
[data-component-type="s-search-result"](excludes sponsored brand banners and editorial cards) - Title:
h2 a spanor.a-text-normalinside the card - Title link:
h2 a[href] - Buy Box price (whole + fraction):
.a-price[data-a-color="base"] .a-offscreen(combines whole + fraction in one accessibility-text element) - Strikethrough price:
.a-price[data-a-color="secondary"] .a-offscreen - Star rating:
.a-icon-altinside[aria-label*="out of 5"]parent - Review count:
[aria-label$="ratings"]or the matching link adjacent to the star icon - Image URL:
img.s-image(srcattribute) - Sponsored badge:
.puis-sponsored-label-textor.s-sponsored-label-info-icon(presence of either marks a sponsored result) - Position: the index of the card within the parent grid container
- Page total:
.s-pagination-item.s-pagination-disabled(last visible page number)
The accessibility-text selectors (.a-offscreen) are the most durable. Amazon refactors the visible price layout every 4 to 8 weeks, but the hidden screen-reader spans stay stable for years because their ARIA contract does not change.
How Do You Parse Search Results Into Structured Data?
The parsing function combines the selectors above into a per-result extractor and an outer page-level loop:
import re
from dataclasses import dataclass
from typing import Optional
@dataclass
class SearchResult:
asin: str
position: int
sponsored: bool
title: Optional[str]
url: Optional[str]
price: Optional[float]
list_price: Optional[float]
rating: Optional[float]
reviews_count: Optional[int]
image_url: Optional[str]
prime: bool
def parse_search_page(html: str) -> list[SearchResult]:
soup = BeautifulSoup(html, "lxml")
results = []
cards = soup.select("[data-asin]")
organic_position = 0
for card in cards:
asin = card.get("data-asin", "").strip()
if not asin:
continue
sponsored = bool(card.select_one(".puis-sponsored-label-text"))
if not sponsored:
organic_position += 1
title_el = card.select_one("h2 a span")
title = title_el.get_text(strip=True) if title_el else None
href_el = card.select_one("h2 a[href]")
url = (
f"https://www.amazon.com{href_el['href']}"
if href_el and href_el.get("href")
else None
)
price = _parse_price(card.select_one('.a-price[data-a-color="base"] .a-offscreen'))
list_price = _parse_price(card.select_one('.a-price[data-a-color="secondary"] .a-offscreen'))
rating = _parse_rating(card.select_one(".a-icon-alt"))
reviews_count = _parse_reviews(card)
img_el = card.select_one("img.s-image")
image_url = img_el.get("src") if img_el else None
prime = bool(card.select_one('[aria-label*="Prime"]'))
results.append(
SearchResult(
asin=asin,
position=organic_position if not sponsored else 0,
sponsored=sponsored,
title=title,
url=url,
price=price,
list_price=list_price,
rating=rating,
reviews_count=reviews_count,
image_url=image_url,
prime=prime,
)
)
return results
def _parse_price(el) -> Optional[float]:
if not el:
return None
text = el.get_text()
m = re.search(r"([0-9][0-9,]*\.[0-9]{2})", text)
return float(m.group(1).replace(",", "")) if m else None
def _parse_rating(el) -> Optional[float]:
if not el:
return None
m = re.match(r"([0-9.]+) out of", el.get_text())
return float(m.group(1)) if m else None
def _parse_reviews(card) -> Optional[int]:
el = card.select_one('[aria-label$="ratings"], [aria-label$="rating"]')
if not el:
return None
label = el.get("aria-label", "")
m = re.search(r"([0-9][0-9,]*)", label)
return int(m.group(1).replace(",", "")) if m else None
The organic_position counter is the field that makes this useful for rank tracking. Without separating sponsored from organic, position numbers across pages are noisy: a sponsored result at the top can push organic position 1 to actual grid position 4 or 5. The separator counter gives you the rank a buyer actually sees in the organic results.
How Do You Handle Pagination?
Amazon search results paginate through the &page=N URL parameter. Default page size is 16 to 24 cards depending on category. Maximum visible pagination is usually 7 to 20 pages, capped by Amazon at a per-keyword limit (no keyword on amazon.com returns more than ~400 unique results regardless of how the URL is constructed).
The pagination loop:
def scrape_all_pages(keyword: str, max_pages: int = 7) -> list[SearchResult]:
all_results = []
for page in range(1, max_pages + 1):
try:
html = fetch_search(keyword, page=page)
except AmazonBlocked:
time.sleep(5)
html = fetch_search(keyword, page=page)
results = parse_search_page(html)
if not results:
break
all_results.extend(results)
time.sleep(2) # respect Amazon's per-IP rate limit
return all_results
Two patterns matter:
- Empty-result early exit. If a page returns zero cards, stop. Amazon has no more results for the keyword.
- Per-page sleep. A 2-second gap between pages keeps the per-IP rate at ~30 requests per minute, well below the empirical rate-limit threshold for residential proxies.
How Do You Scrape Amazon Search Results by Zip Code?
Amazon shows different prices, availability, and Prime delivery estimates based on the buyer’s delivery zip code. Scraping by zip code requires setting Amazon’s delivery-location cookie before the search request.
The cookie is lc-main, set via Amazon’s /gp/delivery/ajax/address-change.html endpoint. The pattern is a two-step request: first POST the zip code to set the cookie on the session, then GET the search with the same session.
SESSION = requests.Session(impersonate="chrome")
def set_zipcode(zip_code: str) -> None:
SESSION.post(
"https://www.amazon.com/gp/delivery/ajax/address-change.html",
data={
"locationType": "LOCATION_INPUT",
"zipCode": zip_code,
"storeContext": "generic",
"deviceType": "web",
"pageType": "Gateway",
"actionSource": "glow",
},
proxies={"http": PROXY, "https": PROXY},
)
def fetch_search_zipped(keyword: str, zip_code: str, page: int = 1) -> str:
set_zipcode(zip_code)
return SESSION.get(
f"https://www.amazon.com/s?k={quote_plus(keyword)}&page={page}",
proxies={"http": PROXY, "https": PROXY},
timeout=30,
).text
Zip-code targeting matters most for grocery, household goods, and any category where same-day or next-day delivery affects the displayed price. For electronics and books, the zip-code effect is minimal.
What Are the Common Mistakes When Scraping Amazon Search Results?
Five mistakes that account for most scraper failures:
- Treating sponsored and organic as the same. Sponsored results inflate apparent rank for the organic positions you care about. The position counter must skip sponsored cards.
- Using brittle visual selectors.
.a-price-wholeand.a-price-fractionchange every 4 to 8 weeks. The accessibility-text selector.a-offscreenis stable. - Forgetting the robot-check guard. A 200-status response is not the same as a successful scrape. Always check the response body for the three robot-check markers before parsing.
- Ignoring pagination caps. Amazon caps results per keyword at around 400 across all pages regardless of the URL parameter. Trying to fetch page 50 returns the same page 7 results, not new ones.
- No country-matched proxy. A US residential IP scraping
amazon.detriggers Amazon’s geo-mismatch detection in milliseconds. Match the proxy country to the TLD.
What’s the Difference Between Scraping Search and Using a Search API?
Three differences matter:
- Maintenance burden. A scraper needs ongoing CSS-selector maintenance every 4 to 8 weeks. A managed search API absorbs that maintenance on the vendor side.
- Anti-bot orchestration. A scraper needs the proxy + TLS + retry orchestration. A managed API delivers a structured JSON response without exposing any of the request-rotation logic.
- Cost predictability. A scraper costs proxy bandwidth plus engineering time. A managed API is a flat per-success rate.
For ad-hoc, low-volume scraping (under a few thousand keywords per month), the scraper path is competitive on cost. Above that, the API path wins on total cost of ownership. Amazon Scraper API’s search endpoint returns the same structured data the parser above produces, plus the country-matched residential proxy and retry layers, at $0.50 to $0.90 per 1,000 successful requests.
FAQ
How many results can you scrape per Amazon search keyword?
Amazon caps total search results per keyword at approximately 400, regardless of how many pages you request. Fetching beyond the 7th to 20th page (varies by category) returns either an empty page or the last fully-populated page repeating. The cap exists at Amazon’s index level, not at the scraper level.
Can you scrape Amazon search without a proxy?
Yes for very low volumes (under 20 search requests per day from one home IP). Above that, Amazon’s per-IP rate limit kicks in and your IP gets flagged for hours. Production workflows need residential proxies or a managed API. See our proxies for Amazon scraping guide for the full provider comparison.
How do you tell sponsored results from organic in Amazon search?
The CSS selector .puis-sponsored-label-text (or its sibling .s-sponsored-label-info-icon) is present on every sponsored card and absent on every organic card. The text “Sponsored” also appears in the visible label, but the class-based selector is more reliable because it survives layout changes.
What’s the rate limit for scraping Amazon search?
Empirically, around 30 requests per minute per residential IP before Amazon’s per-IP rate limit serves a 503 or robot-check page. Per-IP-per-minute is the right granularity, not per-account; Amazon’s anti-bot is IP-aware before it is account-aware. With a rotating residential pool, total throughput scales linearly with pool size.
Can you scrape Amazon search results in real time?
Yes, but the cost-per-keyword is the limiting factor. Real-time means re-scraping the same keyword on a 1 to 5 minute cadence, which translates to 12 to 60 requests per keyword per hour. At a few hundred keywords, that is a few thousand requests per hour, easily handled by a managed API at $0.90 per 1,000 successful requests.
How do you scrape Amazon search results in countries other than the US?
Replace amazon.com in the URL with the matching country TLD: amazon.co.uk, amazon.de, amazon.co.jp, amazon.com.br, etc. Match the proxy country to the TLD. The selectors and parsing logic are identical across all 20 Amazon marketplaces. The full list of TLDs is in our features page under marketplace coverage.
What’s the best Amazon search scraper API?
For pay-per-success billing across 20 marketplaces with structured JSON output, Amazon Scraper API starts at $0.90 per 1,000 successful requests. Bright Data’s structured-data Amazon endpoint also covers search at higher pricing. ScraperAPI’s Amazon search endpoint is competitive at similar pricing. The full vendor comparison is in our best Amazon scrapers post.
Sources
- Scrape.do - Scrape Amazon Search - selector reference + sponsored detection
- Amazon - product detail page documentation - ASIN format reference
- BeautifulSoup documentation - selector syntax
- curl_cffi GitHub repository - browser TLS impersonation
- Aimultiple - 7 Best Amazon Scrapers Ranked by Performance 2026 - independent benchmarks