Amazon Scraper API

How to Scrape Amazon Prices With Python in 2026

Updated at

The Answer

Scraping Amazon prices with Python means pulling five related numbers off a product page: the current Buy Box price, the “was” price (list price), any Subscribe and Save price, per-variant prices, and the price of any other-sellers offers. BeautifulSoup plus requests handles the HTML parsing, the durable CSS selectors are .a-price-whole / .a-price-fraction and .a-offscreen, and the real cost of a DIY scraper shows up in rate limits and robot checks rather than parsing. For a price tracker running a handful of ASINs, the Python script below works. For a repricer or a monitoring product shipping to customers, the Amazon Scraper API is the faster path, starting at $0.90 per 1,000 successful requests on pay-as-you-go (as low as $0.50 per 1,000 on Custom plans).

Why Are Amazon Prices Hard to Scrape?

Amazon prices are hard to scrape because they change frequently, they vary by user context, and the Buy Box (the “add to cart” price) rotates between sellers independently of the other offers visible on the page. Amazon reprices millions of products every day, and according to industry monitoring data, 37 percent of active Amazon monitors check at least hourly and 12 percent run at five-minute intervals. Electronics and trending categories can move dozens of times per day. Books and household staples move slowly.

Three properties of the price make it different from other product fields:

  • Buy Box rotation - the price shown in the big yellow button depends on which seller currently holds the Buy Box. That can change multiple times per hour on a competitive listing, which is why the seller name and the price have to be scraped together.
  • Per-user personalization - Amazon shows different prices to different users based on cookies, location, and Prime membership. A clean unauthenticated scraper gets the “guest” price, which is usually but not always the same as the logged-out retail price.
  • Multiple prices on one page - a typical product detail page has the Buy Box price, a struck-through list price, a Subscribe and Save price, a per-variant price grid, and sometimes an other-sellers price ladder. A thorough scraper pulls them all so downstream analytics can see the discount.

All of this sits behind Amazon’s standard anti-bot layer, so a price scraper has the same rate-limit and robot-check problems as any other Amazon scraping project.

Where Does the Price Live in the Amazon HTML?

The price lives in a span tree anchored by class .a-price, with the dollar portion in .a-price-whole and the cents portion in .a-price-fraction. There is also an accessibility span with class .a-offscreen that contains the full formatted price as a single string (for screen readers), and that span is the most reliable fallback when the split structure changes.

In raw HTML the block looks like this:

<span class="a-price" data-a-size="xl" data-a-color="price">
 <span class="a-offscreen">$19.99</span>
 <span aria-hidden="true">
 <span class="a-price-symbol">$</span>
 <span class="a-price-whole">19<span class="a-price-decimal">.</span></span>
 <span class="a-price-fraction">99</span>
 </span>
</span>

Two selectors cover almost every product page:

  • Primary: .a-price.a-price-whole and .a-price.a-price-fraction
  • Fallback: .a-price.a-offscreen for the full formatted string

The offscreen span is more resilient because it is used by Amazon’s accessibility layer, which changes less often than the visual layout. When Amazon renames or restructures the visible price widget (which happens every few weeks on individual categories), the offscreen span tends to survive.

What Do You Need to Scrape Amazon Prices in Python?

You need Python 3.9 or newer, the requests and beautifulsoup4 libraries, and a realistic browser User-Agent. For anything beyond a few requests per hour, you also need a residential proxy because Amazon rate-limits and fingerprints aggressively.

pip install requests beautifulsoup4 lxml

Set the User-Agent to a current Safari on macOS or Chrome on Windows string. Python’s default python-requests/2.x header gets blocked within the first request.

USER_AGENT = (
 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
 "AppleWebKit/605.1.15 (KHTML, like Gecko) "
 "Version/17.6 Safari/605.1.15"
)

If you plan to run the scraper on a schedule (a price tracker, a repricer, or a monitoring job), add a cron or APScheduler layer around it and pick a check frequency that matches your category’s price volatility. Hourly is enough for books. 5 to 15 minutes is the sensible cap for electronics and trending categories during sales events.

How Do You Fetch and Parse the Current Price?

You fetch the product page with a single GET request and extract the price with two selectors: the split .a-price-whole / .a-price-fraction pair, with .a-offscreen as a fallback.

import re
from dataclasses import dataclass
from typing import Optional

import requests
from bs4 import BeautifulSoup

HEADERS = {
 "User-Agent": USER_AGENT,
 "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
 "Accept-Language": "en-US,en;q=0.9",
 "Accept-Encoding": "gzip, deflate, br",
}

ROBOT_MARKERS = (
 "captchacharacters",
 "Enter the characters you see below",
 "To discuss automated access",
 "Robot Check",
)

class AmazonBlocked(RuntimeError):
 pass

@dataclass
class PriceSnapshot:
 asin: str
 current_price: Optional[float]
 currency: Optional[str]
 list_price: Optional[float]
 savings_pct: Optional[int]
 buybox_seller: Optional[str]
 available: bool

CURRENCY_MAP = {"$": "USD", "£": "GBP", "€": "EUR", "¥": "JPY", "₹": "INR"}

def fetch_page(asin: str, domain: str = "com", proxy_url: Optional[str] = None) -> str:
 url = f"https://www.amazon.{domain}/dp/{asin}"
 proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None
 resp = requests.get(url, headers=HEADERS, proxies=proxies, timeout=30)
 resp.raise_for_status()
 if any(m in resp.text for m in ROBOT_MARKERS):
 raise AmazonBlocked(f"Robot check on {url}")
 return resp.text

def _parse_price_from_split(soup: BeautifulSoup) -> tuple[Optional[float], Optional[str]]:
 whole = soup.select_one(".a-price.a-price-whole")
 frac = soup.select_one(".a-price.a-price-fraction")
 sym = soup.select_one(".a-price.a-price-symbol")
 if not whole:
 return None, None
 w = re.sub(r"[^0-9]", "", whole.get_text())
 f = re.sub(r"[^0-9]", "", frac.get_text()) if frac else "00"
 try:
 price = float(f"{w or '0'}.{f or '00'}")
 except ValueError:
 return None, None
 currency = CURRENCY_MAP.get(sym.get_text(strip=True)) if sym else None
 return price, currency

def _parse_price_from_offscreen(soup: BeautifulSoup) -> tuple[Optional[float], Optional[str]]:
 el = soup.select_one("#corePrice_feature_div.a-offscreen,.a-price.a-offscreen")
 if not el:
 return None, None
 text = el.get_text()
 m = re.search(r"([0-9][0-9,]*(?:\.[0-9]+)?)", text)
 if not m:
 return None, None
 try:
 price = float(m.group(1).replace(",", ""))
 except ValueError:
 return None, None
 currency = None
 for sym, code in CURRENCY_MAP.items():
 if sym in text:
 currency = code
 break
 return price, currency

The split parser runs first because it gives you a clean numeric price with no locale-formatting ambiguity. The offscreen parser is the fallback that handles every layout Amazon throws at it, including the cases where the split structure is absent (third-party seller listings, certain refurbished products, some international marketplaces).

How Do You Extract the List Price and Savings?

You extract the list price (the struck-through “was” price) from the same .a-price tree but scoped to a different container, typically #corePriceDisplay_desktop_feature_div.a-text-price or .basisPrice.a-text-price. Amazon marks the list price with .a-text-price or a data-a-strike="true" attribute depending on the template.

def _parse_list_price(soup: BeautifulSoup) -> Optional[float]:
 candidates = [
 "#corePriceDisplay_desktop_feature_div.a-text-price.a-offscreen",
 ".basisPrice.a-offscreen",
 "span[data-a-strike='true'].a-offscreen",
 ]
 for sel in candidates:
 el = soup.select_one(sel)
 if el:
 m = re.search(r"([0-9][0-9,]*(?:\.[0-9]+)?)", el.get_text())
 if m:
 try:
 return float(m.group(1).replace(",", ""))
 except ValueError:
 continue
 return None

def _savings_pct(current: Optional[float], list_price: Optional[float]) -> Optional[int]:
 if not current or not list_price or list_price <= 0:
 return None
 pct = round((1 - current / list_price) * 100)
 return pct if pct > 0 else None

Savings is a derived field rather than a scraped one. Amazon sometimes prints “You save $X (Y%)” on the page, but the percentage rounding is inconsistent across marketplaces. Computing it yourself from current / list_price is the reliable approach.

How Do You Identify the Buy Box Seller?

You identify the Buy Box seller by reading the #merchant-info or #sellerProfileTriggerId elements, which hold the name of whichever seller currently has the Buy Box. The price and the seller move together, so a scraper that records just the price without the seller is missing the context an analytics pipeline needs.

def _parse_buybox_seller(soup: BeautifulSoup) -> Optional[str]:
 el = soup.select_one("#sellerProfileTriggerId")
 if el and el.get_text(strip=True):
 return el.get_text(strip=True)
 merchant = soup.select_one("#merchant-info")
 if merchant:
 text = merchant.get_text(" ", strip=True)
 m = re.search(r"(?:Sold by|Ships from and sold by)\s+(.+?)(?:\s+and Fulfilled by|\.|$)", text)
 if m:
 return m.group(1).strip()
 return None

def _parse_availability(soup: BeautifulSoup) -> bool:
 el = soup.select_one("#availability")
 if not el:
 return True
 text = el.get_text(" ", strip=True).lower()
 return not any(k in text for k in ("unavailable", "out of stock", "currently unavailable"))

Availability matters for pricing because an out-of-stock listing frequently shows a stale price that no seller is actually honoring. Tagging snapshots as “available: false” lets downstream analytics filter those out of competitive benchmarks.

How Do You Put the Price Scraper Together?

You put the scraper together by chaining the fetch and parse functions, returning a PriceSnapshot dataclass, and adding a simple main entry point for command-line use.

def scrape_price(asin: str, domain: str = "com", proxy_url: Optional[str] = None) -> PriceSnapshot:
 html = fetch_page(asin, domain=domain, proxy_url=proxy_url)
 soup = BeautifulSoup(html, "lxml")

 current, currency = _parse_price_from_split(soup)
 if current is None:
 current, currency = _parse_price_from_offscreen(soup)
 list_price = _parse_list_price(soup)

 return PriceSnapshot(
 asin=asin,
 current_price=current,
 currency=currency,
 list_price=list_price,
 savings_pct=_savings_pct(current, list_price),
 buybox_seller=_parse_buybox_seller(soup),
 available=_parse_availability(soup),
 )

Running scrape_price("B09HN3Q81F") returns something like:

{
 "asin": "B09HN3Q81F",
 "current_price": 189.99,
 "currency": "USD",
 "list_price": 249.00,
 "savings_pct": 24,
 "buybox_seller": "Amazon.com",
 "available": true
}

That JSON is enough to feed a price history database. Snapshot every N minutes, diff against the previous row, and emit an alert whenever current_price changes by more than a threshold. Tools like Keepa and camelcamelcamel run exactly this pattern at massive scale, with their own curated data stores on top.

How Do You Build a Simple Amazon Price Tracker?

You build a price tracker by calling scrape_price on a schedule, writing results to a database or CSV, and comparing each new snapshot against the prior value. The core loop is four lines of code. The hard parts are schedule spacing, rate-limit handling, and alert thresholds.

import csv
import time
from datetime import datetime, timezone

WATCHLIST = ["B09HN3Q81F", "B000ALVUM6", "B08N5WRWNW"]
CHECK_INTERVAL_SEC = 3600 # hourly

def track_prices(out_path: str = "prices.csv") -> None:
 with open(out_path, "a", newline="") as f:
 writer = csv.writer(f)
 while True:
 for asin in WATCHLIST:
 try:
 snap = scrape_price(asin)
 except AmazonBlocked as e:
 print(f"blocked: {asin}: {e}")
 continue
 writer.writerow([
 datetime.now(timezone.utc).isoformat(),
 snap.asin,
 snap.current_price,
 snap.list_price,
 snap.buybox_seller,
 snap.available,
 ])
 f.flush()
 time.sleep(5) # be nice between requests
 time.sleep(CHECK_INTERVAL_SEC)

if __name__ == "__main__":
 track_prices()

Two rules keep this from triggering Amazon’s anti-bot layer on a single-IP run:

  • Keep the inter-request sleep at 3 to 10 seconds between ASINs in the same check cycle.
  • Keep the check interval at 1 hour minimum for a watchlist of dozens of ASINs. Faster cadences need a proxy pool.

For categories where prices move every 5 to 10 minutes (Prime Day pricing, electronics during Black Friday), the DIY version becomes impractical. Rotating residential proxies and concurrent workers turn a small price tracker into real infrastructure in a hurry.

Why Does the Scraper Break at Scale?

The scraper breaks at scale for three concrete reasons: robot-check pages start appearing stochastically after a few hundred requests from one IP, Amazon’s per-variant price rendering changes by category in ways selectors alone cannot solve, and the Buy Box rotates fast enough that a slow scraper captures stale data.

Specific failure modes in order of how often they hit you:

  • Robot-check HTML returned with HTTP 200. The ROBOT_MARKERS tuple catches most variants but Amazon rotates the wording. You will hit a new marker every few months.
  • Split price structure missing on third-party listings. Some seller-offered listings only render .a-offscreen, which is why the fallback is non-optional.
  • Variant grid pricing. Products with multiple sizes or colors render per-variant prices in a grid (#twisterContainer or #variation_... blocks). Extracting the grid correctly requires parsing the JavaScript blob Amazon embeds inline, which is not something a pure BeautifulSoup scraper can do cleanly.
  • TLS fingerprinting. Python requests uses OpenSSL defaults that match neither Chrome nor Safari. Amazon’s edge fingerprints the TLS handshake and blocks mismatches, which is why curl_cffi exists.
  • Per-user price personalization. A scraper without cookies sees the “clean” guest price, which is usually what you want for competitive intelligence but not what a logged-in Prime member would see in their own cart.

Any one of these is fixable. All five together turn a weekend scraper into a steady 10 to 20 percent time sink on maintenance.

When Should You Use a Managed Amazon Price API?

You should use a managed Amazon price API when your volume exceeds 500 ASINs per day with uptime guarantees, when you need variants and other-seller offers in the same response, or when you are shipping a repricer or analytics product to paying customers. The break-even with a DIY scraper is labor cost rather than proxy cost.

The Amazon Scraper API returns price.current, price.was, price.currency, price.savings_pct, buybox.seller, buybox.prime, variants, and the other-sellers offer ladder on a single product-endpoint call. Pricing starts at $0.90 per 1,000 successful requests on pay-as-you-go (as low as $0.50 per 1,000 on Custom plans), non-2xx responses are free, and signup includes 1,000 free requests with no card. The async batch endpoint accepts up to 1,000 ASINs per POST and posts the results to a webhook when the batch finishes. Median latency on the provider’s own benchmarks is around 2.6 seconds per product.

A quick decision framework:

  • Under 50 ASINs, hourly checks - The Python scraper above runs fine from a single residential IP.
  • 50 to 500 ASINs, 5 to 15 minute checks - DIY with a rotating proxy pool. Budget 1 day of eng time per month on selector maintenance.
  • 500+ ASINs or any customer-facing product - Use the managed API. At $50 per month for 100,000 requests, it is cheaper than the proxy bill plus the eng hours, and it handles variants and other-seller offers out of the box.

Scraping publicly displayed Amazon prices is generally legal in the United States under the 2022 hiQ Labs v. LinkedIn ruling, which held that scraping data visible without authentication is not a Computer Fraud and Abuse Act violation. Prices on Amazon product pages are visible without login, so they fall under that precedent.

Amazon’s Terms of Service prohibit automated access by anyone who is logged in. That part matters for sellers who run scrapers under their own Seller Central credentials, because Amazon can and does suspend seller accounts that are detected doing authenticated scraping. The safe posture is to scrape only unauthenticated traffic and never log in through a scraper. Respect robots.txt, cache responses where you can, and do not hammer a single ASIN every 30 seconds when hourly is enough for your use case.

FAQ

How often does Amazon change prices?

Amazon reprices millions of products every day, with some electronics and trending items moving dozens of times per day. Books and household staples are more stable, sometimes unchanged for weeks. Industry monitoring data shows around 37 percent of active Amazon monitors check at least hourly.

Can I scrape Amazon prices from amazon.de or amazon.co.uk with the same code?

The same code works on international marketplaces after two tweaks: pass domain="de" or domain="co.uk" to fetch_page, and handle the comma decimal separator in European pricing. The .a-offscreen fallback already captures the full formatted price, so the numeric parsing needs only a minor regex swap for , versus ..

Does the scraper capture Subscribe and Save prices?

The core scraper captures only the main Buy Box price. Subscribe and Save prices render in a separate #oneTimePurchase_feature_div or #snsAccordionRowMiddle block and need dedicated selectors. Most managed APIs return both in one response.

How do I track price history over time?

Write every snapshot to a timestamped row in a database or append-only CSV, then compute diffs between consecutive rows per ASIN. Keepa and camelcamelcamel both run this exact pattern at scale and expose it as a consumer product.

Why do I see a different price than the scraper?

You see a different price than the scraper usually because Amazon personalizes prices by location, Prime membership, cookies, and session history. A clean unauthenticated scraper gets the guest price. A logged-in Prime scraper would see Prime-only deals that the guest scraper does not.

Sources