5 November 2024·10 min read

Building a Price Tracker with Next.js and Web Scraping

How I built MyCart — a tool that monitors prices across 8 e-commerce platforms — and the gotchas that come with scraping JavaScript-rendered pages.

Next.jsWeb ScrapingNode.jsSQLite

If you've ever watched a product price fluctuate by hundreds of rupees over a few days, you know the frustration. MyCart started as a personal itch — I wanted to buy a pair of shoes but wasn't sure if the price was good. Six months later it was tracking 400+ products across 8 platforms.

The Architecture

The stack is intentionally boring: Next.js for the frontend and API routes, SQLite for persistence, and Node.js child processes for scraping. Boring stacks ship. Exciting stacks become architecture blog posts that never turn into products.

Data flow

User adds a product URL via the UI
API route validates the URL and queues a scrape job
Scraper runs on a cron (every 6 hours) and writes price to SQLite
Chart renders historical data from the DB
Email alert fires if current price < lowest recorded price

The Scraping Problem

Static HTML scraping (cheerio + fetch) works fine for Flipkart product pages. But platforms like Myntra and AJIO render prices client-side with JavaScript. For those, I had to use Puppeteer — a headless Chrome browser that actually runs the JavaScript.

Puppeteer feels like cheating. You're literally using a real browser, just without a screen.

const puppeteer = require('puppeteer');

async function scrapePrice(url) {
  const browser = await puppeteer.launch({ headless: 'new' });
  const page    = await browser.newPage();

  await page.setRequestInterception(true);
  page.on('request', req => {
    if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
      req.abort();
    } else {
      req.continue();
    }
  });

  await page.goto(url, { waitUntil: 'networkidle2' });
  const price = await page.$eval('.pdp-price', el => el.textContent);
  await browser.close();
  return price;
}

Anti-Bot Measures

E-commerce sites don't want to be scraped. Rate limiting, IP bans, and bot detection are real. The mitigations I used: random delays between requests (500–3000ms), rotating user-agents, and respecting robots.txt for the most aggressively protected sites.

SQLite Was the Right Call

I nearly reached for PostgreSQL out of habit. But this runs locally, data fits in a single file, and SQLite handles concurrent reads perfectly fine for a personal tool. The entire database is one .db file you can copy and back up.

What I'd Do Differently

Use a proper job queue (BullMQ) instead of cron for the scraping pipeline
Add a browser extension so users can add products with one click
Cache scrape results to avoid redundant runs when multiple users track the same URL

The code is on GitHub if you want to run it yourself or contribute improvements.

Jatin Dahiya

Systems Engineer · Software Developer

Back to all posts

14 July 2024 · 5 min read