This is a Node.js + Express-based microservice that uses Puppeteer to return the fully rendered HTML of any dynamic webpage. It's designed to work with JavaScript-heavy websites.
- Accepts a public URL via query parameter (
?url=
) - Launches a real headless browser using Puppeteer
- Waits for JavaScript-based content to load
- Returns the full rendered HTML of the page
Before using this microservice, make sure you have the following installed on your system:
- Node.js (https://nodejs.org)
- Git (https://git-scm.com)
To install project dependencies, open your terminal in the project folder and run:
npm install
This will install the following NPM packages:
express
: A web framework for handling HTTP requestspuppeteer
: A library to control a headless browser
These are listed in package.json
.
Follow these steps to set up the project:
- Clone this repository:
git clone https://github.com/Sohaibgit/puppeteer-microservice.git
cd puppeteer-microservice
- Install all dependencies:
npm install
- Make sure your
package.json
includes a start script like this:
"scripts": {
"start": "node index.js"
}
To run the project locally:
node index.js
Once the server starts, open your browser or Postman and visit:
http://localhost:3000/scrape?url=https://example.com
This will return the fully rendered HTML of the provided URL.
Required Query Parameter:
url
: The full URL of the page you want to scrape
Example:
/scrape?url=https://example.com
Response:
The API will return the complete HTML source of the page after dynamic content is loaded.
This project is deployable to cloud platforms like:
- Railway (https://railway.app)
- Render (https://render.com)
- Fly.io (https://fly.io)
- Any VPS or Node.js-compatible server
When deploying to hosting platforms, use the following Puppeteer config:
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
This ensures compatibility with container-based platforms.
This tool is intended for educational use only. Scraping websites may violate their terms of service. You are responsible for ensuring you comply with legal and ethical use.
Do not use this service to scrape websites without reviewing their policies.
Sohaib Khan
GitHub: https://github.com/Sohaibgit