[Scraping]: Learn how to scrape the web with Puppeteer
scraping
09/26/2019
Puppeteer is a Node.js library maintained by Chrome's dev team from Google, also known as headless browser. It can be used to generate screenshots, PDFs of pages, automate form submission, and many more.
1. Install Node.js
If not previously installed, install Node.js
2. Install Puppeteer
BASH
$ npm install Puppeteer
3. Create js file for testing
scraper.js
JS
const puppeteer = require("puppeteer")
;(async () => { const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto("https://ellismin.github.io/") await page.screenshot({ path: "screenshot.png" })
await browser.close()})()
- Code snippet found from puppeteer
4. Test the scraper
BASH
$ node scraper.js
- Automatically created screenshot will be saved as screenshot.png in your directory
screenshot.png
Resizing the screen size
JS
;(async () => { const browser = await puppeteer.launch() const page = await browser.newPage() page.setViewport({ width: 1280, height: 926 }) await page.goto("https://ellismin.github.io") await page.screenshot({ path: "screenshot.png" })
await browser.close()})()
JS
page.setViewport({ width: 1280, height: 926 })
You can change size of the screen with above code.
Move onto more web scraping