[Scraping]: Learn how to scrape the web with Puppeteer

scraping

09/26/2019


Puppeteer is a Node.js library maintained by Chrome's dev team from Google, also known as headless browser. It can be used to generate screenshots, PDFs of pages, automate form submission, and many more.

1. Install Node.js

If not previously installed, install Node.js

2. Install Puppeteer

BASH
$ npm install Puppeteer

3. Create js file for testing

scraper.js

JS
const puppeteer = require("puppeteer")
;(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto("https://ellismin.github.io/")
await page.screenshot({ path: "screenshot.png" })
await browser.close()
})()

4. Test the scraper

BASH
$ node scraper.js
  • Automatically created screenshot will be saved as screenshot.png in your directory

screenshot.png

example

Resizing the screen size

JS
;(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
page.setViewport({ width: 1280, height: 926 })
await page.goto("https://ellismin.github.io")
await page.screenshot({ path: "screenshot.png" })
await browser.close()
})()
JS
page.setViewport({ width: 1280, height: 926 })

You can change size of the screen with above code.

Move onto more web scraping

WRITTEN BY

Keeping a record