Musings of a Fondue

Web Scraping With Nokogiri

This is a small script I originally wrote on October-ish of 2013.

It’s based on this excellent 10 minute tutorial by Paul Bates (RailsCasts).

The script visits a website and returns (‘scrapes’) information without you having to manually open, search, copy, then paste the information you desire. This makes it invaluable for batch executions or tasks you want automated.

I recall being ecstatic at the time - or more accurately, my mind EXPLODING into several pieces!

Though I didn’t take any screenshots, here are some with the same script but run today (May 12, 2015),

Target website
image
Scraped content
image


#!/usr/bin/env ruby

#To run, navigate to folder containing file. In cmd line, 'ruby [filename]'
require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "http://www.tested.com"
doc = Nokogiri::HTML(open(url))

puts doc.css("title").text        #returns title of page (Tested-Tech)

doc.css("article").each do |article|
    title = article.css(".title").text
    author = article.css(".author").text
    puts "#{title} - #{author}"
end

Be sure to checkout the awesomeness that is the RailsCasts tutorials by Paul Bates!

Comments