Ruby on Rails

Depends how you interpret things

Category Archives: Ruby

ImdbCelebrity – A new rubygem for parsing celebritydata from www.imdb.com

Last week I was working on my scrapper for getting Imdb Celebrity data for some specific purpose. Then, I searched for some available plugin or gem for the same. More surprisingly I was not able to find something what I’m looking for. Then, one idea stuck in my mind, why not create one of my own. Then I started working on this small rubygem named “imdb_celebrity”. You can see the public repository of the gem on github imdb celebrity. Also gem is hosted on GemCutter as well.

Imdb_celebrity is a ruby-gem which is used for scrapping celebrity pages from www.imdb.com . You can install imdb_celebrity as

  gem install imdb_celebrity

With current initial release we can search a celebrity with name or we can fetch content for a celebrity with given IMDB id or/and name, by specifying which parser we want to use. Right now we support Hpricot and Nokogiri as parser classes.

Usages:

require ‘imdb_celebrity’

** searching a celebrity

imdb_celebs = ImdbCelebrity::Search.new(“Brad Pitt”) imdb_celebs.celebrities

# this will return array of celebrity objects.

Also you can define the ParserClass which u wanna use [ right now you have choice of using either Nokogiri or Hpricot], as

imdb_celebs = ImdbCelebrity::Search.new(“Brad Pitt”, “NokogiriParser”)

#by default it will be HpricotParser imdb_celebs.celebrities

** Fetching data for celebrity celeb = ImdbCelebrity::Celebrity.new(“0000093”, “brad pitt”) # give 7 digit imdbid number

celeb.parser

=> it will give you the type of parser class it using

celeb.public_methods(false)

=> will give you the public methods of ImdbCelebrity::Celebrity class only

celeb.to_s => will return an array of celebrity data items containing name, real_name, biography, nationality, height, url.

Requirements:

* Hpricot gem should be installed.

* Nokogiri gem should be installed.

RailsCast’s Crawler – RubyScript

First of all thanks to RyanBates for his valuable contribution towards the rails community with those short and easy-to-understand railscast videos. From Past few months I continuously watching those videos. Yesterday while downloading one of those videos, I realize rather than downloading them when its needed, it better to write some very short & sweet script which will do that for me which giving me any trouble. After looking for some already existing rubyscripts on internet, I ended up writing my own.

require ‘rubygems’
require ‘hpricot’
require ‘open-uri’
class GetRailsCasts
def initialize
end
def start
1.upto(229){ |eps|
eps_doc = Hpricot(open(“#{@host}#{eps}”)) rescue nil
if eps_doc
# cd to the folder where u want to store the rails-cast videos
`cd /Users/sandy/railscasts; wget #{(eps_doc/”.download/a”).first[:href]}`
end
}
end
end
Comment are most welcome

Enhancing script/console

I Used Rails ./script/console a lot for debugging my rails apps. Every time when I started debugging something, I have to keep my console log[*.log] file to see what corresponding queries being generated to carried out the result I desired.

I found a better solution for that. I enhanced my script/console, which enabled on-screen query logging for my rails app. I made a small change to #{RAILS_ROOT}/script/console.rb file, which it will look to load any additional ruby file resides in #{RAILS_ROOT}/console_script/ if available.

I have also included the script I wanted to load in the first place, I find it quite handy to be able to see what kind of queries my commands are generating.

#updates /script/console.rb

LOAD_HOOK_DIRECTORY = “#{RAILS_ROOT}/console_scripts”

Find.find( LOAD_HOOK_DIRECTORY ) do |filename|
if filename =~ /\.rb$/
puts “Adding #{filename} to load-path”
libs << ” -r #{filename}”
end
end
#add sql_log.rb file to /console_script/
def log_to(stream=STDOUT, colorize=true)
ActiveRecord::Base.logger = Logger.new(stream)
ActiveRecord::Base.clear_active_connections!
ActiveRecord::Base.colorize_logging = colorize
end
log_to

Black Book Web mail import gem-add mail provider

Black book is one the easiest way to add web email importing feature in your application. Recently I am working on a Social network cum movie portal site. By default, black book provides support for gmail, yahoo, hotmail, AOL and csv. But as per my requirement I have to add 20-30 mail providers from different countries. I started with http://www.free.fr.

require “blackbook”
require “hpricot”
require ‘blackbook/importer/page_scraper’

class Free < Blackbook::Importer::PageScraper
# your code
end

How to login into the users’ account for which we are fetching contacts.


def login
page = agent.get(‘http://imp.free.fr/&#8217;)
form = page.form_with(:name => “implogin”)
form.imapuser = options[:username]
form.fields[4].value = options[:password]
page = agent.submit(form,form.buttons.first)
raise( Blackbook::BadCredentialsError, “That username and password was not accepted. Please check them and try again.” ) if page.body =~ /Username and password do not match/
end


def scrape_contacts
unless agent.cookies.find{|c| c.name == “Horde”}
raise( Blackbook::BadCredentialsError, “Must be authenticated to access contacts.” )
end

page = agent.get(‘http://imp.free.fr/horde/turba/browse.php&#8217;)
contact_rows = (page.search(“form[@name=contacts]”))
data = Hpricot(contact_rows.to_s)
data = (data/”table[1]”)/”.listitem”
data.collect do |row|
{
:name  => (row/”a”).first.inner_text,
:email => (row/”a”).last.inner_text
}
end
end

Points to keep track

  • scraping contacts differs from one mail provider to another.
  • dependencies- mechanize, faster-csv for page scraping
  • we can extend  the mail provider within the black book gem as well as with in ur app[no need to patch gem]