Ruby on Rails

Depends how you interpret things

Category Archives: Ruby Gem

ImdbCelebrity – A new rubygem for parsing celebritydata from www.imdb.com

Last week I was working on my scrapper for getting Imdb Celebrity data for some specific purpose. Then, I searched for some available plugin or gem for the same. More surprisingly I was not able to find something what I’m looking for. Then, one idea stuck in my mind, why not create one of my own. Then I started working on this small rubygem named “imdb_celebrity”. You can see the public repository of the gem on github imdb celebrity. Also gem is hosted on GemCutter as well.

Imdb_celebrity is a ruby-gem which is used for scrapping celebrity pages from www.imdb.com . You can install imdb_celebrity as

  gem install imdb_celebrity

With current initial release we can search a celebrity with name or we can fetch content for a celebrity with given IMDB id or/and name, by specifying which parser we want to use. Right now we support Hpricot and Nokogiri as parser classes.

Usages:

require ‘imdb_celebrity’

** searching a celebrity

imdb_celebs = ImdbCelebrity::Search.new(“Brad Pitt”) imdb_celebs.celebrities

# this will return array of celebrity objects.

Also you can define the ParserClass which u wanna use [ right now you have choice of using either Nokogiri or Hpricot], as

imdb_celebs = ImdbCelebrity::Search.new(“Brad Pitt”, “NokogiriParser”)

#by default it will be HpricotParser imdb_celebs.celebrities

** Fetching data for celebrity celeb = ImdbCelebrity::Celebrity.new(“0000093”, “brad pitt”) # give 7 digit imdbid number

celeb.parser

=> it will give you the type of parser class it using

celeb.public_methods(false)

=> will give you the public methods of ImdbCelebrity::Celebrity class only

celeb.to_s => will return an array of celebrity data items containing name, real_name, biography, nationality, height, url.

Requirements:

* Hpricot gem should be installed.

* Nokogiri gem should be installed.

Advertisements

Black Book Web mail import gem-add mail provider

Black book is one the easiest way to add web email importing feature in your application. Recently I am working on a Social network cum movie portal site. By default, black book provides support for gmail, yahoo, hotmail, AOL and csv. But as per my requirement I have to add 20-30 mail providers from different countries. I started with http://www.free.fr.

require “blackbook”
require “hpricot”
require ‘blackbook/importer/page_scraper’

class Free < Blackbook::Importer::PageScraper
# your code
end

How to login into the users’ account for which we are fetching contacts.


def login
page = agent.get(‘http://imp.free.fr/&#8217;)
form = page.form_with(:name => “implogin”)
form.imapuser = options[:username]
form.fields[4].value = options[:password]
page = agent.submit(form,form.buttons.first)
raise( Blackbook::BadCredentialsError, “That username and password was not accepted. Please check them and try again.” ) if page.body =~ /Username and password do not match/
end


def scrape_contacts
unless agent.cookies.find{|c| c.name == “Horde”}
raise( Blackbook::BadCredentialsError, “Must be authenticated to access contacts.” )
end

page = agent.get(‘http://imp.free.fr/horde/turba/browse.php&#8217;)
contact_rows = (page.search(“form[@name=contacts]”))
data = Hpricot(contact_rows.to_s)
data = (data/”table[1]”)/”.listitem”
data.collect do |row|
{
:name  => (row/”a”).first.inner_text,
:email => (row/”a”).last.inner_text
}
end
end

Points to keep track

  • scraping contacts differs from one mail provider to another.
  • dependencies- mechanize, faster-csv for page scraping
  • we can extend  the mail provider within the black book gem as well as with in ur app[no need to patch gem]