Archive for September, 2008

How to stip tags, script and style off the HTML

September 21st, 2008

Havn’t you just wished sometimes that all the html, script and style tags would just vanish from the html pages and all you get is pure text (for fun and profit). Well, here’s how I am managing it :)

require "open-uri"
require "hpricot"
require "sanitize"

html = open("http://www.google.com")
hp = Hpricot(html.read)
hp.search("script").remove
hp.search("style").remove
sanitize(hp.innerHTML, okTags="")

And output?

“GoogleWeb Images News Orkut Groups Gmail more ▼ Books Scholar Blogs YouTube Calendar Photos Documents Reader even more » iGoogle | Sign inIndia   Advanced Search  Preferences  Language ToolsSearch: the web pages from India Google.co.in offered in: Hindi Bengali Telugu Marathi Tamil Gujarati Kannada Malayalam Punjabi Advertising Programs - About Google - Go to Google.com©2008 - Privacy”

Now you can use this text to any imaginable use - as I mentioned earlier - maybe fun & profit :)

Libraries - hpricot, sanitize, open-uri

Have fun!

Posted in Ruby | Comments (1)

Earthquake in Pune

September 17th, 2008

This is probably the first time in my life when I was fuly aware of the quake - and am close enough to a laptop to blog about it. As per timeanddate.com, its 3:20 AM on 17th Sept 08 morning. Lets see, what do the newspapers report tomorrow :)

Posted in A strong urge to blog... | Comments (3)

Hello World - all over again!

September 10th, 2008

Hello World FF Extensions
Saying hello to FF extensions :)

Posted in Startups | Comments (0)