Archive for September, 2008
How to stip tags, script and style off the HTML
September 21st, 2008
Havn’t you just wished sometimes that all the html, script and style tags would just vanish from the html pages and all you get is pure text (for fun and profit). Well, here’s how I am managing it
require "open-uri"
require "hpricot"
require "sanitize"
html = open("http://www.google.com")
hp = Hpricot(html.read)
hp.search("script").remove
hp.search("style").remove
sanitize(hp.innerHTML, okTags="")
And output?
“GoogleWeb Images News Orkut Groups Gmail more ▼ Books Scholar Blogs YouTube Calendar Photos Documents Reader even more » iGoogle | Sign inIndia Advanced Search Preferences Language ToolsSearch: the web pages from India Google.co.in offered in: Hindi Bengali Telugu Marathi Tamil Gujarati Kannada Malayalam Punjabi Advertising Programs - About Google - Go to Google.com©2008 - Privacy”
Now you can use this text to any imaginable use - as I mentioned earlier - maybe fun & profit
Libraries - hpricot, sanitize, open-uri
Have fun!
Posted in Ruby | Comments (1)
Earthquake in Pune
September 17th, 2008
This is probably the first time in my life when I was fuly aware of the quake - and am close enough to a laptop to blog about it. As per timeanddate.com, its 3:20 AM on 17th Sept 08 morning. Lets see, what do the newspapers report tomorrow ![]()
Posted in A strong urge to blog... | Comments (3)
Hello World - all over again!
September 10th, 2008
Posted in Startups | Comments (0)
