How to effectively serialize a WWW::Mechanize object

October 8th, 2007

Last 3 days, I was almost pulling apart my hair, trying to figure out how can I serialize a WWW::Mechanize object to the database.

The problem started surfacing when I was trying to store a Mechanize instance into the session & oddly enough Rails was throwing the “We’re sorry, but something went wrong” error at me! I reported this on the RailsForum on this thread but without any luck. Infact due to my newbie’ness with ruby/rails made me just blindly post the error message on RailsForum without thinking much about the reasons.

I even posted this on Pragmatic Studio Rails mailing list where many helpful people suggested solutions and reasons. I had to dig into the problem or change the problem statement altogether!

After understanding the problem, the best way to solve the issue was to chose the later option - changing the problem statement, ie. - “How can a Mechanize object be serialized?”

Ruby’s YAML support is fantastic. Just requiring “yaml” into your script gives all your objects a functionality to export & import themselves into YAML using

yaml_text = some_object.to_yaml
some_other_object = YAML.load(yaml_text)

But as you could see in the forum post, doing a to_yaml on the Mechanize object (which has been previously used to fetch a page) returns a TypeError

TypeError (can’t dump TCPSocket):
/usr/lib/ruby/1.8/pstore.rb:349:in `dump’

Some googling and RTFM’ing would tell you that IO classes in Ruby cannot be serialized! DRAT!!! Now how do I serialize the object? Behind the scenes when storing anything in the session, serialization is required - & that was exactly why I was not able to store the Mechanize instance into the session!

But creating a new Mechanize object & then authenticating again & again would be too big a hit on my app’s performance. So what now?

Lets try to re-engineer the solution!
My project deals with remotely logging into a website & then scraping content off it as and when the user requests it. So if I can somehow maintain the Mechanize session (cookies & all) across the actions/controllers, I am good.

A quick look at the Mechanize documentation & I found that it uses CookieJar class to manage cookies. So if I can serialize this object, I will have the functionality I was looking for :)

Voila! I have a solution!

require “mechanize”
require “yaml”
require “pp”

agent = WWW::Mechanize.new
agent.get(”http://www.google.com”)
yaml_text = agent.cookie_jar.to_yaml

new_agent = WWW::Mechanize.new
new_agent.cookie_jar = YAML.load(yaml_text)
pp new_agent.cookie_jar

And since the CookieJar can be serialized, I dont even need a database anymore! My session would work just fine by storing the CookieJar directly into it - per user, per session!

Wheeehaa!

Posted in Mechanize, Rails | Comments (0)

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-Spam Image