How to effectively serialize a WWW::Mechanize object
October 8th, 2007
Last 3 days, I was almost pulling apart my hair, trying to figure out how can I serialize a WWW::Mechanize object to the database.
The problem started surfacing when I was trying to store a Mechanize instance into the session & oddly enough Rails was throwing the “We’re sorry, but something went wrong” error at me! I reported this on the RailsForum on this thread but without any luck. Infact due to my newbie’ness with ruby/rails made me just blindly post the error message on RailsForum without thinking much about the reasons.
I even posted this on Pragmatic Studio Rails mailing list where many helpful people suggested solutions and reasons. I had to dig into the problem or change the problem statement altogether!
After understanding the problem, the best way to solve the issue was to chose the later option - changing the problem statement, ie. - “How can a Mechanize object be serialized?”
Ruby’s YAML support is fantastic. Just requiring “yaml” into your script gives all your objects a functionality to export & import themselves into YAML using
yaml_text = some_object.to_yaml
some_other_object = YAML.load(yaml_text)
But as you could see in the forum post, doing a to_yaml on the Mechanize object (which has been previously used to fetch a page) returns a TypeError
TypeError (can’t dump TCPSocket):
/usr/lib/ruby/1.8/pstore.rb:349:in `dump’
Some googling and RTFM’ing would tell you that IO classes in Ruby cannot be serialized! DRAT!!! Now how do I serialize the object? Behind the scenes when storing anything in the session, serialization is required - & that was exactly why I was not able to store the Mechanize instance into the session!
But creating a new Mechanize object & then authenticating again & again would be too big a hit on my app’s performance. So what now?
Lets try to re-engineer the solution!
My project deals with remotely logging into a website & then scraping content off it as and when the user requests it. So if I can somehow maintain the Mechanize session (cookies & all) across the actions/controllers, I am good.
A quick look at the Mechanize documentation & I found that it uses CookieJar class to manage cookies. So if I can serialize this object, I will have the functionality I was looking for
Voila! I have a solution!
require “mechanize”
require “yaml”
require “pp”agent = WWW::Mechanize.new
agent.get(”http://www.google.com”)
yaml_text = agent.cookie_jar.to_yamlnew_agent = WWW::Mechanize.new
new_agent.cookie_jar = YAML.load(yaml_text)
pp new_agent.cookie_jar
And since the CookieJar can be serialized, I dont even need a database anymore! My session would work just fine by storing the CookieJar directly into it - per user, per session!
Wheeehaa!
Posted in Mechanize, Rails | Comments (0)