Implementing Your Own Caching Layer
I recently had to deal with performance problems in a very large application with a considerable number of SQL queries (i.e. object.find_by_sql or object.paginate_by_sql). And while we can argue whether or not using sql directly in an ActiveRecord context is good, some of these were complex enough (think sum operations, etc) that I didn't want to go and rewrite them as ActiveRecord. And, given a table that is being changed constantly by a crawler, the MySQL query cache wasn't an option*.
So I started things off, as I do so often, by talking to a buddy and discussing the issues. Oddly, he argued against using the built-in Rails caching tools and for doing it myself. Now this is unusual to say the least. Normally he always argues for the built-in frameworks but he and I have had issues in the past around caching and, in particular, cache expiration. So after that discussion, I came to a conceptual approach of this:
- Use an ActiveRecord model to start the data
- Use created_at as a tool to manage the cache expiration
- Serialize the data after fetch to store it away
- Write a get_latest method inside the model to test whether or not to fetch the data from the cache or the source
The first real problem came from needing to deal with not just straight ActiveRecord (AR) objects but will_paginate collections that wrap around the AR objects. Here's something brilliant about ActiveRecord, irrelevant for me, but brilliant:
serialize :data
If you put that at the top of your AR model file then that element of the model will be automatically serialized IN and OUT of the database. Outstanding -- but it kept failing for me. Why? Because I had will_paginate collections over the top of the AR objects. Oy. So now that that wasn't working, at all, I turned to Google and I did some research via a great Skorks article. Apparently you can serialize in Rails, automatically, via YAML or by using the marshal command. The benefit to using marshal is its binary which means its smokingly fast. Or at least as fast as anything in Rails is.
So I tried wrapping my data element like this:
Marshal.dump(res)
to store it (res was the result of the database operation).
and
Marshal.load(cache_result.data)
And no matter what I did, it just plain failed. So the normal walk away from the computer and ponder deeply while I wander the halls of my home looking contemplatively** around while I cogitate made me realize this: IT IS BINARY BUTTHEAD!
MySQL doesn't store binary data by default so this would require a migration change and a db:migrate:redo. So a quick dash back to the migration and I ended up with this:
class CreateQueryResultsCaches < ActiveRecord::Migration
def self.up
create_table :query_results_caches do |t|
t.string :q_hash
t.text :q
t.column :data, :binary, :limit => 10.megabyte
t.timestamps
end
add_index :query_results_caches, :q_hash
end
def self.down
remove_index :query_results_caches, :q_hash
drop_table :query_results_caches
end
end
Useful reference on creating blobs via migrations.
And that actually worked! If you notice the q_hash column, you may be wondering what that is. Given that my queries are long, its faster to hash the query and then use that hash for the lookup instead of trying to look up on a query that's 500 bytes or longer. Now there's only a few more bits to share.
The routine which evaluates the cache result:
def self.get_latest(q_hash)
latest = self.find(:first, :conditions => {:q_hash => q_hash}, :order => "created_at DESC")
# if within last 10 minutes then return else run the real query and store results
if latest && latest.created_at.between?(20.minutes.ago,Time.now)
return latest
else
nil
end
end
The two methods on the QueryCacheResult object for fetching from the cache and/or populating the cache with and without pagination:
def self.cache_it_or_create_it_by_sql_with_pagination(obj,q,page)
if page
q_hash = Digest::SHA1.hexdigest(q + page).to_s
else
q_hash = Digest::SHA1.hexdigest(q).to_s
end
cache_result = self.get_latest(q_hash)
if cache_result.nil?
res = obj.paginate_by_sql(q, :page => page, :per_page => 40)
QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
else
res = Marshal.load(cache_result.data)
end
res
end
And...
def self.cache_it_or_create_it_by_sql(obj,q)
q_hash = Digest::SHA1.hexdigest(q).to_s
cache_result = self.get_latest(q_hash)
if cache_result.nil?
res = obj.find_by_sql(q)
QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
else
res = Marshal.load(cache_result.data)
end
res
end
As a final note, here's an example how this is used from a controller:
@apps = QueryResultsCache.cache_it_or_create_it_by_sql_with_pagination(App,q,params[:page])
Clearly there's more that can be done here but when you find that the built-in Rails caching mechanisms aren't working for you --or-- you feel that stepping out of the framework will teach you something, implementing your own caching approach isn't all that difficult. Learn to use one of the serialization tools and you're off to the races!
*As an aside, I'd point out that the MySQL query cache just ain't all that great but that's another story for another day.
**Ok I went to the can.
