Get a quote!

Blog

In Praise of Rails Machine

Written by Scott Johnson on

It isn't often that you can pull a near 48 hour day dealing with deploying a new code base into production and come out of it with not only a smile but a blog post praising your hosting company. I just finished such an endeavor (ordeal??) and it was the typical sort of experience where you need to do constant server tuning, rebooting to get around load issues, issues with crawlers flooding your site and thus fine grained ActiveRecord optimizations, etc.

Now the site in question is hosted at Rails Machine and, to paraphrase, I come not to bury them but to praise them. During the course of this project, Rails Machine has been:

  • Nothing but professional
  • They turned me onto their new Moonshine project which makes server tuning magnificently easy (imagine setting your Passenger configuration options from within Ruby code)
  • They didn't bat an eye at multiple reboots even late on a Saturday night
  • When we had major problems, they replaced our server without being asked; all I had to do was point out that the issue had happened twice and *whammo* new server

And then, to add icing to a delicious cake, I just got this email:

Hi Scott -

I just wanted to check in and see if you were able to get this to work. Please let me know if you need any further help!

--Ahesan

Ahesan, you magnificent bastard! Thank you for sending this. I can't tell you how good it made me feel to get this at 3:24 am.

So here's my personal bottom line for hosting Rails apps. If you need quality, reliable hosting for Rails with outstanding support then you need to run, not walk and sign up with them. I won't tell you its cheap -- it isn't but Rails Machine knows their stuff and does an outstanding job. Overall I've been more impressed with Rails Machine than any other Rails hosting company I've worked with.

Note 1: I've been a Rails Machine customer for over 3 years now and they've kept every one of the Rails apps I've been a lead developer on running like a champ but their customer service over this past weekend was well over the top. Thanks guys. Appreciated.

Note 2: Since I wrote this up in the bowels of a Sunday night 3 am debugging session, I've since had follow up from others at Rails machine including Josh and Will. Both took the time to go above and beyond just as Ahesan did. Thanks guys. Will in particular cobbled together an excellent suggestion showing how to use MoonShine to do something not explicitly supported yet

Implementing Your Own Caching Layer

Written by Scott Johnson on

I recently had to deal with performance problems in a very large application with a considerable number of SQL queries (i.e. object.find_by_sql or object.paginate_by_sql). And while we can argue whether or not using sql directly in an ActiveRecord context is good, some of these were complex enough (think sum operations, etc) that I didn't want to go and rewrite them as ActiveRecord. And, given a table that is being changed constantly by a crawler, the MySQL query cache wasn't an option*.

So I started things off, as I do so often, by talking to a buddy and discussing the issues. Oddly, he argued against using the built-in Rails caching tools and for doing it myself. Now this is unusual to say the least. Normally he always argues for the built-in frameworks but he and I have had issues in the past around caching and, in particular, cache expiration. So after that discussion, I came to a conceptual approach of this:

  • Use an ActiveRecord model to start the data
  • Use created_at as a tool to manage the cache expiration
  • Serialize the data after fetch to store it away
  • Write a get_latest method inside the model to test whether or not to fetch the data from the cache or the source

The first real problem came from needing to deal with not just straight ActiveRecord (AR) objects but will_paginate collections that wrap around the AR objects. Here's something brilliant about ActiveRecord, irrelevant for me, but brilliant:


    serialize :data

If you put that at the top of your AR model file then that element of the model will be automatically serialized IN and OUT of the database. Outstanding -- but it kept failing for me. Why? Because I had will_paginate collections over the top of the AR objects. Oy. So now that that wasn't working, at all, I turned to Google and I did some research via a great Skorks article. Apparently you can serialize in Rails, automatically, via YAML or by using the marshal command. The benefit to using marshal is its binary which means its smokingly fast. Or at least as fast as anything in Rails is.

So I tried wrapping my data element like this:


    Marshal.dump(res)

to store it (res was the result of the database operation).

and


    Marshal.load(cache_result.data)

And no matter what I did, it just plain failed. So the normal walk away from the computer and ponder deeply while I wander the halls of my home looking contemplatively** around while I cogitate made me realize this: IT IS BINARY BUTTHEAD!

MySQL doesn't store binary data by default so this would require a migration change and a db:migrate:redo. So a quick dash back to the migration and I ended up with this:


    class CreateQueryResultsCaches < ActiveRecord::Migration
      def self.up
        create_table :query_results_caches do |t|
          t.string     :q_hash
          t.text       :q
          t.column :data, :binary, :limit => 10.megabyte
          t.timestamps
        end
        add_index :query_results_caches, :q_hash
      end
      def self.down
        remove_index :query_results_caches, :q_hash
        drop_table :query_results_caches
      end
    end

Useful reference on creating blobs via migrations.

And that actually worked! If you notice the q_hash column, you may be wondering what that is. Given that my queries are long, its faster to hash the query and then use that hash for the lookup instead of trying to look up on a query that's 500 bytes or longer. Now there's only a few more bits to share.

The routine which evaluates the cache result:


    def self.get_latest(q_hash)
      latest = self.find(:first, :conditions => {:q_hash => q_hash}, :order => "created_at DESC")
      # if within last 10 minutes then return else run the real query and store results
      if latest && latest.created_at.between?(20.minutes.ago,Time.now)
        return latest
      else
        nil
      end 
    end

The two methods on the QueryCacheResult object for fetching from the cache and/or populating the cache with and without pagination:


    def self.cache_it_or_create_it_by_sql_with_pagination(obj,q,page)
      if page
        q_hash = Digest::SHA1.hexdigest(q + page).to_s
      else
        q_hash = Digest::SHA1.hexdigest(q).to_s
      end    
      cache_result = self.get_latest(q_hash)
      if cache_result.nil?
        res = obj.paginate_by_sql(q, :page => page, :per_page => 40)
        QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
      else
        res = Marshal.load(cache_result.data)
      end
      res
    end

And...


    def self.cache_it_or_create_it_by_sql(obj,q)
      q_hash = Digest::SHA1.hexdigest(q).to_s
      cache_result = self.get_latest(q_hash)
      if cache_result.nil?
        res = obj.find_by_sql(q)
        QueryResultsCache.create(:q_hash => q_hash, :data => Marshal.dump(res), :q => q)
      else
        res = Marshal.load(cache_result.data)
      end
      res
    end

As a final note, here's an example how this is used from a controller:


    @apps = QueryResultsCache.cache_it_or_create_it_by_sql_with_pagination(App,q,params[:page])

Clearly there's more that can be done here but when you find that the built-in Rails caching mechanisms aren't working for you --or-- you feel that stepping out of the framework will teach you something, implementing your own caching approach isn't all that difficult. Learn to use one of the serialization tools and you're off to the races!

*As an aside, I'd point out that the MySQL query cache just ain't all that great but that's another story for another day.

**Ok I went to the can.

153 Characters to Save 700 Megabytes of RAM

Written by Scott Johnson on

I recently had to run a Rake task on a Rails app which generated a CSV file from a table containing 94,000 odd rows (94,142 to be pedantically specific) and while running it, something I normally do overnight, I noticed my laptopfreeze solid. After minor cursing, when the machine became responsive again, I checked Activity Monitor to find this:

Screen shot 2010-07-20 at 9.05.37 AM.png

Yep. That's right -- a single Ruby process using 952 megs of RAM. Oy. And, just to be safe, I confirmed that the Ruby process chewing 952 megabytes was in fact the Rake task. And, unfortunately it was.

So this brings up the question of of what exactly the table looks like. Now since this is a customer's application, I can't give the exact field names but here are the datatypes in the columns:


    +---------------------+
    | Type                |
    +---------------------+
    | bigint(20)          |
    | int(11)             |
    | varchar(255)        |
    | varchar(255)        |
    | varchar(255)        |
    | text                |
    | varchar(255)        |
    | varchar(255)        |
    | varchar(32)         |
    | int(10) unsigned    |
    | int(11)             |
    | int(11)             |
    | double              |
    | double              |
    | tinyint(3) unsigned |
    | tinyint(3) unsigned |
    | tinyint(3) unsigned |
    | datetime            |
    | datetime            |
    | int(10) unsigned    |
    | int(11)             | 
    | int(11)             |
    | double              |
    | double              |
    | double              |
    | double              |
    | int(10) unsigned    |
    | int(11)             |
    | int(11)             | 
    | double              |
    | double              |
    | double              |
    | tinyint(1)          |
    | tinyint(1)          |
    | int(11)             |
    | int(11)             |
    | int(11)             |
    | float               |
    | float               |
    | int(11)             |
    | int(11)             |
    | int(11)             |
    | float               |
    | float               |
    | float               |
    | float               |
    | int(11)             |
    | int(11)             |
    | int(11)             |
    | float               |
    | float               |
    | float               |
    | tinyint(1)          |
    +---------------------+

Yes its a big table but its not monstrous. Two thoughts came to mind to reduce the memory used by this:

  • Fetch the objects one by one by incrementing the id value. This works but its slow and since its not solely an AutoIncrement column but instead a BigInt supplied from an external data source, this won't work well at all.
  • Fetch less data

Being an old school database person, when I first came into the Rails world, I was initially dismayed by the prevalence of "SELECT * FROM table". I've seen all too many times that overly large fetches have a performance cost but, as I worked with Rails, I simply grew accustomed to it. An interesting blog post I found recently, Five ActiveRecord Tips, pointed out the :select parameter which I had never seen.

The idea behind :select is that you supply a SQL string which represents the attributes of the objects you want to fetch (or columns in the row if you're me and old school). Let's say that you want to get only the id of the object, the created_at and the updated_at in a table called apps then your :select would look like this:


    :select => "apps.id, " + "apps.created_at, " + "apps.updated_at "

And that will be injected into your query by ActiveRecord so that only those 3 attributes will be retrieved per object.

Now here is the before and after of the magic 153 characters that saved 731 megabytes:

Before:


    @apps = App.find(:all, :order => 'id ASC')

After:


    @apps = App.find(:all, :select => "apps.id, " + 
                                      "apps.developer_id, " + 
                                      "apps.display_name, " + 
                                      "apps.canvas_name, " +  
                                      "apps.url, " + 
                                      "apps.description, " + 
                                      "apps.api_key ", :order => 'id ASC')

Keep in mind that standard SQL syntax matters so you have to use commas between the elements and no comma after the last element. Spaces also are important.

Now given that this is Ruby, we can make this a bit cleaner. Here's a first pass at that:


    @apps = App.find(:all, 
                     :select => %w(apps.id apps.developer_id apps.display_name apps.canvas_name apps.url apps.description apps.api_key).join(', '), 
                     :order => 'id ASC', 
                     :limit => 10)

Now given that this is a single table query, we can eliminate the 'apps.' entirely:


    @apps = App.find(:all, 
                     :select => %w(id developer_id display_name canvas_name url description api_key).join(', '), 
                     :order => 'id ASC', 
                     :limit => 10)

So here's our memory utilization after:

Screen shot 2010-07-20 at 9.54.27 AM.png

Now I'd argue that no matter how hard core a Ruby / Rails person you are, trading off 153 characters for 700 megabytes is a hell of a savings.

153 Characters versus Slim Scrooge

Written by Scott Johnson on

Jason, in the comments on my last post, pointed out that I should look at Slim Scrooge. He's actually both right and wrong and, to me, its a case of context. For this application I've moved from a development context to a production context. In development, I have no problems using tools (and I've looked at Slim Scrooge recently actually) and I do. Regularly. But this is now live and I've been dealing with LOTS of memory issues. And I mean LOTS of them -- yes, there will be more posts on this topic. And my concern around Slim Scrooge is just that I don't understand it. Yes it might magically make everything better but might not. I opted to go with what I firmly understood even tho it might have taken me more time than a plugin. Now, if I had noticed this, 4 weeks earlier in the cycle, before we had gone live, I would have been all over Slim Scrooge. For now at least, its :select to my personal rescue.

But thank you Jason. Slim Scrooge seems to be an excellent option and you were 100% right and on the money to point it out.