Message from 2021
This post is pretty old! Opinions and technical information in it are almost certainly oudated. Commands and configurations will probably not work. Consider the age of the content before putting any of it into practice.
Found the problem making that SQL query from yesterday take so long. As it turns out, there was a quirk in our ad hoc “configuration management” system.
About a week before competition, we started keeping a database pre-loaded with teams and services updated, so that it didn’t strike 9:50 on 4/20 and leave us with a big gob of work to be done in incredibly short order.
Compounded with this, we had been working on the scoreboard pretty heavily over the same time span, trying to speed it up. It was during this period in which the big query (from yesterday) was written, as well as a migration to add indices to a table that needed them:
def self.up add_index(:uptimes, [:team_id, :service_id, :flag_id, :value]) add_index(:uptimes, :service_id) add_index(:uptimes, :team_id) add_index(:uptimes, :flag_id) add_index(:uptimes, :value) end def self.down remove_index(:uptimes, [:team_id, :service_id, :flag_id, :value]) remove_index(:uptimes, :team_id) remove_index(:uptimes, :service_id) remove_index(:uptimes, :flag_id) remove_index(:uptimes, :value) end end
We kept testing the whole week, satisfied with the speed of the scoreboard on the test data that had been accumulating all week just by running the uptime checker/flag placer against a handful of VMs.
Morning of, we’ve got the whole setup from the previous day in a satisfactory condition, so we copy the directory for the database right on top of the one filled with test data.
RAILS_ENV=production rake db:migrate
That’s what we forgot. With the indices above, the big query ran in 0.6 seconds instead of the 6 it was taking on the machine we tested with today.
So even if we do nothing else, the scoreboard next game won’t be as bad, because instead of a cache miss taking 20 seconds (and queueing up 20 seconds worth of cascading cache misses), it’ll now only queue up 1 second of cascading cache misses.
And I think we have a solution for that too.
Prefetching in Rails
The plan right now is to use caching on both the SQL output and page caching. Since we can’t completely negate the penalty of a page cache miss (or the cascading cache misses), we’re planning to also cache SQL output. Since you can easily populate the SQL cache from outside an HTTP request (using
script/runner for example), we can set up the two sets of caches to complement each other, operated by a script parallel to the HTTP service:
- Expire (either by force or a cache that knows expiration) the SQL cache.
- Perform SQL requests that will get cached.
- Expire the page cache.
The next request might still cause a cascading cache miss, but since it’ll all be in-process misses (the data will be cooling its heels in the cache), it shouldn’t bring anything to its knees.
I’m still hoping that this won’t be necessary for our scoreboard, but I’m throwing it out here for criticism and in case somebody else needs it.