We finally hosted the USF CTF 2007 event last Friday. It’s been a few days, so now it’s time for me to post what I’ve thought of it so far.

What did I do

I was responsible for the privatemessage (Rails), profile (Rails), and psifertex (Camping) services. I also created the file recovery (hatten_ar_din.dmg) challenge. If you desperately need hints for these, please contact me privately (I don’t want to spoil them publicly until at least a year has passed). Otherwise, feel free to comment here.

I also worked on small parts of the scoreboard, including the interaction model for the flag depositor (nc 31336) and research for other parts of the system.

Did Like

The scripts scoring my services seemed to work without issue. We spent an awful lot of time beforehand troubleshooting these, and originally I wasn’t sure they’d hold up.

My services seemed to be about where I wanted them to be difficulty-wise. Psifertex and Privatemessage were relatively easy, both to attack and defend. Profile was more subtle, with only one team actually stealing more than two flags from it. While defg, the Georgia Tech team was able to score on it reliably, I’m not certain they ever scripted an attack on it (our baseline for granting a breakthrough bonus).

The file recovery challenge was never solved. I’m not going to talk about this much publicly, because I still don’t think anybody has recovered the password from it.

Did Not Like

The scoreboard was SLOW. By the end of the game, after taking all but a single request off of it, the overview of it was taking ten seconds to render, almost all of it waiting on a massive query:

        def count_all(services)
                total_o_query = "IFNULL((SELECT COUNT(DISTINCT captures.id) FROM captures LEFT OUTER JOIN flags ON flags.id = captures.flag_id WHERE captures.team_id = t),0) AS total_o"
    total_d_query = "IFNULL((SELECT SUM(uptimes.value) FROM uptimes WHERE team_id=t),0) AS total_d"
    total_bonus_query = "IFNULL((SELECT SUM(amount) FROM bonuses WHERE team_id=t),0) AS bonuses"

    last_capture = "IFNULL((SELECT UNIX_TIMESTAMP(time) FROM captures WHERE captures.team_id = t ORDER BY captures.time DESC LIMIT 0,1),0) AS l_cap"
    last_defend = "IFNULL((SELECT UNIX_TIMESTAMP(time) FROM uptimes WHERE team_id = t ORDER BY time DESC LIMIT 0,1),0) AS l_def"

    final_query = "SELECT teams.longname, IFNULL(teams.ip,'') AS ip, teams.id AS t, teams.shortname AS sname, #{total_o_query}, #{total_d_query}, #{total_bonus_query}"
    final_query = final_query + ", #{last_capture}, #{last_defend}"
                services.each do |s|
                  service_o_query = "IFNULL((SELECT COUNT(captures.id) FROM captures,flags WHERE service_id=#{s.id} AND flags.id = captures.flag_id AND captures.team_id=t GROUP BY captures.team_id),0) AS #{s.shortname}_o"
      service_d_query = "IFNULL((SELECT SUM(uptimes.value) FROM uptimes WHERE service_id=#{s.id} and team_id=t),0) AS #{s.shortname}_d"
      final_query = final_query + ", #{service_o_query}, #{service_d_query}"

    final_query = final_query + " FROM teams ORDER BY (SELECT SUM(total_o+total_d+bonuses)) DESC, t ASC"

    result = ActiveRecord::Base.connection.select_all(final_query)
    return result unless result.empty? or result.nil?

Ten seconds for a page is crap, and we anticipated page loads being a problem, so we configured the scoreboard to use page caching for anything world-facing. However, a confluence of conditions (+4 alliteration bonus) broke this.

  • A request that hits a cached page will bypass Rails entirely. Conversely, a request that is a cache miss will hit Rails. This is by design.
  • The only way to expire a cached page is deletion.
  • Rails writes the cached page at the end of a request, because the information necessary to write it isn’t available beforehand.
  • If two requests for the same cache miss are processed close enough in time that the second request won’t start before the first one finishes, both will miss.
  • Mongrel instances can only handle a certain number of requests each, to prevent a flood of requests from using all the server resources.
  • When Mongrel runs out of free request “slots”, the oldest request gets closed before completion.
  • When your SQL server takes 50 seconds to fulfill a query, your Mongrel will probably close that request before it finishes.
  • So if you keep getting hit, the page never writes out and you always have misses driving up load on SQL.

The ghetto competition way we fixed this was putting plain ol’ Apache on one machine, and using wget to fetch it pages from Mongrel. We could have it wait 20 seconds between requests, and this reduced load allowed queries to finish in less than 10 seconds, while still allowing the depositor and scoring scripts to continue their business.

I’d like to fix this by separating out the monolithic SQL into smaller, maintainable queries; and caching the output of those with a smarter algorithm (maybe one that generates the next result prior to the previous one expiring).

Other than that, it was a good game, and I’d like to thank everybody who put up with the slowness and had a good time. We plan on making this an annual event, so that should give us plenty of time to tighten up the issues that we had.