Roses are Red, Bombs Explode; How I Failed to Test My Own Code

After my RbCAW-related tasks are (mostly) done on the last sprint, my current responsibility as a back-back-end developer is to implement caching so hopefully our homepage will load in less than half a minutecrosses finger. Unfortunately, testing the caching mechanism is a pain in the back.

Our current data flow schema is as follows:
phab flow“What is that ‘Phrefetch’ thing?” you may ask. Following the silly module name that I gave to the API wrapper, ‘Phrefetch’ is what I call the caching mechanism. Fancy, eh?

So, TL;DR: Phrefetch fetch information such as Projects, Tasks, and Users from Phabricator through RbCAW and store those in the database. The magic that is frontend will then turn the raw DB entries into grid of colored pixels that is user interface. We use PostgreSQL as relational DB is more suitable on storing the information. Also, it is free on Heroku.

As almost every information on each table (Projects, Tasks, and Users) are fetched on one query, cache invalidation is performed not based on expiry but on a set interval. That means Phrefetch will need to run alongside Rails’ web server (Puma) and invoke the caching logic every K seconds or so.

This calls for concurrency, where a set of code runs alongside the others in parallel. Phrefetch will run as a background job, and is started right after Rails had finished initializing. I use concurrency-ruby as its TimerTask is exactly what I need.

Logically, Phrefetch can only be instantiated once, so I applied the singleton design pattern with the help of Ruby’s Singleton mixin.

require 'concurrent'
require 'singleton'

module Phrefetch
  class Phrefetch
    include Singleton

    def initialize
      @running = false
      @task = nil

      # Do other stuff
    end

    def start
      raise StandardError, 'Phrefetch already running' if running?

      @running = true

      @task = Concurrent::TimerTask.new(run_now: true) do |task|
        # Do the caching
      end

      @task.execute
    end

    def stop
      return unless running?

      @task.shutdown
      @running = false
    end

    def restart
      stop
      start
    end

    def running?
      return @running
    end
  end
end

That’s the basic outline of Phrefetch. The Concurrent::TimerTask will spawn a new thread when it’s invoked. As the caching logic will do HTTP requests and DB operations, it is bound to be rather slow, so I also placed a safeguard using Timeout::timeout() to forcefully stop the caching logic if it took too long so it won’t block the next scheduled TimerTask from running.

I’ve been rambling for 386 words and you might be wondering “What’s the problem then? It seems to be okay”.

Wrong.

I wanted to test that the periodical caching works properly: 1) it can be started, 2) it will finish if everything went well, and 3) it can be shut down and restarted properly. The problem lies in #2, we don’t really know how long the caching will run before it is finished.

So I opted for the simplest solution: sleep while the task is not done or timed out yet. It actually works fine until it does not — the test will randomly fail due to the caching took way too long to finish. This happens irregularly — it might fail out of nowhere and somehow finish properly after I put a debugging breakpoint.

Selection_001
“You thought this won’t raise an error, but it was me, Timeout:Error!”

This raised some questions because the tests for caching logic (without the concurrency stuff) don’t take that long. Even more questions and eyebrows were raised as stubbing the ActiveRecord methods ain’t do nothing.

Selection_002.png
I stubbed the User.find method and placed a breakpoint right before the ActiveRecord is saved into the DB. Guess what? It did not work.

After some research, apparently what I was trying to do is kind of redundant because RSpec stubbed ActiveRecord methods by default. Putting two and two together, this is why the caching logic tests run faster than the concurrency tests: RSpec failed to stub the ActiveRecord methods on the concurrency tests, thus those tests will do a real write operation to the database. Read/Write operation on the database are slower than what RSpec stubs do, which I suspect is using in-memory database akin to SQLite as a substitute to real DB.

pjimage
Left: User table during concurrency tests. Right: User table during caching logic tests.

So another TL;DR: concurrency tests might be slow because those involve the database directly. Why? Because threads.

Apparently mocking libraries like Mocha or the one used by RSpec are not thread safe. It means that what I mocked or stubbed on the main testing thread will not be mocked/stubbed on another thread. Great.

So even if RSpec stubbed the ActiveRecord magics on its own thread, the DB operations which inside the Concurrent::TimerTask will not be stubbed, thus accessing the DB directly.

pjimage (1).jpg
Top: on the same thread as RSpec’s thread (#2), stubbed method works properly. Bottom: inside the Concurrent::TimerTask thread (#31), stubbing don’t work.

After hours and hours of googling, I’ve yet to found a solution to this problem.

Some suggest using NullDB to mock the actual database rather than just the DB operation methods. But apparently, NullDB have compatibility issue with PostgreSQL’s schema.

Some suggestion revolves on mocking the multithreading completely, leaving only the test about logic that happen inside the concurrent thread. But, that defeats the whole point of my test — to ensure the background job works properly. Also, I already done  the synchronous test with the caching logic tests.

The moral of this story is: I suck at googling. I believe there must be some solution to this problem, but alas I’ve not find it yet. So, until then I think we will have to bear with restarting the RSpec tests whenever that one concurrency test fail. Peculiarly, this have not happened yet on our Gitlab’s pipeline, but I think it will, sooner or later.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s