Fix a bunch of races in the tests
Once we switch to bluebird, suddenly a load of timing issues come out of the
woodwork. Basically, we need to try harder when flushing requests. Bump to
matrix-mock-request 1.1.0, which provides `flushAllExpected`, and waits for
requests to arrive when given a `numToFlush`; then use `flushAllExpected` in
various places to make the tests more resilient.