Re-Implementation of “BQ: A Lock-Free Queue with Batching” Using MR Locks

This implementation is based on a publication by Gal Milman, Alex Kogan, Yossi Lev, Victor Luchangco, and Erez Petrank in 2018. [1] Titled BQ: A Lock-Free Queue with Batching, they describe the data structures functionality and correctness, along with the previous work it is based on. [1] All implementation details are from this paper unless otherwise stated. In this series of papers, we will be reimplementing and performance testing their data structure. This first paper covers an initial implementation involving no concurrency, using an MR Lock [2] to restrict threads access to the data structure. This ensures correctness as the operations on the queue are guaranteed to be mutually exclusive. Our source code is available here: ​ https://github.com/ClayDiGiorgio/ParallelTeam24


Introduction
This implementation is based on a publication by Gal Milman, Alex Kogan, Yossi Lev, Victor Luchangco, and Erez Petrank in 2018. [1] Titled "BQ: A Lock-Free Queue with Batching", they describe the data structure's functionality and correctness, along with the previous work it is based on. [1] All implementation details are from this paper unless otherwise stated. In this series of papers, we will be reimplementing and performance testing their data structure. This first paper covers an initial implementation involving no concurrency, using an MR Lock [2] to restrict threads' access to the data structure. This ensures correctness as the operations on the queue are guaranteed to be mutually exclusive.
Our source code is available here: https://github.com/ClayDiGiorgio/ParallelTeam24 To compile the following implementations, use the command-line arguments below: Description of the Data Structure A BQ is a thread-safe queue that differs from a normal thread-safe queue in that it supports batch operations. Batch operations work like this: a thread reports to the queue that it would like to enqueue a value or dequeue a value at some point. The queue returns a log (known as a "future") of this request. When the thread is ready to execute those future operations, (or when it wants to enqueue or dequeue immediately) the queue performs those requests with the help of other threads. The original thread can then get the results of its operations from the futures originally returned by the queue. [1] Structures The queue is stored in a linked list that maintains a global-accessible head and tail pointer. It also contains the count of the head and tail to keep track of the size of the list. The head node in this structure is used as a sentinel node, meaning that the data stored in the node is not a part of the queue itself. [1] Each individual thread has its own ThreadData struct, which contains its own collection of operations, and its own linked list of enqueued elements. It also keeps track of the number of excessDequeues, which is used to ensure that linearizability is maintained. [1] List of BQ's Public Methods enq() Immediate enqueue. Executes stored batch before executing itself.
futureEnq() Report to the queue that an enqueue will happen at some point. Returns a log of this notification.
futureDeq() Report to the queue that a dequeue will happen at some point. Returns a log of this notification.

BQLocked.cpp
The modification of the BQ data structure to work non-concurrently with an MR Lock is our original work, though the BQ and the MR Lock definitions are by their respective authors. [1] [2] BQLocked.cpp is an almost carbon copy of BQSequential but with a few alterations to implement locking functionality using the provided mrlock.h and bitset.h header files. [2] The init() function has been updated to include an instantiation of the MRLock class with two resources. As there are only two resources, head and tail, which require locking, the resources are controlled through an enum, LockType, defined in BQStructs.h. This enum maps the head, tail, and both, to 1, 2, and 3, respectively. The enq() and deq() now lock the TAIL and HEAD, whereas a call to execute(), which handles batch operations, locks BATCH (both head and tail).

Performance
BQLocked.cpp and BQSequential.cpp also include a main() function with code for performance testing. In both, we initialize an array of values from which we can select for enqueue operations. Each test randomly selects between performing a single enqueue, dequeue, or batch of futures 400 times. For BQSequential, this is done in the main function. In BQLocked, each of the 4 spawned threads does ¼ of these total operations.
This test function initializes an array of values for future enqueue operations and four threads with a test function. The test function performs a random mix of operations between individual enqueues, dequeues, and batches. Whenever a batch is performed, a randomized 10-length sequence of enqueue and dequeue Futures are created and executed. It is impossible to preallocate Futures prior to testing without substantially altering the data structure. As such, Futures are allocated as futureEnq() and futureDeq() operations are called, as the structure defines, and the execute() function will execute all futures in the thread. With pre-allocation, threads would be able to update and change the Futures of other threads.
Each main() method was called 10 times and the average runtimes for each is graphed to the left.
As we can see, the sequential version of the queue performed substantially better than the parallel version. We attribute this to the wait time experienced due to locking. However, we can see that the wait time overhead introduced by locking is substantially reduced as operations increase. Where both implementations scale more or less linearly, the linear implementation has a steeper slope.