Recent hardware-aware algorithms for finite-element matrix-vector multiplications suggest that on-the-fly matrix-vector products without storing the cell-level matrices reduce both arithmetic complexity and the data access costs. The current implementations of such matrix-free algorithms deal only with a single vector and are not readily applicable for matrix multivector products. We propose an efficient implementation procedure for the matrix-free algorithm to compute FE discretized matrix-multivector products on hybrid CPU-GPU architectures.