We introduce a novel transpiler for the placement and routing of quantum circuits on arbitrary target hardware architectures. We use finite-horizon, and optionally discounted, reward functions to heuristically find a suitable placement and routing policy. We employ a finite lookahead to refine the reward functions when breaking a tie between multiple policies. We benchmark our transpiler against multiple alternative solutions and on various test sets of quantum algorithms to demonstrate the benefits of our approach.