The discovery of the backpropagation algorithm ranks among one of the most important moments in the history of machine learning, and has made possible the training of large-scale neural networks through its ability to compute gradients at roughly the same computational cost as model evaluation. Despite its importance, a similar backpropagation-like scaling for gradient evaluation of parameterised quantum circuits has remained elusive. Currently, the best known method requires sampling from a number of circuits that scales with the number of circuit parameters, making training of large-scale quantum circuits prohibitively expensive in practice. Here we address this problem by introducing a class of structured circuits that admit gradient estimation with significantly fewer circuits. In the simplest case -- for which the parameters feed into commuting quantum gates -- these circuits allow for fast estimation of the gradient, higher order partial derivatives and the Fisher information matrix, and are not expected to suffer from the problem of vanishing gradients. Moreover, classes of parameterised circuits exist for which the scaling of gradient estimation is in line with classical backpropagation, and can thus be trained at scale. In a toy classification problem on 16 qubits, such circuits show competitive performance with other methods, while reducing the training cost by about two orders of magnitude.