Quantum interfaces between photons and ensembles of atoms have emerged as powerful tools for quantum technologies. A major objective for such interfaces is high fidelity storage and retrieval of a photon in a collective quantum state of many atoms. This requires long-lived collective superposition states, which is typically achieved with immobilized atoms. Thermal atomic vapors, which present a simple and scalable resource, have, so far, only been used for continuous variable processing or for discrete variable processing on short time scales where atomic motion is negligible. We develop a theory based on the concept of motional averaging to enable room temperature discrete variable quantum memories and coherent single photon sources. We show that by choosing the interaction time so that atoms kept under spin protecting conditions can cross the light beam several times during the interaction combined with suitable spectral filtering, we erase the "which atom" information and obtain an efficient and homogenous coupling between all atoms and the light. Heralded single excitations can thus be created and stored as collective spinwaves, which can later be read out to produce coherent single photons in a scalable fashion. We demonstrate the feasibility of this approach to scalable quantum memories with a proof-of-principle experiment with room temperature atoms contained in microcells with spin protecting coating, placed inside an optical cavity. The experiment is performed at conditions corresponding to a few photons per pulse and clearly demonstrates a long coherence time of the forward scattered photons, which is the essential feature of the motional averaging.