Although the performance of the 3D human shape reconstruction method has improved considerably in recent years, most methods focus on a single person, reconstruct a root-relative 3D shape, and rely on ground-truth information about the absolute depth to convert the reconstruction result to the camera coordinate system. In this paper, we propose an end-to-end learning-based model for single-shot, 3D, multi-person shape reconstruction in the camera coordinate system from a single RGB image. Our network produces output tensors divided into grid cells to reconstruct the 3D shapes of multiple persons in a single-shot manner, where each grid cell contains information about the subject. Moreover, our network predicts the absolute position of the root joint while reconstructing the root-relative 3D shape, which enables reconstructing the 3D shapes of multiple persons in the camera coordinate system. The proposed network can be learned in an end-to-end manner and process images at about 37 fps to perform the 3D multi-person shape reconstruction task in real time.