The approximation of a discrete probability distribution \(\mathbf{t}\) by an \(M\)-type distribution \(\mathbf{p}\) is considered. The approximation error is measured by the informational divergence \(\mathbb{D}(\mathbf{t}\Vert\mathbf{p})\), which is an appropriate measure, e.g., in the context of data compression. Properties of the optimal approximation are derived and bounds on the approximation error are presented, which are asymptotically tight. It is shown that \(M\)-type approximations that minimize either \(\mathbb{D}(\mathbf{t}\Vert\mathbf{p})\), or \(\mathbb{D}(\mathbf{p}\Vert\mathbf{t})\), or the variational distance \(\Vert\mathbf{p}-\mathbf{t}\Vert_1\) can all be found by using specific instances of the same general greedy algorithm.