Costa Rica is one of the countries with highest species biodiversity density in the world. More than 2,000 tree species have already been identified, many of which are used in the building, furniture, and packaging industries (Grayum et al. 2003). This rich diversity makes the correct identification of tree species very difficult. As a result, it is common to see in the national market that species are commercialized with mistaken identifications, which makes quality control particularly challenging. In addition, because 90 timber tree species have been classified as “threatened” in Costa Rica, correct identifications are indispensable for law-enforcement. The traditional system for tree species identification is based on macro and microscopic evaluations of the anatomy of the wood. It entails assesing anatomical features such as patterns of vessels, parenchymas, and fibers. Typically, 7.7 x 10 cm pieces of wood cuts are used to identify the tree species (Pan and Kudo 2011, Yusof et al. 2013). However, assessing these features is extremely difficult for taxonomists because properties of the wood can vary considerably due to environmental conditions and intra-specific genetic variability. Deep learning techniques have recently been used to identify plant species (Carranza-Rojas et al. 2017a, Carranza-Rojas et al. 2017b) and are potentially useful to detect subtle differences in patterns of vessels, parenchyma, and other anatomical features of wood. However, it is necessary to have a large collection of macroscopic photographs of individuals from various parts of the country (Pan and Kudo 2011). As a first step in the application of deep learning techniques, we have defined a formal, standard protocol for collecting wood samples, physically processing them, taking pictures, performing data augmentation, and using metadata to provide the primary data necessary for deep learning applications. Unlike traditional xylotheque sampling methods that destroy trees or use wood from fallen trees, we propose a method that extracts small size samples with sufficient quality for anatomical characterization but does not affect the growth and survival of the individual. This study has been developed in three forest permanent plots in Costa Rica, all of which are sites with historical growth data over the last 20 years. We have so far evaluated 40 species (10 individuals per species) with diameters greater than 20 cm. From each individual, a cylindrical sample of 12 mm diameter and 7.5 cm in length was extracted with a cordless drill. Each sample is then cut into five of 8 x 8 x 8 mm cubes and further processed to result in curated xylotheque samples, a dataset with all relevant metadata and original images, and a dataset with images obtained by performing data augmentation on the original images.