Background Lung segmentation constitutes a critical procedure for any clinical-decision supporting system aimed to improve the early diagnosis and treatment of lung diseases. Abnormal lungs mainly include lung parenchyma with commonalities on CT images across subjects, diseases and CT scanners, and lung lesions presenting various appearances. Segmentation of lung parenchyma can help locate and analyze the neighboring lesions, but is not well studied in the framework of machine learning. Methods We proposed to segment lung parenchyma using a convolutional neural network (CNN) model. To reduce the workload of manually preparing the dataset for training the CNN, one clustering algorithm based method is proposed firstly. Specifically, after splitting CT slices into image patches, the k-means clustering algorithm with two categories is performed twice using the mean and minimum intensity of image patch, respectively. A cross-shaped verification, a volume intersection, a connected component analysis and a patch expansion are followed to generate final dataset. Secondly, we design a CNN architecture consisting of only one convolutional layer with six kernels, followed by one maximum pooling layer and two fully connected layers. Using the generated dataset, a variety of CNN models are trained and optimized, and their performances are evaluated by eightfold cross-validation. A separate validation experiment is further conducted using a dataset of 201 subjects (4.62 billion patches) with lung cancer or chronic obstructive pulmonary disease, scanned by CT or PET/CT. The segmentation results by our method are compared with those yielded by manual segmentation and some available methods. Results A total of 121,728 patches are generated to train and validate the CNN models. After the parameter optimization, our CNN model achieves an average F-score of 0.9917 and an area of curve up to 0.9991 for classification of lung parenchyma and non-lung-parenchyma. The obtain model can segment the lung parenchyma accurately for 201 subjects with heterogeneous lung diseases and CT scanners. The overlap ratio between the manual segmentation and the one by our method reaches 0.96. Conclusions The results demonstrated that the proposed clustering algorithm based method can generate the training dataset for CNN models. The obtained CNN model can segment lung parenchyma with very satisfactory performance and have the potential to locate and analyze lung lesions.