To investigate the problems of the large grayscale difference between infrared and Synthetic Aperture Radar (SAR) images and their fusion image not being fit for human visual perception, we propose a fusion method for SAR and infrared images in the complex contourlet domain based on joint sparse representation. First, we perform complex contourlet decomposition of the infrared and SAR images. Then, we employ the KSingular Value Decomposition (K-SVD) method to obtain an over-complete dictionary of the low-frequency components of the two source images. Using a joint sparse representation model, we then generate a joint dictionary. We obtain the sparse representation coefficients of the low-frequency components of the source images in the joint dictionary by the Orthogonal Matching Pursuit (OMP) method and select them using the selection maximization strategy. We then reconstruct these components to obtain the fused low-frequency components and fuse the high-frequency components using two criteria——the coefficient of visual sensitivity and the degree of energy matching. Finally, we obtain the fusion image by the inverse complex contourlet transform. Compared with the three classical fusion methods and recently presented fusion methods, e.g., that based on the Non-Subsampled Contourlet Transform (NSCT) and another based on sparse representation, the method we propose in this paper can effectively highlight the salient features of the two source images and inherit their information to the greatest extent.