Object detection and recognition is a very important topic with significant research value. This research develops an optimised model of moving target identification based on CNN to address the issues of insufficient positioning information and low target detection accuracy (convolutional neural network). In this article, the target classification information and semantic location information are obtained through the fusion of the target detection model and the depth semantic segmentation model. The classification and position portion of the target detection model is provided by the simultaneous fusion of the image features carrying various information and a pyramid structure of multiscale image features so that the matched image fusion characteristics can be used by the target detection model to detect targets of various sizes and shapes. According to experimental findings, this method's accuracy rate is 0.941, which is 0.189 higher than that of the LSTM-NMS algorithm. Through the migration of CNN and the learning of context information, this technique has great robustness and enhances the scene adaptability of feature extraction as well as the accuracy of moving target position detection.