Although deep learning holds great promise as a prognostic tool in psychiatry, a limitation of the method is that it requires large training sample sizes to achieve replicable accuracy. This is problematic for fMRI datasets as they are typically small due to the considerable time, cost, and resources necessary to obtain them. A recently developed self-supervised learning method called Mixup may help overcome this challenge. In Mixup, the learner combines pairs of training instances to produce a virtual third instance that is a linear combination of the two instances and their labels. This procedure is also well-suited to the coregistered images typically found in fMRI datasets. Here we compared performance of a task fMRI-based deep learner with Mixup vs without Mixup on predicting response to treatment in recent onset psychosis. Whole brain fMRI time series data were extracted from a cognitive control task in 82 patients with recent onset psychosis and used to predict “Improver” ( n = 47) vs “Non-Improver” ( n = 35) status, with Improver defined as showing a 20 % reduction in total Brief Psychiatric Rating Scale score after 1 year of treatment. Mixup significantly improved performance (accuracy without Mixup: 76.5 % [95 % CI: 75.9–77.1 %]; accuracy with Mixup: 80.1 % [95 % CI: 79.4–80.8 %]). Ablation showed the improvement was due to improvement in both Improvers and Non-Improvers. These results suggest that using Mixup may significantly improve performance and reduce overfitting of fMRI-based prognostic deep learners and may also help overcome the small sample size challenge inherent to many neuroimaging datasets.