Any properly designed network coding technique can result in increased throughput and reliability of multi-hop wireless networks by taking advantage of the broadcast nature of wireless medium. In many inter-flow network coding schemes nodes are encouraged to overhear neighbours traffic in order to improve coding opportunities at the transmitter nodes. A study of these schemes reveal that some of the overheard packets are not useful for coding operation and thus this forced overhearing increases energy consumption dramatically. In this paper, we formulate network coding aware sleep/wakeup scheduling as a semi Markov decision process (SMDP) that leads to an optimal node operation. In the proposed solution for SMDP, the network nodes learn when to switch off their transceiver in order to conserve energy and when to stay awake to overhear some useful packets. One of the main challenges here is the delay in obtaining reward signals by nodes. We employ a modified Reinforcement Learning (RL) method based on continuous-time Q-learning to overcome this challenge in the learning process. Our simulation results confirm the optimality of the new methodology.