In this paper, we investigate the extent to which we could classify task difficulty in the software testing domain, using psycho-physiological sensors. Following a literature review, we selected and adapted the work of Fritz et al. (2014) among software developers, and transposed it to the testing domain. We present the results of a study conducted with 16 professional software testers carrying out predefined tasks in a lab setting, while we collected eye tracking, electroencephalogram (EEG) and electrodermal activity (EDA) data. On average, each participant took part in a two-hour data-collection session. Throughout our study, we captured approximately 14Gb of biometric data, consisting of more than 120 million data points.
Using this data, we trained 21 naïve Bayes classifiers to predict task difficulty from three perspectives (by participant, by task, by participant-task) and using the seven possible combinations of sensors. Our results confirm that we can predict task difficulty for a new tester with a precision of 74.4% and a recall of 72.5% using just an eye tracker, and for a new task with a precision of 72.2% and a recall of 70.0% using eye tracking and electrodermal activity. The results achieved are largely consistent with the work of Fritz et al. (2014). We conclude by providing insights as to which combinations of sensors would provide the best results, and how this work could be used to enhance well-being and workflow support tools in an industry setting.
Anders, S., Lotze, M., Erb, M., Grodd, W. and Birbaumer, N. (2004), ‘Brain activity underlying emotional valence and arousal: A response-related fmri study’, Human brain mapping 23, 200–9.
Barrett, L. (1998), ‘Discrete emotions or dimensions? the role of valence focus and arousal focus’, Cognition and Emotion 12, 579–599.
Blaiech, H., Neji, M., Wali, A. and Alimi, A. (2013), Emotion recognition by analysis of eeg signals, pp. 312–318.
Brookings, J., F. Wilson, G. and R. Swain, C. (1996), ‘Psychophysiological responses to changes in workload during simulated air traffic control’, Biological psychology 42, 361–77.
Carter, J. and Dewan, P. (2010), Design, implementation, and evaluation of an approach for determining when programmers are having difficulty, pp. 215–224.
Crosby, M. and Stelovsky, J. (1990), ‘How do we read algorithms?: A case study’, Computer 23, 25–35.
Florea, R. and Stray, V. (2019), ‘The skills that employers look for in software testers’, Software Quality Journal 27(4), 1449–1479.
Fowles, D. C., Christie, M. J., Edelberg, R., GRINGS, W. W., Lykken, D. T. and Venables, P. H. (1981), ‘Publication recommendations for electrodermal measurements’, Psychophysiology 18(3), 232–239.
Fritz, T., Begel, A., C. M¨ uller, S., Yigit-Elliott, S. and Z¨uger, M. (2014), ‘Using psycho-physiological measures to assess task difficulty in software development’, Proceedings of the 36th International Conference on Software Engineering. ACM.
Graziotin, D., Wang, X. and Abrahamsson, P. (2013), Are happy developers more productive? the correlation of affective states of software developers and their self-assessed productivity.
Haapalainen Ferreira, E., Kim, S., Forlizzi, J. and Dey, A. (2010), Psycho-physiological measures for assessing cognitive load, pp. 301–310.
Hart, S. G. (1986), ‘Nasa task load index (tlx): Paper and pencil package’, NASA Ames Research Center, Moffett Field, CA United States 1.
Hart, S. G. and Staveland, L. E. (1988), Development of nasa-tlx (task load index): Results of empirical and theoretical research, in P. A. Hancock and N. Meshkati, eds, ‘Human Mental Workload’, Vol. 52 of Advances in Psychology, North-Holland, pp. 139 – 183.
Hess, E. H. and Polt, J. M. (1960), ‘Pupil size as related to interest value of visual stimuli’, Science 132(3423), 349–350.
Khan, I., Brinkman, W.-P. and Hierons, R. (2011), ‘Do moods affect programmers’ debug performance?’, Cognition, Technology & Work 13, 245–258.
Khan, I., Brinkman, W.-P. and Hierons, R. (2013), ‘Towards estimating computer users’ mood from interaction behaviour with keyboard and mouse’, Frontiers of Computer Science 7.
Kim, K., Bang, S. and Kim, S. (2004), ‘Emotion recognition system using short-term monitoring of physiological signals’, Medical biological engineering computing 42, 419–27.
Kramer, A. F. (1991), ‘Physiological metrics of mental workload: A review of recent progress’, pp. 279–328.
Lee, J. and Tan, D. (2006), Using a low-cost electroencephalograph for task classification in hci research, pp. 81–90.
Micallef, M., Porter, C. and Borg, A. (2016), Do exploratory testers need formal training? an investigation using hci techniques, in ‘2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)’, IEEE, pp. 305–314.
M¨ uller, S. C. (2015), Measuring software developers’ perceived difficulty with biometric sensors, in ‘2015 IEEE/ACM 37th IEEE International Conference on Software Engineering’, Vol. 2, pp. 887–890.
M¨ uller, S. C. and Fritz, T. (2015), Stuck and frustrated or in flow and happy: Sensing developers’ emotions and progress, in ‘2015 IEEE/ACM 37th IEEE International Conference on Software Engineering’, Vol. 1, pp. 688–699.
Ostberg, J., Graziotin, D., Wagner, S. and Derntl, B. (2017), Towards the assessment of stress and emotional responses of a salutogenesis-enhanced software tool using psychophysiological measurements, in ‘2017 IEEE/ACM 2nd International Workshop on Emotion Awareness in Software Engineering (SEmotion)’, pp. 22–25.
Parnin, C. (2011), Subvocalization - toward hearing the inner thoughts of developers, pp. 197 – 200.
Ramirez, R., Palencia-Lefler, M., Giraldo, S. and Vamvakousis, Z. (2015), ‘Musical neurofeedback for treating depression in elderly people’, Frontiers in Neuroscience 9.
Salimpoor, V. N., Benovoy, M., Longo, G., Cooperstock, J. R. and Zatorre, R. J. (2009), ‘The rewarding aspects of music listening are related to degree of emotional arousal’, PLOS ONE 4(10), 1–14.
Schmidt, S. and Walach, H. (2000), ‘Electrodermal activity (eda) - state-of-the-art measurement and techniques for parapsychological purposes’, Journal of Parapsychology 64, 139–163.
Shaw, T. (2004), The emotions of systems developers: an empirical study of affective events theory., pp. 124–126.
Siegmund, J., Peitek, N., Parnin, C., Apel, S., Hofmeister, J., K¨astner, C., Begel, A., Bethmann, A. and Brechmann, A. (2017), Measuring neural efficiency of program comprehension, in ‘Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering’, ESEC/FSE 2017, Association for Computing Machinery, New York, NY, USA, p. 140–150.
Smith, S. W. (1997), The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, USA.
Vanitha, A. and Alagarsamy, K. (2019), ‘Software testing in cloud platform: A survey’.
Wróbel, M. (2013), Emotions in the software development process, pp. 518–523.