The ABCD of Usability Testing.

We introduce a methodology for tracking and auditing feedback, errors and suggestions for software packages. This short paper describes how we innovate on the evaluation mechanism, introducing an (Antecedent, Barrier, Consequence and Development) ABCD form, embedded within a platform to enable end users to easily report on usability issues. The evaluation platform we use is the STEP cloud e-Participation platform (part of the STEP Horizon2020 project http://step4youth.eu/) and it will be tested and validated in real life contexts, with the participation of public authorities that will integrate the service in their regular decision-making practices, involving young people through engagement and motivation strategies. The pilot evaluation aims to demonstrate how open engagement needs to be embedded within public sector processes and identify the key barriers for wide scale deployment of the platform.


INTRODUCTION
There is widespread problem in the development of software, screen media and large-scale public engagement projects; the problem is how to get feedback from all user groups, stakeholders and developers quickly, unobtrusively and effectively (Albert, Tullis & Tedesco, 2009). Inevitably, in such projects, the delivery time window is constrained by the stakeholders, and the underlying aims of the project. In addition, the budget is always going to be constrained, and pressure to increase the volume of content, or the scope of new functions can often suck in additional budget (Hawk and Dos Santos, 1991), often leaving the usability budget in deficit before it has even been called into question (see Tullis & Albert, 2013). Besides this structural barrier to adoption, there is a further problem of how to empower users to document their insights and issues. If a new system is presented as the result of considerable investment, then there can be an unconscious bias present in the testers to rate any functions positively. This tendency for people to want to give positive feedback is well documented in HCI and Experimental Psychology (Nichols & Maner, 2008), particularly if the participants themselves are pro-social in their outlook and may succumb to the Hawthorne Effect of trying much harder at the task because they are conscious of being observed (e.g. Franke & Kaul, 1978).
Compounded with the tendency to evaluate novelty positively is the problem of how to enable testers to reflect on their experience and provide suggestions for improvements. If testing is restricted to a single episode, then any emerging idea that happens after the interview or test is lost, since the time limited questionnaire does not provide and 'incubation' period (e.g. Dorfman, Shames and Kihlstrom, 1996). Clearly longer time-frame testing, and a paradigm that allows interaction and reflection would increase the chance of feedback and insight reaching the team of developers. Finally, however, any insight generated is only of use if it is fed back to the team in a timely manner (McKeen, Guimaraes & Wetherbe, 1994). The structure of some experiment-style usability tests can inhibit rapid feedback, since it necessarily involves a temporal disconnect between construction of the system and its testing. Whenever there is a delay for collection, collation and analysis of data, insight feedback can be lost. The problem then is to find a way to share the insights from all partners quickly and efficiently.
Neilsen (1997) documents many of the techniques available in Usability (UX), but there is a problem with requirement to give feedback. Traditionally UX research has been seen as 'laboratory' research with a moderator present. The focus on getting feedback in real-time in the company of a moderator can break up the 'flow' of the user experience. Even, when an experience is selfpaced, if the user has to stop and write down every issue they come across as they arise then it can break up the experience. Testers (Users, Developers and Stakeholders) would often rather just get their hands on the system and try it out. Therefore, having a review based feedback system would be helpful. Pushing the review exercise out into a self-guided, online, remote system allows for large-scale, asynchronous collection of insight data (Albert & Tullis, 2013).
The problem discussed for users in UX testing, also exist for developers. There can be enormous time pressures to fix ongoing issues. It can be difficult to switch gears and note down relatively minor issues when the key of the core functionality. It can be hard to remember the problems and reflect on a solution when so much problem solving remains outstanding and work is in hand. For developers, having a remote, reflective, asynchronous review based feedback system would be helpful.

Innovation via the STEP approach.
The STEP project is developing an online platform (https://en.step.green/) with web / social media mining, gamification, machine translation, and visualisation features to engage young people in the decision making process on environmental issues. In order for this to happen the platform itself must be both easy to use and appealing to young people. The project is just about to enter the Pilot phase where it will be tested in 6 different European municipalities.
The novelty of the evaluation approach for the STEP project is the inclusion of an online log to enable users, stakeholders and designers to record information about user experiences. It is set up to allow for live observation and retrospective recall of specific incidents. In particular, it is set up to be as non-intrusive as possible. This guided-defect reporting mechanism that guides users through the process, helping them to present clear, unbiased reports with rich contextual detail, but in the language of the end-user should help to provide the developers and the project team with much better feedback to improve the platform.
We have developed an Antecedent, Barrier, Consequence and Development (ABCD) report form is intended to supplement other formal evaluations of user experience. Its purpose is to allow stakeholders a chance to record the experiences and issues that arise when using the STEP platform in a way that allows for their subsequent review, analysis and reporting. Crucial to the STEP project is the facility for the testing and validation to take place in real life contexts. By creating a retrospective log, users can immerse themselves in the actual system and complete a user journey from start to finish, but be enabled, prompted and empowered to record their experiences and insights.
By allowing users to record in an open prompt format the details of their experience, the system avoids some of the 'leading question' problems (Gabbert et al., 2010) associated with structured questionnaires and investigative interviews. Making the responses self-administered allows the participant to record the events at a time and place that suits them and allows them to use their autobiographical memory to respond.

Functional Behaviour Assessment
The 'ABC' method is derived originally from Clinical Psychology to allow for reflective and unobtrusive collection of incident data retrospectively, but in a structured way, to allow for insight development (Kamps, 2002;Pratt & Dubie, 2008). The overall approach to 'Functional Behaviour Assessment' (Toogood and Timlin, 1996) in Clinical Psychology is to identify the critical behaviour (B), then reflect on what the antecedent (A) to the behaviour was, and finally record the consequences (C) to the behaviour. The practicing clinical psychologist will record these (or ask an observer, or carer, to record these) and then review these later to spot emerging behavioural patterns and to identify solutions.

ABC(D): ANTECEDENT, 'BEHAVIOUR', CONSEQUENCE (& DEVELOPMENT).
In the current project we have modified the ABC system to create the ABCD (Antecedent, Barrier, Consequence, Development) chart. The chart is shown in Table 1, and can be filled in online or in a paper format.
The preferred sequence of completion is as follows:  What was the 'A'? The 'Antecedent' context. What was the user or system doing before the issue arose?
What was the 'B'? The problem 'Barrier'?
 What happened to stop the goal being achieved?
What was the 'C'? The 'Consequence' for the user?
 What was the impact on the user experience?
What is the 'D'? What Development or decision is needed?
 What new features or fixes are required?
The ABCD form is for administrators and designers and facilitators of the STEP Project to record information about user experiences using a structured paradigm to enhance the usability of the platform being developed. The forms can be used in any AGILE project management setting since they allow the insertion or inclusion of a suggested remedy. However, this final category is not an obligatory response. In ABC chart paradigms, the solution and causal pattern is expected to emerge only after a period of data collection.

Cued Retrospective Commentaries
If time and facilities permit, then the ABCD tester can record the whole testing episode with video screen capture, and then carry out an independent post-hoc review to examine the user experience, with the user explaining what they did in a think aloud protocol. By using video 'logs' in this way, it maximizes the potential for insight generation since it promotes reflective and recollective processes within a systematic and complete record of activity. By replaying a work episode, the memory component of the task is eliminated, giving time for reflective insight (an 'aha!' moment for example).
By pooling all of these data, there is a capability for meta-reflection, by synthesizing all of the ABCD material and reviews of commentaries in a feedback loop. This gives a pathway (not obligatory) for the developer to contribute to ABCDs retrospectively, after a period of work but while the experience is still fresh in the mind.

MISMATCH BETWEEN USER REPORTING AND DEVELOPER NEEDS
Reporting usability defects can be challenging, specifically getting the developers to understand the issue and to appreciate its importance is often a major challenge. Yusop et al. (2016) surveyed both open source and industrial software developers about their usability defect reporting practices and found that reporting the 'cause' of the problem is the most difficult information to provide whereas software developers consider this to be the most helpful information in enabling them to fix the issue.
Getting information to developers in an appropriate format is an issue, developers issue tracking is set up very much for developer use and not for endusers to report issues. In addition, Developers may have a tendency to view usability issues as 'nitpicking' and of less importance than other types of software issues (Yusop & Vasa, 2016). One of the authors has previous experience of working on usability issues in developing scientific software ) The experience of this project highlighted that to integrate usability and UCD into the software platform effective communication between end users, the usability team and the developers was vital. Each week testing was carried out with end users and issues were recorded (with details provided in a similar way to the ABCD report although no suggested 'fix' was listed). The reports were then shown to the Developers who then added their own column at the end for a suggested fix labelled 'ticket'. These tickets were then transferred by the Developers using their own language and terminology into the 'Trac®' Issue tracker system that was utilized by them for the software development process. .
This avoided some of the problems of the misunderstanding of reported issues by Yusop & Vasa (2016), reducing the gap between end user needs and the needs of the software developer to understand the issue. By following this approach, the aim is to both reduce the barriers for user issue reporting, and also to try and lead the user into making more useful and meaningful reports.

CONCLUSIONS
Taking heed of advice from Kujala (2003) that user tests must be of lower cost to implement, we present a method that can be flexibly employed or deployed in an iterative design project to gain insight, attract feedback and usability insights for operations in the field. Using it in an online format allows for optimal access to data over time, and also ensures that data from usability testing can be added in a meaningful way.
The ABCD chart can be flexibly incorporated into existing empirical usability techniques and more novel methods such as Cognitive Walkthroughs (e.g. Polsom, Lewis, Rieman, and Wharton, 1992); Heuristic Evalution (Nielsen, J., and Molich, 1990;Nielsen, 1992;Neilsen and Phillips, 1993). Cognitive walkthroughs and cued retrospective commentaries are planned as part of the STEP project evaluation and these, in combination with post-study interviews and surveys, should be effective in capturing usability and other issues with the platform. Previous studies have shown that combining appropriate usability methods is much more effective than a single method (Walji et al. 2016;Middleton et al. 2013). The User testing method is best for detecting specific performance problems as they can be reported as they are encountered. Survey and interview methods can help to verify reoccurring usability problems, but are of limited value used in isolation. Walji et al. (2016) concluded that no single method was successful at capturing all usability problems, therefore a combination of different techniques that complement one another is necessary to adequately evaluate the system.
The ABCD form provides a means for reflection to be incorporated into feedback by using retrospective recording of insights using an. Unlike experimenter led usability sessions, this method provides space after a user experience to provide feedback. Harnessing autobiographical and episodic memory creates an opportunity for users to 'join up' insights from multiple instances. This latter element, can be achieved by the testing team rather than any individual, since it is possible to The approach is consistent with recent reviews of usability testing that emphasise the benefits of qualitative as well as quantitative data obtained from transcripts of reviews (Ebling and John, 2000).