A Taxonomy of Number Entry Error

People are prone to errors in many aspects of life, including when entering numbers. The effects of these errors can be disastrous, for example when an incorrect number is entered when programming a medical infusion pump or when entering ﬁnancial information into a system. Designing better systems may help to prevent these errors however, in order to do this we need to understand far more about the types of errors being made, and their causes. Unfortunately, there are very few documented examples of number entry errors and thus many of the studies conducted so far rely upon modelled, not real world data. This paper reports a study that was designed to elicit number entry errors and the subsequent process of creating a taxonomy of errors from the information gathered. A total of 350 errors were gathered. A method for classifying the errors using 21 codes is proposed — this is a signiﬁcantly higher ﬁgure than previously suggested, showing that currently we underestimate the true number of such errors. These codes are then organised into a taxonomy similar to that of Zhang et al (2004). We show how this taxonomy can be used to guide future research into number entry errors by suggesting experimental conditions needed to provoke certain errors. The taxonomy may also be used during the initial stages of design to help the designer understand the categories of errors that users are most likely to make and thus design accordingly.


MOTIVATION
As humans we are all prone to error, even in tasks we complete every day.Normally the consequences of such errors are unimportant and can be easily rectified.However, in the medical domain for example, an error can cost lives.If a medic programs an infusion pump incorrectly the patient may suffer ill consequences or even die as a result of an over or under dose as noted in an analysis by the Institute of Safe Medication Practices Canada (2007).Reason (1990) defines two types of errors: slips (or lapses) and mistakes.Mistakes occur when a person has incorrect or absent knowledge of the task they are aiming to complete.Slips occur when a person has the knowledge needed to perform a task but for some reason takes the wrong actions in completing the task -this may be during the execution stage or during planning.Errors due to mistakes are remedied by providing training to people in order to complete their knowledge about the task and how to perform it.Slip errors however are more difficult to avoid -they occur even when we are very skilled at a task.
Before we can begin to prevent these slip errors occurring, we must first understand the errors being made.In this paper we present a corpus of real life errors, specifically number entry errors, gathered in experimental conditions and a method for classifying and organising the errors collected in order to facilitate further research into causes and solutions.

Current taxonomies of error
The types and causes of both mistake and slip errors can be further broken down using Norman's Action Cycle (1990).Identifying the points within the Cycle at which error can occur helps to conceptualise the errors in terms of their causes.An example of this practice can be seen in Zhang et al's (2004) taxonomy of medical error.Using Norman's Action Cycle, Zhang et al have been able to classify various medical errors into a taxonomy that allows errors to be grouped according to their position within Norman's Action Cycle.This taxonomy focuses closely on cognitive causes for error and suggests solutions on a similarly fine grained level.
Another example of a taxonomy created to classify medical error is the International Taxonomy for Errors in General Practice created by Makeham et al (2002).This taxonomy was created using a set of reported errors from various countries with similar levels of primary healthcare.These errors were then analysed by a set of investigators.The purpose of the taxonomy was to compare medical incidents across countries and different healthcare systems.The taxonomy provided a universal language for reporting medical error.
The need for a common language used to describe error is a common motivation for the creation of an error taxonomy.Researchers believe that a universal language will encourage self reporting from medical professionals and make the necessary comparison and collation of error statistics an easier task.This was the reason behind the creation of perhaps the most widely used error taxonomy: the one designed for use by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO).The Patient Safety Event Taxonomy was developed by Chang et al (2005) and based upon existing terminology in the medical domain, including colloquial terms.This taxonomy is in use today by the JCAHO when gathering reports about incidents which are then used to create a database of medical errors in order to provide both statistical information about the errors occurring in hospitals and to flag any growing concerns and issue alerts about common errors.

Studies of slip error
Slip errors that occur in the programming of devices such as syringe drivers and infusion pumps can occur as the result of an incorrect procedure being followed, or by incorrect data being entered into the system.There have been a number of studies into the effects and causes of procedural errors, such as Back et al's (2010) investigation into the effects of lock-out periods on post completion errors (PCEs) and Li et al's (2006) studies into the effects of an interruption on PCEs.Byrne and Bovair (1997) suggest that a high working memory load at the time the post completion step is to be completed may be the cause of these errors.
Much of the research in this area is based upon Altmann and Trafton's (2002) work on the activation based goal memory (AGM) model.The model suggests that goals are subject to decay over time and have certain activation levels which means interruptions in tasks can cause goals to be lost.The effectiveness of cues in the environment to reactivate the goals of a task has been investigated in both Chung and Byrne's (2004) Phaser task and Li et al's (2005) River Crossing task.However, it is not just errors in remembering how to complete the correct procedure that occur when using medical devices, yet little has been done to understand the other slip errors occurring, such as those at the data entry stage.Such data in the medical domain usually takes the form of a series of digits that are required to specify values such as rate of infusion.Data and key-logging information from such medical devices are not freely distributed and therefore we have no corpuses of number entry errors available.In fact many medical devices only log the final value entered into the device and not the stream of input beforehand.Until very recently, research into number entry slip errors have therefore only been able to use simulated or hypothesised data.For example, Thimbleby and Cairns (2010) recently modelled the errors that could occur when interacting with a particular type of infusion pump.However, this paper mentions only three types of error, out by r error, termination slip and leading zero omission, and provides a solution to one specifically, the out by ten error.They provide a solution that is shown to effectively reduce the occurrence of this error.Additionally, in a study of the effect of input system on the awareness of errors, Oladimeji, Thimbleby and Cox (personal communication) identified five categories of number entry slip errors ('skipped', 'transposition', 'wrong Whole Number', 'missing Decimal', and 'other').However, they make no suggestion that this list is complete, nor provide any suggestion of the underlying causes.
The aim of this paper therefore is to report a study of number entry error which aims to generate a corpus of errors.The errors gathered from the study were coded into 21 categories of number entry type.Inspired by Zhang et al's taxonomy of medical errors, we have used the error categories to create a taxonomy of number entry errors.The taxonomy may serve as the basis for a design tool to aid device designers in understanding the types of errors their users may make and the solutions to these errors.The taxonomy also allows future research to focus on specific number entry errors and suggests conditions that elicit certain number entry errors.In this way, researchers no longer have to estimate or simulate data but will be able to directly manipulate experimental conditions in order to gather the number entry errors they wish to study.

DATA COLLECTION
Slip errors do occur naturally however they are highly infrequent and thus hard to study when using natural conditions.For this reason, the tasks within the study were designed specifically to provoke error in the user.Two tasks were used in this study: a memory task and an audio task.The memory task provokes number entry errors by overloading participants' memories by asking them to memorise and recall multiple numbers.The audio task required participants to enter numbers whilst listening to a series of tones and reacting to a particular tone.There were two conditions in the audio task which will be explained in more detail within the procedure section.

Participants
In total 20 participants took part in this study, all but one were students at Masters or PhD level.The participants' ages ranged from 22 to 27 with a median age of 24.

Design
The study was conducted as a mixed design.The within groups independent variable had two levels: a memory task and an audio task.The between groups variable was within the audio task.There were two levels: 'After-enter' and 'Mid-type' (depending on when the interruption tones occurred).The order of the tasks was counterbalanced between all participants.
The dependent variables consisted of keystroke data including the time and type of each key press, and the number of errors made.Errors included corrected errors that occurred when a participant made an error but fixed it and uncorrected errors when an error was made but was unnoticed by the participant.

Materials
Both experiments took place on an Apple iPad device.

Memory Task
In this task, a series of numbers were shown to participants one at a time -participants were able to control the speed of this.The participants were required to memorise each number they saw.The participants then had to recall and enter the previous number they had memorised.When the participant began typing, the current number would disappear from screen.In this way the participants always had to store two numbers in memory.
Participants had to enter 30 numbers in total.15 of these were integers and 15 contained decimal points.The length of numbers ranged from 2 digits to 5. The composition of the numbers and their ordering within the task was randomised with each participant getting a different set of numbers to memorise and enter.

Audio Task
The Audio task required the participants to listen to the noises the device made and so headphones were provided for the participants to use.The volume was set at the same level each time.This task required participants to enter numbers whilst paying attention to an audio stimulus in the form of a regular two note sound occurring every 2 seconds.At times the sound would alter from the 'Normal tone' to an 'Emergency tone'.The two were not easily distinguishable from one another and so required careful monitoring.Once the emergency tone sounded, the device had to be reset by pressing the reset button which would then restart the normal tone.
There were however, two conditions in this task that determined when the participant could reset the device.One group of participants, the 'Midtype' group, were required to immediately reset the device regardless of how far through typing a number they were.The second group, the 'Afterenter' group, were only able to reset the tone once they had finished typing and had confirmed the current number they were on.All participants were made aware that it was imperative to reset the tone as soon as possible.The numbers the participants were entering were 8 digits long and were both integers and decimal numbers.
The purpose of the two conditions was initially to investigate if one condition would elicit significantly more errors than the other.This was not the case and so the two conditions merely act as different methods of eliciting error.

Procedure
The experiments were all conducted within the same room so as to prevent any environmental confounding variables.The participants sat at a desk across from the experimenter.
Prior to the beginning of each of the two tasks participants experienced a trial period.This trial period lasted until the participant felt they understood what the task involved.The order of the two tasks was counterbalanced between participants.

Results
Every error the participants made in the study was logged with the key logging system.This includes both uncorrected errors and corrected errors.

Memory task
The memory task elicited a total of 226 uncorrected errors.Each participant made a mean of 11.3 (sd=5.42)uncorrected errors.Every participant made at least 3 uncorrected errors.
In total there were 35 corrected errors in the task with each participant contributing on average 1.75 (sd=2.49).Only 13 of the 20 participants made a corrected error.

Audio task
As we are not concerned with differences between the two groups on the audio task, we collapse across the groups and report the number of errors made in total in the audio task.
A total of 26 uncorrected were gathered from this task.Each participant made an average of 1.3 (sd=1.13)uncorrected errors.Of the 20 participants, 6 made no uncorrected errors.
The participants made 64 corrected errors in total.On average each participant made 3.2 (sd=1.58)corrected errors.Whereas in the memory task uncorrected errors were more common, in the audio task corrected errors were most common.In this task every participant made at least one corrected error.
The rate of error can be calculated by looking at the number of possibilities for error.For uncorrected errors this is an easy figure to come by -it is simply the amount of numbers each participant was required to enter.For each task this was 30 numbers.Meaning the rate of uncorrected errors in the memory task was one uncorrected error every 2.65 number entered.This figure is substantially lower for the audio task with one uncorrected error happening only once every 23.08 numbers entered.
Calculating error rate for corrected errors is less well defined.The number of chances to make a corrected error is technically infinite as there was no limit to the number of times a participant could delete and retype a digit.In reality the participants did not repeatedly retype the same digit though.It was decided that the number of chances for making a corrected error should be the number of digits the participants were asked to type.In the memory task the participants typed 2470 characters (including decimal places) in total.The rate of corrected errors for this task then is one corrected error per 70.57digits typed.The audio task involved typing far more digits as the numbers were longer.In total the participants typed 4932 characters and thus the corrected error rate for the audio task was one error per 77.06 characters typed.

Discussion
The memory task was the most effective task for eliciting both uncorrected and corrected errors.Although the audio task did produce more corrected errors, the rate or error was not as high as in the memory task.In this sense the memory task was the most effective study for provoking number entry errors.However, the audio task still produced a substantial number of errors for coding.
It is perhaps not surprising that the audio task should elicit a lower number of uncorrected errors as, unlike in the memory task, participants were able to look at the target number on the device and confirm their input whereas during the memory task participants had to rely on the number stored in their memory.

Method
A coding system was developed to categorise and sort the errors gathered into a more meaningful and understandable set.The codes were developed iteratively throughout the process of gathering the errors.The first codes were generated at the pilot testing stage to test whether a coding system would be suitable for categorising the errors.Once this had been satisfied, the studies continued and more errors were gathered which were consequently coded.
The codes themselves were created at first using experimenter intuition by grouping similar errors together.At this point the groups were assigned names that described the errors contained.These names became the first codes used to categorise the errors.These codes were used for all errors until an error did not fit a code, in this situation a new code was added to the set.
For many codes it was clear exactly when an error fell into that category, for example the code "Decimal added" is clearly used when a decimal point has been added unnecessarily to the number.However codes such as "Incorrect Pattern Use" appeared more subjective and required a form of grammar to describe them and make them more objective.This also proved a test for some codes -if a grammar could not describe the code then the grouping was purely down to the experimenter's arbitrary selection and thus not useable by others.
The codes could not be applied to the errors as they were.A hierarchy was necessary to differentiate which code should be used when more than one were applicable to an error.For example if a participant typed 785 instead of 784 the code "One digit wrong" could apply.However, this is a special case of one digit wrong where in fact, the digit that was wrong was only out by one.Thus, the "Out by one" code when it can be, should be applied before "One digit wrong".

Results
A list of all the codes applied to the resulting errors, their frequencies and examples from the studies can be seen in Table 1.
The results of the study show how vast the range of types of number entry errors could be.In fact, the errors discussed by Thimbleby and Cairns (starred in Table 1) contribute to only 5.72% of all errors collected in this current study.The errors noted by Oladimeji, Thimbleby and Cox make up 27.14% of errors generated in this study.It can be seen then, that research to date has only focused on a small proportion of the number entry errors that exist.

Discussion
The most frequent error was 'No Clear Reason'.This case covered all instances of error that could not be explained easily.For example a participant typed the number 78 instead of 55, another typed 256 for 930.This category encompassed all errors that did not fall into other categories so is not surprising that it is also the largest category, although it still only accounts for less than 17% of all errors.
The next most frequent errors appear to be those caused by lapses -that is actions with malformed intentions.Examples such as the 'Anagram' error or adding and removing digits from the target number.These errors make up around 30% of the errors collected.The slightly less frequent errors however, for example '0 instead of decimal' and 'Out by an order of magnitude' appear to be due to slips of the finger whilst typing.In fact this division between error type is seen in the taxonomy developed to categorise the collected errors and is discussed further in section 4.3.

ERROR FRAMEWORK
The codes generated were useful to categorise number entry error however, the volume of errors called for further categorisation to add more meaning to the collected errors.We therefore employed Norman's Action Cycle (1990) as a framework for categorising the codes according to their underlying causes, as has been done in other domains by Zhang et al (2004).

Method
In order to group the codes, the cause of each error needs to be identified.The study reported here aimed only to extract the errors themselves and not their underlying causes.For this reason hypotheses needed to made about likely causes for the error -this was based upon observations during the study and assumptions based upon experience.The two codes 'Redundant' and 'No clear reason' were omitted from the categorisation as these errors were least clear and thus were hardest to determine possible causes for.
The generated codes were organised into any of the categories and sub categories of the taxonomy that they could occur in.As there may be multiple causes of some specific error types, some error codes were placed in multiple rows of the table.For example the errors where participants mixed up the decimal and nought key appear as both Action Specification Slips and Action Execution Slips.An argument is made in the table to justify its placement as a specification slip but equally these errors could just be due to inaccurate typing.

Results
The framework as seen in tables 2 and 3 is split into two tables to represent each side of Norman's Action Cycle: one for errors whilst taking action and the other for errors whilst evaluating action.There is a certain amount of mirroring between the placement of the codes within the tables in that the codes within the Goal Slips row also occur in the Action Evaluation Slips row.An error caused by a goal slip has the potential to be noticed and then corrected unless there is also a slip in the action evaluation, in which case the error goes unnoticed and becomes and uncorrected error.This is in fact the case for all types of error, in that every slip made during the execution stages can be corrected if there is not a further slip during the evaluation stages.

Discussion
The more frequent error types are those categorised as Goal slips and the less frequent are Action Execution slips.In fact the top five most commonly occurring errors (after the 'No clear reason' error and excluding 'Skipped') can be found in the Goal Slips section of the taxonomy.This division was identified after finalising the taxonomy and proves to be an interesting result that will require further investigation into the differing frequencies of number entry errors caused by poor goal formation and those caused by poor goal execution.At present the results seem to imply for the tasks used in this study that the most common number entry errors occur before the user has begun the number entry process and are in fact caused by faulty processes in the mind.

GENERAL DISCUSSION AND CONCLUSIONS
Many studies involving the creation of a framework or taxonomy state the benefits of producing a 'universal' language for comparison of error events.However, the number of taxonomies being created threatens to undermine the main advantage of the taxonomy by generating too many varying terms.In this exercise it has been shown how the coding of number entry error can be adapted to work in conjunction with an existing framework, rather than against it.
From the framework of errors there are two clear future research paths that need to be taken.Firstly the study of the cause of the number entry errors discovered and secondly the design implications and solutions to these errors.

Investigating cause
The categorisation work completed for this study has produced a theoretical framework for grouping number entry errors.With this framework established it is now possible to begin systematically studying number entry errors in order to understand their causes and thus re-evaluating and evolving the current positioning of errors within the framework.
Organising the error codes using Norman's action cycle in to a framework similar to Zhang et al's has identified associated cognitive mechanisms that are likely to cause each of the errors listed.We investigate human error because the underlying cognitive system that causes the error is often the same system that causes correct behaviour.By manipulating these listed mechanisms, future research will be able to investigate certain types of error by manipulating the study conditions to affect particular cognitive mechanisms and thus elicit particular types of entry error.So, research in this domain could tell us about the cognitive processes involved in the kinds of tasks where people have to copy over a piece of information (number) from one place to another (such as nurses taking a number from a chart and inputing it in an infusion pump).

Designing solutions
The ultimate aim of investigating error is to be able to prevent or at least reduce the number of errors occurring.And so the next line of investigation using these number entry error codes is to begin designing solutions to prevent them.This step can only occur however, after the causes of errors have been investigated and confirmed.
A small selection of broad solutions are already provided by Zhang et al to mitigate against the various types of slip errors.However, these solutions are arguably far too abstract to be applicable to the current issue of number entry errors.Indeed the list has not been designed with number entry in mind meaning the solutions are at a fairly high level and are not comprehensive enough to be directly applied.However they do provide a starting point and basis for investigation.
This process can be applied to two of the current coded errors, two with already determined causes: the errors '0 instead of decimal' and 'Decimal instead of 0'.These errors are caused by the placement of the decimal point and 0 keys being the opposite of a standard calculator or computer keyboard number pad lay out.Although one might assume that the layout of such number entry keypads is standardised, this is not the case.The 0 and decimal point keys do not occur in their standard positions on the Baxter Flo-Guard 6201 infusion pump and the Alaris IVAC Signature Edition Gold infusion pump for example.One solution to an error such as these suggested by Zhang is "Train users".Although this solution would probably help to make users aware of the switch from the 'normal' layout of keys it is not ideal.In addition, it has already been demonstrated that training does not reduce slip errors effectively.
Looking at the problem from another angle we could say that the users had already been trained, by long periods of use, to use the devices with the common arrangement of decimal and 0 key and therefore the underlying cause of this slip error is negative transfer from this more common design layout.This training could be utilised and the device design altered to match this training.

Conclusion
Prior to this study, it was unclear what number entry errors occurred and previous research in the area of number entry error has relied only on speculation about types and frequencies of the different types of errors.This work shows that there is a wider range of possible number entry errors than has been considered previously.A taxonomy of these errors has been produced that can be used as a basis for further directed research -helping future researchers to focus their studies on specific errors and their underlying causes.We will focus our future research effort on understanding the causes on those error types that are likely to have the biggest real world impact in safety critical domains with the aim of identifying ways in which we might mitigate them.
Once solutions to a range of error types have been suggested, the taxonomy also lends itself to being made into a design tool providing guidance to designers in many areas including the medical domain about the types of errors that might be most prevalent in the situation in which their system is going to be used, and making design recommendations regarding how best to design out such possibilities.

Table 1 :
Frequency of all errors made during pilot studies and main experiments.Starred entries are those noted by Thimbleby and Cairns

Table 2 :
Error codes generated from the study placed into a framework similar to that suggested by Zhang et al.Bold codes are those explored in more detail in the Example column.(Execution Slips)

Table 3 :
Error codes generated from the study placed into a framework similar to that suggested by Zhang et al.Bold codes are those explored in more detail in the Example column.(Evaluation Slips)