Self-Previewing Gestures and the Gesture-and-Effect Model : Experimentation with Responsive Visual Feedback for New and Unlearned Interactions

s on human factors in computing systems (pp. 815–828). ACM New York, NY, USA. Hudlicka, E. (1997). Summary of knowledge elicitation techniques for requirements analysis (Course material for human computer interaction). Worcester Polytechnic Institute. Kaptelinin, V., & Nardi, B. (2012). Affordances in HCI: Toward a mediated action perspective. In Proceedings of CHI '12 (pp. 967–976).


INTRODUCTION
Previous work (e.g.Norman, 2012;Vermeulen et al., 2013;Wigdor & Wixton, 2011) has observed issues with gestural interfaces in which users are unaware of and/or fail to discover hidden gestures and UI tools, resulting in both inabilities to perform gestures and errors in executing them.Norman and Nielsen (2010) commented: '…How is anyone to know, first, that this magical gesture exists, and second, in which settings it operates?' Norman (2012) reiterates: 'One of the powers of modern computers is discoverability, you can explore, but with gesture systems it's a pain.It's amazing how many things people don't know about the computers they use and there's no way to find out.'These issues challenge the notions of perceptible affordances, feedforward and feedback within the domain of Human-Computer Interaction (Baerentsen & Trettvik, 2002;Gaver, 1991;Kaptelinin & Nardi 2012;McGrenere, 2000;Norman, 1988Norman, , 1999;;St. Amant, 1999;Hartson, 2003;Turner, 2005), which scrutinise how UI objects (e.g.controls, menus, toolbars, modes, domain objects, etc.) should be represented and respond to user action or inaction.Researchers are re-thinking these theories (Djajadiningrat et al., 2004;Vermeulen et al., 2013;Wigdor & Wixton, 2011) in order to adapt graphical user interfaces (GUIs) to systems that afford novel input methods and, as a result, facilitate users' adoption of them.The resulting systems are generally called natural user interfaces (NUIs) (Norman, 2010;Sorensen, 2010;Wigdor, 2010).In many cases, NUIs lack traditional WIMP-GUI (windows, icons, menus, pointer) controls and embed post-WIMP (Beaudouin-Lafon, 2000;Dam, 1997) user interfaces adapted to touch and other forms of 'natural' input.
In addition, the relevant literature assessing the conceptual implications of novel forms of UI to gestural interactions is, at best, sparse.To bridge this gap, this paper reports a study exploring participants' assessments of automatic visual prompts that depict touch and gestures on the screen of a tablet.To support the study and enable a detailed analysis of the data, we developed a 'gesture-and-effect' model of touch-based interactions and a corresponding rating system.The rating system allowed us to assess users' success or failure to identify potential gestures and correctly predict their outcomes.
The common factor in most findings from extant research is lower error rates (around 20%) in execution (e.g.Paperlens, see Spindler et al., 2009; Ripples, see Wigdor et al., 2010;ShadowGuides, Freeman et al., 2009; SimpleFlow, see Bennett et al., 2011) for gestural interfaces that implement feedforward prompts in response to participants starting interactions, compared to higher rates (up to 50%) for basic interfaces in which participants were left to discover gestures with no visual aid.As observed in these studies, users are shown the continuation (see terminology used in Wu, 2005), which is the effect of gestural interactions, but not how to start them.This approach is extended here by examining the benefits of feedforward before touch occurs, following Vermeulen's (2013Vermeulen's ( , p. 1938) ) reframing of the feedforward technique.We hypothesised that moving this feature to earlier in the interaction sequence could improve the discovery of unfamiliar gestures and result in fewer errors in execution.We have termed this approach "Self-Previewing Gestures" (SPG).

DESIGNING THE INTERACTIONS
In this study, two alternative visual designs were deployed with SPG incorporated into the designs.These designs present contrasting approaches to communicating with users.The study employed an unfamiliar gesture vocabulary in order to challenge both expert (e.g.familiar with iOS gestures) and novice users to touch devices.The most important visual aid on which participants relied to identify available gestures was the SPG, rather than previous knowledge.The gestures did not match the expected effects observed in current tablets.The two designs are: 1. 'Iconic': The iconic version displays a pictorial representation of a hand touching the screen with the appropriate number of fingers to initiate registration of the gesture.It also used text labels to indicate the action required (i.e.'open') or the UI object to be triggered (i.e.'menu').2. 'Symbolic': The second design 'Symbolic' uses simple geometric forms to depict touch points over the screen.The touch points were designed in an uneven fashion to simulate the human touch over the screen.Noticeably, this design style does not use textual support.It is more abstract and provides a less direct visual metaphor.Both designs used arrows to demonstrate direction.Three interactions were selected as being unfamiliar in touch devices and, therefore, supposedly more challenging for users.For each of the three interactions, an image depicting the touch points required (the SPG), the movement and the type of action (such as a tap or pinch) was presented onscreen, together with an animation of the system's response to the gesture.The user was then asked to replicate the gesture.The system response was kept as a constant factor, meaning that the feedforward created was identical in both designs.The preview of the 'effect' of the gesture (or system response) comprised a 'ghost' (a term used in this study to indicate a translucent clone of an object) that was animated along with the gestural affordance to preview the correct position for activation.Insights for the interaction techniques were drawn from Wigdor et al.'s (2009) Ripples technique, Wigdor and Wixton's (2011) self-revealing gestures 'chrome' layer for MS Surface and Hofmmester's (2012) prototype work for Windows 8 touch.The three interactions are shown in Figure 1, Figure 2 and Figure 3.The gesture depictions are shown in a magnified view at the bottom of each picture, and the effect of each gesture is shown in a sequence of frames.The screen size is 2048x1536px at 264 ppi (Apple iPad models, 2013), and the interactions are displayed over a fictitious booklet application.

Open application:
In the prototype screen, a multi-touch gesture opens an application, which is traditionally opened by a single tap.The 'open application' interaction proceeds as follows: The gestural affordance appears over the button (Figure 1-a).The animation demonstrates the button moving down along with the gesture (b) before returning to its place.The user is expected to move the button in the appropriate fashion to open the application (c).

Pull hidden menu:
This gesture requires the participant to swipe horizontally from an 'invisible' activation area to reveal a hidden menu.Neither the optimum touch range nor the UI component is visible.The interaction proceeds as follows: The gestural affordance appears on the left hand side of the screen (shown in zoom in Figure 2-a).The animation demonstrates the menu moving sideways along with the gesture (b) and then  returning to its place.The user is expected to swipe horizontally from the left bezel towards the centre of the screen to reveal the menu (c).
3. Touch and hold a picture: Norman and Nielsen (2010), Vermeulen et al. (2013) and Norman (2014) note that this type of gesture is particularly difficult for inexperienced users of gestural interfaces.To avoid learning effects, we used the touch-and-hold gesture in a context that would be unfamiliar to most users.First, the gestural affordance appears over the top picture (shown in zoom in Figure 3-a).
In the animation, the top right corner demonstrates that the picture is selected (b) before fading out.The user can select as many pictures as desired and then drag and drop them over the booklet (c).This interaction requires the user to complete two steps.Therefore, it is a 'sequential'2 affordance (Gaver, 1991, p. 82), as the picture 'affords' selection as a response to a touch-and-hold interaction.This affordance takes the shape of a small bent corner on the picture's top right (Figure 3-b), which remains active following a successful interaction to indicate a change of mode to 'selected' and available for dragging.
The next section describes the gesture-and-effect model used to assess the SPG.

THE GESTURE-AND-EFFECT (GEM) MODEL
The new forms of visual interventions that depict touch over a UI require a new framework of analysis to investigate the effectiveness of the SPG technique.To provide the theoretical foundations for such a framework, the works of Norman (1998), Wu et al. (2005) and Golod et al. (2013) were referenced to create a gesture-and-effect model (GEM) for evaluating touch interfaces.Figure 4 shows the GEM.
The initial approach when creating the model was to separate the user's planning and action into smaller steps.Norman's (1988, pp. 45-53) Theory of Action provides a generalized view of a person interacting with 'the world'.A user must form an intention and plan an action sequence in order to execute actions to fulfil his or her goal (the stage of execution).Following execution, a new phase of evaluation occurs, in which the user reassesses the environment to check whether his or her goal has been achieved.Thus, evaluation and execution form a cycle.Using a 'self-previewing' interface, which shows actions in relation to context(s), it is possible to reconsider Norman's (1988) theory beginning with the evaluation stage.Wu et al. (2005) focus on the execution of gestures.The authors created the registration-continuationtermination (RCT) model to break down the execution of a gesture into 'micro' parts.In a similar fashion, Golod et al. (2013, p. 17), describe a 'gesture phrase' model, which segments the execution of a gesture into 'microinteractions'.Golod et al.'s (2013) 'gesture phrase ' and Wu et al.'s (2005) RCT model were adapted into the GEM execution phase.The term 'micro-phase' was also adopted and used to describe the phases within the model.

Restore Status (Undo a gesture)
NUIs, which differ fundamentally from the predominant desktop metaphor paradigm, present a challenging context for undo actions (see Dix, 1996;Shneiderman, 2010), and mainstream touch devices (e.g.smartphones and tablets) rarely provide 'undo' buttons or options within their 'edit' menus.
However, Norman's (1988) model does not separately address the issue of an undo option.We therefore needed to add a separate 'Restoration' phase to the model, especially since the work of performing an undo is so complex in the domain of gestural interfaces.The micro-phase 'undo' or 'restore' is represented in the model with the arrowed pathways labelled 'C' and 'D'.This phase implies a new evaluation of the system status and the need for a new gesture (execution phase)-different from the initial gesture-to undo an action or restore the system to its previous state.

Tap to Preview
As can be seen in the model (bottom portion), another feature was included to represent an upcoming interaction technique in gestural interactions, which we call 'tap-to-preview'.This interaction is depicted in the GEM as an 'alternative execution' (arrowed pathways 'E' and 'F') that triggers a new evaluation of the available gesture.Wigdor and Wixton's (2011, p. 153) 'just-in-time chrome' technique and Hofmeester's (2012) 'teaching gesture' for Windows 8 touch were highly influential in creating this variant within the GEM.
The next section examines previous work using a rating system relevant to the study goals.

RATING SYSTEM
One measure of the difficulty users experience with an interface is the number of attempts required to successfully execute a gesture.Freeman et al. (2009) and Bragdon et al. (2009) defined a priori rating criteria to judge success or failure in participants' attempts to perform gestures.
The study reported here sought to discriminate between participants' 'quick success' versus success obtained at great cost (e.g. after many attempts) in a prototype design with untaught gestures.Thus, Bragdon et al.'s (2009) criteria seemed directly relevant.Applying Bragdon et al.'s (2009) rating scheme at a more strategic level, this study counted the number of attempts to determine a nominal coding.For instance, up to six attempts could indicate success, but more than six could indicate significant effort to understand a visual prompt or execute an action.
A pilot study (n=10, 6F, 4M, age 22 to 54) was organised with acquaintances from other departments within City University (excluding colleagues from the Centre for HCI Design).It was used to validate the pre-set rating criteria.It did not intend to produce a formal coding or statistical analysis.A preliminary set of two designs was used to validate the number of attempts necessary for participants to succeed or fail in understanding and executing gestures.
No participants' verbalisations indicated a need for change the rating criteria.It was observed that most participants who fully understood the visual prompt produced an acceptable description and executed the gesture in the first three to six attempts.Participants who struggled to comprehend the prompt managed to describe and perform it in up to seven attempts, rarely more.However, some participants failed to describe or execute the gesture at all.The participants' struggles to execute the gestures showed that a 'partial' rating may provide finer grained information.It is important to note that the partial rate of assessment, although it stemmed from a successful description, was defined in such a way as to differentiate it from full success.
Therefore, in the final rating scheme, a 'success' (1) was measured by the clarity of the user's description of the meaning of what he or she saw before a successful execution.Both the precision of the description and the number of attempts taken to arrive at the final assessment were considered.An accurate description was considered to include the number of touch points required, the motion to be performed and, at the most rudimentary level, the ability to perform a gesture within the first six attempts.A 'failure' (2) was considered to be a complete inability to describe the visual cue or execute the gesture.A 'partial success' (3) was considered to be a correct assessment from the seventh attempt onwards up to a successful execution.
The participants' evaluations and executions were assessed separately, meaning that a participant might succeed at one phase but fail at another.For instance, a user might correctly identify a gesture as requiring two touch points but fail to identify the movement he or she should follow.This would result in a 'correct' assessment for the number of fingers, but a failed physical execution of the gesture (e.g.due to swiping in the wrong direction).

STUDY HYPOTHESES
Hypothesis (a) stated: Depicting the touch points in the user interface will improve gestural learning and reduce user error in executing gestures.Following the approach of laboratory-based research through design (RTD) for design (Cross, 1999;Frayling, 1993;Koskinen et al., 2011;OECD Frascati Manual, 2015;Zimmerman et al., 2007), two visual designs described in section 3 were deployed.
Second, this study sough to support or reject the second study hypothesis (b), which stated: Displaying automatic visual cues before interaction will facilitate the discovery of gestures and reduce errors in execution.Its aim was to explore the SPG combinations that yield fewer error rates in participants' executions of gestures.
Third, hypothesis (c) stated: A rating system that segments users' gestural interactions into smaller phases will help to reveal issues with users' evaluations and executions of gestures.This study used the GEM rating system to evaluate prototype designs in a within-subjects experimental design and explore the specific moments within assessment and physical executions that posed the most difficulties for participants.
The next section describes the methodology used to undertake the empirical study, including the criteria for the selection and recruitment of participants, the materials utilised and the study design.

METHODOLOGY
Motivated by the review of previous research in designing gestural interactions, the first design consideration was how to trigger the appearance of the cue (the visual prompt) that informed the user of an available gesture.For this study, five versions of the UI were prepared and evaluated.The iconic and symbolic designs were used as visual prompts to indicate the touch gestures.The three interactions described in section '3' were used as tasks for participants to undertake.In short, these five versions included: The effect is shown only to users touching and holding target objects.
Version 5 (V5).'Complete set': Version 5 was one of the representatives of SPG.It was the most complete set, combining animations that 'selfpreview' visual cues for gestures and their effects.
Different from V2, it also included the tap-to-preview feature, which was incorporated to determine whether this feature could further improve gesture discoverability by allowing participants to play back the gesture and effect in the event of missing the automatic cue.

Participants
The study recruited a total of 45 participants.
Participants were recruited via leaflets placed across City University.The recruitment process sought participants from diverse backgrounds to ensure that the designs were assessed across a broad pool of users.All participants were coded according to their participant number, age, gender (e.g.P1, 37, M) to ensure anonymity.
Study participants were between 19 and 64 years old.This sample selection sought to avoid ethical issues, since participants under 18 or over 65 years old require special protection consent, as mandated by the university's ethical regulations.In more detail, 27 participants were between 19 and 33, 12 were between 34 and 48 and 6 were between 49 to 64.With respect to gender, 18 were male and 27 were female.Regarding desktop computer use, 27 participants used Windows, 14 used MacOS and 4 did not report an OS preference.In terms of smartphones, 18 owned iPhones, 17 owned Android devices, 2 owned Blackberry devices, 7 had regular cell phones and 1 did not specify a phone platform.
For tablet devices, 25 participants were iPad users, 4 used other brands and 16 did not possess tablets.

Study Design
Development.Proceedings of British HCI 2017 -Digital Make-Believe, Sunderland, UK The study took place at the Interaction Lab (Centre for HCI Design), City University of London.The test was set up with an iPad running iOS 7 attached to a metal stand for testing mobile devices.A Microsoft camera with a built-in microphone was positioned to record the screen and comments.Only the participant and the facilitator (the researcher) were present in the room during the test.
The prototype application was developed by a third party on Linux and OS/X using Xcode, and it ran on iOS 7 (which was, at the time, the current version of iOS).The application was implemented in JavaScript.
A within-subjects experimental design was used.The first set of independent variables (IVs) consisted of the five versions of the application.The second set of IVs comprised the 10 micro-phases in the GEM, including 'tap-to-preview'.Each micro-phase was rated accordingly to the a priori criteria (Incorrect, Correct and Partial ratings).
The first micro-phase within 'evaluation' (Notice visual cue) was removed from the analysis due to a low error rate and a limited error scale.Specifically, of the 45 participants, 42 (94%) participants detected the visual prompt when the study session started.Three did not see the cue and had to be prompted, but were still considered able to continue the study.The low error rate for this phase might have been due to a learning effect.Removing the data reduced the risk of producing spuriously significant results from small sample sizes (i.e. of errors).

Randomization set
The designs (2), interactions (3) and application versions (5) were randomised using a Latin square set, yielding a total of 30 combinations.To verify the methodology and to check if the designs or versions required any further improvement, a pilot study (similar to the set used to validate the rating system) was organised (n=4, 2F, 2M, age 26 to 43).
The participants produced no relevant comments suggesting the need for design alterations.Furthermore, most participants managed to produce adequate descriptions of the SPG within the first 10 presentations, indicating that showing all possible combinations throughout the test would be unnecessary.Therefore it was decided to display only 20 combinations per sequence and to expose each participant to each of the five versions four times in a balanced fashion.
Finally, to avoid biasing results by showing the same sequence to all participants, we organised three different randomised sequences.It was considered that 15 participants per set would provide a sufficient sample.

Elicitation Method
An 'oral structured interview' (Geiwitz et al., 1990;Hudlicka, 1997) was used to interviewing the study participants.The oral structured interview method combines situational and behavioural question types and, unlike 'contextual inquiry', can be used in a controlled laboratory environment.Furthermore, unlike long, conventional post-task interviews (e.g.those used in 'contextual inquiries'), this method uses short questions to elicit micro-responses, rather than including a time-consuming addendum at the end of the test.

Videos
of the participant sessions were systematically reviewed in detail.Verbalisations were transcribed, and the performance of evaluations and executions for each micro-phase was rated and recorded in a spreadsheet for analysis.The application logs were consulted, to further assess participants' success at each step of evaluation and execution.

FINDINGS
The results of a generalized linear model (GLM) were assessed by employing different tests: a loglinear analysis, a chi-square test, a Mann-Whitney test and a Kruskal-Wallis test.Initially, a Shapiro-Wilk normal distribution test was conducted, and the results indicated that the H0 could be rejected (p<0.05), which demonstrated that the distribution of results was non-normal.The difference between the designs was then calculated using a Mann-Whitney test.
To determine any reliable significant differences and as a first step in the analysis, a global log-linear analysis was conducted for the evaluation phase (descriptive values in Table 6 and log results in Table 8 within the appendices).The analysis of the scores used all three dimensions: a) the three ratings of user performance, b) model micro-phases 2 to 6 and c) the five separate versions.The results of the global test across all factors were statistically significant: G 2 =514.4,df=64, p<0.001.
Following the same procedure used to assess the evaluation phase, a global log-linear analysis was conducted to verify the significance between the independent and dependent variables for the execution phase (descriptive values in Table 7 and log results in Table 9 within the appendices).The analysis used three dimensions: a) the three ratings of user performance, b) model micro-phases 'T' to 10 and c) the five separate versions.The results of the global test across all factors were statistically significant: G 2 =1364.26,df=64, p<0.001.

Addressing hypothesis 'a'
The first hypothesis states that depicting the touch points in the user interface will improve gesture learning and reduce user error in executing gestures.To either support or reject the null hypothesis, it was necessary to compare the application versions (the IVs) with touch points to those without.The dependent variable (DV) was the number of errors (signalled by correct, partial and incorrect ratings) per version per 10 micro-phases.This section examines whether there were marked differences in micro-phase results between versions.
To either support or reject the null hypothesis, it was necessary to compare Version 4, which did not visually depict touch points, to all versions that did depict touch points (V1, V2, V3 and V5).
This section examines whether there are marked differences in the micro-phase results between versions.As can be observed in Table 4 (appendices), Version 4 produced the second most errors in execution for micro-phases 8 (Touch configuration) (18%) and 9 (Perform direction) (4%).Micro-phase 10. (System status), is treated separately, since the result was simply true or false, with no option for 'Partially correct'.This micro-phase is critical because it is the final determination of whether a participant succeeded or failed in executing a given gesture.A Kruskal-Wallis test was used to analyse versions across this micro-phase, showing a significant difference (H=2.551,df = 895, p<0.05).In order to assess the differences in system status between the versions, each version was tested for the likelihood of correct and incorrect executions.In this final analysis, the general performance for all versions across both evaluation and execution was 81%.
As can be seen in Table 1 there was a statistically significant difference in performance across all versions (χ 2 =10.145, df = 4, p< 0.05), demonstrating that the results of all later analyses are statistically significant (p<0.05).Table 1 shows that Version 5 had the most correct executions across the board (87%), while Versions 1 and 4 had the least (76%).Now that significance was verified across the various micro-phases and application versions, it is safe to focus on supporting or rejecting the hypothesis by comparing the versions that show visual depictions of touch points (V1, V2, V3 and V5) with those that do not (V4).Figure 5 shows the descriptive (N) and expected values for correct and incorrect responses regarding system status for the compared groups.Version 4 had more incorrect executions (24%) than the group of versions that show visual touch (V1, V2, V3, V5; 18%).As reported above, the difference is statistically significant (p<0.05).In summary, the evidence supports this hypothesis by demonstrating significance across the model.The versions that visually depict the gesture (V5 and V2) produced the lowest error rates, outperforming the version that does not include a visual depiction (V4).
The qualitative evidence further demonstrates the benefits of providing visual depictions to users.By contrast, Version 4, which did not display gestural affordances, only the effect of the action, yielded high error rates for evaluation and execution.This was clearly observed in Interaction 1 and Interaction 2, in which six participants complained about the lack of visual cues for gesture or touch points.As an example: "Similar to the one before but with no dots…still unclear" (P40, M, 42); "I'd try to see if the sign comes back" (P46, M, 54); "Doesn't seem to have much point in that.Doesn't tell you anything" (P34, F, 35); and "This time I got the same symbol but without the fingers circles" (P46, M, 54).

Addressing hypothesis 'b'
This hypothesis stated that displaying automatic visual cues before interaction will facilitate the discovery of gestures and reduce errors in execution.In order to support or reject this hypothesis, it was necessary to compare the application versions (the IVs) that self-preview visual depictions of touch points (V1, V2 and V5) with the application version (V3) that uses 'tap-to-preview' (requiring user interaction to display the visual cues).Version 4 was removed from the analysis because it does not use visual depiction for touch.The DV is the number of errors (signalled by correct, partial and incorrect ratings) per version per 10 micro-phases.
A simple mean was drawn from the execution rates (see Table 4, appendices), and a larger success rate was found for Versions 1, 2 and 5 (82.23%) than for Version 3 (80.6%).Similar results were observed when analysing micro-phases within execution in detail.For instance, micro-phases 7 (Touch to confirm) (χ 2 =30.289, df = 8, p<0.001), 8 (Touch configuration) (χ 2 = 24.940,df = 8, p<0.05) and 9 (Perform direction) (χ 2 = 49.924,df = 8, p<0.001) yielded similar results.In fact, all versions that displayed automatic SPG yielded lower error rates for these micro-phases.Noticeably, the versions that self-preview visual touch slightly outperform the versions that require touch interaction.The self-preview group (V1, V2 and V5) had fewer incorrect executions (18%) than Version 3, which uses tap-topreview (19.4%).This hypothesis aimed to provide a detailed examination of the micro-phases.The analysis demonstrated significant results for three of them; however, the final analysis of system status revealed no statistical significance between the grouped versions (V1, V2 and V5) and Version 3, so the null hypothesis could not be rejected.Given the small difference in measurements (Figure 6), it appears that there was no effect.This is seen in the similarity in error rates achieved at the end of the interaction, since, in both cases, more than 80% of participants completed the gesture successfully.

Selected participants' comments
A few comments show participants' reactions to automatic events, followed by correct descriptions of the implied actions of a given visual prompt.
Comments include: "This came up before I touch it this time.The corner thing again.Maybe it says it is selected, no?" (P36, F, 24); and "Something showed and disappeared...two circles...maybe zoom perhaps?Ah, that was two dots, guess had to bring down" (P43, F, 35).However, five participants expressed surprise in the event of affordances being displayed automatically without any interaction from their side Comments include: "It did that because I tapped or would come anyway?"(P36, F, 24); "I didn't touch that" (P39, F, 56); "Will this always appear in the program?" (P43, F, 35); and "But I haven't done anything!Feels like it was doing something I didn't ask for" (P43, F, 35).

Addressing hypothesis 'c'
Hypothesis 'c' stated that a rating system that segments users' gestural interactions into smaller phases will help reveal issues with users' evaluations and executions of gestures.The null hypothesis states that a statistical analysis will show no significant differences between phases or between the evaluation and execution of gestures.
Statistical significance has already been observed across micro-phases within evaluation and execution (see 8.1 Addressing hypothesis 'a' and 8.2 Addressing hypothesis 'b').These results support the current hypothesis.However, this hypothesis also applies to each micro-phase individually, considering a total of 10 evaluation and execution phases.The null hypothesis cannot be rejected for micro-phases 6 (Effect on system-status; χ 2 =3.969, df = 8, 0.860) or T (Tap-to-preview; χ 2 =5.663, df = 2, p = 0.59), since these did not show significant differences.
Several additional findings that emerged from the analysis support the current hypothesis.The proportion of success and failure between the evaluation and execution phases had the added distinction of the 'partial success' rating (as explained in '5.Rating System').A partial success outcome indicates that a user was eventually successful, but only after a number of errors in initially assessing the visual cues.Ideally, a good design will have not only a low error rate, but also a low rate of partial success.To assess how the five different versions fared across the board in relation to partial success rates, 'observed' (obs.) and 'expected' (exp.)results were drawn for both evaluation and execution (Table 3 and Table 5 respectively, within the appendices).The 'observed' values are the real data and the 'expected' values are equally distributed across each version, thus reflecting the null hypothesis that differences are randomly discovered in different conditions.The bottom row shows the average results for all versions, demonstrating that the expected proportions are valid.Each micro-phase has different levels of expected partial success; however, we do not balance for each version, since the variation of each version from the average level is what we are testing.Figure 7 (within the appendices) shows the total 'Observed' values from each version compared across the evaluation and execution phases.In each case, the number of attempts to execute a gesture was fewer than the number of attempted evaluations.
For instance, in Version 1, a mean partial success rate was observed for 20 evaluations versus 7 executions (20/7).Thus, participants' attempts at execution were a third of their attempts to evaluate the visual prompts.The same rate was observed for Versions 2 and 5. Versions 3 and 4 showed a more marked difference, with the number of execution attempts representing only a fifth of the number of evaluation attempts (19/4).In terms of the total number of attempts, Versions 2 and 5 required fewer evaluations than the other versions.Across all versions, the number of participant executions was a quarter of the number of evaluations.

DISCUSSION
Due to the lack of guidelines on effective design practices for communicating gestures to users, the visual solutions for the self-previewing gestures were each created with a certain degree of risk.However, the design choices were not random; they were informed by research for design.Furthermore, none of the versions used in the empirical study was intended to be ideal.Without empirical evidence, any preference would be mere speculation.
There were good reasons to suspect that some approaches would be less optimal if based on the experiences and insights gained from previous study.For example, Version 1 (static), while representing much of standard common practice, is potentially less effective in communicating direction of movement.The study also reveals that animated feedforward, in the form of self-previewing gestures, is superior to static affordance.Furthermore, tap-topreview, on its own, proved to have limited effectiveness, although it is more effective than a basic static approach.Self-previewing affordances were found to be more effective, and it appears that the option of tapping to repeat a recent affordance may be beneficial.However, more evidence is required to determine the importance of this option to users.

CONCLUSIONS AND FUTURE WORK
Study hypothesis (a) was supported, demonstrating significance across the model.The versions showing a visual depiction for the gesture outperformed the version that did not.The qualitative evidence further demonstrates the benefits of providing visual depictions to users.Study hypothesis (c) was also supported, as the three-way rating criteria revealed statistically significant differences between microphases.The null hypothesis for hypothesis (b) could not be rejected; thus, this requires further research.The model-based rating system proved helpful in distinguishing different aspects of user performance during both the evaluation and execution stages of the interaction.The following discusses the limitations of each contribution and offers suggestions for future work: 1. GEM model: A deductive process defined the stages and micro-phases pertaining to the GEM.In order to refine the GEM, independent researchers should apply the method themselves.The data from these studies should then be compared to more closely examine the utility of the short questions corresponding to the model's micro-phases.This could identify any redundant micro-phases within the model or support the development of improved questions to elicit user responses.2. Rating system: Ideally, any evaluation system should be validated and proved reliable by other independent researchers.Further data on the threefold rating criteria would allow the rating scheme to be tested for inter-rater reliability and, in cases of vague definitions or ill-defined or debatable heuristics, to be

Figure 1 :
Figure 1: Interaction 1 'Open application'.The SPG appears over the button (a).The animation demonstrates the button moving down along with the gesture (b) before returning to its place.The user is expected to move the button in the appropriate fashion to open the application (c).Either Design 1 'Iconic' or Design '2' Symbolic are shown in a randomised sequence.

Figure 2 :
Figure 2: Interaction 2 'Pull hidden menu'.The SPG appears on the left hand side of the screen (a).The animation demonstrates the menu moving sideways along with the gesture (b) and then returning to its place.The user is expected to swipe horizontally from the left bezel towards the centre of the screen to reveal the menu (c).Either Design 1 'Iconic' or Design '2' Symbolic are shown in a randomised sequence.

Figure 4 :
Figure 4: The gesture-and-effect model (GEM) for touch interactions.The GEM is based on Norman's Theory of Action (1988) by considering the evaluation and execution stages, but adapting it to touch-based interactions.

Version 1 (
V1). 'Static gestures':The visual cue that depicts touch points on the screen only fades in and out for an appointed time.Thus, version 1 provided an industrial baseline, representing a common form of cue found in contemporary software on iOS, Android and other touch-screen operating systems.Mobile applications still rely on static (and inefficient) visual prompts that are either displayed only once upon the first run of an application or presented in step-by-step tutorials, which are often ignored by users.Version 2 (V2).'Animated gestures and effect':Version 2 was the SPG technique.This version showed the number of touch points and the effect of the gesture (e.g. a hidden menu) before the user touched the screen.Version 3 (V3).'Tap-to-previewanimated gesture':Version 3 provided a research baseline using an existing and proven method for guiding users during the execution of a gesture.It displayed the registration pose (the touch points over the screen) after a user began touching the screen.This version drew fromFreeman et al.'s (2009)  ShadowGuides technique and the work ofHofmeester (2012).The tap-to-preview feature was included to prevent participants from missing the automatic presentation of visual cues.Version 4 (V4).'Tap to reveal animated UI response': Version 4 provides an additional research baseline.It draws from Bau and Mackay's (2008) Octopocus and Bennett et al.'s (2011) SimpleFlow, which lack the visual prompts necessary to start a gesture and show only the effect of the gesture in the form of a 'gesture-completion path'.
8.1.1.Selected participants' comments© Chueke et al.Published by BCS Learning and Development.Proceedings of British HCI 2017 -Digital Make-Believe, Sunderland, UKClear evidence was observed that the visual cue supported the user's identification of an unfamiliar gesture.For instance, Interaction 1 required the participants to use an unfamiliar two-finger gesture to achieve what is otherwise a commonplace interaction (with one finger).Even faced with a challenge to both unlearn an existing association and learn a new association, the versions could help participants successfully make a difficult leap: "...but thanks to the interactive description otherwise I wouldn't really try to use 2 fingers" (P2, M, 30); "To activate it I have to do what the hand is doing" (P29, M, 48); and "It's actually easy looking the way you're doing it better than reading the instruction.If you want to show someone things it's better to show someone a picture or video…I know I have to put two fingers and move it down" (P32, F, 27).

Figure 6 :
Figure 6: Comparison of V1, V2 and V5 versus V3 in terms of system status.

Table 1 :
Expected and actual executions for versions (system status).