Predictive Modelling for HCI Problems in Novice Program Editors

We extend previous cognitive modelling work to four new programming systems, with results contributing to the development of a new novice programming editor. Results of a previous paper, which quantified differences in certain visual languages, and feedback we had regarding interest in the work, suggested that there may be more systems to which the technique could be applied. This short paper reports on a second series of models, discusses their strengths and weaknesses, and draws comparisons to the first. This matters because we believe "bottlenecks" in interaction design to be an issue in some beginner languages -- painfully slow interactions may not always be noticeable at first, but start to become intrusive as the programs grow larger. Conversely, text-based languages are generally less viscous, but often use difficult symbols and terminology, and can be highly error-prone. Based on the models presented here, we propose some simple design choices that appear to make a useful and substantive difference to the editing problems discussed.


INTRODUCTION
Programming education is highly topical, and there are several actively-developed novice programming tools that have been widely used, and cited in the literature.These range from child-user "block" building systems (like Scratch) to Greenfoota Java game-based development tool used in schools -to "pure" visual programming systems, based on flow-chart-style diagrams (such as Lego Mindstorms).There are other systems that sit between the above, such as Alice, StarLogo TNG, and numerous other variations on the "block" metaphor.
There are also "mainstream" programming languages that are judged to be the simplest of their kind, used to teach beginners (such as Python, Java, or variants of Basic).All of these systems look, superficially, very differentthey range from toy-like graphics, to monospace text, to complex flow diagrams and lines.However, there are interactions that are common to several of the differently-styled editor types, and there are also systems that look similar, but behave very differently in terms of interaction design.In this paper, we extend previous cognitive modelling work to four new programming systems (McKay 2012).The initial goal of that study was to compare several "benchmark" systems to a new editor in development, as part of the design process.Results of the previous paper, which highlighted differences in some visually-similar visual languages, and feedback we received, suggested that there may be other systems that could be approached in this way.This short paper reports on a second series of models, discusses their strengths and weaknesses, and compares them to each other, and to the systems in the first set.We acknowledge, for the record, that viscosity, through task time, is only one of the issues in novice programming systems.A system with low viscosity would not necessarily meet the other (educational) requirements for beginner systems, but observations suggest that excessively viscous interactions may still be problematic for some types of novice user.

Psychology of programming
One important idea from the psychology of programming is viscosity.When working with notationsin this case, computer programsviscosity is defined as resistance to change (Green 1989).For example, once a program has been (partly) written, viscosity might be encountered when editing an existing statement, rearranging the entered program, or inserting new code somewhere in that which already exists.

Predictive Modelling for HCI Problems in Novice Program Editors McKay Kölling
Green & Blackwell (1998) define six "cognitive activities" -"incrementation" (adding new code), transcription (copying a design into code, or copying code from somewhere else), modification, exploratory design, searching, and exploratory understanding.The primary activities dealt with in our previous paper were incrementation and modification.Together, these cover adding and modifying statements, moving them once in place, and removing them.In the Time Scale of Human Action terms used by Newell and Card (1985), these tasks take place in the "task" and "unit task" scales of the rational and cognitive bands.

Previous paper
In a previous Where some groups are larger than others, it is because there are multiple variants of the task, applying to different types of statements.The tasks were all simulated using the same cognitive architecture/agentthe software used to conduct the simulations is explicitly designed to facilitate comparing two or more designs like-for-like.As well as task times, we were able to observe the number of steps involved in completing a task.From the differences in simulated times between different systems, we noticed trends in types of task generally required more steps and/or time in similar systems, and which required less.The prior paper covered only Scratch, Alice, Greenfoot, and a new prototype design (which was being described in that paper).

Novice program editors
Programming languages appear frequently; as with any other language domain, there are a great number of languages that have been considered educational.Kelleher & Pausch (2005), for example, categorised 87 educational programming systems, in a paper nearly a decade old.Indeed, two of the triad of "major" systems often discussed in the computing education world -Greenfoot and Scratch (the other being Alice)were not around at that time.It would not have been possible, here, to investigate every possible variation.However, we have chosen a small number of systems that exemplify certain editing and notational styles, and that are used in a "real" educational context (that is, that they are not purely research languages).The original selection of Scratch, Alice and Greenfoot was based on their respective similarities to the new editor we were developing at the time.It was hypothesised that the main differences would occur between Greenfoot (representing text, in general) and Scratch and Alice (as a pair, since they are superficially similar in structure).There were, in fact, several areas in which they behaved very differently (for example, when deleting or moving an existing statement, but not adding new ones).
The rationale for selecting the four additional example systems is discussed in the methodology section, but it is appropriate to describe the distinguishing features of each of the systems here, for readers who are unfamiliar with them.

Alice
Alice (Cooper, Dann & Pausch 2003) programs are composed of drag-and-drop blocks that represent program statements.Because drag-and-drop allows for validation, syntax errors can be prevented (it is not possible to drop an invalid statement at a given point).Adjustable parameters for a statement can be added or changed through context menus.The structure of the statement remains intact, and cannot be broken up.The block has to be entirely removed, and replaced, if the programmer wants to modify the type of block.

Scratch
At first, Scratch (Maloney et al. 2004) appears visually similar to Alice, though it uses a much stronger colour scheme.Programs are composed of blocks, which must be dragged to the composition area using a mouse.One difference between Alice and Scratch, noted in the prior paper, is that Scratch blocks "stick" to the blocks above them when they are dragged.This means that additional steps are needed to move a single block, since it must be detached from its neighbours first (so as not to bring them with it).As shown later in this paper, this is a critical point in discussing Scratch's overall results.

Greenfoot (inc. Java)
Greenfoot (Henriksen, Kölling 2004) is a Javabased system that emphasises object-oriented programming (it is closely related to BlueJ) through games.Though Greenfoot's Java text editor uses font colour and background to some effect in code presentation, its interactions are essentially the same as other text editors' (Greenfoot's focus is on the games-based approach, rather than the specific program code used).

Cognitive models
Keystroke-level models can be used to measure the "overt", or mechanical, movements that a user makes (Card, Moran & Newell 1980).Cognitive models additionally measure hidden "mental" operators, like eye movement, and reading-and thinking-time.These models, however, are complex, and difficult to construct accurately by hand.Non-experts, in particular, can introduce errors into the calculations (John 2010).CogTool (John et al. 2004) is a prototyping tool that automates the creation of cognitive models for specific tasks.The evaluator leads CogTool through screenshots or storyboards step-by-step, demonstrating the end-user's workflow for the chosen task (such as clicking a button or menu item).CogTool uses a computer model of human cognition to generate a model of the task, and makes predictions for overall task time, for subtasks.CogTool automates error-prone parts of the modelling process, improving the accuracy of the prediction considerably, compared to manuallycreated models (John 2010).

MODELS
The CogTool models here are based on 46 exemplar tasks, each of which was modelled in all eight systems.The tasks were chosen to cover each of the cognitive activities found in the literature.They are based on use-casesall of the places where a statement can be entered, or moved from one context to another, and so on.They are not necessarily "equal" in terms of how frequently they occur in real tasks.We hope to have additional data in future that would show which use-cases are the most frequent in realworld novice programs.
In the previous paper Scratch, Alice and Greenfoot were modelled against a prototype editor.Those systems were selected because of their (apparent) similarities to the prototype.To augment those findings, we modelled StarLogo TNG (a block language that is similar to Scratch) and Mindstorms NXT (a diagrammatic visual language, also used in education).While Greenfoot was the only textbased system tested in the previous work, NetBeans's Java editor has now been included.This separates any effects that are unique to (the current version of) Greenfoot from those that are related to Java syntax (or perhaps, to text in general).The final system used here is Python.
Python is used in some teaching contextsthough to a lesser extent than Javaand is included to provide an additional point of comparison for text languages.Python's syntax is considerably different from Java's, and it uses fewer special symbols (such as semicolons or braces).

RESULTS
Mean task times produced by the model, grouped  1, and illustrated in the figure.As shown, no system is universally "best" across all task types, and scales of the differences between systems (for a given task type) vary.Mean times for the "insert" group, for example, range from 1.644s to 15.950s, with some systems clustered around 3-6 seconds.The "replace" group has a similar distribution.There is less variation in the "modify" group.An analysis of variance (ANOVA) finds that the differences between systems are significant in all groups except "modify".Variance in the "modify" group is not significant overall.

Presentation vs. interaction
Although Scratch, Alice and StarLogo TNG could be referred to as "block" languages, and in some ways look very similar, they are very different in terms of interactions.
An example of the relationship between language/notation and editing system is best seen by comparing Greenfoot and NetBeans.Both are Java systems, but, as seen above, have produced differing results in some groups.
In the previous paper, we had not expected the relatively long task times for selecting and moving code in Greenfoot.The CogTool graphs subsequently showed that much of the time and (virtual) effort came from the user having to select exactly the right delimiting characters in a Java program construct (semicolons, { } braces, etc.).These are relatively small mouse targets, compared to the systems where a user manipulates whole blocksan example of Fitts' (1954) law.Though the notation differs visually from the block-based systems, Mindstorms NXT has a similar overall task-time profile as those.
Most of the differences occur in tasks which involve manipulating the very small "wires" that connect NXT's circuit-like symbols.In these cases, the fine manipulation required appears to increase task timesimilar to the bracket/non-bracket manipulation effect from the text systems here.
Modelling another Java editor -NetBeansprovides an additional set of results.In tasks that require the user to select around a statement, NetBeans is still more viscous than some of the alternatives.However, it is less viscous than Greenfoot in the "move" task group.A detailed look at the two shows that while the two environments approach selection in a similar way, and give similar results, the extra time cost in Greenfoot occurs in the "move" part of the task.In some text editors (including NetBeans) it is possible to cut and paste a highlighted portion of code with drag and drop, though it requires small/fine movement at the end location.Because Greenfoot does not do this, the user must "manually" cut and pastemost quickly done through a right-click context menu.Modelling Pythonanother text language, one that does not feature C/Java-style bracesdemonstrates the difference.Visual Studio's Python editor supports the highlight-and-drag feature mentioned above -making the "move" task type behaviour similar to NetBeans.However, it was more efficient at selecting the code in the first place, because Python lacks the small delimiting characters.Another factor in Java (or languages with a similar syntax) is that pairs of braces must often be "closed up" when part of the code is moved awaysurplus braces might need to be removed, and/or extra ones added.The cognitive dimensions refer to these situations as "knock-on" viscosity, where a single change has an indirect effect on other parts of the program (Green, Blackwell 1998).

Viscosity and "sticky" blocks
One of the biggest causes of viscosity in Scratch, and StarLogo, is the way blocks "stick" to their neighbours when moved.When a block is dragged from the middle of a program, any blocks that are below it are dragged along too.Therefore, when manipulating a single statement, a novice programmer has to detach any neighbouring blocks, carry out the main task, and then reattach the "unwanted" blocks in their original positions (closing the gap that was left).This was noticed by users when we previously conducted a qualitative pilot study using Scratch (McKay, Kölling 2012).It adds several (tedious, or unhelpful, in their opinions) steps which are not found when we compare Scratch or StarLogo to a system like Alice.The pattern is not defined by the visual structure of the blocks (which is often useful), but by the "sticking" effect when interacting with more than one unit of code.The net effect of this "sticking" varies from program to program.However, the models have been used here to compare a number of tasks in their "withfollower" or "without-follower" variants.This is, for example, the difference between deleting two identical statementsone at the end of a scope/stack/method, without any others trailing underneath; the other mid-program, with neighbours both above and below.This was applicable in fourteen of the modelled tasks, and the effect is summarised in Figure 2. Task time appears unaffected by end-of-stack positioning in Alice (the total difference is less than 0.04% in only one task).In both of the affected systems, the task time increased for all tasks.There is no direct way to enter the text without a block.Alice's design approach is different stillwhile allowing drag-and-drop style blocks, it also uses selection from hierarchical menus.Thus, clicking on the empty space for a parameter opens a (large) context menu, from which the user can choose any possible values.This makes menus very large in programs with many variables, or when writing complex multinested expressions.
StarLogo's block-only approach is conceptually consistent, but it increases task times for those tasks that require extra blocks for each literal.Once the block is in place, changing its value is less problematic.In Figure 3, the mean task times for changing a literal, once its block has been added, are approximately the same for all three systems (since this involves entering text from the keyboard, as normal).

Incidental to the interface
There are some results that indicate a problem with the general UI of the editor, rather than a problem with the actual program notation.Some of the differences between Scratch and StarLogo TNG are like this; StarLogo, for example, uses a series of panels, on the left of the screen, which must be cycled through in a fixed order.There are panels for blocks that apply to the individual object, apply to that class of object, and that represent control statements.
A further example is Greenfoot's cut and paste options.In most text editors (whether for programming or other domains) selected text can be dragged to another point in the file, effectively interpreted as a combined cut and paste operation.
In the version of Greenfoot that was tested, this interaction is not present.Although ostensibly trivial, this increases the task time for Greenfoot tasks that rely on moving statements, or rearranging the order of different program parts.

Limitations of the predictive model
The modelling approach used here does not take into account the time a user might spend designing a program, or attempting to understand existing code.Programming is a more cognitively complex activity than most Internet browsing, for example and involves additional processes.We proceed on the basis that though thisthe program design element of the taskmeans that, in practice, program-writing will take longer than the sum of the individual cognitive/ "mechanical" tasks, there is still value in comparing those aspects of tasks like-forlike in different systems.
It is obvious that the choice of tasks affects mean task times.If we were studying the overall effect that these HCI problems have on writing a whole program, it would be important to weight the task groups appropriatelytaking account of the proportion of their time the "real" programmer is likely to spend on different activities.Jadud (2006) observed such a user, and provides programmer workflows of the sort that could be used to determine the importance of different tasks.

CONCLUSIONS
This paper extends previous cognitive modelling work to compare different block-based and traditional text programming languages.The discussion has concentrated on the viscosity incurred when statements cling to each other during editing, and the ways in which parameters, particularly literals like numbers or text, are handled in the (semi-) visual notations.Comparisons of those systems that, superficially, look alike, have proved particularly useful.The differences between literals in Scratch and StarLogo TNG, for example, cause an observable effect in task completion time for the same sets of tasks.Mindstorms NXT, again, looks very different from the block-based languages, but its treatment of literals and Scratch's are closer to each other than Scratch's is to Alice or StarLogo (and this overall approach appears more usable).We are developing an editor that has some resemblance to block-based editors, and our findings suggest that a Scratch-style textbox approach would be preferable to the StarLogo horizontal blocks design.Compared to Alice, Scratch and StarLogo are very viscous when editing existing statements.In this case, we believe that the Scratch/StarLogo "sticky" block design would be detrimental.Overall, while some of these factors might be noticed through system use, we believe that this work is a first step towards quantifying them in a systematic way.Predictive models, based on their accuracy elsewhere, give us reason to believe that the trends, at least, would be similar in real user testing, and this is a step which we now intend to pursue.
vs. block-based literalsBlock languages differ from each other in their treatments of text and numeric literals.Though the program statements in these languages are shown as individual blocks, there is some difference between those that use "plug-in" value blocks in those statements, and those that have characterbased text entry slots as parameters instead.Scratch allows literals to be entered as text, from the keyboard.In StarLogo literals are added as separate blocks (Figure4), and this makes StarLogo more viscous to use.Scratch allows plain text and drag-and-drop value blocks to be used together in the same expressions.In StarLogo, as noted, literals are created as special blocks, which are snapped together horizontally (whereas statements are arranged vertically) to create expressions.

Figure 3 :Figure 4 :
Figure 3: Adding blocks to hold literals Figure 2: Task times whether or not mid-program Predictive Modelling for HCI Problems in Novice Program Editors McKay Kölling from the keyboard.In StarLogo literals need to be added as separate blocks.From the results in this paper, we can see that that makes StarLogo more viscous to use, and this is discussed later.
StarLogo TNG blocks are visually similar to Scratch's.However, there are some differences in the effects alignment and layout have on a block's meaning.More importantly, for this work, Scratch allows text literals to be entered (into a textbox)

Table 1 :
Old and new predictions for task types (times in seconds) * = least viscous/most efficient Figure 1: Mean system times for each task type