Qualifying counterfactuals: Students' use of counterfactuals for evaluating historical explanations

The study investigates upper secondary school students' use of counterfactual reasoning when engaging in a task concerning historical explanation. The study analyses student answers to a prompt asking them to evaluate the causal importance of a historical actor for a historical event,
 aiming to characterize the counterfactuals used, as well as applying possible criteria for what can be considered a qualified counterfactual. The criteria for qualification of counterfactuals are based on theoretical proposals about the potential of counterfactuals in relation to historical
 explanation. The findings indicate that a majority of the students involved use counterfactuals in their reasoning about explanatory importance, most of them employing counterfactual reasoning in relation to the historical actor. The analysis of qualification indicates that student reasoning
 becomes more qualified when students instead focus on structural factors, include both structures and actors in their counterfactual reasoning, or support their reasoning by making comparisons.


Introduction
Historical explanation can be considered a core aspect of teaching and learning history (Van Drie and Van Boxtel, 2008;Seixas and Morton, 2013). Some aspects of what is considered to be important about historical explanation are that historical causal relationships should be presented as complex and multifaceted, and that students need to learn how to evaluate the importance of different causal factors (Lee and Shemilt, 2009). At the same time, one of the pitfalls of learning historical explanation is the risk that students come to view history as predetermined, implying that what actually happened was the only possible thing that could have happened (Barton, 2008;compare Lee and Shemilt, 2009). A possible method to counteract this tendency, suggested by several researchers, is the use of counterfactual reasoning in relation to historical explanation (Chapman, 2003;Lee and Shemilt, 2009;Woodcock, 2011;Seixas and Morton, 2013). This study investigates how students employ counterfactual reasoning in relation to explanation of a historical event.
The use of counterfactual reasoning in history is not uncontroversial. Several historians have frowned upon counterfactual reasoning, since the basis of a counterfactual argument -asking 'what if' -opens up questions that are not empirically verifiable, risking that the argument strays into pure speculation (Carr, 1961;Fischer, 1971;Evans, 2016). However, others have instead emphasized the merits of counterfactual reasoning for engaging with historical explanation (Gaddis, 2002;Megill, 2007;Lebow, 2010;Sunstein, 2016). Historiographical research focused on counterfactuals indicates that their use, intentionally or not, is relatively frequent among historians, serving to support and defend their interpretations (Kozuchowski, 2015;compare Rosenfeld, 2016).
Within history education research, the approach towards counterfactuals for engaging in historical explanations has been cautiously optimistic. For example, Woodcock (2011: 128) suggests that 'they can lead to fresh insight into how and why a particular event or process was caused and into how important particular causes were', before warning that they must be used 'with caution' (Seixas and Morton, 2013). Empirical researchers have used counterfactuals in lesson design, finding that they help in evaluating the importance of causal factors (Chapman, 2003;Woodcock, 2005;Buxton, 2010;Lilliestam, 2013;Roberts, 2011;Carroll;. The purported advantages of using counterfactuals in history education, besides the argument by Woodcock (2011) mentioned above, are: that they help students to analyse agent-structure relations (Lilliestam, 2013), understanding the implications of the historical context for decision-making (Chapman, 2003); that they engage students (Buxton, 2010;Huijgen and Holthuis, 2014); and that they can help to avoid a predetermined view of history by indicating that alternative outcomes were possible (Lee and Shemilt, 2009;Nolan, 2013). However, research so far has focused on how teachers can use counterfactuals to stimulate engagement with historical explanation; relatively little is known about how students use counterfactual reasoning. In particular, there is a lack of concepts or criteria for determining whether counterfactual arguments made by students can be understood as more or less qualified in relation to historical explanation.
This study aims to contribute to knowledge about counterfactual reasoning in history education by investigating how history students use counterfactuals when reasoning about historical explanation. The informants are Swedish history students in upper secondary school (age 16-17 years) that to a large extent used counterfactuals in the context of a written assignment designed to assess their handling of historical explanation, asking the students to evaluate an individual's (Adolf Hitler's) importance for explaining a historical event (the Nazi seizure of power in Germany). The purpose of the study is to analyse students' uses of counterfactuals as more or less qualified for advancing their understanding of historical explanation. Linked to this is the subsidiary aim of testing theoretical criteria for more or less qualified historical counterfactuals. I aim to answer two research questions: (1) How can students' counterfactuals be characterized?
(2) In what ways can students' historical counterfactuals be considered more or less qualified in relation to evaluating historical explanations?
The first question is descriptive in intent, aiming to provide a categorization of students' counterfactuals based on what factors they change in their counterfactual reasoning, and how that affects their explanatory reasoning. The second question tackles the issue of qualification, using theoretically derived ideas about what can be considered qualifying features of counterfactuals.

Theoretical considerations
The analysis of historical counterfactuals in this study is inspired by Woodward's (2003) counterfactual theory of causal explanation. According to Woodward (2003: 191), causal explanations can be understood as 'exhibiting patterns of counterfactual dependence', meaning that they can, at least hypothetically, answer the question 'what if things had been different? ' Woodward's (2003) theory is part of a tradition of understanding causation from a counterfactual point of view that was pioneered by Lewis (1973) and Mackie (1974), and that is relatively well established in qualitative research (Mahoney and Goertz, 2006). A strength of the counterfactual theory is that it does not rely on laws as being fundamental to causal claims, thus presenting an alternative to the deductivenomological model of causation that has traditionally been seen as the default model for understanding causality (Woodward, 2003;Seppälä, 2012;Hewitson, 2014). Furthermore, proponents of some variations of the counterfactual theory of causation have argued that counterfactuals are an integral part of historical explanations (Megill, 2007;Nolan, 2013;Seppälä, 2012;Sunstein, 2016).

Analytical categories
One of the most fundamental aspects of counterfactuals is to consider what is manipulated. Lilliestam (2013) points out that counterfactual manipulation can be done either on structural factors or on historical actors. A manipulation of a structural factor entails a change in a condition influencing the studied event, for instance removing the war indemnities that Germany had to pay as a result of the Treaty of Versailles. A manipulation of a historical actor, on the other hand, changes something about this actor. In the context of this study, the obvious manipulation is removing Hitler. Lilliestam points out that there are other possibilities, such as shifting the actor to another time or place, or reasoning about what different actions were possible for an actor (Lilliestam, 2013;compare Nolan, 2013).
Furthermore, it is worth considering what effect, if any, the counterfactual manipulation has on the argument being made. Here, the research interest is mainly in how counterfactuals can be used for evaluating the importance of causes, although it should be noted that counterfactuals may also have a value in clarifying what alternatives were available to historical actors, thus also being a possible scaffold for historical empathy (Lilliestam, 2013;Huijgen and Holthuis, 2014). Logically, the counterfactual manipulation of an antecedent may lead to two different results: either the manipulation leads to the conclusion that the consequent would not have happened had this hypothetical change occurred, or it leads to the conclusion that the consequent would have happened anyway. In the first case, the counterfactual manipulation becomes an argument for the necessity of the antecedent, while in the second case, it becomes an argument against the necessity of the antecedent (see Lilliestam, 2013;Rosenfield, 2016). This aspect makes it possible to investigate whether students tend to reinforce actual history when using counterfactuals, or whether they see possible alternative outcomes.
These two questions, about what is manipulated and whether the manipulation allows for the possibility of an alternative outcome, form the basic categories of the study. They are not inherently linked to any form of qualification, since there is no a priori reason for a manipulation of actors to be more qualified than a manipulation of structures. However, they may be correlated with other aspects that say something about the quality of the counterfactual. Theoretical proponents of counterfactuals point to three criteria that can possibly be used: plausibility, context sensitivity, and support by comparison and/or generalization.

Plausibility
It is important to recognize that not all counterfactuals are alike. Nolan (2013: 318) gives the following examples (among others): (1) If Napoleon had not invaded Russia in 1812, Paris would not have fallen in 1814.
(2) If an equal number of Japanese ninjas and Caribbean pirates had fought a battle, the pirates would have won.
Nolan's point is that Statement 2 appears as an obviously irrelevant flight of fancy, while Statement 1 appears as at least potentially plausible. Why? Megill (2007) makes a distinction between 'restrained' and 'exuberant' counterfactuals, arguing that restrained counterfactuals are the ones with value for historical reasoning. In a restrained counterfactual, the historian looks for a cause that, if changed, could have led to a different outcome than what actually happened (Megill, 2007;compare Sunstein, 2016). Going beyond that, into elaborating upon what would then have happened in the counterfactual timeline, quickly leads to the 'exuberant' form of counterfactual history. Maar (2016: 362) agrees with this idea of restrained counterfactuals, specifying that a valid counterfactual needs to encompass 'alterations to one single antecedent, while all other conditions remain fixed'. This principle of counterfactual restraint clarifies why Statement 1 above is more plausible than Statement 2. When a single antecedent is manipulated counterfactually, it becomes possible to use this manipulation as an argument about the importance of the manipulated antecedent, making the counterfactual at least principally plausible (Seppälä, 2012;Rosenfield, 2016). Such manipulation corresponds with Woodward's (2003) notion of hypothetical interventions in order to clarify causation.

Context sensitivity
Returning to Statement 1 above, it would be strengthened if we could provide some information in support of the conclusion that the consequent (Paris would not have fallen in 1814) would have been likely if the antecedent (Napoleon did not invade Russia in 1812) had been true. One such aspect is context sensitivity, meaning that the counterfactual needs to agree with, or at least not contradict, the known historical context (Maar, 2016;compare Evans, 2016). What would be needed for a counterfactual to be considered context sensitive? Presumably, it would have to be reasonably specific. Consider the following two hypothetical counterfactuals: (3) If Hitler had died, someone else might have led the Nazis to seize power.
(4) If Hitler had died, someone else might have led the Nazis to seize power. For instance, Göring would probably have been able to take Hitler's place.
While these statements make the same general point, Statement 3 is vague as to how the alternative outcome would actually come about. Statement 4, on the other hand, specifies an alternative, making the entire line of reasoning possible to discuss in relation to what is known about the situation, and the likelihood of Göring actually being able to replace Hitler. Alternatively, an answer might specify certain actors or traits necessary for the substitute to be able to replace the actor (or, conversely, be used to argue that the actor was irreplaceable). A specific statement, as opposed to a vague or generic one, is thus more qualified, since it becomes in principle possible to verify or refute it in relation to the historical context (see Barton, 2008;Samuelsson and Wendell, 2017;Wendell, 2018). Context sensitivity, understood here as specificity, needs to be applicable to both structure and actor when applied to counterfactuals.
In the above cases, the manipulated factor is an actor, and it is easy to see how one might be able to specify alternatives. When it comes to structural factors, specificity would rather be in the form of previous events or existing conditions that are specified as decisive ('if the First World War had not occurred, then ...').

Support by comparison and/or generalization
Specificity may be augmented by other means. Seppälä (2012) discusses what she calls 'evidence' in support of counterfactual statements. Here, I instead use the term 'support', in order not to confuse the term with historical evidence as defined within historical thinking (see Seixas and Morton, 2013). Seppälä suggests two types of support: comparisons with similar cases, and the use of generalizations. Of these, a comparative approach has been put forward by Lebow (2010), and used in teaching by Buxton (2010) and Carroll (2018). Certainly, the validity of a comparison depends upon the actual similarity of the comparative case, which becomes a point of discussion in and of itself. As for generalizations, this can be defined as introducing 'general theoretical knowledge of how societies, markets and humans behave into the case one is studying in order to make causal inferences' (Seppälä, 2012: 58). This does not seem to have been empirically tried in the same way as the comparative approach, but the notion of using generalizations in order to reason about human behaviour does have theoretical support (Weber, 1949;compare McCullagh, 1998). Such generalizations are not laws in the strict sense, but can be considered lawlike or 'normic' statements (Scriven, 1959;compare Woodward, 2003). In order to be valuable for historical reasoning about causal factors, counterfactual statements thus need to be restrained rather than exuberant. This can be seen as the most basic criterion for qualifying counterfactuals. As long as counterfactuals pass this test, they can be further qualified by being specific, rather than generic. Some possible ways of further qualifying counterfactuals are by using comparisons and/or generalizations to support them.

Data collection and analysis
The data collection was carried out by using a test question inspired by a test from the Swedish National Agency for Education. The function of the question was as a prompt for students to reason about different interpretations of a historical event: the Nazi seizure of power in Germany in 1933.
Teachers willing to let their students participate were found through social media, specifically two Facebook groups for teachers in social studies and history. Seven teachers from different upper secondary schools responded positively. One of these later withdrew, leaving six teachers that used the prompt in their classes. In all, 174 students were asked to participate, and a total of 139 produced a text. These texts make up the source material for the study. All answers were anonymized in the process, meaning that the researcher does not know the names of the students involved. Table 1 shows the percentage of pass grades in history for the schools of the teachers in this study, taken from the official statistics of the Swedish Department of Education. As the table indicates, the pass grade percentage is slightly lower than the national average for Schools 1, 4 and 6, and slightly higher for Schools 2, 3 and 5. None of the schools differ by more than 3.5 percentage points from the national average. All the classes involved studied the basic course in history for the upper secondary school. The schools involved are located in three different geographical areas of Sweden (western, eastern and northern). One of the schools is private, the others are municipal. This method of data collection can be seen as a form of snowball sampling (Bryman, 2012). It is not random since the participating teachers, while working at different schools, were all participating in online communities for teachers, indicating a professional interest not necessarily shared by all of their peers. They selected the participating students from their available student groups. A known weakness of this type of sampling is that participants direct the sampling by selecting other participants that they believe are 'interesting' for the study. This means that the sample of participating students is probably not representative and thus cannot be used for statistical generalizations. However, for the purposes of this study, the samples can be used for theoretical generalization, specifically in relation to the theoretical criteria for qualified counterfactuals (Yin, 2013). The variation of involved schools regarding type and location, as well as previous research both in Sweden and the UK regarding the use of counterfactual reasoning in history education, serve to decrease the risk that the data collected are results of particular conditions of the involved schools (Buxton, 2010;Carroll, 2018;Worth, 2012). The prompt used to elicit student answers is reproduced below: How important is a single person?
Below, you see a person and an event. This task is to reason about how important the person was for the event.
Person Event Adolf Hitler The Nazis seized power in Germany in 1933.
1. Below, you see two different interpretations of the person's importance for the event: (A) The person was of great importance for the event.
(B) The person was of little importance for the event.
Find support for both interpretation A and B by citing historical examples.
2. Compare both interpretations, and discuss their strengths and weaknesses when it comes to explaining the historical event. Use your historical examples when you reason. Use the concepts 'agent' and 'structure' when you discuss.
The prompt was designed in order to capture how students handle different interpretations of historical events, in accordance with the knowledge requirements of the Swedish history syllabus for upper secondary history (History 1b; Skolverket, 2011).
As such, the prompt did not give any explicit instruction to reason counterfactually, instead directing students towards evaluating the two different interpretations of Hitler's importance for the explanandum. Teachers were instructed to give the students 40 minutes to complete the assignment, making the situation test-like. The teachers were given the opportunity to use the prompt as part of their own assessment practices (thus increasing the test-like circumstances), although it is unknown to what extent they did this. The risk that teachers assisted students in producing desirable answers was lowered by the analysis in practice being turned towards counterfactual reasoning, which was not provided as a scaffold and was not the intended object of the prompt. Once collected, the student texts were analysed in three steps. First, a content analysis was made in order to categorize how the students reasoned about the prompt. At this stage, the preponderance of counterfactual statements was discovered, and the following steps were designed to focus on this phenomenon. Second, the texts were categorized according to the manipulation of structure or actor, as well as the presence or absence of alternative outcomes. Third, the categories that were found in the second step were further analysed using the theoretically derived criteria for qualification. It is important to note that the categorization outlined previously is based on characteristics of counterfactuals as described by theoretical proponents of counterfactuals in history. In this study, the material is not produced by professional historians, but rather by students of history in upper secondary school. It is to be expected that students will not be able to command the same knowledge of the historical context as professional historians. Likewise, the fact that the student responses have been produced in a situation similar to a test may limit the potential of students to make elaborate comparisons.
The student responses vary greatly in length: the shortest is about 150 words, and the longest is almost 900 words. The quality of the texts also varies, as will be shown in the presentation of results. Within such a wide variation, there is also a wide variety of counterfactual arguments. While some students use no counterfactuals at all, other students use several counterfactuals at different places in their texts, and the counterfactuals within a single text are not necessarily equally relevant or supported. In such cases, the text has been categorized based on the most qualified counterfactual in the text. This means that a text placed in the category of specific and supported counterfactuals includes at least one such counterfactual, but it may also include others that are less qualified. The categorization is thus based on the best example of every text, indicating the potential for qualified counterfactual reasoning evidenced by the text, even if it is not consistent throughout. Quotations from the texts have been selected to illustrate the different categories.

Results
Out of the 139 student texts in the study, 114 contain at least one counterfactual statement. The use of counterfactuals is not concentrated in one or a few of the schools; a majority of students in all the classes use some form of counterfactual, even though the prompt does not explicitly call for it. In the following sections, the different categories found are described and exemplified.

Absence of counterfactuals
Of the 139 texts, 25 do not contain counterfactuals. Three of these demonstrate advanced explanatory reasoning without any explicit use of counterfactuals. However, the remainder of these texts struggle with adequately responding to the prompt, either because they just reproduce certain facts, or because they attempt to evaluate the different interpretations through consequences, as in this example: I think interpretation A is the strongest, because he came to play such an important role for a large part of the world. Germany eventually had to pay enormous indemnities, which would take many years. He conquered countries, made many countries oppose him, joined alliances and more and more of his own people opposed him. He was and became a very recognized man. (6)(7)(8)(9) In this case, the text refers to the policies of the 1930s and of the Second World War, as well as to Hitler's legacy, as an argument for Hitler's causal importance for 1933, indicating confusion between causes and consequences. This tendency to include later historical events, especially the Second World War and the Holocaust, occurs in several texts, and may indicate problems with the chosen explanandum: the Nazi regime is so inherently associated with these events that several students struggle with separating them when engaging with the assignment (see Barton, 2008;Wendell, 2018). However, cases such as this also indicate that the students producing them are struggling with understanding the concepts of cause and consequence. Table 2 shows how the 114 texts that do contain counterfactuals are categorized by which factor they manipulate. As the table indicates, a majority of these focus on manipulating an actor, which is hardly surprising given the nature of the prompt. Those focusing on structure, or that include manipulation of both structure and actor, are roughly equal in number. As Table 2 indicates, structure-focused counterfactuals, as well as texts including manipulation of both types of factors, almost unanimously express the possibility of an alternative outcome. The actor-focused counterfactuals, in contrast, are almost split down the middle between those that do express such possibilities and those that do not. Some texts in each category have been considered 'unclear' due to vague phrasing.

Counterfactuals lacking alternative outcomes
A number of actor-focused texts do not express any possible alternative outcomes as a result of the counterfactual manipulation. Text 6-16 is an example: I think interpretation A is the best, because Hitler could take power since the people needed a person with a strong drive, and his incredible ability to persuade drove many Germans to believe that he was worthy of their faith ... Of course, someone else could have taken his role before Hitler formed history in the way he did. It's not possible to rule out that other men and women had ideas about how they wanted Germany to look that were similar to Hitler's, and would have been ready to seize power. I think the Nazis would seize power anyway, even without Hitler. Of course, it wouldn't have looked exactly the same in history, but it wasn't just Hitler who was involved and influenced people, he had help. (6-16) The counterfactual manipulation here is the removal of Hitler as a central actor. In that absence, the student reasons that the outcome would have been essentially the same, with the Nazis seizing power. The student hypothesizes that there were other potential leaders with the same ideas. In this case, the counterfactual argument diminishes the importance of the individual actor, since the argument is that other candidates were possible and would have yielded the same outcome. The argument does not reinforce structural factors, but rather widens the scope of actors to include (non-specified) actors around Hitler as available substitutes. The counterfactual manipulation thus leads to a reinforcement of the factual development in 1930s Germany.

Counterfactuals expressing alternative outcomes
Actor-focused manipulation can also express a possible alternative outcome, as exemplified by text 1-2: Despite the structural aspects, Hitler as an actor was a decisive factor that led to the Nazis seizing power in Germany ... On the other hand, the economic and political structure was important in why Hitler could seize power. The parliamentary situation and the effects of the inflation crisis and the stock market crash made people desperate and looking for a solution as fast as possible, and when they saw an engaged leader like Hitler telling about the plans for Germany they became inspired. If it had been another person and not Hitler who took the power, the country would have changed, but not in the way Hitler did. If it hadn't been for Hitler, the Nazis wouldn't have seized power ... (1-2) While attributing importance to structural factors, this text also focuses the counterfactual manipulation on Hitler, hypothetically substituting him with another, non-specified, person. However, in this case, Hitler is considered so important that such a change would have led to an alternative outcome: the Nazis would then have been unable to seize power. This type of counterfactual argument thus reinforces the importance of the individual actor. Text 4-34 focuses on manipulation of a structural factor, rather than an actor, and exemplifies how alternative outcomes can be expressed in such texts. First, the text mentions various actions by Hitler that contributed to his success. Then the reasoning turns: What rather shows that Hitler was of little importance for him seizing power is, among other things, WW I. Without WW I there would have been no discontent among people, and no one would have had any need for finding someone to hate and blame ... So, Hitler wouldn't have been Qualifying counterfactuals 59 History Education Research Journal 17 (1) 2020 able to gain power unless WW I and its consequences had occurred, since his policies were built on hate. (4-34) The counterfactual manipulation in this case is about the First World War, which is hypothetically removed from the background. As a result, the student argues that many factors that contributed to Hitler's success would not have been present. The argument indicates awareness of the causal chains connecting the First World War and the Nazi seizure of power, and attributes most importance to them. As a consequence, the actions of Hitler and other potential actors appear less important, since they are contingent upon the consequences of the First World War.

Plausibility and specificity
A salient feature of the material is that none of the texts that employ counterfactuals do so in an 'exuberant' way. The student texts do not include the kinds of highly speculative alternative outcomes that Evans (2016) and others worry about. However, the texts can still be categorized according to specificity. Counterfactual specificity appears to be most easily accomplished when the counterfactual in question is structural in nature. As Table 3 shows, vague counterfactuals are most common when counterfactuals are actor-focused, while counterfactuals including structural factors are almost always specific in principle. This text is a very clear example of counterfactual specificity, not only specifying Göring as a possible alternative to Hitler, but also citing at least one necessary ability ('rhetorical talent'). In comparison, Text 6-16 does not specify alternative persons, but it does state what attributes would be necessary for such a person, namely the same ideas as Hitler and the willingness to seize power, which is why the text has also been categorized as specific. Text 1-2 has been categorized as vague, since no specifics are provided about what would have been necessary actions or attributes that were either unique to Hitler or necessary for a potential substitute.
When it comes to texts that include counterfactuals regarding structural factors, the specificity is generally provided by the argument focusing on a certain factor or set of factors. Text 4-34 is a clear example of this, placing the emphasis on the First World War as the source of all the structural troubles that helped bring about the Nazi seizure of power. The few structure-focused texts that have been categorized as vague give a generic picture of what is referred to: Unless circumstances had been the way they were during that time, things might have gone another way. (1-15)

Support by comparison and/or generalization
Regarding support by comparison, generalization, or both, Table 4 shows that these methods are not very common; only a total of 32 texts include some of these. Support by generalization only is uncommon, occurring in four texts. One example is Text 5-7, which includes both structure-and actor-focused counterfactuals: The economic crisis (hyperinflation) led to misery in the country, with unemployment etc. When a country is economically unstable, discontent grows among the inhabitants, which reinforces political extremism ... If there hadn't been such a structure, Hitler's party wouldn't have had as great an impact, and if Hitler as an actor hadn't been able to exploit the structure, the Nazis might not have seized power. (5-7) The generalization is expressed in the second sentence of the quotation, serving to connect economic factors to political factors in building up to the counterfactual. The argument takes the form of a more or less universal law stating that economic instability (a structural factor) leads to discontent, which in turn strengthens political extremism. Germany in 1933 appears as an individual case that exemplifies this 'law', and by using it, the student supports the argument that Hitler was not as important as the economic situation. Statements of this kind resemble the kinds of generalizations common in social sciences. Their relative sparseness in the material may indicate that the students have learnt to be careful with generalizations in history. While support by generalization is sparse in the material, support by comparison occurs more frequently, although the method is still uncommon: 19 texts include one or more comparisons to support their counterfactual reasoning. These include several different points of comparison, with both time period and place varying. In the texts of one class (from School 1), early 1930s Germany is compared with the situation in Germany about ten years earlier, in the early 1920s. Since the occurrences are concentrated in that class, this probably reflects teaching practices. Other relatively frequent comparisons are made with Sweden (both 1930s and present day), Russia/ USSR and the USA. The comparisons made by the students in this context are by necessity limited in nature, due to the test-like circumstances. Thus, the comparisons are overall neither elaborated nor detailed, making it easy to point out their deficiencies. However, the aim of analysing these comparisons is to better understand how the students use them to support their own reasoning.
The following example compares Germany in the 1930s and the 1920s: Without the bad structure, Hitler as an actor wouldn't have mattered much, since there was no reason to change the structure. Hitler had tried to seize power previously, around 1923, but was put in prison. At that time, the structure was much better, there was no economic misery and the people had no reason to turn to the extremes. This can show that Hitler was not that important for the seizure of power in 1933, since he failed to seize power in 1923 but succeeded 10 years later without changing himself significantly. (1-13) This example highlights both the strengths and the weaknesses of this comparative approach. The comparison should satisfy most demands for similarity between cases, with the point of comparison being geographically identical, and the difference in time being relatively short, making it a reasonable way of developing an evaluation of Hitler's importance. However, the argument made is flawed, since it is not true that there was 'no economic misery' in Germany in 1923. Presumably, the student has misapplied the concept of 'the roaring twenties' to the entire 1920s, lacking enough knowledge of the historical context. This case thus reinforces the importance of context sensitivity, not just for counterfactuals, but also for comparisons. Of course, there are students who do not make this mistake, in which case the comparison works better. Other comparisons are close in time to 1933, but instead compare the German situation with that of other countries, most prominently the Soviet Union. These comparisons are often brief, only pointing out some important similarities and differences needed to support the argument made. In this example, France is the point of comparison: If the conditions had been different (like for instance in France, where Léon Blum (prime minister) prohibited all fascist organisations and created good conditions for democracy), Hitler wouldn't have seized power in Germany. (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) While this argument can be further qualified, it shows how a comparison can be used to strengthen the counterfactual statement.

Using both generalization and comparison
Nine of the texts include both generalizations and comparisons in the counterfactual reasoning. Text 4-22, for example, includes both: [Those who emphasize structures] argue that the new policies attracted people, no matter who was the leader ... However, I think actors are somewhat more important than the structures. The financial crisis at the end of the 2000s [sic] hit many parts of the world hard, including Sweden. Despite this, the government was re-elected. This can have several causes. In crises governments are usually strengthened, as opposed to Germany, which probably was due to the rhetoric of Adolf Hitler. The same happened in Sweden where Fredrik Reinfeldt is a skilled speaker. But during the refugee crisis of 2015, this trend has not held, and the same pattern as in 30s Germany repeats as the people lose confidence in the government, presumably because of an opponent's rhetoric. (4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22) This is one of the most elaborated comparisons in the material, the student comparing two instances of structural crises in contemporary Sweden with the German case. The student formulates a generalization ('In crises governments are usually strengthened'), then notices deviations from this generalization, and proceeds to explain deviations by the actions of individuals. This all serves as an argument for the counterfactual that if Hitler had not been the leader, the Nazis would not have seized power, supporting the interpretation that Hitler was indeed important. A similar line of reasoning is expressed by Text 5-22: To find support for Hitler not being important for the event, we can look at the French Revolution 1789. Part of the background of the French Revolution was that France's economy had been damaged by war, and people in the lower classes were affected by, for instance, high bread prices. This was also the case in Germany during the Interwar era ... In both cases, the economy was weak as a consequence of war. A difference is that the regime change in France happened through revolution, while Hitler was elected democratically. It does show that regime changes as a consequence of a poor economy due to war can happen without a person as central as Hitler was for the NSDAP [National Socialist German Workers Party] ... There are some similarities with the election of Donald Trump in 2016. USA didn't have the same problems that Germany did, but Trump built his rhetoric on MAGA ['make America great again'] ... Mussolini's seizure of power in Italy can be compared to Hitler's, since both built on nationalism ... (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22) As noted before, the reasoning of Text 5-22 later leads to the argument that Hitler was indeed important. Here, the student builds a generalization -'regime changes as a consequence of poor economy due to war' -makes a comparison with the French Revolution, and highlights both differences and similarities to differentiate the original case (Germany) from the generalization.
Text 5-3 exemplifies a simple combination of the two types of support: With all this misery, extremism grew in both directions. The same thing happened during the Russian Revolution, a similar thing happened recently in the USA with Trump. When people are discontent they want something new, not necessarily modern. They can certainly appeal to old glory, as long as they represent change ... If Germany had been wealthy and people content, they wouldn't have bothered with the stab-in-theback myth and sought change to the degree they did. (5-3) In this example, the two comparisons -with the Russian Revolution and the Trump presidency -are not elaborated at all, serving as basic supports for the idea that the German case was not unique, but an example of a more general pattern -'when people are discontent they want something new'. This reasoning forms the basis for the concluding counterfactual, emphasizing the importance of structural factors over actors.
It may be argued that several of the comparisons exemplified here are not really comparisons at all, especially the briefer ones that are little more than references or allusions to some other event or person. This is true, but it misses the point that the presence of these references, undeveloped though they are, indicates that some of the students -28 in total -do attempt to find points of comparison as a way to strengthen their reasoning. That they are undeveloped follows from the context of the assignment, in which the students have to produce the answers during class, with a time limit. That they still make the comparisons indicates a potential for developing this kind of reasoning further, if the conditions for producing the answers were to be changed.

Discussion
The purpose of this study was to investigate students' use of counterfactuals in the context of a task designed to test their handling of historical explanation, with the aim of providing suggestions about how counterfactuals can be used as an educational tool in history teaching. In discussing the findings, it is important to remember the limitations of the study: the material has been derived from a limited sample produced under test-like conditions. With this in mind, what can the findings tell us about the potential value of counterfactuals for teaching and learning historical explanation?
The first finding is that a majority of the students involved use some form of counterfactual for reasoning about the explanatory importance of the central actor, Adolf Hitler. The prompt used was not specifically designed to elicit counterfactual reasoning, and the students come from different schools, indicating either that the students use counterfactual reasoning of their own volition, or that the teachers, independently of each other, teach counterfactual reasoning as a matter of course in relation to historical explanation. It is possible that the nature of the task, evaluating explanatory factors, encourages students to think counterfactually. In either case, this finding points towards an existing use of counterfactual reasoning. This existing use can be thought of as a problem that needs to be rectified, or as a potential for advancing thinking about historical explanations. The latter position is the one taken in this paper.
The value of counterfactual reasoning for this kind of task can be better appreciated by taking into account the 25 answers that do not use counterfactuals. While a few of these do provide advanced reasoning about the causal relevance of the central actor, the majority of them fail to engage with the interpretative nature of explaining Hitler's importance in a meaningful way, instead only recalling facts. This indicates that counterfactual reasoning is an important scaffold for students engaging in evaluation of explanatory factors, if not strictly necessary. Evans (2016) is thus probably correct when he argues that counterfactuals are not necessary in order to uncover historical agency, but, not being a history education researcher, he does not take into account their potential in an educational context. However, use of counterfactuals still needs qualification. In this study, analysis of the distinction between actors and structures (Lilliestam, 2013) shows that the students tend to use counterfactual manipulation on the central actor, Hitler. This can be most easily explained by the fact that the prompt focused on the explanatory importance of Hitler, although it is also possible that an individual actor is relatively easy to identify and engage with, strengthening this tendency. A smaller number of students still focus on, or include, structural factors in their counterfactual reasoning. Possibly, a prompt that focuses on a structural factor rather than an actor would increase this tendency, which seems to be desirable, since the students manipulating structural factors express the possibility of alternative outcomes to a greater extent than those who solely focus on actors. Focusing counterfactual reasoning on structural factors thus appears to be more fruitful for countering notions of history as predetermined (Barton, 2008;Nolan, 2013). Arguably, this is because a structural focus opens wider questions about what was possible and what was not possible given a particular historical context (Lee and Shemilt, 2009). In this study, Student 4-34's argument for the importance of the First World War in explaining the success of the Nazis is one example of a student addressing questions of possibility and necessity.
Furthermore, the analysis of student texts using the theoretically derived criteria for counterfactuals indicates that texts that manipulate structural factors tend to be more specific in their counterfactual reasoning. This strengthens the quality of the reasoning in the sense that it becomes possible to evaluate what facts the students use to underpin their counterfactual reasoning in factual history. From a teaching perspective, this makes visible possible misconceptions and misunderstandings that can be used in feedback to the student. These are, of course, dependent upon the actual topic; an example from this study is Student 1-13's mistaken understanding that early 1920s Germany was not in an economic crisis, mirroring a common tendency to confuse the inflation crisis and the Great Depression (Darby, 2010). In this sense, context sensitivity, understood as specificity, appears to be the most important aspect of making counterfactuals 'respectable' (Chapman, 2003: 49).
In contrast, the notion of plausibility did not yield any results. While students' counterfactuals may be more or less specific, they do not engage in the kind of speculation or wishful thinking about which Evans (2016) worries. This may be because the prompt did not ask for counterfactual reasoning -the counterfactuals are not the main point, but rather a device used for engaging in the evaluation of the explanation. It may also be that the test-like circumstances discourage students from veering off into too much speculation. The latter possibility makes it important to retain plausibility as an important aspect, should it turn out that students are more prone towards alternative history speculation outside of this particular context.
As for the supporting methods, generalizations appear to add little of value to the qualification of counterfactuals, even though they do clarify certain assumptions made by the students. Comparisons appear more valuable, confirming the observations of Buxton (2010) and Worth (2012). When comparisons and generalizations are used together, the counterfactual reasoning is considerably strengthened. Even though relatively few of the students in this sample use comparisons, they appear important for advancing sophisticated counterfactual reasoning in support of evaluation, making the counterfactual reasoning more qualified than in cases where comparisons are absent. The theoretical criteria can thus be modified, with generalization seen as an aspect of comparison, rather than as a separate form of support.
In sum, the results of this study confirm that students are able to use counterfactuals in order to evaluate explanatory factors. Not all students show this capacity, and those that do so show varying levels of sophistication: the minority using comparisons to support their counterfactual reasoning make more qualified arguments. If the employment of counterfactual reasoning is a result of students' own tendencies towards counterfactual reasoning, the criteria for qualifying counterfactuals can be used as a way of sharpening such tendencies in order to use counterfactuals that are valuable for reasoning about historical explanations. If, on the other hand, the employment is a result of counterfactual reasoning being taught by teachers, the criteria can be used as a way of developing an already existing teaching practice. By suggesting these criteria, the study confirms and furthers the arguments for counterfactuals made by Lee and Shemilt (2009), Woodcock (2011), Lilliestam (2013 and Huijgen and Holthuis (2014), as well as adding theoretical considerations to the use of counterfactual reasoning in teaching design, as suggested by Buxton (2010), Worth (2012) and Carroll (2018). On a final note, it is probable that the value of counterfactuals will continue to be disputed in academic history, which means that they may be dismissed from history education as well. The results of this study indicate that such a dismissal is probably a mistake, because counterfactuals appear to be a fruitful way for students to engage with questions about the evaluation of explanatory factors. That they can be misused is not an argument for dismissal, but rather for developing methods for using them in ways that support and highlight the ways in which students think about history.
Notes on the contributor Joakim Wendell is a PhD candidate in history with a focus on history education research at Karlstad University, Sweden. His research interests focus on teaching, learning and assessing explanation in history.