Open-Card Sort to Explain Why Low-Literate Usersabandon their Web Searches Early

The purpose of this paper is to report the possible reasons for premature abandonment by low-literate users during online searches. Previous evidence suggests that low-literate web users abandon their online searches early believing that the information they were looking for should be in the section they were at, thinking that they have either found it or that the information was unavailable. This paper describes an open-card sorting technique combined with multiple Cognitive Task Analysis (CTA) methods to understand why this occurs. Nine high-literate and eight low-literate volunteers of the Citizens Advice Bureau (CAB) sorted 37 cards representing information in the “Adviceguide” social services website. The qualitative data collected were analysed using Emergent Themes Analysis (ETA). Results showed that low-literate users do not create main and subgroups when classifying the cards but kept them on single-level taxonomy. They rank these groups based on flawed interpretations of concepts and personal or hypothetical experiences. High-literate users create multi-level taxonomies and their interpretations are based on keywords and interpretations of concepts and personal or hypothetical experiences. We believe these differences in classification models may contribute to premature abandonment of online searches by low-literate users.


INTRODUCTION
Government departments and other organizations are placing important information on the web and reducing face-to-face advice.For instance, in the UK, social service information could reside in many Government and non-profit organization 'silos', this distribution and duplication of information complicate the search for information [1].Previous research showed that low-literate users are less successful in obtaining online information [2,3].This creates a digital divide due to imbalances in resources and skills needed to effectively participate as a digital citizen.The research focuses on persons with low levels of literacy defined according to the National Skills for Life Survey [4].Low levels of literacy could be due to English not being their first language, or leaving school at an early (and an inappropriate) stage.These factors may result in subsequent poor ability to read, write or spell adequately to meet the demands of daily life.Literacy is defined as the ability to read, write and speak depending on the expectations of the social economic environment a person lives in [5].
According to the research commissioned by the former Department for Education and skills in 2003 suggested 16% or 5.2 million of the UK population presented below Level 1 or low levels of literacy (read and write at a level of an 11 year old) [4,6].
The commission suggests persons having Level 1 of literacy or functional literacy, represents the best approximation to what counts as basic requirements required for everyday living [6].A previous study [7] examined online information seeking behaviour strategies between high and low literate users of a social service website.The study showed that low literate users abandon their searches prematurely assuming the information they were looking for was not available in the system.The results from this study also showed that low literate users' search abandonment increased from 20% -60% as the task difficulty increased from easy to difficult.72% of the time they were unable to find the relevant information at the places expected from the system [7].The authors inferred that low literate users' expectations or metal representations did not map across to the system, which led low literate users to abandon their searches prematurely claiming the information they were looking for was unavailable in the places they expected.The low literate users expectations or mental representations of the search task results were heterogeneous to each other.These behaviour strategies were not evident with the high literate users.As there seems to be a mismatch between the system and the low literate users' expectations, further investigations were required.The previous study identified a problem, but did not identify the depth of it.The current study investigates Marchionini describes that the "process of information seeking is a cognitive activity that involves long and short term memory, background knowledge, spatial cognition and mental models, to name a few critical factors".He further observes that "information seekers develop and use mental models for a variety of mental and physical objects, including information objects and different domains of knowledge" [8].A good design depends on the mapping between users expectation or mental representation and the designers model through a system [9].Several researchers emphasize the role of individual differences and cognitive factors in the information seeking process.First step to understand information-seeking in electronic environments is to develop an understanding of the basic cognitive processes that guide informationseeking [10].Humans construct mental models of the phenomenon to gain knowledge and understanding.By understanding mental models of users allow to understand a problem situation and predict consequences of actions contemplated for solving the problems [11].Indi Young [12] explained that once we understand the beliefs and assumptions a user makes in their head, we can uncover the mistakes, misunderstandings and oversights with respect to system will become clear.
In order to determine the depth of the problem and identify the reasons why mismatches between the online system and the low literate users, we decided to conduct an open card sort study.The literature shows card-sort data have been used to understand users mental models [13].An open card sort gives the users the freedom to classify information according to their domain knowledge and experience without external influences.Card sorts can be carried out either electronically or on paper.
For the purpose of our study we opted for a paperbased card sort to minimize any additional learning, computer literacy and other external factors that might influence the findings.The participants were clients of the Citizens Advice Bureau (CAB) and 60% of them were currently claiming some form of benefit.Assumption was made that the participants had basic domain knowledge to carry out the card sort activity which were based on the 'Adviceguide' website information.The outcomes of the card sorting study aimed to (1) identity whether there are differences in classification models between high and low literate people irrespective of a system, (2) to identify those differences, and (3) to determine guidelines for interface design.

METHOD
Participant recruitment: Ethical approval for this study was granted by the EIS School Ethics Committee.A notice was placed at the Barnet Citizens Advice Bureau, and staff members were informed to encourage clients to participate in the study.Volunteers were made aware that the study was to carry out a card sorting on social service information.Selection process: Volunteers were 18 years old and above, in cases where English was not their first language, it was imperative that they have lived in the UK for a minimum of 10 years and their education was at least above or below secondary education.debt, family, and housing.Participants could place the 37 cards on a table without any overlap and did get overwhelmed.The 37 information items were typed, assigned a unique identifier number, and printed on cardboard.A set of detailed definitions of the information on the cards were provided by the CAB and printed on a separate summary sheet.

Multiple Cognitive Task Analysis Methods
Multiple Cognitive Task Analysis (CTA) methods were used to extract and understand the participants' decision process during their card sorting.We used process tracing, observation and interview methods.These three techniques were used in the previous study to triangulate participants' classification behaviour strategies [10].The in-depth interviews were modelled after the Critical Decision Method [11], a semi-structured interview technique which uses probes to extract information and experiences encountered during a task.

Procedure
The study was conducted at Middlesex University Usability Laboratory.Participants were asked to classify a set of cards.An information sheet with detailed definitions of the 37 terms was given to them.They were told to ask for assistance if a clarification was required for any unfamiliar terms.
The assistance did not give clues or directions on how to classify the cards.Prior to the study, they had a practiced trial with 10 cards with names of cities, birds, mammals, and reptiles.Following the practice session, participants were asked to classify the cards according to what they felt was best.Participants were encouraged to make sure that they understood the meaning of each card term.Participants were instructed to think-aloud during the classification process.Participants were informed that was no right or wrong classifications, and there was no limit to the number of classifications.
If more than one way of classifying is identified, they were advised to place the card which they felt was more meaningful to them.Participants were informed to let the facilitator know when the classification process was completed.The facilitator will then provide post-it notes to give each classification a label name.If assistance is required to write the heading names, participants were advice to let the facilitator aware.Once the classifications were named the participants were made aware a semistructured interview will take place.All participants gave consent for video recording.
The facilitator observed the participant during the card sort and took notes on any questions that needed verifying.After the labels were defined, the participants were asked a set of open-ended questions to clarify the process and describe any memorable experience encountered during the classification.If clarifications were required, the facilitator found the relevant part of the video and played it back.The video helped participants to recall the event and explain their decisions, actions or thoughts.
Once the interviews were completed, participants were asked to complete the National Skills for life literacy survey to determine their level of literacy.Nine participants were classified as high literate, and the remaining eight were classified as low literate.
The high literate users scored an average of 34 out of 40 while the low literacy users scored an average of 15 out of 40.Low literacy corresponded with reading levels below the UK National Curriculum Level 4 while high literate users reading levels were above 5.
At the end of each session the facilitator wrote down the card numbers on the associated labels created by the participants along with any hierarchy created.

Data analysis techniques
The total study excluding the literacy assessment and breaks on average took 1.5 hrs for a highliterate user and 2.5 hrs for a low-literate user.The qualitative data from the think aloud, observations, video recordings along with the semi structured interviews were all transcribed.In total we transcribed 20 hours of low-literate users data and 13 hours of high-literate users data.The data was then analysed using the Emergent Themes Analysis (ETA) approach Wong and Blandford [12].During the qualitative data analysis process the participant's literacy status was double blinded, to avoid any bias.
Once the qualitative analysis process was completed post-it notes containing the labels and associated card numbers with single or multi-level taxonomies were entered into an Excel spreadsheet.We used spreadsheet template [13] to calculate agreement weight (Formula 1) to determine the strength a card being placed in a category.

RESULTS
To determine the differences in the classification models between low-literate (LL) and high-literate (HL) people, we looked at when the participants initiated their classification process, how participants classified the cards, and what influenced their thinking (Table 1).

When did participants initiate the classification process?
Participants initiated the classification process either as soon as the cards were read or after they laid the cards on the table.Regardless of how they initiated the classification process, some tended to keep cards aside for later classification.We observed that only 67% of the HL participants initiated their classification process as the cards were being We also observed that only 33% of the HL participants lay out the cards before grouping, while this behaviour was not observed with the LL participants.For instance, participant HL7 asked

How participants classified the cards?
We identified that participants used three methods to classify the information: by keywords, by keyword and interpretation and by interpretation only.We refer to a keyword classification if they grouped the cards by a prominent word.Keyword and interpretation means that after grouping the cards by keyword, they changed some cards to other groups that were more closely related by meaning.We refer to interpretation only when the participant justified a classification based on personal or hypothetical experiences and did not use keywords.
Eleven per cent of the HL participants used keywords, 67% of the HL participants classify cards using keywords and interpretation, and the remaining 22% of the HL participants use interpretation only to group the cards.On the other hand, 100% of the LL participants only use interpretation for their classification.Those who grouped the cards by keywords only found, for instance, all cards that contained the word tax and grouped them together.Participant HL3 said "I think I am going to put 'tax' together and let's see and put the 'debt' one together, ok, I think I see a few 'frequently asked questions' about different subjects, I am going to try to group them together and see if it works."Participants who used keywords and interpretation tended to classify the cards in two stages.For example, participant HL6 classified some of the cards using keywords and explained that is unable to carry out the same process with the remaining cards "I am not making any assumptions I am looking for straight associations.Ok, now I am trying to understand this because I grouped these for the basic grammatical associations.This [by keyword] is the group criteria I am using here, but here, since this [looking at another card] is different I cannot use this trick.I am trying to look behind the word.I am trying to understand what this heading is telling me".Some participants used personal or hypothetical experiences to justify their groupings.We observed that all LL participants interpretations were flawed at some point of the classification process, while this was not observe with the HL ones.For example, participant LL3 came across national insurance contribution and benefits which refers to the amount of tax that you can claim back towards contribution- based allowances such as Jobseeker's Allowance, and Incapacity Benefit.The participant said "National insurance is for tax that means when you earn you pay tax and not enough pay".Although this participant was claiming benefits from the government, and was aware of the fact that one pays a tax on national insurance, the person failed to understand that the benefit refer to the possibility of claiming it back.

Factors that influenced participants' thought process?
We found that participants' thinking was influenced by: (a) number of levels for each classification, (b) ranking within these levels, and (c) personal or hypothetical experiences.We observed that HL participants tend to create subgroups within the main groups; we refer to this as multi-level taxonomy.LL participants did not create subgroups within their classification; we refer to this as single-level taxonomy.Participants tend to rank their groups.We noticed that HL participants ranked the main classifications and the subgroups, i.e. they performed a horizontal and vertical ranking.LL participants were observed to only rank their groups horizontally.We observed that 78% of the HL participants classified in a multi level fashion.When participant HL1 placed the cards, the participant had a multilevel taxonomy: benefits -> tax problem -> tax credits.The person stated "Problems with benefits and tax credits, hmmm as it is to do with benefits and tax credit problems,.It should go under this big category benefits and hmmm then under tax problems and finally under tax credits." Figure 1 shows how the said participant organised the cards in a multi-level taxonomy.Under the "Benefit" taxonomy the participants placed cards "What benefits can I get?, Benefits fact sheets, and Benefit in kind", followed by two sub-level taxonomies "Benefit problems" and "tax problems".Cards placed respectively under the above were "Frequently asked questions about benefits, Benefits for people over sixty, Young people and benefits, benefits for families and children, Benefits for people who are sick or disabled, Benefits for people looking for work, National insurance contributions and benefits and Dismissal and benefits" and "Help with tax problems".The"tax problems" taxonomy followed by two sub-level taxonomies "tax payment" and "tax returns" which contained the following cards respectively "Help with your council tax-council tax benefit and Payment of benefits and tax credits" and "Problems with benefits and tax credits and Benefits and tax credits for people in work".We observed the participant placed cards which were very general about benefit under the "Benefit" taxonomy, followed by information about benefit for different people under "Benefit problems" sub taxonomy.The sub taxonomy "Tax problems" had general information about help with tax problems, and the two sub taxonomies "tax payment" contained cards related to payment of tax and "tax return" contained cards which were related to tax returns.

Figure 1-Participant HL1 -Shows a multi-level taxonomy
All HL users ranked their groups either horizontally or vertically; while 50% of the LL participants ranked the groups horizontally.Participant HL3 observed the classification carried out and made changes to make the group smaller, ranking it vertically according to age starting from families and children, young people, over 60 going to disabled.
"I think ideally this group should not be this big, if you look here the top once are like benefits for different people like young people, families, people over 60 and sick people".
Another example of ranking is given by participant LL8 who affirms "in the number one [referring to the tax group], the most frequently used one, I think this one [participant's most important group] … I think more priority on this one, Yes first is tax and benefit, then mortgage, number 3 for holiday and the last one is....".
It is important to note that some participants justified their actions based on personal or hypothetical concepts or experiences.We observed that 67% of the HL participants and 88% of the LL participants used their personal or hypothetical experiences to influence their thought process.These participants referred to their own experience; friends and family experiences, news, or placed themselves in someone else's shoes.We observed that all LL participants used at some point of their classification process, a personal or hypothetical experience, or an interpretation of the meaning that was flawed or incorrect.HL participants did not present incorrect or flawed interpretations even when using personal experiences.Participant LL1 recalled a personal experience and said But participant LL4 justified the creation of a new group that was labelled "accidents' with a flawed interpretation of the cards dubbed benefits fact sheets and young people and benefits.The participant stated "Benefits fact sheets and young people and benefits name is 'accident', sometimes, an accident in a car or fell on a street, like walking and sometimes get dizzy, or slippery, children like sometimes sitting on a car and don't wear a seat belt, and sometimes, slippery like us.It's called 'accident'." Besides the several classification initiation processes and the different methods participants used, there were differences in completion times.We refer to completion as the instant in which participants stated they finished grouping.However, 75% of LL participants made major changes to their groupings during the interview process.None of the HL participants made any changes after completion.The time for completion differ greatly between the two groups.On average LL participants took 75 minutes for completion while HL participants took about 15 minutes.Thus, LL users took 5 times longer to completion, and yet continued to make changes to the classifications during the interview process as seen in Figure 2.

Agreement among participants on label names
HL participants in total created 76 label names and 78% of them created them with multi-level taxonomies.On the other hand LL participants created 44 label names with single-level taxonomies only.The think-aloud and interview data were revisited to identify label names which had similar meaning.We identified 19 main categories that could represent the 120 label names created by HL and LL participants.The 19 main categories will be referred as high-level taxonomy and are listed as follows: The 19 high-level taxonomy list  2 shows the label names created between the HL and LL participants with regard to "Benefits related information".In this table we see the multilevel taxonomies created by the following HL participants respectively (HL1, HL2, HL3, HL5, HL6, HL8 and HL9).The think-aloud and interview data point out the remaining HL or LL participants created single-level taxonomies.Some of these single-level taxonomies fell into broader high-level taxonomies, but the participants did not make the necessary connections.

LL3
Child benefits

LL4
Benefit Department Help and Advice on Benefits

Second on Tax Papers on Benefits
The 19 high-level taxonomies and the associated HL and LL participants were entered into a spreadsheet template [13].Either the spreadsheets were separated to identify clear card numbers of the difference between the HL and LL participants.An agreement weight (see Formula 1) was calculated for each card for each high-level taxonomy as shown in Figure 3 for HL and Figure 4 LL participants.The agreement weight is a way to describe the strength of a card in a single high-level taxonomy [14].

Formula 1. Calculating agreement weight
Once the agreement weight was calculated high agreement, was calculated if the agreement weight appears 66% or > in a category (cell to be marked green), medium agreement, was calculated if the agreement weight appears 33% > and 65% < in a category (cell to be marked white with percentage), and low agreement, was calculated if the agreement weight appears 33% or < in category (cell to be marked yellow).
The summary Table 3 below shows the different agreement weights high, medium and low that were calculated using Figure 3 and 4 respectively for the HL and LL participants.Figure 3 and 4 columns consist of the 19 high-level taxonomy list.We observed that only HL participants had a 39% high agreement indicating there was more than a 66% agreement on the card placements.A medium agreement by HL and LL participants respectively were 9% and 22% indicating there was agreement between 33% to 65% on the card placements.Finally a low agreement by HL and LL participants respectively were 52% and 78% indicating there was less than 33% agreement on the card placements.
It is important to note the visual differences observed from Figure 3  Open

DISCUSSION & CONCLUSIONS
The study aimed to determine if there were differences in the classification models of lowand high-literate participants.It was important to determine these differences because a previous study suggested that the high rate of online search abandonment of LL participants was due to a mismatch of their representation compared to the online website they used.Results showed that there are important differences between high-and low-literate participants' classification models.The interpretation of the meaning of the cards' labels affected their classification.The level of literacy seemed to influence the number of levels (main and subcategories) that were created.78% of the HL participants created a multi-level taxonomy, but all of the low-literate participants had single-level taxonomy.Although all participants tended to rank their groups, their ranking depended on the levels of taxonomies created.
For HL participants, the classifications were based on keywords, interpretation or a combination of the two.LL participants only interpreted the labels.This result confirms that low literacy users do not scan for information but only read [2,7].Furthermore, we have inferred from the observations that low-literate participants generally present a flawed interpretation of concepts, personal or hypothetical experiences.Moreover, HL participants seemed to have critical and analytical skills that help them interpret and understand the concepts of the labels in a more accurate way.The differences we observed in the way LL and HL participants classified these cards, has led us to suggest that one of the reasons for LL web users premature abandonment may be the differences in mental representations due to flawed interpretation of concepts, limited critical and analytical skills they have.

LIMITATIONS OF THE STUDY AND FUTURE WORK
We carried out a cluster analysis on the open-card sort results.Cluster analysis is used to visually see representations of highly correlated groups or categories.Items which are very similar will be grouped together while items which are dissimilar will create other groups.A tree structure known as a dendrogram which is used to visually represent the outcomes of a hierarchical clustering was drawn for the HL and LL users.The preliminary results show four tightly correlated categories for the HL users and 3 loosely correlated categories for the LL users.This supports the previous findings that HL users have a higher agreement on the card placements whereas the LL users have a lower agreement.We plan to extend the cluster analysis finding to triangulate the above discussed findings.However, due to the small sample size it is difficult to generalise these findings.
Observations can be made and future directions for future research can be explored with a bigger sample.
Open-Card Sort to Explain Why Low-Literate Usersabandon their Web Searches EarlyNeesha Kodagoda, B L William Wong, Nawaz Khan Open-Card Sort to Explain Why Low-Literate Usersabandon their Web Searches Early Neesha Kodagoda, B L William Wong, Nawaz Khan "My husband passed away few years ago, so I am on the benefits and bereavement.I kind of know how most of the tax things work."Participant HL4 tried to recall friends or family experiences when grouping the cards "hmm have not taken benefits in my life, but I tried to remember what others have discussed about benefits and stuff like that and I used that knowledge."Participant HL6 tried to imagine himself in the specific situation "I am trying to understand what this heading is telling me, and I have to wear the shoes of someone else to look for this information."

Figure 2 -
Figure 2-Participant LL1 -changes made to the classification after completion and 4 indicate: (a) high agreements by HL participants only, (b) HL participants items are less disperse then LL participants, (c) use of many high-level taxonomies by LL participants.

Figure 3 -Figure 4 -
Figure 3-High literate participants agreement on classification Open-Card Sort to Explain Why Low-Literate Usersabandon their Web Searches EarlyNeesha Kodagoda, B L William Wong, Nawaz Khan the previous findings to understand reasons for premature abandonment during online searches by low literate users.

Table 1 -
Open-Card Sort to Explain Why Low-Literate Usersabandon their Web Searches Early Neesha Kodagoda, B L William Wong, Nawaz Khan read compared to a 100% of the LL participants.For instance, a HL participant while reading the benefit fact sheets card stated "hmm I will put the benefit fact list separate from the Mortgage because that Mortgage is not benefits.Even though both are connected to financial problems.But they are different."While a LL participant said "credit, this [the card labelled credit] can go [with] this one [referring to finance], because it is about financial."Summary of results.

Overall results Completion (once the participant stated they finished grouping others continue to group over the interview session) Time for completion on average (in minutes) 100% 15 25% 75
and children are normally younger people and over 60 are elderly people.Can I skip one and go back as I don't know about these right now..." .
"Can I just place the cards on the table first?."Additionally, 78% of the HL participants and 38% of the LL participants kept cards aside for later classification as they were uncertain where the cards would fit.For instance, participant HL1 reasoned "Benefits for families and children the other one is benefits for people over sixty they are different because families

Table 3 -
Summary of agreement weights for HL and LL participants