Can a Virtual Agent provide good Emotional Support ?

In this study we explore whether an emotional support message sent to an informal carer by a Virtual Agent provides good quality emotional support, compared to the same message sent by a friend or sister with whom they have either a close, medium, or distant relationship. We also explore whether these judgements are affected by personality. Participants recruited from Mechanical Turk rated an emotional support message for Suitability, provided qualitative feedback about their rating and then completed a personality measure. We found that the support message was rated worst when it came from the Computer, Distant-sister and Close-friend. While these were rated worse, they were not rated poorly, implying that support from a computer is valuable. There were three effects for personality which did not vary with the support giver’s Identity: agreeableness and emotional stability had a positive correlation with 3 sub-scales of supportiveness. A thematic analysis of comments revealed that people prefer emotional support from a human; they like empathy; support from close friends means more; they prefer personalised support; and they have higher expectations from family over friends.


INTRODUCTION
Digital health applications are among the most timely and helpful innovations of the age that have potential to provide a comprehensive approach to integrated care (World Health Organization and others 2016) -yet there is the danger of them being impersonal. The repercussions of removing the face-toface interactions between two people is not well understood. In this study we explore whether a text-based emotional support message is perceived differently depending on who sent it, and whether this varies with the personality of the reader. The results of this study can be used to inform the future development of emotional support systems for people experiencing stress.
It is possible to insert empathetic content into an e-Health application using a virtual agent (VA). VAs have potential to improve engagement with e-Health interventions by expressing empathy and providing affective responses to user input (Scholten et al. 2017). Good quality emotional support has been found to reduce negative affect (Meyer and Turner 2002), though inappropriate support could harm the relationship between the support giver and receiver (Burleson and Kunkel 1996). This is particularly critical for VAs as it might break the user's suspension of disbelief (that a computer actually empathises with the user) (Scholten et al. 2017) and impact upon the user's willingness to engage in further activities.
This type of emotional support may be particularly useful in the carer domain. It has been widely found that carers exhibit more severe physical and mental health problems than non-carers of a similar demographic (e.g. Savage and Bailey 2004, Schoenmakers et al. 2010, Caqueo-Urízar et al. 2009, Vitaliano et al. 2003. This is exacerbated by the fact that carers are often elderly themselves and/or suffer from medical conditions themselves. In the UK, 1 in 5 people aged 50-64 are carers; 1 in 4 women and 1 in 6 men. 63% of carers have a long-term health condition (compared to 51% of non-carers) (Carers UK 2015).
In supporting carers it is thus important to recognise that they are not a homogenous groupcarers may come from any part of society. In trying to develop an eHealth application to support them we must recognise different carers may require different support and respond to support differently.
One such difference is personality. Personality describes who we are and how we react in different circumstances. There are many ways to measure personality. One of the most popular and reliably validated is the Five-Factor Model (FFM) (Goldberg 1993), which describes an individual's personality as a set of scores on five different factors or traits: Extraversion (I), Agreeableness (II), Conscientiousness (III), Emotional Stability (or Neuroticism) (IV) and Openness to Experience (V). We hypothesize that carers with different personalities may require different types and amounts of Emotional Support.
Highly emotionally stable individuals are calm, nonneurotic and imperturbable (John and Srivastava 1999), while neurotic individuals (those with low Emotional Stability) are more likely to worry, feel negative affective states and experience depressive symptoms (Watson 2000, Larsen and Ketelaar 1991, Lahey 2009, and as such may require more support to deal with these emotions. We hypothesize that these individuals will value emotional support more highly as they are more affected by stress. Astrid et al. (2010) investigated the impact of personality factors on experience with VAs. They found that big five factors Extraversion and Agreeableness had an impact on the users' experience with the VA. They found that people with high agreeableness felt better after interaction with the VA. Extraverted individuals used more words to interact with the VA, implying improved engagement. We hypothesize that individuals with higher agreeableness will rate the VA more favourably than people with low agreeableness.
In this study we aim to explore if emotional support from a computer is more or less supportive than from humans. Yet it is likely that people rate support from a stranger differently than support from a friend. Dunbar's (Dunbar 1998, Hill andDunbar 2003) studies of personal social networks tell us that an individual (ego) has several bands of friendships each with distinctive qualities that are of a constant size across cultures. Several factors have been identified as influencing the size of an individual's personal networke.g. income and marriage. It has also been found that without network maintenanceregular interaction between membersa friendship will degrade, though kin relationships require less maintenance. Thus, if an individual is not able to invest time in network maintenance then it is likely that they will lose friends (Roberts and Dunbar 2011). We can use the network bands (described by Zhou et al. (2005)) to explore how network closeness affects an emotional support message supportiveness compared to support from a VA.
Previous research has explored how emotional support expectations vary between different members of a social network; what you expect from a close family member is different from a work colleague. Moncur et al ) demonstrated this in a study where parents of infants in neo-natal intensive care chose to share information differently between people of different emotional proximity. Thus we expect that people will find an emotional support message more or less suitable depending on social network proximity. We anticipate that the AI chatbot will be rated lowest, but may be comparable to support from an acquaintance.
In previous studies, researchers have investigated what types of emotional support are suitable for carers experiencing different stressors (e.g. isolation, physical demand, interruption) (Smith et al. 2014) and first responders experiencing stress (Kindness et al. 2017). They have also investigated how to add emotional context to emotional support messages through gift emoticons (Smith 2015), and how emotional support messages should be tailored to users with high or low emotional stability . In this study we wanted to explore whether people rated the support message differently if they thought it was coming from a real person or a computer. We also wanted to see if people with different personalities would rate the support differently.

METHODS
In this study we examine whether social network distance and family/friend relationship have an impact on emotional support supportivenss. We contrast this with computer-provided support. We also investigate whether people's personality has an impact on their ratings.
Ethical approval for the study was obtained from the department ethics committee. Informed consent was obtained from participants prior to participation.

Design
We used a between-subject design. Participants saw one scenario and one message (see Figure 1). The message was depicted as coming from either a computer, a sister or a friend. The sister/friend was either a 'best friend' (close), close acquaintance (medium) or distant acquaintance (distant), as shown in Table 1.
Participants rated their empathy with the scenario (here called 'Sympathy' to disambiguate it from the message category 'Empathy'), to allow us to control for low empathy, as used in (Smith et al. 2014). They rated the support message on 4 Rating Types: scales of helpfulness, appropriateness, sensitivity and effectiveness (all Likert scales from 1 to 9, see Figure 1, taken from Jones & Burleson 1997) an They also completed a personality slider task (from sec:sliders) for the Big 5 personality traits. The Independent variables were support provider identity (7 levels: computer, sister-close, friend-close, sister-medium, friend-medium, sisterdistant and friend-distant) and participant personality (a score ranging between 18 and 162 for each personality trait of Openness to Experience, Conscientiousness, Extraversion, Agreeableness and Emotional Stability); the dependent variables were Sympathy (1-7) and Message rating. Qualitative data was also obtained. 3

Materials
 One stressful scenario depicting emotional demand and one support message depicting empathy were used (as empathy is considered to be high quality emotional support; Burleson & Kunkel 1996). These were taken from (Smith et al. 2014) and are also used in (Smith 2015); see Figure 1.  We used Zhou et al. (2005) to guide us in definitions of relationship closeness (see Table 1), which adhere to Dunbar (1998)'s Social Network Theory.  We decided to depict the carer's friend/family member as female, as the carer in the scenario is female. This was to avoid a possible confound of participants' interpretations of a Male-Female relationship.  We described the family member as a sister in order for them to be judged as a peer rather than more senior or junior in the family hierarchy.  Sliders to measure the big 5 personality traits were taken from (Smith et al. nd) (see Figure 2)

Participants
Participants were recruited from Mechanical Turk (MT nd) and were paid $0.50. This was chosen as a good participant pool of people with experience of caring compared to opportunistic sampling, and as it suited our indirect approach. Smith et al. (2014)

Procedure
The procedure is depicted in Figure 1. Participants were first presented with the information sheet consent and consent form. They then provided basic demographics and were screened using the English Comprehension task. They were then randomised to one of the seven conditions (see Table 1).
Participants were told what a carer was and that they would be shown one scenario involving a carer called Alice. They were presented with the scenario and asked to rate their empathy (here called 'Sympathy') with the carer's situation. Then they were introduced to Sally as either Alice's friend, sister or computer (see Table 1) and were asked to rate a short support message that Sally had sent to Alice and explain why they had given those ratings. They were then presented with the Big 5 Personality trait story pairs with a slider to indicate how close they thought they were to one of the people described in the stories (see Figure 2).
This indirect method was chosen as it is difficult to explore stressful situations whilst they occur (both for ethical and practical reasons). We measured empathy as a covariate to mitigate that some participants don't empathise with stressful situations; this has been found to be effective in previous studies (Smith et al. 2014).

Hypotheses
We had the following hypotheses:  H1 Computer-provided support will be rated lower than human-provided support  H2 Family support will be rated differently from friend support  H3 Support from close friends/family will be rated differently than support from distant friends/family  H4 People with lower emotional stability will rate the support more highly than high emotional stability people  H5 People with higher agreeableness will rate the computer-provided message more highly than people with low agreeableness.

RESULTS
An initial review of the data revealed that it was normally distributed and thus suitable for using parametric tests.

Effects of Identity and Rating Type on Rating
A 7×4 ANCOVA of Identity×Rating Type (Supportiveness subscales of Effectiveness, Helpfulness, Appropriateness and Sensitivity) was performed on rating, controlled for Sympathy. This was significant for Rating Type F(3, 523)=14.33, p<0.05 and Identity F(6,523)=5.01, p<0.05, but not the interaction. The main effects can be seen in Figure 3. Post-hoc tests reveal two homogeneous subsets (see Table 2). Medium-friend, distantfriend, medium-sister and close-sister were significantly higher rated for the same message than the computer. The message was rated as significantly more sensitive and appropriate than effective or helpful.
From this, we find support for H1: computergenerated support was rated lower than humanprovided support. We also found some support for H2 and H3, that family support was rated differently than friend support, and close support was rated differently than distantthe message from the distant-sister was rated low, while the message from the close-friend was rated low. This shows that 6 participants' ratings depended on both the identity of the supporter and the closeness of the relationship.

Effects of Personality on Rating
A Correlation analysis was run between the 5 big five personality trait scores and the 4 rating types, controlling for sympathy through use of a partial correlation analysis. Agreeableness had a positive correlation with Effectiveness (r(135)=0.17, p=0.05; see Figure 5) and Sensitivity (r(135)=0.18, p<0.05; see Figure 6) and Emotional Stability had a positive correlation with Helpfulness (r(135)=0.17, p=0.05; see Figure 4). Additionally, several of the traits correlated with each other (see Table 3). These correlations are small, and seem in line with other observations that the Big Five factors are not completely orthogonal (DeYoung 2006, Saucier 2002, Anusic et al. 2009, Dennis et al. 2012).

Effects of Personality, Rating Type and Identity on Rating
To explore whether the correlations we found for Agreeableness and Emotional Stability varied with Identity, we ran an ANCOVA examining the interaction between Personality, Identity and Rating Type. We found a significant interaction between Rating Type and Emotional Stability (F(3,366)=3.47, p<0.05), but no other effects. This is most likely due to the small sample size and number of statistical tests that have been corrected for.
Thus we found no support for H4, that people with lower emotional stability will rate the support more highly. Instead we found an indication that people with high emotional stability rated the message slightly higher on helpfulness than people with low emotional stability (see Figure 4). This did not interact with Identity. We also found that people with high agreeableness rated the message higher on effectiveness (see Figure 5) and sensitivity (see Figure 6) than participants with low agreeableness. This provides partial support for H5, but that this is not specific to VAs -people with high agreeableness are more favourable to an interaction regardless of the identity of the support provider.

Thematic Analysis of Comments
Following our quantitative analysis, we performed a qualitative analysis (using an open-coded thematic analysis) of rating explanations to explore why people had rated the message in certain ways. Our analysis was based on a (compulsory) comment box on the page (see Figure 1); people did not leave extensive comments (no comment was longer than 3 sentences). There were 19-20 participants per condition, so for each condition there is a relatively small pool of comments to analyse. Additionally, many of the participants commented on the general content of the message rather than the relationship between the sender and receiverwe were not interested in these general comments in this study. We found the following themes:

AI diminishes support.
When informed that 'Sally' was 'an Artificial-Intelligence computer application (chatbot)', participants had divided opinions as to whether the support was suitable or not. Five participants thought that a computer application could be helpful: 'even though it is just AI it could still be helpful', though the support would be 'diminished somewhat because Sally is a chatbot'. Four participants claimed that Sally 'cannot possibly understand how anyone feels', thus the message's impact would be reduced. One participant went as far as saying the would find this message from a computer 'weird and creepy.'

Offering to help would be appreciated.
The quantitative analysis showed that participants overall felt that Sally was less helpful and effective than appropriate and sensitive in her message. This is born out in the comments. In the close and medium conditions, participants suggested that Sally 'could have been more helpful by offering to help.' One participant goes as far as suggesting that 'as a sister, Sally could have provided more input and offered her assistance'; it is a family member's duty to help. There is no expectation that distant friends or family should offer practical assistance.

Acknowledging feelings is helpful.
Nine participants felt that the support message acknowledged and validated Alice's stress. This, in itself, is helpful: 'While Sally isn't doing anything concrete to help Alice, just the fact that she's acknowledging it's a tough job and being supportive is very helpful.'

Support from distant friends.
Four participants commented that the distant sister/friend's message would not be very supportive because 'how can someone be supportive if they're in your life that little?'. Contrastingly, two participants thought it was nice of someone they were not close with to offer support: 'it's good for someone that I don't speak to often to actually acknowledge the situation.'

Brevity and Personalisation.
Nine participants commented that the message was short or impersonal. Some participants were critical of this: 'seems like a blanket statement that adds nothing to show that she really does understand how she feels,' while others thought it was nevertheless effective: 'It's a very simple message, but it shows a lot in letting someone know that they're not alone.'

Familial expectations.
Two participants highlighted that Alice's sister ought to provide better support because she was her sister: 'I think being her sister she could have expressed sympathy in a better way.' Another participant liked the message and stated, 'That is what family should do'. This indicates that for some participants, family members have an obligation to be supportive.

DISCUSSION AND CONCLUSION
In this study we found that the identity of the support giver has an impact on the ratings of supportiveness. People have different expectations of different members of their social networkfamily members are expected to provide support regardless of closeness, and close friends should provide good quality support. A thematic analysis of comments revealed that people prefer emotional support from a human; they like empathy; support from close friends means more; they prefer personalised support; and they have higher expectations from family over friends. This is to be expectedit is well established that there is an expectation/obligation of help and support from family (e.g Parrott and Bengtson 1999).
We found that people rated the emotional support worse if it came from a computer; however, it was still rated as good support by most participants, with a mean rating of 6.12 (SD 1.61) on a 1 to 9 likert scale of supportiveness (see Figure 3). This is an improvement on the mean rating of 6.00 that was found by Smith et al. (2014), where they only presented users with the scenario without any information about the sender of the support message. This implies that knowing that a message comes from a computer does not diminish support. While it is to be expected that people tend to treat VAs like real people (Reeves and Nass 1996), this study provides evidence that emotional support messages from virtual agents are also subject to this personification effect.
We found three weak effects for personality which did not vary with Identity. This supports the results in (Smith 2016), that personality does not have a big impact on emotional support. We found a correlation between Emotional Stability and helpfulness, implying that people with higher emotional stability find emotional support more helpful, and people with high agreeableness are generally more favourable to text-based emotional support than people with lower agreeableness. This provides further support for (Astrid et al. 2010), that agreeableness affects human-computer interactions.
The work presented has several limitations. Firstly, the study only considered one type of stressful scenario, namely Emotional Demand. Validated scenarios for other stressors have been produced (Smith et al. 2014), and these could be used to investigate the impact of stressor. Secondly, the study only used one particular support message.
Researchers have validated many support message types and instances for different types of stressors, which can be used in follow-on studies (Smith et al. 2014). Thirdly, the study only investigated the Big Five personality traits. Follow-on studies could consider other personality traits such as self-esteem and resilience. Fourthly, we only considered one instance of emotional support. Follow-on studies could look at sequences of support messages for when people experience multiple stressful situations over time. Fifthly, only textual support messages were used. Follow-on studies could investigate the impact of adding different emoticons (e.g. those proposed in (Smith 2016)), a visual representation of the agent, or emotional expressions by the agent. Finally, the study was indirect: asking participants' opinions on support messages for an informal carer in a particular stressful situation. A follow-on study could repeat this using participants who were actually experiencing the stressful situation themselves.
We did not explore the impact of personality in our thematic analysis. While this would be interesting, it would require a far larger sample; it is difficult to isolate which traits cause which opinions.
In this work we only compared a computer to known others. It would be interesting to see how AI support compared to an unknown other (e.g. volunteer from charity, healthcare worker); this might be more comparable to computer-generated support.
The findings of our study lead us to believe that a computer can provide acceptable emotional support for people in stressful situations. The crowd-sourcing methodology we used was particularly useful in investigating this without having to go through the long process of face-to-face co-design with carers (we would have needed very many carers if we wanted to investigate personality adaptation). Therefore we can now go on to implement and test a system with real informal carers.
The implications of this work for emotional support agents are encouragingan emotional support message delivered by a VA is appreciated by users, and is rated at least as supportive as a message from an acquaintance. However, there are individual differences: some users do not like support from a computer at all. Further investigations should explore what individual differences impact upon this, so that the use of a VA to deliver emotional support can be appropriately tailored to the user.