Spontaneous Pain Expression Recognition in Video Sequences

Automatic recognition of Pain expression has potential medical significance. In this paper we present results of the application of an automatic facial expression recognition system on sequences of spontaneous Pain expression. Twenty participants were videotaped while undergoing thermal heat stimulation at nonpainful and painful intensities. Pain was induced experimentally by use of a Peltierbased, computerized thermal stimulator with a 3 × 3 cm contact probe. Our aim is to automatically recognize the videos where Pain was induced. We chose a machine learning approach, previously used successfully to categorize the six basic facial expressions in posed datasets [1, 2] based on the Transferable Belief Model. For this paper, we extended this model to the recognition of sequences of spontaneous Pain expression. The originality of the proposed method is the use of the dynamic information for the recognition of spontaneous Pain expression and the combination of different sensors: facial features behavior, transient features and the context of the expression study. Experimental results show good classification rates for spontaneous Pain sequences especially when we use the contextual information. Moreover the system behaviour compares favourably to the human observer in the other case, which opens promising perspectives for the future development of the proposed system.


INTRODUCTION
The interpretation of facial expressions, and particularly expressions of emotion, is critical to everyday social interactions [3].The study of human facial expressions has an impact in several areas of life such as art, social interaction, medicine, security and human-computer interaction (HCI).Other applications of automated systems for facial expressions recognition is in affectrelated research like cognitive psychology, psychiatry, and neuropsychology, where such systems can improve research quality by improving the reliability of measurements.Most importantly, it can speed up the currently tedious, manual task of processing data on human affective behavior, notably using the Facial Action Coding System (FACS) developed by Ekman [4,5,6].For all these applications an automatic facial expressions classification system is necessary.
Most of the past work on automatic facial expression analysis has been dedicated to the analysis of posed facial expression and was not always applicable in real-life situations [16,22].Indeed, spontaneous facial expressions are often characterized by subtle changes of facial features while the acted facial expressions are characterized by exaggerated changes of facial features [7].Therefore, the focus of the research in the field has started to shift to automatic analysis of spontaneously displayed facial expressions [7,8,9,10,11,6].Moreover, from the overall state of the art in the field few efforts have been made towards detection of deliberately non-basic affective states such as attentiveness [12], fatigue [13,14], and pain [8,15].Extending our approach [2], we applied machine learning to the task of automatic recognition of spontaneous Pain expressions involving subjects undergoing thermal heat simulation of painful intensities [16].We focused on Pain because of its potential medical significance, for example, as a pain assessment tool in individuals who are not able to communicate Pain verbally (e.g.newborns, individuals with pronounced cognitive impairments [17,18,19]).This paper reports on a method of dynamic, multi-cue and context depend recognition of spontaneous Pain sequences and aims at making new contributions to the already proposed models.
The first step for automatic facial expression recognition implies face and facial features segmentation.But if we consider the state of the art in face detection and facial features localization as well as tracking, noisy and partial data should be expected [20].Therefore, facial expression analyzer should be able to deal with noisy and partial data and to generate conclusions so that the associated certainty varies with the certainty of face and facial points localization and tracking [20,6].The Transferable Belief Model (TBM) [33] is well adapted to deal with these considerations and is then chosen as analyzer.It facilitates the integration of a priori knowledge and can deal with uncertain and imprecise data.The TBM has been used in several applications such as image processing, geoscience, medicine, robotics and defense [21,22] and more recently in the analysis and the recognition of human facial and body behavior [1,2,23,24].
In the case of Pain expression, in addition to the permanent facial features behaviour (like eyebrows and mouth), one important part in the automated system was (brow lower) [8].Moreover, the mechanisms used by the human visual system, remains the best automatic facial expression recognition system.In a recent study, Roy et al. [25,26] have made a finer and less biased analysis [27] of the importance of facial features for the discrimination of the basic facial expressions including the facial expression of Pain.Their findings showed that Nasal root wrinkles are one of the prominent facial features that drive the human observer for the recognition of Pain expression.Based on these findings nasal root wrinkles are analyzed in combination with the permanent facial features for spontaneous Pain recognition.
In addition to the static facial feature information, it is important to note that in daily life facial expressions are not static, but are the result of dynamic and progressive combinations of facial features deformations.Bassili and more recently Ambadar [44] have shown that facial expressions can be more accurately recognized from image sequences than from single images.Moreover, it has been shown that temporal dynamics of facial behavior represent a critical factor for distinction between spontaneous and posed facial behavior [7,28,4,6], as well as for categorization of complex behaviors like Pain, shame, and amusement [4].Pain sequences shall be analyzed to take into account the temporal dynamics of the facial features and the decision shall be taken over the whole sequence.
Another limitation of the existing models is their context-independent classification.A few attempts have been made towards context dependent interpretation of the observed facial expression [29, 13,30,31,32].However facial expressions are accordingly displayed in a particular context, such as the location (outdoor, indoor), the situation (driving a car or being treated in a hospital), the task undergoing, the other people involved, the identity and personality of the expresser [20,6].However, to the best of our knowledge, no vision-based model takes into account the context of the application for spontaneous expression recognition.
Here we present a model that integrates a context variable in order to refine the recognition process.
The proposed work is a new development of the previously proposed system [2] applied to the dynamic recognition of spontaneous Pain expression in spontaneous videos.To summarize, the originality of the proposed method is twofold: (1) the fusion of different sensors: the permanent facial features deformation (such as eyes, eyebrows and mouth), the transient features information (nasal root wrinkles) and the context of the expression production (e.g.medical context); (2) the use of the dynamic behavior of these sensors for the recognition of sequences of spontaneous Pain expression.
The fusion process of all these pieces of information is based on the TBM which is well-adapted to design a fusion approach where various independent sensors or sources of information collaborate together to provide a more reliable decision [33,34].Moreover, most of the already proposed models map facial expressions directly into the basic facial expressions proposed by Ekman and Friesen [35] and are not able to model the doubt between several facial expressions in the recognition process.This property is important considering that "binary" or "pure" facial expressions are rarely perceived (people usually display mixtures of facial expressions [50].The TBM based model has proven its ability to deal with all these considerations for the recognition of the basic facial expressions [2,23].Here we demonstrate the suitability of this model also for the recognition of spontaneous Pain expression sequences.
The remainder of the paper is organized as follows: first, we describe the facial expression databases we used to train and evaluate the system; second, we describe briefly the main features of our automatic facial expression system and describe the fusion process using the TBM; third, we present the model of temporal classification and the final fusion and decision process of Pain expressions sequences, introducing a context variable; finally we present the classification results both on spontaneous and acted Pain expressions, emphasizing not only on the good performances but also on the quality of the information extracted from the video sequence.

FACIAL EXPRESSION DATA
Experiments in this paper are run across two databases: a spontaneous Pain expression database and an acted facial expressions database (STOIC).

Spontaneous Pain Expression Database
In most facial expression databases, facial expressions are acted.These acted facial expressions differ in appearance and timing from spontaneously occurring facial expressions [7].Here, we describe the creation of a spontaneous Pain expression database during a study conducted in a lab at the Institut Universitaire de Gériatrie de Montréal.Subjects were participating in a study on the relation between Pain catastrophizing and facial responsiveness to Pain in healthy, Pain-free individuals.

Description of stimulus material
In the database videos, Pain was induced experimentally by mean of a Peltier-based, computerized thermal stimulator (Medoc TSA-2001; Medoc Ltd, Ramat Yishai, Israel) with a 3 × 3 cm 2 contact probe.The contact probe was attached to the left lower leg.Baseline temperature was always set to 38°C. 1 non-painful (1 °C below the individual pain threshold) and 2 painful thermal stimuli (2 -3 °C above the individual pain threshold) were applied in a random order.The temperature increased from baseline with a heating rate of 4°C/s to the pre-set temperatures, remained at a plateau for 5 seconds and returned to baseline with a rate of 4°C/s.ISIs varied between 30-35 seconds.The faces of subjects were videotaped (see Figure 1).The video camera was placed in front of the subject at a distance of approximately 4 m.Before applying a stimulus, subjects were always instructed to focus on an emotionally neutral picture being positioned next to the camera in order to ensure a frontal view of the face.Subjects were also instructed not to talk during thermal stimulation.To mark the onset of stimulation on the videotape (for further analysis), we switched on a light signal concurrently.The light was visible to the camera but not to the subject (see Figure 1).

FACS
Facial expressions displayed on each frame of the obtained videos have been analysed using the Facial Action Coding System (FACS) [35].This system is based on anatomical analysis of facial muscle movements and distinguishes 44 different action units (AUs).These are the minimal numbered units of facial activity that are anatomically separate and visually distinguishable.The intensity for each action unit was rated on a 5-point scale (A -E) with A being the least intense of the action and E the maximum strength of the action.A FACS coder (qualified by passing an examination given by the developers of the system) identified the frequency of all 44 AUs and the intensity of 42 AUs (AUs 45 and 46 do not allow for intensity coding).A special software designed for analysis of observational data (the Observer Video-Pro (Noldus Information Technology)) was used to segment the videos and to enter the FACS codes into a time-related data-base.Time segments of 5 seconds beginning just after stimulus had reached maximum were selected for scoring.The database was used for the validation of the proposed model.

Posed Expression Database
We also used the STOIC dataset (see Figure 2) developed and validated by Roy and collaborators from the Université de Montréal [36].It is one of the latest facial expressions dataset validated by human observers.Videos were recorded using a camera located directly in front of the subjects (students from theatrical schools) who were asked to perform the six basic facial expressions (Fear, Happiness, Surprise, Anger, Sadness, and Disgust) as well as Pain and Neutral expressions.Fifteen videos of facial expressions unambiguously recognized as Pain by 25 human observers were employed.The database was divided in two parts: the training set (10 of the Pain videos) and the test set (the remaining five videos).
The training set was used to define the rules for Pain expression recognition.The rules correspond to the facial feature deformations (and then the corresponding characteristic distance states) leading to the maximization of the correlation between the human and the system performances.This process allows defining a combination rules (from all the possible ones) based on human observer validation (see Table 1).This process was also carried out on the basic facial expressions plus Neutral expression but will be presented separately in another paper.

AUTOMATIC SYSTEM
This paper proposes a new development of our model on facial expressions classification [2].The model is based on the comparison of the permanent facial features (eyes, eyebrows and mouth) deformations to their neutral state using the TBM [33].It is able to recognize pure expressions plus Neutral as well as doubt between the basic facial expressions (Joy, Surprise, Fear, Disgust, Sadness, Anger).It is also able to deal with all facial feature configurations that does not correspond to any of the cited expressions (Unknown expressions).In the following, the system is generalized for the recognition of Pain expression in video sequences using fusion of visual and contextual information.

Characteristic distance measurements
The first step in the Hammal et al. [2] facial expression model is the extraction of the contours of the permanent facial features (eyes, eyebrows and mouth-see Hammal et al. [37]).A specific parametric model is defined for each deformable feature.Several characteristic points are extracted in the image to be processed to initialize each model (for example, eyes corners, mouth corners and brows corners).In order to fit the model with the contours to be extracted, a gradient flow (of luminance and/or chrominance) through the estimated contour is maximized.The chosen models are flexible enough to produce realistic contours for the mouth, the eyes and the eyebrows.More details about this method have already been presented in [37] (see Figure3).
Based on the segmentation results of the first frame a set of characteristic points is selected and tracked in the remaining of the sequence.The algorithm we used to track these facial points is the Lucas-Kanade feature-tracking algorithm [38].To be the most robust possible the characteristic point positions are re-detected automatically at each eye blink.Indeed the spontaneous expressions are slower than the acted expressions and thus the tracking process of the characteristic points is sufficient.Figure 3 shows an example of the characteristic points tracking.  3.In a recent modeling process of a psychophysical experiment modeling the visual cues used by human observer for the classification of the six basic facial expressions Hammal et al. proved that these characteristic distances summarize the most important information necessary for the classification process [23].
A numerical to symbolic conversion is then carried out using a fuzzy-like model for each characteristic distance !Di (see TBM section and Hammal et al. [2]).It allows the conversion of each numerical value to a belief in five symbolic states reflecting the magnitude of the deformation.

!
Si if the current distance is roughly equal to its corresponding value in the Neutral expression,

!
Ci " ) if the current distance is significantly higher (vs.lower) than its corresponding value in the Neutral expression, and !Si " Ci + (vs.

!
Si " Ci # ) if the current distance is neither sufficiently higher (vs.lower) to be in

!
Ci " ), nor sufficiently stable to be in !Si (see Figure 4 for example).In order to determine the current expression (according to the characteristic distances), a fusion process of the states of the characteristic distances is then performed based on the Transferable Belief Model (see section 4).The TBM has already demonstrated its suitability for the classification of the basic facial expressions [1,2,23].The authors have validated their model on the two well-known benchmark databases (the Cohn-Kanade database [39] and CAFE database [40]) and on their own database (Hammal-Caplier database [41]).

Transient features
In the current modeling, adding to the permanent facial features deformation, transient features (Nasal root wrinkles) are also used for the classification process.The choice of the Nasal root wrinkles is due to their appearance in the Pain expression [42].Moreover, more recently Roy and collaborators [25,26] have made a finer and less biased analysis of the importance of facial features for the discrimination of the basic facial expressions as well as Pain expression for human observer.The experiment revealed the precise effective filters for the categorization of the six basic expressions as well as Neutral and Pain.Their results prove that the nasal root wrinkles correspond to one of the most important visual cues used by human observer for Pain expression recognition.Based on the eyes characteristic points (inner eyes corners) the nasal root area is selected for wrinkles detection (see Figure 4).In the selected area the Nasal root wrinkles detection is based on the Canny edge detector.The presence or absence of wrinkles is decided by comparing the number of edge points in the nasal root in the current expressive image with the number of edge points in the nasal root of a Neutral facial image.If there are about twice more edge points in the current image than in the reference image, wrinkles are considered to be present.Figure 4 shows an example of nasal root detection.The Canny edge threshold is set by expertise but is kept constant over all the databases.We take a high threshold to minimize the risk of errors.Then based on TBM modeling (see section 4.2) the system will keep the doubt instead of taking the risk of making a wrong decision.

The contextual information
As reported by some researchers [6,20] a largely unexplored area for facial expressions recognition is that of context dependency.Without context, even human may misunderstand the observed facial expression.Yet, with the exception of a few studies investigated the influence of context on affect recognition, all existing approaches to machine analysis of human affect are context insensitive [6].Then, an important related issue that should be addressed in all affect recognition is how to make use of information about the context.The aim of the proposed work is to prove the suitability of the TBM to easily adding one or more context variables in the model of facial expressions classification.In the case of our application -Pain recognition-several contextual variables can be defined: the place, the task, the answer to a writing question, etc.These contextual variables allow reducing the set of the expected facial expressions.In the current paper only the place is used.It consists of the place where the expression is done.This place variable can take two values medical or not (considering that the expresser is in the hospital or not) and where the aim is to identify if the videotaped expression is painful or not.The context variable is introduced for Pain recognition in the present study but it can be easily generalized for the other expressions defining a set of rules conditions for each one of them.For example, as reported by [20] Smiling in the context of downward head pitch, communicates embarrassment rather than Joy (the results for the basic facial expressions will be presented separately in another paper).
The context variable is then added to the permanent facial features and transient features as a refinement "sensor" for the classification process.Once all the required information is collected, a fusion architecture based on the TBM is done (see section 6).

FUSION PROCESS BY THE TRANSFERABLE BELIEF MODEL
In a realistic interaction environment, a facial expression analyzer should be able to deal with noisy and partial data and to generate its conclusion with confidence that reflects uncertainty of output of face and face point localization and tracking [6].The Transferable Belief Model (TBM) is then chosen as analyzer.
The TBM is a model of representation of partial knowledge [43,44] and can be understood as a generalization of probability theory.It can deal with imprecise and uncertain information explicitly defining doubt states and provides a number of tools for the combination of this information [33,34].It considers the definition of the frame of discernment exclusive and exhaustive hypotheses characterizing some situations.It means that the solution to the problem is unique and is one of the hypotheses of !".The TBM is well adapted to design a fusion approach where various independent sensors or sources of information collaborate together to provide a more reliable decision.
The TBM has already proved its suitability for the classification of the basic facial expressions [1,2].It has also proved its ability to deal with partially occluded facial parts, optimizing all the available information to take the best possible decision and its performances compare favorably to those of human observer in experimental conditions [23].
Based on these considerations, the proposed model for the classification of spontaneous Pain expression is based on the TBM fusion process of all the information resulting from the characteristic distance states, with the addition of the nasal root wrinkles and the context information (medical context).

The basic belief assignment of the characteristic distances:
Using the TBM approach requires the definition of the Basic Belief Assignment (BBA) associated to each independent source of information.The BBA

! m Di
"Di of each characteristic distance state where

The basic belief assignment of the transient features
The BBA

! m TF
"TF of the nasal root wrinkles is defined as: where !"TF = {P, A}, the power set The piece of evidence !m TF "TF = 1 is associated with each symbolic state given that the presence or the absence of the transient features.The detection threshold has been derived by statistical analysis on the Hammal-Caplier and the STOIC databases [41,36].
As reported above (see section 3.2), we take a high threshold to minimize the risk of errors.
Then if the number of wrinkles pixels is higher than the threshold, the system is sure that the nasal root wrinkles are present ( ! m TF "TF (P) = 1) otherwise the system keeps the doubt instead of taking the risk of making a wrong decision and then ( ! m TF "TF (P # A) = 1).
The nasal root wrinkles are used for the Pain identification as reported in the Table 2.The detection of the nasal root wrinkles allows a refinement of the classification eliminating the expressions: Happy, Surprise, Fear, Sadness and Neutral reducing the number of the possible expressions to 3 rather than 8. Indeed, in addition to Pain expression, the Nasal root wrinkles can also be present in the case of Anger and Disgust expressions.

The basic belief assignment of the context variable
The BBA

! m CT
"CT of the contextual variable is defined as: "CT associated with each symbolic state given that the context of the sequence acquisition is medical or not.It corresponds to the answer to the question: are we trying to know if the expresser is painful or not?-.Then if this is the case the piece of evidence of the state The piece of evidence of the corresponding expressions is computed as: -If we are in a medical context, trying to know if the expresser is painful or not, the pieces of evidence of the corresponding expressions is: -Otherwise the pieces of evidence of the corresponding expressions is: In the case of our application the variable state corresponding to the context is defined manually according to the fact that the expresser is in a medical context and we are trying to know if its current state is painful or not.
However, the context variable allows only refining the already obtained classification results (based on the facial sensors) where Pain is already recognized or where the system hesitates between Pain and another expression (see section results).Moreover as reported above we can have several contextual variables according to the expressions and the context information we want to model.We are working on that development of our model for a contextual recognition of the six basic facial expressions.
In order to prove the refinement process based on the use of the context variable, two simulations are presented for the recognition of spontaneous Pain expressions sequences: with the use of the context variable and without the use of the context variable.

TEMPORAL INFORMATION FOR FACIAL EXPRESSION CLASSIFICATION
Temporal dynamics of human facial behaviour is a critical factor for the interpretation of the facial expressions [7,44] and is moreover essential for categorization of complex psychological states like various types of Pain and Mood [45].
In the following we take into account the dynamic behavior of the permanent facial for the classification sequences of facial expressions introducing their temporal behavior pattern.
The described model has been proposed for the 3 facial expressions Joy, Disgust and Surprise [46] and has been generalized for the 8 facial expressions (Joy, Disgust, Surprise, Fear, Anger and Pain as well as Neutral).However as we are interested in the Pain expression classification only the results related to Pain and especially spontaneous Pain sequences are reported in this paper.To our knowledge this is the one of the first tentative to explicitly model the dynamic behavior for the recognition of sequences of spontaneous Pain expressions.The temporal information is introduced at two levels: first by taking into account at each time t the information at time t-1; second by combining all these information from the beginning until the end of the sequence to take the decision.

Basic belief assignment prediction of the characteristic distance states
The main idea is to define an evolution model for the permanent facial features behavior and then the corresponding characteristic distance states.The model predicts the basic belief assignment !mt at time t according to the basic belief assignment !mt"1 at time !t "1 (it is assumed that the two BBAs are close because the information between two consecutive frames is strongly related).Indeed, spontaneous facial expressions are slower in time than posed facial expressions [cohnsmile04, valstarpantic06].
The temporal model consists in defining the conditional pieces of evidence, gathered in a "transition matrix" for each characteristic distance.The predicted basic belief assignment !m ^D j ,t , ! t (1 " j " 5) defined on ! 2 "D j consists in predicting the pieces of evidence !mDj ,t of the characteristic distance states at time t according to their pieces of evidence !mD j ,t"1 at time t − 1.
The predicted basic belief assignment is computed at each time t by the combination of a transition matrix !M(Dj ) and the computed basic belief assignment at time t−1 in the following way: ) ) The elements of the matrix M(Dj) are the fraction of the mass to be in a given state at time t knowing the state at time t−1.This matrix corresponds to the temporal evolution model; it is defined for each characteristic distance independently of the subject and of the expression.

Temporal evolution model
The temporal evolution model corresponds to the transition matrix composed of a distribution of conditional pieces of evidence [47].These pieces of evidence correspond to the pieces of transition from each proposition A of the frame of discernment at time t−1 (previous frame) to each one of the possible propositions B at time t (current frame) and is noted where the sum of all the conditional pieces of evidence belonging to the same column is equal to 1 and the matrix dimensions is [5X5].
The piece of evidence of the transitions are learned using the Hammal-Caplier database and have already been validated for the three expressions Joy, Surprise and Disgust [46] and generalized for the six basic facial expressions plus Neutral plus Pain.In this paper only Pain results are reported.Tested on spontaneous sequences the proposed model proves its generalization and robustness for news data.
The conditional basic belief assignment can be defined between each two consecutive frames !mD j ,t= M i e (Dj ) " mD j ,t#1 (5)   Equation 5 is defined for one transition between (t−1, t).To obtain the transitions on the whole sequence, the !mD j ,t and !mD j ,t"1 are concatenated in ! M i(D j ,1..N "1) e ) corresponding to the BBA of the considered distance !Dj , for the subject i from the frame 2 until N (resp. 1 until N-1) for the expression e such as: or more detailed: = M i e (Dj ) * mD j ,1(S) ... mD j ,N #1(S) It has to be noted that the elements of each column of these matrices correspond to the BBA associated to the characteristic distance !(Dj ) for the considered subject i and the expression e in the current frame.For example, for the proposition S in !Mi e (D j ,2..N ) , ! mD j,2 (S) corresponds to the piece of evidence of the state S for the subject i in the frame 2 for the expression e.
The transition matrix has been already validated for the three expressions (Joy, Disgust and Surprise [46]).This paper describes its generalization for Spontaneous Pain expression.

Sequence expressions classification
A facial expression is the result of progressive deformations of a set of facial features appearing at different times and without any defined appearance order (asynchronously) [46].Spontaneous facial expression is characterized by a beginning, one or more apexes and an end [7].In each expression sequence, the beginning is detected as the first frame where at least one of the permanent facial features (and then the corresponding characteristic distances) is no more in the stable state S (Neutral); the end is detected as the first frame where all the permanent facial features (and then the corresponding characteristic distance states) have come back to the stable state S. However there is no way to detect the apexes of one expression sequence.The proposed method deals with this consideration.The recognition of Sequence of Pain expression is done taking into account all the available information (previously facial features deformation) between each pair of beginning and end frames.Then the BBAs of the characteristic distance states at each frame in this interval are computed as described in section 4.1 and combined based on the rules table (see Table 1) to define the expression corresponding to all these deformations.

Processing
Once the beginning of the expression has been detected, the analysis of the distance states is made inside an increasing temporal window Δt.The size of the window Δt increases progressively at each time from the beginning until the end of the expression.Then, at each time t, the whole set of the previous information (the past states of the characteristic distances and then the corresponding facial features behavior) is taken into account to classify the current expression sequence.This allows to explicitly deal with the dynamic of the facial expression and more importantly with asynchronous facial features deformations.Once the beginning is detected, the classification consists in defining at each time t the basic belief assignment BBA of the characteristic distance states defined on !C + ,C " ,S # C + ,S # C " { } according to their past basic belief assignments from the beginning until the current frame.To do this, at each time t, inside the current window Δt and for each characteristic distance, a criterion has to be used to select its corresponding state according to its previous behavior.The selection is made according to the number of appearance of each symbolic state in (state) (see Equations 9   for example) and integral (sum) of plausibility noted !Pl "t (state) computed inside the temporal window !"t (see Equation 10for example).These rules have been already validated for the 3 expressions Joy, Disgust and Surprise in [46] and are generalized for the Pain expression in this paper.
For instance, for a characteristic distance !Dj and for the state !
where !Kt indicates the occurring or not of a symbolic state at time t.
From these two parameters (state), some rules are used to choose the distance states at each time t inside the temporal window Δt [46] as: • If only one singleton state appears inside the increasing window, this one is chosen to be the state of the studied characteristic distance.• If two singleton states appear, the most plausible state between them is chosen.
• If only doubt states appear, the most plausible one between them is chosen At the beginning, all the distances are in the stable state S and change only if one of the other states appear in the increasing window.In this case the corresponding state is chosen according to the rules defined above.
The piece of evidence associated to each chosen state corresponds to its maximum peace of evidence inside the current temporal increasing window.Finally at time t between the beginning and the end of the expression sequence, once the basic belief assignments of all the characteristic distances are defined, the corresponding expression is selected according to the rules table (see Table 1).Then it is fused to the information produced by the nasal root wrinkles and the context variable to give the best possible decision.
The salient character of this classification is that a decision can be made at each time t taking into account all the past basic belief assignment of the characteristic distance states (and then the whole dynamic of the corresponding facial features) from the beginning until the current frame [46].At the beginning of the sequence all the expressions are in the set of possible expressions and, during the sequence, this set is progressively reduced.When reaching the end (the current frame is then the last frame of the sequence), the decision depends explicitly on all the past basic belief assignments of the characteristic distance states and gives the classification on the entire expression sequence.

FUSION PROCESS
The main feature of the TBM is the powerful combination operator [33,48] that integrates information from different sensors.In the current case the sensors are the characteristic distance states, the nasal root wrinkles and the contextual variable.However it requires the definition of the fused information on the same frame of discernment.The fusion process is done in three successive steps at each time (frame) of the sequence: first the fusion of all the characteristic distance states; then combination of the obtained results to the nasal root wrinkles and finally combination to the context of the application to refine the classification results.

Fusion of the characteristic distances information
Based on the facial feature deformations associated with Pain expression, this latter is characterized by a set of characteristic distance states according to the rule displayed in Table 1 (see [2] for the rules of the six basic facial expressions).This mapping has been obtained from the Pain expression sequences validated by human observers on the STOIC database [36].From these rules and in order to take into account all the available information, the facial expression classification is first based on the TBM fusion process of all the !Di states.

! m Di
"Di of the states of the characteristic distances are defined on different frames of discernment.For the fusion process, it is necessary to redefine the BBAs on the same frame of discernment ! 2 " , where  " of all the states of the characteristic distances is performed using the conjunctive combination rule [33,34]

DECISION PROCESS
The decision is the ultimate step of the classification process.It consists in making a choice between various hypotheses !Ee and their possible combinations.Making a decision is associated with a risk except if the result is sure ( ! m(Ee ) = 1).Several decision criteria can be used [33,34].
In this paper the decision was made using the pignistic probability BetP [49] as: where !" corresponds to the conflict between the sensors.

CLASSIFICATION OF SPONTANEOUS VERSUS ACTED PAIN
The simulation results were obtained on the 20 spontaneous pain sequences obtained on the experimental condition described in section 2.1 and the 15 videos sequences of the STOIC database (validated by human expert).
Our simulations aimed at, first, proving the generalization of the proposed model for all the facial expressions and its robustness to identify Pain expression, discriminating it from the six basic facial expressions plus Neutral and second, proving the refinement role of the contextual information.Three simulations were carried out: first, the system performed a 2-alternatives choice between Pain and Neutral without the use of the context variable; second, it performed an 8-alternatives choice between Pain and the six basic facial expressions plus Neutral without the use of the context variable and finally it performed the same 8-alternatives choice using the context variable.Figure 5 presents an example of the information displayed during the analysis of the Pain expression sequences.The interface is divided into five different regions: on top left, the current frame to be analyzed; on top middle the result of the static classification (based only on the information at the current frame, here this is a Pain expression with a Pignistic probability equal to 1); on top right, the result of the dynamic classification which corresponds to the classification of the sequence since the beginning until the current frame 94 (here Pain sequence with a Pignistic probability equal to 1, see section 5); on bottom left, the current states of the characteristic distances and their pieces of evidence; on bottom right, the corresponding facial features deformations.

Spontaneous pain results
In this section we investigate how the proposed model performed on "spontaneous" Pain sequences.Our investigation includes the generalization to a new database, as well as head movement with both in-plane and out-of-plane rotations.The results are presented on 20 subjects.
In the first simulation the system performed a 2-alternatives choice between Neutral and Pain facial expressions.The classification rates compare favorably (70%) with the already reported classification rates on spontaneous facial expressions.However, this comparison remains difficult to do as their approach differ in several characteristics (the systems were applied on different expressions, they either classify expressions or Action Units and they are tested on different databases).Based on a 2-alternatives choice, the Ignorance state corresponds to the cases where the Pignistic probabilities of Pain and Neutral are equal (0.5).Having only two expressions this state corresponds to total ignorance of the system (30%) (see Table 3).
Pain Ignorance Pain 70 30 In order to know if the system was able to recognize Pain expression and discriminate it from the six basic facial expressions as well as Neutral, a second simulation was carried out where the system performed an 8-alternatives choice between the six basic facial expressions as well as Neutral and Pain.It has to be noted that such an 8-alternatives classification has never been done on spontaneous Pain expressions.The classifications rates are reported in Table 4.
The row (Sadness, Pain + Anger, Pain) corresponds to the cases where the system recognizes at the same time Pain expression with one of the two expressions Sadness or Anger.In these cases the two couple of expressions Pain/Sadness or Pain/Anger are recognized with the same Pignistic probability (Pain=0.5 and Sadness=0.5 or Pain=0.5 and Anger=0.5).Pain is then recognized equiprobably with Sadness or with Anger.Similarly the row (Sadness, Anger, Pain) corresponds to the cases where the three expressions are recognized equiprobably (Pignistic probability equal to 0.33).
To summarize, the system recognizes spontaneous Pain expression but mix (or doubt) with Sadness or Anger at the same time.In these cases the system is sure that the current expression is one of these 2 and never one of the 6 other expressions.Considering hospital context application (waiting room, older people under camera monitoring), such as information is more than sufficient to alert somebody in charge that the patient is suffering Pain.A human observer (medical doctor or nurse) can then confirm or not this information.Moreover, the obtained results reflect the ability of the proposed model to deal with the perception of mixture of facial expressions (people usually do not display "pure" facial expressions [50]).The obtained results are comforted by the results of Roy and collaborators where human observers where asked to classify Pain expression in the 8-alternatives choice.Interestingly, the model shows a striking similarity with humans who also misclassify Pain with Sadness or with Anger expressions [25,26].
In the third simulation, the context variable is added to the 8-alternatives choice to refine the classification results.Results show that the doubt (between Pain, Sadness and Anger) is solved (Table 4).This result proves the usability of the context variable in our application.We obtain a classification rate of 77% which compares favorably to the previous results on automatic recognition of spontaneous expressions.
The ignorance state corresponds to the case where the system recognizes at the same time the 8 facial expressions with the same Pignitic probability (.0125).This case corresponds to the total ignorance of the system.As a conclusion, all these results show the ability of the proposed model to deal with spontaneous Pain sequences and to take advantage of the context information.It also shows its robustness when applied to new sets of data (not used for training).

Acted pain results
The classification results are also reported on the acted database (the Stoic database validated by human observer).In the first simulation the system performed a 2-alternatives choice between Neutral and Painful.The classification rates are higher than spontaneous Pain expressions (92.3%) while the Ignorance rates decrease significantly (7.7%) (see Table 5).
Pain Ignorance Pain 92.3 7.7 Similarly to the spontaneous Pain simulation, the system was tested on an 8-alternatives choice with and without context variable.The classification rates are reported in Table 6.In this database, the doubt is no longer between the expected expressions (Sadness and Anger) but between Smile and Disgust expressions.These results emphasize the difference between the acted and the spontaneous expressions especially for Pain expression.
As for the spontaneous results, the combination with the context variable leads to solve this doubt leading to a good classification results (92%).
The classification results for acted and spontaneous Pain sequences and especially the difference between the mixed expressions in each case opens promising perspectives for future development of the model aiming at discriminating between acted and real Pain expressions.

CONCLUSION
Here we presented results for the classification of spontaneous Pain expression.The system proves its suitability to deal with spontaneous sequences and gives good classification results, encouraging our future work on other spontaneous facial expressions.Moreover, it allows modeling the doubt between expected expressions revealing the same confusion as the ones obtained by a human observer.The good classification rates in the two databases and the difference between the misclassified expressions gives us indications for our current development of the model and its ability to dissociate between acted and real Pain expressions.

FIGURE 1 :
FIGURE 1: Examples from the spontaneous Pain sequences

FIGURE 2 :
FIGURE 2: Example of Pain sequences from the STOIC database

FIGURE 3 :
FIGURE 3: Characteristic points tracking and the corresponding characteristic distances.From the segmentation results, the permanent facial features deformations occurring during facial expressions according to the Neutral state are measured by five characteristic distances

FIGURE 4 :
FIGURE 4: Example of nasal root wrinkles detection

!
Ci " ) and !S i .! m Di "Di (A) is the belief in the proposition !A " 2 #Di without favoring any of propositions of A in case of doubt proposition.This is the main difference with the Bayesian model, which implies equiprobability of the propositions of A. A is called focal element of !m Di "Di (A) whenever !m Di "Di (A) > 0. Total ignorance is represented by !m Di "Di ("Di ) = 1.To simplify, !{Ci + } is noted !C + and !{Si,Ci " } is noted !S " C + (i.e.!S or !C + ). The piece of evidence !m Di "Di associated with each symbolic state given that the value of the characteristic distance !Di is obtained by the function depicted in Figure 5.The threshold values {a, b, c, d, e, f, g, h} have been derived by statistical analysis on the Hammal-Caplier database (Hammal-Caplier database [41]) for every characteristic distance.Details can be found in Hammal et al. [2].

FIGURE 5 :
FIGURE 5: Model o f basic belief assignment based on characteristic distance Di.For each value of Di, the sum o f the pieces of evidence of the states of Di is equal to 1.

! 2 "
TF = {{P},{A},{P, A}} the frame of discernment, !P means that the nasal root wrinkles are present and!A means that they are absent.From the frame of discernment only the states !P (we are sure that the wrinkles are present) and the state !P " A (we don't know) are considered (the notation are simplified like the section 4.1).

1 -
To summarize: -If the nasal roots are present: the current expression is Pain or Anger or Disgust (without favoring any of them) and the corresponding piece of evidence is computed as: -! m TF "TF (P) = m TF "TF (Pain # Anger # Disgust) = If they are absent: the current expression is one of the 8 expressions with the piece of evidence: -! m TF "TF (P # A) = m TF "TF (Pain # Anger # Disgust # Happy # Surprise # Fear # Sadness # Neutral)

" 2 "
CT = {MC,NMC}, the power set ! CT = {{MC},{NMC},{MC,NMC}} the frame of discernment, !MC means medical context (the expresser is in a medical context then it is more likely that the expected expression corresponds to Pain ) and !NMC means not medical context, !MC " NMC means that we don't have any idea of the context of the expresser (then the expected expression can be one of the 8).From the frame of discernment only the states !MC (we are sure) and the state !MC " NMC (we don't know) are taken into account (the notation are simplified like the section 4.1).The piece of evidence !m CT For example, for a considered distance !Dj , ! mD j [S](C + ) corresponds to the piece of evidence (the belief) !mD j (C + ) at t such as !mD j (S) = 1 at time t − 1.For each characteristic distance Dj (1≤j≤5), all the conditional pieces of evidence are gathered in the corresponding transition matrix !M(Dj ) as:

!
mD j ,t ); 1 ≤ t ≤ N, N the total number of frames per sequence).Then it exists a transition matrix !M i e (Dj ) for each distance !Dj , for each subject i and for each expression e noted !M i e (Dj ) such as:

"
Di is derived for each characteristic distance !Di.In order to combine all this information, a fusion process of the BBAs !m Di

"
(see equation 11) and results in !m " the BBA of the corresponding expressions: !m " = #m Di " derived on the same frame of discernment, the joint BBA !m Di, j is given using the conjunctive combination (orthogonal sum) as:

F 6 . 2 "B
. This leads to propositions with a lower number of elements and with more accurate pieces of evidence.The results of the characteristic distances combination are then refined by their combination with the nasal root wrinkles and the context information.Fusion of the characteristic distances results with nasal root wrinkles and the contextual information From the BBAs of the nasal root wrinkles states respectively.The results of the combination of the characteristic distances are then combined by the conjunctive combination with those of the nasal root wrinkles as reported by the following equation:!m D,TF " (G) = (m D " # m TF " )(G) = m D " (A) $ m TF "(B) denote propositions and !A " B denotes the conjunction (intersection) between the propositions !A and !B. Spontaneous Pain Expression Recognition in Video Sequences the set of expressions (Pain need to be identified from the set of 8 facial expressions).Finally the combination of the characteristic distances and the nasal root wrinkles are combined by the conjunctive combination with those of the context variable as reported by the following equation:

FRAME 6 :
User interface displaying the classification information extracted during a sequence.Top left: current frame; top middle, BBAs of the expressions; Top right, classification results (MaxPigni :maximum pignistic probability); bottom left, estimation of the distance states and the corresponding facial features deformations with their pieces of evidence

TABLE 1 :
Rules table based on the nasal root wrinkles for Pain detection

TABLE 2 :
Rules table for Pain recognitionFrom the rules table and the BBAs of the states of the characteristic distances

TABLE 3 :
Classification results (%) of spontaneous Pain sequences in the case of two-alternatives choice

TABLE 4 :
Classification results (%) of spontaneous Pain sequences in the case of an 8-alternative choice without context variable (first row) and with context variable (second row)

TABLE 5 :
Classification results (%) of acted Pain sequences in the case of two-alternatives choice

TABLE 6 :
Classification results (%) of spontaneous Pain sequences in the case of an 8-alternative choice without context variable (first row) and with context variable (second row)