Intelligent Finger Movement Controlled Inter-face for Automotive Environment

This paper presents an intelligent finger tracking system to operate infotainment systems or instrument panels in automotive environment. We developed algorithms to control an on-screen pointer using off-the-shelf infra-red sensors and reported a user study on pointing and selection tasks undertaken using the finger tracking system. We proposed polynomial models to predict probable targets and a Bayesian Fusion Model to integrate eye gaze locations of users with their finger tracks. The predictive fusion model resulted in less than 2 seconds pointing and selection times on average inside a car running on a motorway.


INTRODUCTION
Recent research on intelligent dashboard design often investigates new modalities of interaction inside a vehicle such as head-up displays, gaze controlled interface [Biswas 2016], finger movement tracking systems [Ahmad 2017], haptic feedback [Chang 2011], hand gesture tracking based input [Ohn- Bar 2014] and so on.However, most research either collects data in driving simulators or analyses offline data [Ahmad 2016].Driving simulators miss the effect of on-road vibration while analysis of offline data misses the presence of a feedback loop by an operator.This paper reports a user study in which a person used a finger tracking based dashboard, inside a car running on a highway, and a set of intelligent multimodal algorithms resulting in an average response time, for standard point and selection tasks, of less than 2 seconds.We have explored the use of an intelligent finger tracking system that can predict users' pointing targets before the users physically touch a touchscreen.If the driver need not physically touch the instrument panel it: • can be placed even out of the reach of the driver.
• will be helpful for an elderly driver who has reduced range of motion at the shoulder due to age related physical impairments, such as arthritis.
• can reduce the pointing and selection time as targets will be selected even before users actually touch them.
We initially collected data on finger tracking systems in desktop computing and automotive environments and used the cursor trajectories to develop predictive models.Then we conducted a user study inside a car on the intelligent finger tracking system.

Cursor Trajectory Analysis
We used data from our previous studies for developing a predictive model for the finger tracking system.Data were collected in desktop computing environment and inside a moving car, bus and train using finger and eye gaze tracking sensors [Biswas 2016].We fit different polynomial models on variable numbers of previous data points.We considered 3, 4 and 5 previous data points and fit linear, quadratic, cubic and quartic polynomial equations, as appropriate.We then predicted the y-coordinate from a given x-coordinate after developing different polynomial models from previous data points and compared the average R² and error.
In the tables below, we have bold-faced the highest R² values and the lowest error terms.It may be noted that prediction using the last 3 points increased R² and reduced error for most models.The cubic and quartic models increased R² values over their linear and quadratic counterparts but also increased error.So we further analysed the linear and quadratic models in the following study described in section 3.
Intelligent Finger Movement Controlled Interface for Automotive Environment Biswas • Twist • Godsill 2

Fusion Model
Besides the predictive model, we also explored the possibility of fusing another input modality to increase the accuracy of the predictive model.We have used an eye gaze tracker to improve the accuracy of finger tracking.Eye gaze tracking is the process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head while an eye tracker is a device for measuring eye positions and eye movement.Our fusion technique considers eye gaze and finger tracks as two independent variables.The fusion model works at feature level [Sanderson 2002] and takes x and y coordinates as input which are calculated from raw eye gaze and finger tracking signals.The eye gaze tracker was calibrated using 9-point calibration on a 2dimensional screen.We used a least square predictor with the finger tracker that can predict the next probable point of finger movement as described in the previous subsection.We constructed two 2-dimensional Gaussian distributions at the point of eye gaze and at the point of predicted finger track on the screen.The standard deviations of the Gaussian distributions were proportional to the published accuracy of the trackers.In this particular implementation, we considered all points on the screen are equally likely to be a probable target and so multiplied the likelihood probabilities of eye gaze and finger tracks.The pointer is moved to the most probable point on the screen based on the Maximum Expected probability (MEP).The pointer trajectory is fed to the target prediction system, discussed in previous section, to predict a target.The following set of equations further explains the fusion strategy.In these equations P(x) stands for probability of variable x while P(y|x) stands for conditional probability of variable y given x.

 P(eye-gaze|target) × P(finger-track|target)
Considering all targets are equally likely to be selected

EVALUATION OF THE FUSION MODEL
We conducted the following study to evaluate the fusion model.We compared the linear and quadratic model with a naïve nearest neighbour predictor.The nearest neighbour predictor only predicts the nearest target from the latest cursor position.We have described the study in the following subsections.

Material:
We used similar set of materials as we used in the study described in section 2.2.The car was driving along a motorway at approximately 60 MpH while participants undertook pointing and selection tasks.We used a Tobii EyeX gaze tracker with the Tobii SDK to record eye gaze locations and a Leap Motion motion Intelligent Finger Movement Controlled Interface for Automotive Environment Biswas • Twist • Godsill 3 tracker to record fingertip locations.

Design:
The pointing and selection task was initiated through an auditory cue.It mimicked a car dashboard (figure 2) and participants were instructed to select a button on it after hearing the auditory cue.The auditory cue was set to appear between 5 and 7 second intervals.The target button was randomly selected on the dashboard.The pointing was undertaken through the intelligent finger tracking system as described in the previous section and selection was done through dwelling on target.
For the control condition, the dwell time was set to 500 msec while, for the predictive conditions, a target was automatically selected if it was predicted for seven consecutive prediction cycles.

Figure 2. Screenshot used in the study
Procedure: Initially, the participants were briefed about the study and its goal.We calibrated the eye gaze tracker while the car was not moving and demonstrated the pointing and selection tasks to the participants.They undertook the experimental task once to familiarize themselves while the car was not moving although we discarded the data from this familiarization session.
Once the car was driven to the motorway, we commenced the actual trial.Participants undertook pointing and selection tasks under the following three conditions in random order: 1. Nearest Neighbour Predictor (Control Condition) 2. Fusion Model with Linear Predictor 3. Fusion Model with Quadratic Predictor Participants were allowed five minutes under each condition.They were instructed to take their hands out of range of the motion tracking sensor after each pointing task and start hand movement only after hearing the auditory cue.After each condition, they filled up the TLX sheet based on their average performance.

Results:
We measured three dependent variables: 1. Selection Time: the time difference between the last auditory cue and timestamp recorded at the selection of a target.2. TLX scores: which were given by participants after each individual session based on their average subjective feeling.3. Number of wrong selections: which were automatically measured by the logging software.
As expected, the selection times and TLX are lower in the predicted condition while the number of wrong selection are lower in the control condition.The average selection times were less than 2 seconds in both linear and quadratic prediction models.However, none of the dependent variables show a statistically significant difference at p<0.05 in repeated measure ANOVA.Discussion: Our study found that the fusion model reduced the average pointing and selection times below 2 seconds and the quadratic model also reduced cognitive load in terms of TLX scores from the control condition.However, we could not find statistically significant difference for the dependent variables.Due to safety issues, we could not let the driver undertake the pointing and selection tasks while driving and the study was conducted on the passengers.Although we instructed the participants to take their fingers away from the sensor after each pointing task, in effect many of them did not fully remove their hand from sensing range of the motion tracking sensor in-between the pointing tasks.As a result, the sensor could continuously track fingers and the quality of tracking was too good to properly leverage the prediction mechanisms.

Figure 3. Comparing pointing and selection times for unimodal and multimodal systems
However, in a realistic situation, the hands of the driver will be engaged to the steering wheel and the sensor would only need to construct the hand model for each and every pointing task, while the driver reaches towards the dashboard (or instrument panel).Constructing the hand model for a moving hand may reduce the accuracy of sensing and result in the eye gaze tracker and predictive model becoming more useful.One way of simulating this situation would be by using a driving simulator but, in that case, we shall miss the vibration from the road and, as our study pointed out earlier, pointing and selection times were significantly affected due to vibration.Our present research is focusing on improving the accuracy of the prediction model as well as simulating a realistic driving situation to validate the model.

CONCLUSION
This paper proposed an algorithm to control an onscreen pointer using finger movement recorded by offthe-shelf infra-red trackers.Our algorithm was different from existing ones as we did not try to recognize a limited set of gestures but rather proposed an algorithm to control a graphical user interface by unconstrained finger movement.We analysed cursor trajectories of the finger tracking system and proposed a set of polynomial models to predict cursor trajectories a-priori.We have also proposed to use an eye gaze tracker, to increase the accuracy of prediction, and designed a Bayesian Fusion Model that combined signals from eye gaze and finger trackers.The fusion model can also be updated based on the history of interaction of users although, for the present analysis, we considered all pointing targets to be equally likely to be selected.Our study inside a car, involving 9 users undertaking pointing and selection tasks, reduced the target selection times and cognitive load using the Bayesian model compared to a naïve nearest neighbour predictor although the number of wrong selections increased using the fusion model.The average pointing and selection times were less than 2 seconds using the fusion model.Our future research is trying to further reduce the pointing and selection times by improving the fusion model.

Figure 4 .Figure 5 .
Figure 4. Comparing cognitive load in terms of TLX scores for unimodal and multimodal systems

Table 1 .
Polynomial Model Fitting in Desktop Computing Environment

Table 2 .
Polynomial Model Fitting in Vibrating Environment (inside moving vehicle)