Big Data Quantitative Risk Analysis Method for Machine Health Indicator Prediction

health indication method like the RUL which predicts We found that the BDQRA method provides good performance at detecting the onset of bearing deterioration. The BDQRA method also has a major advantage compared with other bearing health monitoring methods because it can provide information about the contribution deterioration from operation of other components of a process machinery. Abstract: Various data-driven methods have been applied to predict machine health indicators especially in the field of prognostics. Machine health indicators reveal the condition of equipment and/or its components including bearings by monitoring their operation data such as frequency vibration. To aid the prediction of the machine health indicators, this study applies the BDQRA method to monitor the health of bearings as a component of the machine. The BDQRA method involves applying data compression techniques like feature extraction to the bearing vibration data, to extract the domain features. Due to the complexity of the feature extraction process, this study proposes fast Fourier transformation for the data compression. This is followed by obtaining a time series profile of the bearing vibration data to analyse the health status of component bearing. It the uses changepoint analysis to predict the period at which the bearing health deterioration is imminent. Since the bearing health deterioration could be due to the independent operation of a component bearing or through communication between the component bearing and other components (or bearings) within the process machinery, the method also applies the principle of interaction effect to investigate the contributions from the other components of the machinery to the health deterioration of the component bearing detected. The accuracy of the prediction of the point of imminent health deterioration of the component bearing is investigated by comparing the outcome of the BDQRA method with the outcome of other methods published in literature which have been applied to the dataset used in this study. The findings reveal the BDQRA method have comparative advantages to the methods used in the related studies.


HIGHLIGHTS
• Monitoring the health of mechanical equipment has always been a challenge for the highly hazardous process industries.
• Bearing is one of the basic components in process machinery so any risk associated with bearing deterioration may have a detrimental effect on the entire operation of the process system.
• Monitoring bearing health is applied for machine health indicator prediction used by the highly hazardous process industries.
• One such method is remaining useful life, which predicts the time after the onset of bearing deterioration.
• Big data quantitative risk analysis is an alternative approach which detects the onset of deterioration in a component bearing and can provide information about the contribution from the operation of other component bearings to the health issues of process machinery.

SUMMARY
Good health indicators of mechanical equipment have always been the desire of operation and safety managers because they help improve reliability, minimise operation cost and reduce downtime. One of the most basic components of a process machinery is the bearing, hence any risk associated with its deterioration may have a detrimental effect on the entire process operation. As a result, methods like risk analysis, preventative management, condition monitoring, prognostics and remaining useful life (RUL) are applied to monitor bearing health conditions to aid effective decisionmaking within process industries. We applied the Big Data Quantitative Risk Analysis (BDQRA) method to monitor bearing health conditions and verified the outcome with other bearing health indication method like the RUL which predicts the time to end of life of the bearing. We found that the BDQRA method provides good performance at detecting the onset of bearing deterioration. The BDQRA method also has a major advantage compared with other bearing health monitoring methods because it can provide information about the contribution to the deterioration from the operation of other components of a process machinery. Health indicators in mechanical equipment have always been challenging to operation and safety managers over the years due to several reasons including improving reliability, minimising cost of operation, and reducing downtime (Heng et al., 2009;Lee et al., 2014). Owing to this, several methods has been applied to aid effective decision-making within industries. They include risk analysis, preventative management, condition monitoring, prognostics, and RUL (Qiu et al., 2003;Benkedjouh et al., 2013) so that appropriate action can be taken to prevent excessive downtime of the operation. All the aforementioned techniques have been applied for monitoring the health condition of mechanical components including bearings.
Bearing is one of the most basic components of most process machinery and plays a major role in issues affecting the overall operations of a process system. As a result, any risk associated with deteriorating bearing health condition may have a detrimental effect on the entire process operation of the facility. Many potential sources of bearing health issues including mechanical causes like overheating have been reported (CSB, 2009). As a result, numerous bearing health monitoring techniques including acoustic measurement, electrical effects monitoring, oil debris monitoring, power quality, temperature monitoring and vibration analysis have been investigated by researchers (de Azevedo et al, 2016).
This paper proposes a hybrid method for predicting the time at which imminent bearing health deterioration could occur. This hybrid method utilises a collection of methods including time-domain method like RMS, frequency domain method like FFT and big data methods like time series analysis, changepoint analysis, regression decision tree analysis, regression analysis, analysis of variance and interaction effects. Although each of the methods within the hybrid method can be applied for bearing health deterioration analysis on its own, we are of the view that when used separately, each method has its own strengths and weaknesses. Besides, some of the individual methods have already been investigated using the dataset applied in this research. In our view, the hybrid approach using a combination of all these methods will maximise the overall investigation than using the individual methods. The hybrid method was developed and tested using historical data but would be run in near real-time in actual situations. We called this method Big Data Quantitative Risk Analysis (BDQRA) because it was applied for quantitative risk analysis at the time of its development and testing (Jordan, 2019).
The proposed method differs from other methods which have been applied by previous researchers as it adopts the big data technique of change-point analysis (Basseville & Nikiforov, 1993;Killick et al., 2010) to capture the point at which the imminent bearing health deterioration could occur and applies the theory of interaction effect to determine the contribution of other mechanical components within the process machinery to the deterioration detected. Fig.1 is a schematic diagram of the method. The remainder of this paper is organized to include Section 2 which presents a description of the dataset used, including a description of data selection, storage, dominant features in bearing vibration data which depends on the health conditions of the bearing. Section 3 presents a description of change-point analysis and the reason for its selection as a deterioration condition detection method. Section 4 presents detail explanation on how the BDQRA method works. This is followed by a presentation of the various stages within the BDQRA method as applied in this research as Section 5. Section 6 is dedicated to presenting and discussing of the results from the method for data analysis. Section 7 then presents the conclusions drawn from the research.

Data and Sources
The data for the study was obtained from the NASA Prognosis data repository  (Lee et al., 2007) discloses that the process system has four bearings each of which was rotating at a constant speed of 2000 rpm by an AC motor coupled to the shaft via rub belts. Fig 2. Is an image of the process setup and schematic diagram of the arrangement of bearings within the process machinery.

RMS
Values help identify differences between vibration signals. The same applies to the mean and SD.

SF
The RMS to mean ratio. The value depends on an object's shape but independent of dimensions. ( ோெௌ µ ) Sk Quantifies symmetry of data, value approximates to 0 for healthy bearing, shifts to positive or negative when a fault develops.

Var
Measures the dispersion of the data around the mean. Where, We applied frequency domain analysis using fast Fourier transform (FFT) algorithm to decompose the signals into their Nyquist frequency (Nf). The Nf is half the sampling rate of the of the signal (Seeber & Ulrici, 2016). Thus, the FFT help remove noise from the observations without losing key features. As a result, the data is condensed to a size that could be handled by system memory for easy analysis using R programming language. There is data for the four bearings which became defective during the operation. An image of the defects is provided (Cerrada et al, 2018) as Fig.3.

Change-point Analysis for Risk of Bearing Health Deterioration Detection
We selected change-point analysis for risk of bearing health deterioration detection after a review of literature on the use of big data for bearing health analysis (Jordan, 2019), together with considerations like the size of the data and the data being time stamped. Because the data was recorded over given time periods, we hope to extract important information such as descriptive and explanatory variables which can be processed and used with data mining models including changepoint analysis. We also considered the application of Weibull distribution or time series analysis as part of the big data technique to incorporate into the change-point analysis for risk of bearing health deterioration detection.
We found that both Weibull distribution and time series analysis has been extensively applied in

How the BDQRA Method Works
Our method combines the two change point techniques (changepoint and structural change) to detect the onset of deterioration of bearing health to help avoid catastrophic events in process machinery. We the apply regression decision trees to determine predictors and moderators for the regression models for the interactions up to the point at which the risk of deterioration of the bearing health is imminent. We are of the view that regression decision tree in the BDQRA method is applicable because the study involves continuous variables (De Cock et al, 2017, Ahmad et al, 2018, Pekel, 2020. We apply linear regression models to determine the interactions up to the point of the eminent risk of deterioration. To enable a side-by-side comparison of the linear regression models, we use the ready-made regression tables from the Stargazer package on the R-language platform to provide summaries of the linear regression analysis results (Hlavac, 2014). Any statistically significant interactions detected in the linear regression analysis will be further investigated using analysis of variance (ANOVA) type II. If the significant interactions are confirmed, effect plots would be produced to visually express the relationships and any uncertainties within the model measurements.
Interaction effect applies to a situation where the operation of one of the components within the process machinery affects another. We are of the view that For clarity, we adopt the terminology proposed by Grace-Martin to provide clarity of the use of 'moderators' and 'predictors' in the principle of interaction effect within the context of this study (Grace-Martin 2018). By 'predictors', we refer to any component whose operations may have a potential to produce an effect on the risk of deterioration of the health of the bearing which develop health issues, without any real distinction between their roles. We also refer to the components whose operations can make some contributions to the effect of the operation of the 'predictor component' on the on the risk of deterioration of health of the bearing detect as 'moderators.'

The Stages within the Proposed Approach
As mentioned in Section 1, the BDQRA method is a hybrid method that combines a collection of big data techniques. The BDQRA method was developed and tested as a quantitative risk analysis technique to determine deterioration in the health conditions of a process machinery using data from the operations of the machinery. Generally, the deterioration of the health of a component of the operation machinery is deemed to have occurred when the operation of the machine fails or some issues of abnormality within parts of the operation of the machine occurs. The deterioration could be caused independently or caused by a remote event including interaction of the various components in the form of a loop. We apply the BDQRA method to bearing operation dataset because the bearings forms part of the operation of many machines used for industrial application. Sections 4.1 to 4.5 detail the stages within the BDQRA method.

Data Selection and Data Quality Investigation Stage
The initial step of the method involves obtaining information about the process machinery for which the method would be applied. Data from the operation of the process machinery are then obtained for analysis. This is followed by assessing quality and attributes of the data using available information on the data detailed in the meta data. Other attributes of the data such as sampling rate or sampling times are also investigated using at least 10% of the data selected at random.

Data Mining and Exploration Stage
The second stage involves application of descriptive statistics to investigate the completeness of data in the data files by looking for missing observations such as nulls and N/A's. Where missing observations are observed, we apply appropriate techniques such as ( Pearson's correlation analysis was used to investigate any potential relationships which may exist between the operation of the components.

Data Condensing Stage
At this stage, the data obtained from the operations of the process machinery are condensed into data sizes which can be handled within the memory of a PC. Since data obtained from the operation of some of the components of process machinery e.g bearings are generally supressed by noise due to the small magnitudes of bearing vibration frequencies (Shirong et al., 2014), fast Fourier transform (FFT) is used to obtain key statistical and time-domain features. Other techniques like calculation methods are also used to obtain feature whose information may be missing from the data. For instance, the speed and contact angles of the component bearing were described in 'rpm' and 'degrees', so appropriate equation was applied to convert the speed and contact angle into units of Hz and rad. The data is also formatted according to the time sequence from the source files, to create bearing-specific data files which were timestamped according to the time on each of the test file in the dataset. The bearing-specific data files are then combined into a new data frame and stored in a storage media.

Investigation of Bearing Health Condition Stage
We explore the bearing-specific datasets using the approach discussed in Section 4. respectively. However, if the bearing has already suffered a deterioration of health at the onset of the operation or is healthy but not properly secured within the operation machinery before the operation of the process machinery was started, the time series profile of the vibration data will show a disturbance from the onset of the bearing operation. Additionally, the time index of the onset and offset of the bearing health deterioration determined by strucchange and changepoint would be relatively similar.
We then use the plot the RMS of the data which has been successfully applied by Wu et al., 2017 to confirm the outcome of our investigation by comparing the time indices detected as the onset and offset of the bearing health deterioration detected by strucchange and changepoint. Wu et al. reveal that bearing health-related issues appears as changes in the trends of the RMS profile plot. As a result, we expect the onset and offset of the bearing health deterioration detected by our method to approximate the time indices of any changes in the trend in the RMS profile plot of the data.

Investigation of Contributions from Other Components Stage
We investigate contribution from other components of the process machinery to the deterioration of health suffered by the component bearing. This contribution may be due to relationships between the operation bearing which suffered the health deterioration and that of the other components of the process machinery through interactions up to the point of the imminent deterioration. We investigate this relationship with Pearson's correlation plots and significance tests. Type II ANOVA test was preferred to Type I and Type III for this study because we observed that in Type I ANOVA, the order of the variables generally matter. As a result, the position of the components being used as the predictor and moderators in the model (i.e. first or second) makes a difference in that the first variable in the regression model is compared to a model with just the intercept, and the second variable in the regression model is compared to a model with the first variable and the intercept. As a result, a change in the order of the variable within the regression model causes variable outcomes due to correlation of the predictors. We also found that Type III ANOVA gets around the issue observed with Type I ANOVA by assessing each predictor in the regression model including the interactions, against a regression model which include every variable but the predictor. However, dealing with interactions without one of the main effects could lead to some extent a meaningless outcome because of its sensitivity to the main effects and any missing observations. The study therefore applied Type II ANOVA because it does not suffer from the issues observed with Type I and Type III ANOVA since the main effects are tested with other main effects in the regression model but not the interactions. Thus, each effect is easily interpreted as the unique contribution to the prediction of the model.
We then use effect plots to produce a visual representation of the predicted values of the outcome for given values for the component used as the predictor in the regression models. This helps to us to provide a visual presentation to help explain how we select the appropriate model fit of the analysis. We also explore the nature of the significant interaction to understand the form taken by the interaction using Johnson Neyman plots and simple slope analysis. This helps to describe the relationship between the operation of the component bearing which suffered the health deterioration and low, medium, and high operation values of the component of the machinery whose operations made more contribution to the health deterioration of the bearing.      We then proceed with the analysis using regression decision trees to investigate the significant Pearson's correlation of the interactions up to time indices 837 and 968. The plot of the regression decision tree Fig.9, reveals that the operations of Bearing 3 is the main component in the process machinery whose operations made a dominant contribution to the operations of Bearing 1. This is followed by the operations of Bearing 2 whose operations appears to influence the contributions made by the operations of Bearing 3 to that of Bearing 1. As a result, we proceed with linear regression modelling to help predict the effect of interaction of the operation of the component bearings using Bearing 1 as the dependant component, Bearing 3 as the predictor, and Bearing 2 as the moderator. The summary output of the linear regression modelling is presented in Table 3.    The summary output in Table 3  . This also suggests that more of the variables explains the outcome of the interaction models than that of the general models. Additionally, the interaction models have lower residual standard errors than the general models.

Research Findings and Discussion
The model matrices (Table 4) reveals that the AIC and the BIC of the interaction models have comparatively lower significant p-values than those of the general models. Investigation of the significant interactions with ANOVA type II, confirms that statistically significant interactions observed in the linear regression models (Table 5). Thus, there appears to be some contribution from the operations of Bearing 3 to the health deterioration of Bearing 1. This contribution is influenced by moderations from the operations of Bearing 2.   (Fig.10). The plots show that at 95% confidence interval, the model fits better when the moderation from the operation of Bearing 2 approaches the one standard deviation above the mean, than the mean and one standard deviation below the mean.
This gave a difference of 23.616 seconds for the entire duration of the bearing's lifespan.
The actual RUL was provided as 55.3 sec by Widodo &Yang, 2011 andWu, Li &Qiu, 2017. From this, we calculate the accuracy of our prediction using the formula This gave expected time for the risk event as 952.316 seconds. We then calculated the accuracy of our prediction using the formula Where Detected is the time of the bearing health deterioration detected by the BDQRA method and Actual is the expected time of the bearing health deterioration. This gave a prediction of 99.98% which indicates a good performance of the BDQRA method in predicting the time of the health deterioration of the bearing.

Conclusion
Accurate predictions of the point of imminent health deterioration of a component of a bearing in a process machinery is crucial for health monitoring of the machinery and helps prevent conditions which could lead to fatal incidents. Most literature concentrating on health monitoring employs RUL prediction approach, that is to predict the remaining useful life of the bearing. This paper applied the BDQRA method which was developed and tested with historical data for quantitative risk analysis but can be used to predict health issues of bearing components of a machinery. The method can also be used to analyse real-time data.
The advantage of this method is that it is a hybrid method which uses two change-point analysis techniques to detect the time at which imminent bearing health deterioration could occur. As a hybrid method, it incorporates a collection of big data method techniques including time series analysis, change-point analysis, RMS, FFT, regression decision tree analysis, regression analysis ANOVA and interaction effect. Each of these methods could be applied on its own to determine bearing health issues from bearing vibration data including bearing health deterioration.

Big Data Quantitative Risk Analysis Method for Machine Health Indicator Prediction
We were inspired by a different approach for predicting bearing health deterioration and using our BDQRA method would help in future research. Owing to this, we verify our findings by comparing them with findings of other bearing health indication methods which have used the data from this research and published in peer reviewed journals. The results reveal that our hybrid BDQRA method is very effective and has some advantage over some of the published methods and can be applied as effective tools for practical industrial application.