COMPARISON OF BAYESIAN AND FREQUENTIST SURVIVAL ANALYSIS METHODS IN MODELLING OF PROSTATE CANCER SURVIVORSHIP IN OYO STATE, NIGERIA

Background: Survival analysis is a statistical method for modelling the probability that a subset of a given population will survive past a certain time. In the context of cancer, this probability would represent recurrence of tumour, or remission (i.e. being disease-free). This study seeks to compare the traditional frequentist approach and the Bayesian approach to survival analysis in estimating, and the predictors of prostate cancer (CaP) survivorship. Prostate cancer starts when healthy cells in the prostate gland change and grow out of place, forming a mass called a tumour. Method: A retrospective analytical study design was employed, through extraction of case files patients diagnosed and treated for CaP from January 2010 to December 2017 at UCH, Ibadan. The extracted data was further divided into two cohorts (2010 - 2014) and (2015 - 2017). A proforma was used for extraction which includes the following sections;


COMPARISON OF BAYESIAN AND FREQUENTIST SURVIVAL ANALYSIS METHODS IN MODELLING OF PROSTATE CANCER SURVIVORSHIP IN OYO STATE, NIGERIA
Abi Roland, Fagbamigbe Adeniyi, Akinwande Mathew ABSTRACT Background: Survival analysis is a statistical method for modelling the probability that a subset of a given population will survive past a certain time.In the context of cancer, this probability would represent recurrence of tumour, or remission (i.e.being disease-free).This study seeks to compare the traditional frequentist approach and the Bayesian approach to survival analysis in estimating, and the predictors of prostate cancer (CaP) survivorship.Prostate cancer starts when healthy cells in the prostate gland change and grow out of place, forming a mass called a tumour.
Method: A retrospective analytical study design was employed, through extraction of case files patients diagnosed and treated for CaP from January 2010 to December 2017 at UCH, Ibadan.The extracted data was further divided into two cohorts (2010 -2014) and (2015 -2017).A proforma was used for extraction which includes the following sections; socio-demographics, clinical/pathological characteristics, date of diagnosis, date last seen, and treatment received.
Descriptive statistics was used to describe these characteristics.The survival probability was determined by the KM survival method.Cox proportional hazard (CPH), Weibull AFT and Bayesian Weibull AFT (using normal prior distribution) models was used to determine predictors of survivorship.

Results:
The average age of the patients was 72 years, with peak incidence of CaP among those aged 70 -79 years.Most patients 87.3% were diagnosed at stage IV, with many having metastasis to the spine.Among the patients, 33.6% received chemotherapy and surgery.Patients from North central had the highest median survival (MS) time in the (2015 -2017) cohort.The overall MS time for the (2010 -2014) cohort was 2.9 months, and 3.3 months for the (2015 -2017) cohort while the overall MS time for the study was 3.2 months.Patients treated with chemotherapy and

Introduction
Survival analysis is a method for modeling time duration to the occurrence or recurrence of events.
It is also called reliability theory or reliability analysis in the field of engineering.In general, survival analysis is a statistical modeling approach that involves time to event data.An important aspect of survival analysis data is censoring and non-normality.Non-normality of the data violates the normality assumption of most commonly used statistical method such as multiple linear regressions (Oakes, 1972).The objective of using survival analysis is to answer the question of, what is the probability that subset of a given population will survive past a certain time?In the context of cancer, this probability would represent death, relapse, recurrence of the tumour, or remission (i.e.being disease-free) of a cancer patient.

Frequentist methodology of survival analysis
Much of the development of frequentist methodology is attributed to the work carried out by R.A.
Fisher in the early 20th century.The frequentist approach to statistics evaluates procedures based on imagining repeated sampling from a particular model (the likelihood), which defines the probability distribution of the observed data conditional on unknown parameters.Properties of the procedure are evaluated in this repeated sampling framework for fixed values of unknown parameters; good procedures perform well over a broad range of parameter values (Gandenberger, 2014a(Gandenberger, , 2014b(Gandenberger, , 2014c)).Survival analysis in the frequentist framework is based on non-parametric methods (Nelson Aalen, Kaplan Meier), Semi-parametric (Cox Proportional Hazard) and the parametric (Weibull, Exponential, Log-logistic etc) models.

Bayesian methodology of survival analysis
The introduction of Bayesian methodology can be attributed to a posthumous paper titled "An essay towards solving the Problem in the Doctrine of Chances" by the Reverend Thomas Bayes in 1763 (Bayes, 1763).Despite its early inception, frequentist methods remain the dominant methodology in modern statistics.In part, this was due to the lack of modern sampling techniques that require substantial computing power (Ibrahim, Chen, & Sinha, 2005).Indeed, for all but the simplest Bayesian models, computation methods were complex and was often impossible to find any analytical solutions.Whilst advances in theory and computing have made Bayesian methods more accessible, their uses in practice still remain somewhat limited.
In estimating the parameter  associated with data , under the Bayesian approach, (|) is evaluated where (. ) represents some probability density (Jackson, 2015).Bayesian methodology considers the data, once observed, to be a fixed quantity and allows the parameters to be the random variables; thus, instead evaluating (|).

𝑃(𝒙) = 𝑚𝑎𝑟𝑔𝑛𝑖𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
All inferences are made on summaries of the posterior probability density, the posterior is a product of the information obtained from the data and the information taken from prior knowledge.Note that the marginal density () is simply the probability of observing the data which is not dependent upon the parameter  (Berger, 1985;Box & Tiao, 1992).
Prostate cancer starts when healthy cells in the prostate gland change and grows out of place, forming a mass called a tumour.(ASCO, 2015) The Prostate is a walnut-sized organ located just between the bladder and the male reproductive organ, and it slowly grows larger to an average weight of 40 grams in ageing men.A prostate gland surrounds the urethra that empties urine from the bladder and also secretes prostate fluid that protects sperm.These physiological functions may be compromised during various prostate diseases including prostatitis, benign prostatic hyperplasia or hypertrophy, and cancer (Kgatle, Kalla, Islam, Sathekge, & Moorad, 2016).The size of the prostate changes with age (ACS, 2016).Pathologic evidence suggests that neoplastic changes of the prostate epithelium develops early in a man's adult life, but do not become clinically evident or problematic until decades later.Some patients live out their lives with prostate cancer that remains stable for decades without treatment.In other cases, cancer grows aggressively, responds poorly to therapy, and causes death within a few years (Sunny, 2005).
Clinically detected prostate carcinomas display a variety of phenotypic and malignant potential.
Majority of all prostate carcinomas are typical adenocarcinomas, which can be divided into different tumour grades.The histological differentiation together with tumour stage, determined by tumour size, as well as the presence of lymph node and distal metastases are used to assess the prognosis of the patients (DF, 1992).Epidemiology studies have shown that there are environmental as well as genetic contributions to the development of prostate cancer.Prostate cancer is a heterogeneous disease, and there is large variations from patient to patient even within the same tumour.These differentials in the Prostate cancer architecture and incidence rates may be attributable to genomic instabilities and alterations associated with various Prostate cancer risk factors.
Cancer of the prostate is primarily a disease of the elderly men.About three-quarters of cases worldwide occur in men aged 40 years and above with increasing incidence and morbidity in men of black African ancestry (Fernandez et al., 2012).In 2012, 1.1 million men were diagnosed with CaP and 70% of them (795,000 cases) were in developed countries (Mohammadian-hafshejani, Ghoncheh, Towhidi, Jamehshorani, & Salehiniya, 2016).Available reports indicate that CaP accounts for approximately 20.3% and 4.8% of all male cancers in Sub-Saharan Africa and North Africa, respectively (Cooney, Okuku, & Orem, 2016;Fernandez et al., 2012).CaP is on the upward trajectory among Nigerian men and remains one of the most common cancer among men in this region (T O Akinremi, Adeniyi, Olutunde, Oduniyi, & Ogo, 2014).A prevalence rate of 1046 per 100,000 men of age > 40 screened (Ikuerowo et al., 2013) was reported in Lagos.Compared to African-American men, Nigerian men are 10 times more likely to have prostate cancer and 3.5 times more likely to die from it.Environmental and most importantly, genetic factors have been incriminated as the reason for the geographic differences in incidence.Hence, this study seeks to determine the survivorship rate and to compare and contrast the effectiveness of frequentist and Bayesian approaches in modelling the determinants of survivorship among prostate cancer patients.

Study design
A retrospective analytical study design was employed.This was done through a review and extraction of hospital records, case files of all patients diagnosed and treated for prostate cancer from January 2010 to December 2017 at the University College Hospital (UCH) Ibadan was be included in the study.

Study setting
The study site includes all patients receiving treatment for prostate cancer at the radiation oncology department of UCH, Ibadan.The department offers services in the area of counselling/psychotherapy, group meeting with patients for problem shearing, alleviation of psychological concern, simulation, tumour localization procedure, palliative care, holistic management for cancer sufferers and their families, External beam radiation therapy, radiotherapy treatment for cancer patients using Cobalt 60 therapy machine, gynecology brachytherapy, ward admission, and clinic services.

Study Population
This study consists of all Prostate cancer patients who were managed for prostate cancer at the University College Hospital, Ibadan from January 2010 to December 2017.

Data Collection Procedure
A proforma was used to extract relevant data from patient's records, with the assistance of medical record clerks.The extracted data was divided into two cohorts (2010 -2014) and (2015 -2017).A total of 354 patients' records was extracted of which in the (2010 -2014) cohort, 110 were diseasefree while 54 were censored.Also, in the (2015 -2017) cohort, 115 patients were disease-free while 75 were censored; with a follow up time of 15 months observed in each cohort.To ensure data reliability, the extracted records was cross-checked by a radiotherapist.The sections in the data collection form included patients' socio-demographics, diagnosis, clinical and pathological features, treatment method, and follow-up.

Dependent variable
The dependent variable was the time to being disease-free (remission) after treatment, measured in months from time of presentation to time patient was last seen.A patient or an individual is declared to be disease-free when there is no residual disease on medical examination after completion of treatment.Meanwhile patients still under follow up, lost to follow up, death from CaP or other diseases were censored.

Independent variables
The

Determination of survival probabilities
Due to the presence of right censored data, the Kaplan-Meier method was used in this study.
Suppose we have p distinct survival times arranged in increasing order, say t1 < t2 < . . .< tp.At time ti there are ni subjects who are said to be at risk i.e. they survived up to this time and were not censored.The number of subjects who failed at time ti is denoted by di.Consider the recursive equation Where t0 = 0 and S0 = 1 Therefore, the survival function is given as The survival time t is the time period (months) to being disease-free from time of diagnosis to time last treated/seen, since all patients are at risk of this event.The censoring index for patients who were disease-free is "1" whereas patients who are is still undergoing treatment, died from CaP or a disease other than CaP, patients who can no longer be observed (Loss-to-follow-up), are declared censored and coded as "0".While the log-rank test was used to test the equality of survival curves.

Cox Proportional hazard model
As a nod to the proportional hazard assumption, consider two subjects for which the values of the The word "accelerated" is used in describing these models because rather than assuming that failure time   is exponential, Weibull, or some other form, a distribution is instead assumed for   = exp (−    )  and exp(−    ) is called the acceleration parameter Assuming that the survival time follows a Weibull distribution with scale parameter  0 and shape parameter , written as is distributed as Weibull with parameters ( 0 , ).
The Weibull hazard and survivor functions are Since the effect of the covariates is to accelerate time by a factor of exp(−    ), thus for the AFT model The "streg" package in STATA 15 was used to estimate the maximum likelihood estimate for the Weibull AFT model by specifying the "tr(time ratio)" option.Essentially, the accelerated failure time implies a deceleration of time or, in other terms, an increment in the expected waiting time for an event of interest to occur.Time Ratio (TR) > 1 indicates that an individual experience being disease-free at an earlier timing, TR = 1 implies no difference in survival time, while TR < 1 suggests that an individual experienced a delay or decreased in time to being disease-free.

Bayesian Survival Analysis
Bayesian analysis generates conclusions based on the synthesis of new information from the observed data and previous knowledge or external evidence.The Weibull Model is one of the accelerated failure time models that can be used fit survival data under the Bayesian framework.
Let   denote the survival time of the i th patient.Assuming that   has a Weibull distribution with parameters  > 0 and  > 0, with a density function of the form: The survival function of   is given by The hazard function is given by The likelihood function of the unknown parameters (, ) given the data can be written as: Where   is an indicator variable taking value 1 if   is the failure time (i.e patients who are diseasefree) and 0 if   is right censored (loss-follow-up, death, still under follow up).
To incorporate covariates we, therefore, write  = ′  , where ′  and are p x 1 vector of covariates and regression coefficients respectively.
Assuming a normal prior for ,   ( 0 , ∑ ) 0 then the joint posterior is given by Where D = (n, t, X, ) denote the observed data for the regression model and X is an n x p matrix of covariates with the i-th row as ′  and  = (  , . . .,   ) ′ .
The "bayes, streg, tr" package in STATA 15 was used to fit the Bayesian AFT model using the random-walk Metropolis-Hastings Monte Carlo Markov Chain (MCMC) sampling method for simulating the results of the parameters .The Metropolis-Hastings algorithm is given as follows: Let (, ) be a proposal density which is also termed as a candidate generating density such that Also, let (0,1) denote the uniform distribution over (0,1).Then, a general version of the Metropolis-Hastings algorithm for sampling from the posterior distribution (|) can be described as follows: Step 0. Choose an arbitrary starting point  0 and set i = 0.
The iteration steps from 0 through 3 is referred to as an MH update.By design, any Markov chain simulated using this MH algorithm is guaranteed to have the posterior distribution as its stationary distribution.
Criteria's for measuring the efficiency of MCMC are the acceptance rate of the chain and the degree of autocorrelation in the generated sample.When the acceptance rate is close to "0" then most of the proposals are rejected, which means that the chain failed to explore regions of appreciable posterior probability.The cut-off for acceptance rate is 0.234 for a multivariate posterior and 0.45 for a univariate posterior.

Cox proportional regression
The outcome of the bivariate regression models showed that widowers in the 2015 -2017 cohort had a 30% increase in risk (HR = 3.3; 95%CI: 1.0 -10.5) of being disease-free compared to patients who were currently married (Table 4).

Parametric survival analysis (Weibull distribution)
Result from the bivariate Weibull regression model showed that patients who were widowers in the 2015 -2017 cohort experienced a significant increase of 70% in time to being disease-free (TR = 0.3; 95% CI: 0.1 -0.9) compared to patients currently married (Table 6).
Patients with a moderately differentiated Gleason score experienced a significant increase of 50% in time to being disease-free (TR = 0.5; 95%CI: 0.3 -0.9) for those in the 2015 -2017 cohort.

Gleason group
Well differentiated 1.0 1.0 1.0 Patients with a moderately differentiated Gleason score in the 2015 -2017 cohort experienced a significant increase in time to being disease-free by 60% (TR =0.6; 95% CrI: 0.3 -0.9) compared to patients with a well-differentiated Gleason score (table 8).

Geographical Location
Chemotherapy cohort and overall group experienced 30% increase in time to being disease-free respectively (aTR = 0.7; 95%CrI: 0.6 -0.9) compared to patients with metastasis to the lymph node.
Patients who were treated using a combination of chemotherapy and radiotherapy experienced a 70% delay in time to being disease-free compared to patients who underwent surgery for the 2015 -2017 cohort (aTR = 1.7; 95%CrI: 1.2 -2.3).Patients who received radiotherapy alone experienced a 70% increase in time to being disease-free (aTR = 0.7; 95%CrI: 0.7 -0.8) for the 2010 -2014 cohort compared to patients who underwent surgery.However, individuals who were treated using chemotherapy, radiotherapy and surgery, chemotherapy and radiotherapy, chemotherapy and surgery experienced 20%, 20%, 60% and 30% delay in time to being diseasefree for the 2010 -2014 cohort compared to patients who underwent surgery alone (table 9).

DISCUSSION
This study accessed the rate of survivorship and determinants of survivorship among CaP patients.
The incidence of CaP was highest among patients older than seventy-two years.This is consistent with studies of (Ifere, Abebe, & Ananaba, 2012).CaP is known to be a disease of the aged with very few incidences observed among those below forty years, and very recurring as men progress in age.Advanced age is associated with an aggressive form of this disease (Ji et al., 2017) and a predictor for decreased survival rate as also shown in the Bayesian analysis of this study.
It was observed that patients currently married had longer survival after treatment compared to widowers, this could be due to support received from spouse and family members.Particularly in an African setting where cancer is seen as a death sentence, support from family members and a feeling of not being left alone could encourage quick recovery (Odedina et al., 2009).Further analysis of the data using Cox Proportional Hazard (CPH), Weibull parametric and Bayesian parametric regression however showed that widowers were more likely to be disease-free, with the Bayesian result presenting greater precision as shown in its narrower credible interval.The highest median survival time observed in this study was among the CaP patients from the north central region, this could be attributable to distance patients had to travel to receive treatment, making them better prepared and financially prepared to cover all medical bills and as such receive better care and longer survival.This opinion was also shared by (Zhang & Lawson, 2012) Most patients either drinks alcohol or smokes tobacco, these are known determinants for lung cancer, but it is not yet known if it could be defining in CaP survivorship (Peisch et al., 2017).
Diagnosis via DRE was popular among attending physician, closely followed by biopsy, this is because a DRE can indicate whether a prostate biopsy is recommended for a patient, especially in more aggressive cases (James, 2014) .DRE diagnosis increased survivorship time, this could be due to its ability to quickly detect hardness of the prostate indicating the presence of CaP.DRE is known to be an independent marker for better prognosis (Migowski, 2010), however this was not consistent with this study.It was observed under the Bayesian analysis that diagnosis via transrectal ultrasound, biopsy and a combination of DRE and biopsy were markers for better survival prognosis.
An elevated PSA level at diagnosis is usually associated with poor disease prognosis, although due to differences in technical performance and reference standards, PSA measurements from different labs are not necessarily comparable, however approximately 98% of patients with metastatic CaP will have elevated PSA (Yang & Armstrong, 2010).In this study though, PSA level did not affect treatment outcome across the CPH, parametric and Bayesian models.Observed among patients in this facility was the inconsistency of PSA levels during treatment.This could be due to lack of proper care and lack of finance for consistent treatment on the part of patients, as some patients may miss a certain number of treatments before resuming treatment.
As observed in the CPH model patients who received chemotherapy and surgery were less likely to experience remission, this was also observed in the Weibull and Bayesian parametric regression.
Patients who received radiotherapy alone as observed in the Bayesian analysis experienced being disease-free earlier.However only a few patients received this treatment regimen, this could be due to unavailability of a radiotherapy machine, coupled with frequent breakdown and workers industrial action (Gillessen et al., 2018;Robinson et al., 2018;Wolff et al., 2015).
In comparing the CPH, parametric Weibull and Bayesian Parametric models, the Bayesian approach performed better than other models by extracting the most significant predictors particular in the 2010 -2014 cohort.Some of significant factors were in agreement with those found in (Balogun, Role, & Dawodu, 2014;Migowski, 2010); it was observed that the Bayesian approach is a more appealing alternative to analyzing and reporting time-to-event studies (Zhang & Lawson, 2012) as compared to traditional frequentist approach, due to providing a flexible framework for smoothing rates and representing statistical uncertainty in rate ratios (Chen et al., 2013).Although still used sparingly, the Bayesian approach is fast gaining recognition due to advancement in computing power and software as found in this study.

Limitation of the study
In an effort to carry out this research, the researcher was faced with the challenge of extracting the required data for the study as it involves review of patient medical records; of great concern was the limitation of proper record keeping and lack of follow-up as the true state of most patient could not be determined.An important limitation is lack of records on patient's com-morbid status, lifestyle and behavioural variables.

Figure 1 :
Figure 1: Distribution of CaP patients by stage at diagnosis and site of metastasis

Figure
Figure 4: MCMC Convergence diagnostics for selected characteristics (2010 -2014) cohort independent variables for this study include patient's socio-demographics characteristics, clinical characteristics, time of diagnosis, time last seen, time last treated, treatment received and treatment outcome.A description of the coding method is giving in the table below.

Results Distribution of prostate cancer (CaP) patients by sociodemographic characteristics
Information Criterion (BIC) Deviance Information Criterion (DIC).Among the(2010 -2014), (2015 -2017) cohorts model and overall group, any of Cox Proportional Hazard (CPH), parametric regression model with consistent low values of -2LL, AIC and BIC was selected as the best fitting model while for Bayesian regression a model with consistently low values of -2LL, AIC, BIC, DIC and an acceptance rate of 0.2 for the multivariate model and 0.4 for the univariate model was selected as the best fitting model.Ethical consideration Ethical approval was obtained from UI/UCH Ethics Committee of the University of Ibadan approval reference: UI/EC/18/0492 east while 8(2.3%) from north central.Those with a family history of prostate cancer were 77(21.8%)while 167(33.0%)smoked tobacco or drank alcohol, 101(20.0%)had coronary heart disease, 118(23.3%)diabetes, and 120(23.7%)were hypertensive (table 1).

Table 1 : Distribution of prostate cancer patients by sociodemographic characteristics
Others include: taxi drivers, artisans, labourers, business men