The Shallow Gibbs Network, Double Backpropagation and Differential Machine learning

We have built a Shallow Gibbs Network model as a Random Gibbs Network Forest to reach the performance of the Multilayer feedforward Neural Network in a few numbers of parameters, and fewer backpropagation iterations. To make it happens, we propose a novel optimization framework for our Bayesian Shallow Network, called the {Double Backpropagation Scheme} (DBS) that can also fit perfectly the data with appropriate learning rate, and which is convergent and universally applicable to any Bayesian neural network problem. The contribution of this model is broad. First, it integrates all the advantages of the Potts Model, which is a very rich random partitions model, that we have also modified to propose its Complete Shrinkage version using agglomerative clustering techniques. The model takes also an advantage of Gibbs Fields for its weights precision matrix structure, mainly through Markov Random Fields, and even has five (5) variants structures at the end: the Full-Gibbs, the Sparse-Gibbs, the Between layer Sparse Gibbs which is the B-Sparse Gibbs in a short, the Compound Symmetry Gibbs (CS-Gibbs in short), and the Sparse Compound Symmetry Gibbs (Sparse-CS-Gibbs) model. The Full-Gibbs is mainly to remind fully-connected models, and the other structures are useful to show how the model can be reduced in terms of complexity with sparsity and parsimony. All those models have been experimented with the Mulan project multivariate regression dataset, and the results arouse interest in those structures, in a sense that different structures help to reach different results in terms of Mean Squared Error (MSE) and Relative Root Mean Squared Error (RRMSE). For the Shallow Gibbs Network model, we have found the perfect learning framework : it is the $(l_1, \boldsymbol{\zeta}, \epsilon_{dbs})-\textbf{DBS}$ configuration, which is a combination of the \emph{Universal Approximation Theorem}, and the DBS optimization, coupled with the (\emph{dist})-Nearest Neighbor-(h)-Taylor Series-Perfect Multivariate Interpolation (\emph{dist}-NN-(h)-TS-PMI) model [which in turn is a combination of the research of the Nearest Neighborhood for a good Train-Test association, the Taylor Approximation Theorem, and finally the Multivariate Interpolation Method]. It indicates that, with an appropriate number $l_1$ of neurons on the hidden layer, an optimal number $\zeta$ of DBS updates, an optimal DBS learnnig rate $\epsilon_{dbs}$, an optimal distance \emph{dist}$_{opt}$ in the research of the nearest neighbor in the training dataset for each test data $x_i^{\mbox{test}}$, an optimal order $h_{opt}$ of the Taylor approximation for the Perfect Multivariate Interpolation (\emph{dist}-NN-(h)-TS-PMI) model once the {\bfseries DBS} has overfitted the training dataset, the train and the test error converge to zero (0).

Content

Author and article information

Journal

Title: ScienceOpen Preprints

Publisher: ScienceOpen

Publication date (Electronic preprint): 10 April 2021

Affiliations

[1 ] Department of Mathematics and Statistics, Université de Montréal, 2920, chemin de la Tour, H3T 1J4, Montreal, Québec, Canada

Author information

Nonvikan Karl-Augustt ALAHASSA https://orcid.org/0000-0002-0426-3444

Article

DOI: 10.14293/S2199-1006.1.SOR-.PPS25DJ.v1

SO-VID: b49facb8-7d6c-4939-8654-289873a7f33a

License:

This work has been published open access under Creative Commons Attribution License CC BY 4.0 , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com .

History

Date received : 10 April 2021

Data availability: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

ScienceOpen disciplines: Computer science,Statistics,Mathematics

Keywords: MultivariateRegression,Neural Networks,Probability and stochastic processes,Graphical models,Structured models,Gibbs Fields,Sparse Models ,Compound Symmetry,Double Backpropagation,Taylor Theorem