Direct Use of Neural Networks for Decision Making

a predict the demand for a and based a is on the quantity of output; a neural network predict a future price of a commodity, and based on this value a decision to buy or not to buy it is made; a neural network can estimate probabilities of defaults for a loan, and based on these values the decision to give or reject the loan is made). This paper develops a methodology of using neural networks directly for making decisions without preliminary obtaining the predicted value for the relevant output factor (or probabilities for different values of the relevant output factor). The presented methodology is related to reinforcement learning


Direct Use of Neural Networks for Decision Making
Ernest Aksen, PhD, ScD Professor at Department of Mathematical Methods in Economics, Belarus State Economic University (Minsk, Belarus) eaksen@hotmail.com In most cases the output of a neural network produces a predicted value for the relevant output factor or probabilities for different values for the relevant output factor, and a decision is made based on these values (Bishop, 2006;Goodfellow, 2016;Haykin, 2008). (Examples: a neural network can predict the future demand for a commodity, and based on this value a decision is made on the quantity of output; a neural network can predict a future price of a commodity, and based on this value a decision to buy or not to buy it is made; a neural network can estimate probabilities of defaults for a loan, and based on these values the decision to give or reject the loan is made). This paper develops a methodology of using neural networks directly for making decisions without preliminary obtaining the predicted value for the relevant output factor (or probabilities for different values of the relevant output factor). The presented methodology is related to reinforcement learning (Sutton, 2018).
denote a vector of input factors, yan observed output factor, za decision (which is made after x get known but before y gets known), ( , ) u z yutility (benefit) for decision z if the output y occurs. Let ( , ) fx  denote a neural network where  is an array of estimated parameters. The function ( , ) fx  can be scalar or vector-valued (or array-valued). If the function ( , ) fx  is vector-valued, we denote its components as ( , ) Parameters  are estimated based on n observations ( , ) ii xy, Let X, Y and Z denote the respective series of observations and decisions: Denote: Consider separately two cases: continuous and discrete decision variable z.

Continuous variable z
In this case we interpret the output ( , ) fx  of the relevant network as the value for the decision variable z: (3) Using formula (3) we can obtain the decision i z corresponding to the i-th observation and to arbitrary values of parameters β: Remark 1. The decision (4) is not a real decision that was actually made for the i-th observation. It is the decision which would have been made if the neural network (with the given values of parameters β) had been used for making decisions.
In accordance with formulas (1), (2), (4) and (5) we have: Z Y denote a total utility (benefit) function (over the given series of observations). For example function ( , ) U Z Y can be of the form: where ( , ) ii u z y denotes the utility (benefit) from decision i z for the state i y . Substituting (5) into the total utility function ( , ) U Z Y we get: It is quite natural to estimate parameters β by maximizing (7): The optimal values   of parameters β provide the decisions According to (3) we interpret the output ( , ) fx  of a relevant neural network as the firm's output size. 3 Within the above setting, the firm's (one-period) profit ( , ) u z y is calculated as follows: and therefore the total profit ( , ) U Z Y over n periods equals: according to notation (1).
Substituting (4) and (5) into (10) we have: The optimal values   of parameters β provide the outputs ( , ) Z F X   which maximize the total profit (11) (over n periods) for the given explanatory values In accordance with above, a decision is made after x gets known but before y gets known.
Vector (13) is known as a mixed strategy. 4 In this case we interpret the output ( , ) fx  of a relevant network as the vector of probabilities of taking different decisions.

So in this case
(16) Vector (14) can be obtained by using the softmax activation function at the output layer of a neural network.
Using formula (16) we can obtain the mixed strategy corresponding to the i-th observation and any given values of parameters β: Remark 2. In the case of 2 s  the output of a neural network can be scalar and can be obtained by using sigmoid (logistic) activation function.
As above, we use the notation ( , ) u z y for the utility from the decision z if the state y occurs. Let ( , ) v p y denote the expected utility from using a mixed strategy (13) if the state y occurs: Let P denote a sequence of mixed strategies for the given series of observations: Using notation (2) and (20) formulas (17) can be written as According to (19) the expected utility from using a mixed strategy (13) for the i-th observation is as follows: Let ( , ) V P Y denote a total expected utility (benefit) function over the given series of n observations: The optimal values   of parameters β provide the mixed strategies ( , ) P F X   which maximize the total expected utility (25) for the given observations ( , ) ii xy, While using a relevant trained neural network in real world, the decision can be chosen based on the maximum probability in the obtained mixed strategy, i.e.  We interpret the (scalar) output ( , ) fx  of a relevant neural network as the probability of giving the loan. So in this case formula (27) gives the total expected profit (for n past loans). Due to The optimal values   of parameters β provide the probabilities ( , ) P F X