Introduction to Set Shaping Theory

In this article, we define the Set Shaping Theory whose goal is the study of the bijection functions that transform a set of strings into a set of equal size made up of strings of greater length. The functions that meet this condition are many but since the goal of this theory is the transmission of data, we have analyzed the function that minimizes the average information content. The results obtained show how this type of function can be useful in data compression.


Introduction
In this article, we introduce the Set Shaping Theroy whose objective is the study of the bijection functions that transform a set   of strings of length N into a set  + of strings of length N+K with K and  ∈ ℕ + , |  | = | + | and  + ⊂  + .In particular, we will analyze the functions in which the set  + contains the strings with less information content belonging to the set  + .The analysis of the results shows how this type of function can be useful in data compression.

Methods
In this article, we use the concepts and functions developed by C.E.Shannon [1] that represent the basis of information theory.Given a source defined by an ensemble  = (; ; ), where x is the value of the random variable,  = { 1 , 2 , … …   } are the possible values of x (states) and  = { 1,  2 … …   } is 1 Author correspondence: solomon.kozlov@mailfence.com the probability distribution of the states (  ) =   with ∑   =1 = 1.
The entropy of X, denoted H, is defined as: We call   the set that contains all possible strings  = { 1 , … ,   , … .,   } generated by X.
Definition 1: We call f the bijection function on the set   defined as: The function f defines from the set  + a subset of size equal to ||  .This operation is called "Shaping of the source" because what is done is to make null the probability of generating some sequences belonging to the set  + .

Definition 2:
The parameter K is called the shaping order of the source and represents the difference in length between the sequences belonging to   and the transformed sequences belonging to  + .Given a source  = (; ; ), and a string   = { 1 , … ,   , … .,   }, we define its information content: The probability (  ) that the source X generates the sequence   is: Definition 3: we call the average information content of a sequence generated by a source  = (; ; ) the summation of the product between the information content of the sequences belonging to   is their probability: Remark 1: As N tends to infinity () tends to NH(X).Indeed, when N becomes large the contribution to the value of the function (1) derives almost exclusively from the strings belonging to the typical set [2].With typical set we mean, the set of strings whose information content is close to NH(X).
This function is essential to understand the advantages of applying f, because this function transforms the strings  ∈   into the strings  ∈  + consequently, the average information content changes as follows: Where (  ) remains unchanged but the information content of the string changes.

Definition 4:
We call   the bijection function on the set   defined as: Remark 2: The function   transforms the set   into the set  + composed of |  | strings with less information content belonging to  + .Consequently, each string belonging to the complementary set of  + has a greater information content than any string belonging to  + .Wanting to apply this type of function to problems concerning data compression, the   functions are the most interesting to be analyzed.Having chosen such short string lengths, we have a value of I(x) that differs greatly from NH(X).This result is normal since for these values of N the value calculated with the formula (1) depends very much on strings with information content less than NH(X).Observing the data in table 1, we notice an unexpected result, indeed for values of || > 2 the average information content I(y) is less than I(x).Now, let's increase the length of the strings to 100 and keep the value of K at 1. Thus, the strings  ∈  100 have length 100, consequently having chosen K=1 the strings  ∈  101 have length 101.In this case, given the length of the strings, the exact calculation of I(x) and I(y) is very complex, so we estimate these values using the Mote Carlo method [3] and [4].The data reported in table 2 concern the simulation of 1000000 of strings of length 100 generated by a source  = (; ; ) with a uniform probability distribution and || variable between 2 and 10.The first column shows the cardinality of A. The second column shows the value of I(x), the third shows the value of I(y) and finally the fourth shows the difference I(x)-I(y).

Given
Analyzing the data in table 2, we note that for this value of N the value I(x) approximates NH(X).Indeed, as mentioned, increasing the length of the strings the contribution to formula (1) almost exclusively depends on the strings with information content close to NH(X).Also in this case for || > 2 the average information content I(y) is less than I(x).Hence, this result does not depend on the length of the strings but also remains by increasing N.  Now to try to understand this result, let's compare the single values (  ) and (  ) with   (  ) =   , || = 3, K=1 and N=10.Therefore, the strings  ∈  10 have length 10 and having chosen K=1 the strings  ∈  11 have length 11.In this situation, the set  10 contains 3 10 = 59049 strings.We have chosen this value of N because it allows us to calculate the single values (  ) and (  ) and at the same time strings with information content much lower than NH(X) contribute negligibly to I(x) and I(y).

|𝐴| I(x) N=100 I(y) N=101 I(x)-I(y)
In Figure 1, the solid line shows the (  ) values and the dashed line the (  ) values in bits.The strings were sorted according to their information content in ascending order.Analyzing Figure 1, we can note that despite I(y)<I(x) (I(x)=14,263 bits, I(y)=14,136 bits) this inequality is true only on average.Indeed, the single values of (  ) and (  ) tend to oscillate between them.This result is interesting because it tells us that the use of this technique depends on the probability distribution P and consequently on the information content of the typical set.Since the information content of the typical set can be approximated with NH(X), the use of the   function can only be useful when this value is placed in an area where (  ) < (  ).

Conclusion
In this article, we have defined the Set Shaping Theory whose goal is the study of the bijection functions that transform a set of strings into a set of equal size made up of strings of greater length.The functions that respect this condition are many but since the goal of this theory is the transmission of data, we have analyzed the function the   which transforms the set   into the set  + composed of the |  | strings with less information content belonging to  + .
Analyzing the data, we find an unexpected result, indeed the average information content I(y) turns out to be less than I(x) when the cardinality of A is greater than 2. This result is present for minimum lengths such as those reported in table 1 and for longer lengths like the one shown in table 2. Therefore, this result does not seem to depend on the length of the string.However, this is only a preliminary analysis to reach a conclusion it is essential to study the asymptotic behavior.
Figure 1 shows another interesting result, the single values of (  ) and (  ) oscillate between them and neither of them is greater or less than the other continuously.Consequently, the use of this technique depends on the information content of the typical set.
For these reasons, we believe that this theory is particularly interesting in data compression.However, as mentioned, the consequences of this type of transform on the average information content are particularly complex and therefore this analysis requires further studies.
a source defined by an ensemble  = (; ; ) with a uniform probability distribution, we will apply the function   to the set   , which contains all possible strings of length N produced by X, and compare the values of () and () with   () =  We start by analyzing strings of lengths equal to = || , K=1 and || variable between 2 and 7. Consequently, the length of the strings  ∈  || is || and having chosen K=1 the strings  ∈  ||+1 have length || + 1.Therefore, for example, if || = 2 the strings  ∈  2 have length 2 instead the strings  ∈  3 have length 3. Being N very small it is possible to calculate the value of I(x) and I(y) exactly.The first column shows the cardinality of A. The second column shows the value of I(x), the third shows the value of I(y) and finally the fourth shows the difference I(x)-I(y).||I(x)  = || I(y)  = || + 1 I(x)-I(y) 2 1,000

Table 2 :
The average information content I(x) and I(y) in bits calculated for  = 100   = 1.