Fast Estimation of Nonparametric Kernel Density Through PDDP, and its Application in Texture Synthesis

in


INTRODUCTION
Multivariate density estimation is an important tool in computer vision for statistical decision making and pattern recognition tasks (Elgammel et al. [2003], Liu et al. [2007], Zhang et al. [2005] etc.).In density estimation, there are broadly three parallel worlds, parametric, semi-parametric and nonparametric, with their own advantages and disadvantages.In parametric approaches, one has to have a priori knowledge of underlying distribution.In semi-parametric approaches, finite mixture models can be found, where, the knowledge of the mixture distribution is assumed to be known a priori, and also the number of mixture components.A popular model is the Gaussian mixture model (GMM).These finite mixture models offer more flexibility than the parametric density estimation methods.Nonparametric density estimation methodologies assume less structure about the underlying distribution, which makes them a good choice for robust and more accurate analysis.There is another advantage of using nonprametric density estimation methodologies over semi-parametric ones, the estimation of the parameters for semi-parametric models is troublesome.A popular method for the parameter estimation of finite mixture models (e.g., GMM), is Expectation Maximization (EM) algorithm (Dempster et al. [1977]).In, Wu [1983], it has been shown that EM algorithm is not guranteed to converge to a global optimum, and if the likelihood function is unimodal and a certain differentiability condition is satisfied, then any EM sequence converges to the unique maximum likelihood estimate.There are also other problems associated with the finite mixture models, such as, what should be the number of mixture components (Pernkopf and Bouchaffra [2005]) or what should be the component density model, or the slow convergence property of EM algorithm (Huang et al. [2005], Thiesson et al. [1999, Revised 2001]).Among all nonparametric approaches, kernel-based density estimation (KDE) approach offer more flexibility.However, huge storage (required to store all available data vectors), and expensive computational (and time) complexity for the calculation of probability at each target data vector, makes nonparametric kernel density estimation (KDE) less popular than the semi-parametric approaches (such as GMM).Recently, a number of algorithms have been proposed for fast estimation of Kernel density, (Greengard and Strain [1991], Yang et al. [2003], Zhang et al. [2005] etc.).Moreover, the computational efficiency of modern day systems has also increased.As a result, the nonparametric KDE is gaining popularity in different image and video processing applications (Elgammel et al. [2003], Liu et al. [2007], Zhang et al. [2005]).
Natural texture synthesis is an important problem in terms of understanding the mathematical modeling structure of texture, which not only used for artificial textured object surface synthesis in computer graphics, but also in understanding the textures for further processing of images and videos (such as, classification, segmentation, retrieval, compression etc.).Nonparametric, noncausal, Markov Random Field (MRF) model (Paget and Longstaff [1998]), provides a theoretical background for synthesizing textures from a broad range of natural textures.The sampling process for synthesizing texture depends upon repeatitive estimation of Markovian conditional density.This process has a huge computational complexity (Sinha and Gupta [2007]), which does not favor real-time application.In this work, we compare the computational complexity of the original algorithm (Paget and Longstaff [1998]), with the fast sampling algorithm implemented through the proposed fast KDE (FKDE) algorithm.The resulting texture synthesis algorithm is tested visually with respect to a number of natural textures, taken from a standard database (Brodatz [1966]).
In section 2, the mathematical background of KDE is described and the problem is defined.A brief discussion of the earlier algorithms for fast KDE (FKDE) with their limitations are described in section 3. Section 4 present the proposed algorithm for FKDE, with its error bound and computational complexity.The mathematical model for texture synthesis is described briefly with the description of the practical problem in section 5. Section 6 describes the application of the proposed algorithm in natural texture synthesis.The paper is concluded in section 7.

KDE PROBLEM INTRODUCTION
Let t s ∈ ℜ d be a set of source data points, and t n ∈ ℜ d a target data point.In nonparametric Gaussian kernel density estimation, the probability of the target data point t n is calculated as follows.
where, Scott [1992].The computational complexity of equation 1 is O(Nd) for each target data point.In general, kernels (multivariate or univariate) have numerically finite support, e.g., in case of Gaussian kernel , if the absolute value of the kernel argument becomes ≥ 700, the value of the kernel term becomes quite negligible, i.e., 9.86 × 10 −305 .One can use this knowledge in kernel density estimation to minimize the required number of source data points, i.e., considering the source data points within a radius from the target data point, expressed as shown in equation 2.
Here, R n ⊂ ℜ d is the subspace within a constant area corresponding to a constant radius (say R) around the data point t n .Let h i = h, ∀i; now, one can define R for Gaussian multivariate kernel as, R = d i=1 x 2 i = 700h 2 .And the precision error would be In the equation 3, we have described a least upper error bound (LUEB) for the FKDE algorithms.
In the rest of this paper, we will analyze the proposed FKDE algorithm with respect to the earlier algorithms in terms of computational complexity and unsupervisedness for attaining the least upper error bound described in equation 3 for a given value of R.

EARLIER APPROACHES AND MOTIVATION FOR FURTHER INVESTIGATION
In the following subsections 3.1 and 3.2, we will briefly describe two algorithms for FKDE with their limitations.Afterwards in sub-section 3.3, we will try to provide the seed of a possible solution to the limitations of the earlier algorithms.

Reconstructionhistogram
Zhang et al. [2005] has described an algorithm named as reconstructionhistogram for reducing the computational complexity of the KDE.The source data set {t s } is first clustered within a number of clusters (user defined), {Clust i ; i = 1 . . .M}.Here Clust i describes the mean vector of the i th cluster.
Define n i as the number of source data vectors within i th cluster.Thereafter the probability of target data vector t n is calculated as follows, Equation 4 can be thought of as KDE of t n given the source data points at cluster centroids with a weight factor n i /N.It can be also thought as GMM.Therefore, it does not have the flexibility of original KDE.Moreover, the error bound is not mentioned by Zhang et al. [2005].

Improved Fast Gauss Transform
Fast Gauss transform (FGT) (Greengard and Strain [1991]), is a more general form of the Gaussian KDE, where the kernels have a Gaussian form.In FGT, the weights of the Gaussians (centered at each data point) can differ, whereas in case of (Gaussian) KDE the weights are equal.The speed is due to the grid structure assumed upon the total space.In the calculation of Gauss transform (or KDE), the effect of source data points within each multidimensional box of a grid structure, can be approximated with a finite series expansion around a modal vector (may be a centroid).Yang et al. [2003] proposed an improved version of FGT (IFGT) algorithm for reducing the unnecessary computational complexity in multidimensional case through a prior clustering of the source data points.For the calculation of KDE, the distances between the target data point t n and the cluster centroids are calculated.If the distance (between t n and any cluster centroid) is more than a user-defined threshold, the source data points within the corresponding cluster are not considered for the calculation of KDE.Let us define the clusters as described in sub-section 3.1.Then the following equation describes the FKDE according to IFGT, In equation 5, f (t n , Clust i ) is the approximation comes from truncated multivariate Taylor expansion (only possible for Gaussian kernel Yang et al. [2003]).If one wants to implement original KDE (for any kernel function) then the modified equation could be,  Here, R IFGT is different from R (required for error bound described in equation 3).Again, R IFGT varies depending upon the target data t n and the cluster under consideration Clust i .This can be best understood from figure 1a.
If one wants to fix the R IFGT to a maximum value as R IFGT = R + max(R Clust ), where, R Clust is the maximum of maximum distances between any cluster to its source data points, the computational complexity of the algorithm may increase, as explained in figure 1b.Region I in figure 1b describes the area of radius R, fixed by the corresponding kernel.This region overlaps with the cluster A data set.Therefore, one needs to include the cluster A for the calculation of the density at target point T.But the cluster center A n is far from the target data point.Therefore, a different radius threshold is needed, R IFGT > R, and this defines the region II.Region II includes the sencond cluster centroid B n , and therefore, the final calculation of the KDE will consider both cluster A and B. The inclusion of the redundant data points increases the computational complexity.The same problem will be encountered if one considers other target points marked as × in the figure 1b.Therefore, there are two main limitations (additionally with the truncation error) of IFGT, as follows, 1. optimal R IFGT varies with both t n (target data point) and cluster shape, figure 1a. 2. maximum possible R IFGT can increase computational complexity of the FKDE algorithm, figure 1b.

Boundary Effect : A possible solution
Instead of cluster centroids, if one considers a boundary (d-dimensional hyperplane), partitioning the total space into two subspaces (as shown in figure 1b), the above mentioned problems will not arise for most of the target points.One can easily make decision based upon the distance between the target data point and the boundary.This distance is not required to change with respect to the target data point and therefore it will remain R. Therefore, for the fast kernel density estimation, collection of boundaries has the following advantages compared to a collection of centroids, • The distance threshold (required for error bound equation 3) does not depend upon the cluster size or shape.• The computational complexity could be lesser in boundary-based clustering, compared to centroid-based methodologies; because from figures 1a and 1b, it is clear that, R IFGT ≫ R.

FAST KERNEL DENSITY ESTIMATION (FKDE) THROUGH PDDP
We will first describe the basic clustering algorithm in subsection 4.1.Thereafter, we will use this clustering algorithm for fast kernel density estimation (FKDE), described in subsection 4.2.Finally, in subsection 4.3, the computational complexity is analysed.

Principal Directive Divisive Partitioning (PDDP)
In this section the basic clustering algorithm PDDP (Boley [1998]), is described.Here, the information extracted from the Principal Component Analysis (PCA) of a set of source data points, is used for defining the cluster boundary.The partitioning involves two basic steps as descrbed below; • project each source data point within the present space onto the first principal direction (eigen vector corresponding to the largest eigen value).• partition the present space into two sub-spaces with respect to the mean (or median) of the projected values.
One can build up a tree structure by recursively partitioning each sub-spaces.From this algorithm we construct source data index vectors for each leaf node, {J i ; i = 1, . . ., M}, where M = 2 q and q is the depth of the tree structure.
The above mentioned clustering algorithm, partitions the space into two sub-spaces along its first principal direction and with respect to its mean (or median).This implies that, a d-dimensional hyperplane is acting as a boundary to divide the space into two sub-spaces.This hyperplane is the matrix of (d − 1) number of eigen vectors (replacing the first eigen vector with all zeros).Therefore, at each node (except the leaf nodes) of the tree, one such boundary is created for partitioning the space constructed by the source data points of the present node.

FKDE
The formal pseudo-code for kernel density estimation is described in algorithm 1.The steps can be summarized as follows, 1. at current node, the target data point t n is projected onto the first principal direction of the corresponding set of source data points, 2. the distance between projected t n and the mean (or median) of the corresponding set is calculated, 3. if this distance is less than the pre-defined threshold R, consider both the children of the current set of source data points.4. otherwise, consider left child if t n belongs to left side of the boundary, 5. or else, consider the right child.
In short, first one finds the neighborhood leaf nodes corresponding to a target data point with a fixed distance threshold R. The source data points within these leaf nodes (J = i∈ℵ(t n ) J i ) are then used for final calculation of the kernel density of the target data point, as,

Computational Complexity
The computational complexity for KDE is O(Nd), equation 1.The ideal computational complexity for FKDE is O(|R n |d), equation 2, where |.| denotes cardinality of the set.The problem is to know the source data points such that s ∈ R n .As described in subsection 4.2, the search for leaf nodes J i involve a path through the tree structure.At each node (steps described in subsection 4.2), one has to decide whether to search in the left child or in the right child or within both the children.This decision making process has a computational complexity of the order of d.Now, we have to consider the worst, best and average computational complexities of the FKDE algorithm.
In the worst case, one has to consider all the data points, i.e., all the leaf nodes.In this case the computational complexity becomes O((2 q+1 − 1)d + Nd) ≅ O(Nd), since N ≫ 2 q+1 .This is very unlikely in practice.In the best case, only one leaf node has to be considered, and therefore, the minimum computational complexity becomes O((q − 1)d This can happen when t n is well within a sub-space.In most cases, t n will lie close to one or more boundaries, and in those cases, one has to consider more than one leaf nodes, say this number is m (≪ 2 q = M).Since, the searching complexity is additive and m(≪ 2 q ) number of nodes will have approximately mN M (M = 2 q ) number of source data points, on average the computational complexity would be approximately O( mNd M ) ≪ O(Nd).

NONPARAMETRIC MRF AND TEXTURE SYNTHESIS
MRF models have been used for different applications in different branches of science and engineering.In the present case, description of MRF is taken from the view point of lattice models.The lattice, Y, is a collection of random variables (r.v.henceforth) at the sites, s = i, j ∈ S, where, i, j = 0, 1, . . ., M − 1.The random variables are described as Y s ∈ Λ, i.e., they belong to the same state space.The MRF assumption implies, , which describes the fact that given a neighbor set, ℵ s , the r.v. at s is independent of all other sites, (s) = S − s and this conditional probability is termed as local conditional pdf (LCPDF).The neighborhood system is defined with the help of following two axioms, • s ℵ s , and, Let us now consider the nonparametric MRF model as described in Paget and Longstaff [1998] in the context of texture synthesis.Assume, S in and S out signify the input and output texture lattices respectively, and for simplicity we also assume that X s = {y r ; r ∈ ℵ s } denote the neighborhood set of the pixel r.v.Y s .For each pixel in the output lattice, s ∈ S out , we estimate the LCPDF P(Y s |X s ), for Y s = 1, 2, . . ., L (grey values) from the data X p , Y p where, p ∈ S in .This is generated from the input texture through Parzen window estimator using the product kernels as, Here, κ h (.) is the Gaussian kernel function.In the following a brief description of the sampling process from the nonparametric conditional density, as described in Paget and Longstaff [1998], is given.Choose a new y s , by sampling the estimated LCPDF, through either Gibbs sampler or ICM algorithm.

Local Simulated Annealing: Texture Synthesis Algorithm
For synthesizing textures from a random initialization we need local simulated annealing Paget and Longstaff [1998].In this method the LCPDF (eq.9) evolves with time.It starts from a marginal pdf and approaches to the actual nonparametric estimation of conditional density.In this subsection, we briefly discuss the algorithm of texture synthesis through the local simulated annealing.
Let the two random fields defined on the two lattices S in and S out be denoted as, Y in = {Y s ; s ∈ S in } for the input texture, and Y out = {Y k ; k ∈ S out } for output texture respectively.For local simulated annealing we require another random field defined on the output lattice structure.We define this random field as, T out = {c k ; k ∈ S out }. Paget and Longstaff [1998] has described T out as the temperature field.In this paper, we use the term confidence field, to describe the behaviour and the requirement of the field in the process of texture synthesis.The random variables c k reflect the confidence of synthesis of the corresponding random field Y k at site k ∈ S out .The range of the confidence random variables c k varies from 0 to 1, where 1 represents complete confidence, and 0 none at all.The rule for updation of the confidence at site k ∈ S out is defined as, , where η ∈ [0, 1] is a random variable generated from uniform distribution.This equation describes the fact that the confidence of any pixel site will depend upon the confidence values of its neighborhood sites.In the estimation of LCPDF, the argument within the kernel for neighborhood vector will change depending upon the confidence values at the corresponding neighborhood sites of the output lattice.Let us define a neighborhood vector of confidence random variables as, W k = {c r ; r ∈ ℵ k }, ∀k ∈ S out .Similarly we define a confidence matrix at site k as, Φ k , whose diagonal elements contain the random variables c r ; r ∈ ℵ k , and the off diagonal elements are all zero.The estimation of LCPDF according to the confidence field is defined as, where, X k , and X s correspond to the neighborhood vectors of output and input textures, respectively.The kernel for neighborhood vector can be written as, or alternatively as, where, n = |S in |, |.| is the cardinality, (.) t represents transpose, W k,i , X k,i and X s,i are the i th random variable of the random vectors W k , X k , and X s , respectively.
From the equations 10, 12 and 13, it is clear that as we progress in time the LCPDF (equation 11) will approach the actual form starting from the marginal density p(Y s ), as the values of confidence random variables approach one starting from zero.Since, we are approaching the true LCPDF through local simulated annealing, we can sample LCPDF according to independent conditional mode (ICM) algorithm as described by Besag [1986].In ICM, we look for Y k = v, v ∈ {0, 1, . . ., L} (where L is the number of gray levels), which has got maximum probability according to the given neighborhood X k , i.e., maximum LCPDF.The advantage of ICM is fast convergence, whereas other sampling algorithms (such as, Gibbs sampler, Metropolis sampler) has a very slow convergence rate with almost sure convergence to the global optimum Paget and Longstaff [1998].In this paper, we have worked with ICM algorithm.

Computational complexity of earlier texture synthesis algorithm
The computational complexity for the calculation of LCPDF is of the order of O(N(d + 1)) (equation 11), where N = |S in | is the number of pixels within input texture sample and (d + 1) is the dimensionality of the {X p , Y p }. Sinha and Gupta [2007] have tried to reduce this computational complexity by reducing the dimension of the neighborhood set X p .In this work, the computational complexity is reduced by reducing N, which is equal to 128 × 128 in the current work.In the next section, the proposed algorithm for fast KDE (FKDE) is described, along with the reduction in computational complexity.

Texture synthesis with FKDE
The computational complexity is mainly due to the evaluation of equation 11 at each output pixel k ∈ S out .Incorporation of proposed FKDE algorithm within the LCPDF estimation (equation 11), has two main problems, 1. how to include the effect of W k (the temperature field) within the PDDP-based tree structure for the implementation of FKDE, and 2. there are two joint densities corresponding to {Y k , X k } and X k ; therefore, it requires two FKDE structure, which is not computationally efficient.
To achieve these limiting effects within FKDE algorithm one has to change the radius threshold R, since R is responsible for the searching the leaf nodes required to calculate the final KDE.This new value of R (say R new ) can be explained as follows, Local simulated annealing: Therefore, R new should vary from a large value to R, as annealing process proceeds.We can achieve this effect by considering c k (confidence value at site k) as shown in equation 14.This new radius threshold can be introduced within FKDE, at line number 10 of algorithm 1, as shown in equation in equations 15 and 16.
Now we will study the second problem mentioned above.If the KDE of the neighborhood vector (the denominator term of equation 11) s∈S in K h (X k − X s |W k ) is not equal to zero, then one would be interested in the numerator term, otherwise LCPDF is equal to zero.Therefore, only when K h (X k − X s |W k ) 0, the calculation for K h (Y k − Y s ) is required.In this work, the tree structure (sub-section 4.1) is built for the neighborhood vectors {X s ; s ∈ S in }, i.e., collected from the input texture.The FKDE is applied to each element from the set {X k ; k ∈ S out }, i.e., neighborhood vectors from the output texture.Therefore, the LCPDF described in equation 11 becomes, where, S in,k ⊂ S in describes the collection of pixels from input texture where K h (X k − X s |W k ) 0.
Therefore, with equations 17 and 16, one can evaluate the LCPDF (equation 11) with only one FKDE structure and with same computational complexity (O( mN(d+1) M )) of FKDE.This computational complexity is lesser than the earlier texture synthesis algorithm (O(N(d + 1))), since m M ≪ 1.In the following section, we will test the proposed texture synthesis algorithm with a number of natural textures.

RESULTS
Natural textures range from near-regular to stochastic.We have tested the computationally efficient texture synthesis algorithm (sub-section 5.3) for these two extreme classes.The texture samples are taken from a standard database Brodatz [1966].Three textures corresponding to near-regular textures are D103, D20, and D22.In case of stochastic textures we have taken two such cases D12 and D15.The texture D15 is not only stochastic in nature, it has also got a structural component.The order (required to define the neighborhood system of NNMRF) is kept fixed at 20 for all cases.The texture synthesis has been done at single resolution, since if the computationally efficient texture synthesis algorithm is working in single resolution it will also work in multi-resolution.
The results for near-regular and stochastic texture classes are shown respectively in figures 2 and 3.
From figure 3 it can be observed that the computationally efficient texture synthesis algorithm (with FKDE) produces better result than the earlier algorithm in cases of near-regular textures.Whereas, in case of stochastic texture class (figure 2) the results match closely.The results establish the effectiveness of incoporation of the proposed FKDE algorithm within texture synthesis algorithm.

CONCLUSION
The main contribution of this work is to analyse the nonparametric FKDE from the viewpoint of a strict upper error bound (equation 3), and to develop a new algorithm based upon this error bound.It has been shown that in FKDE, the boundaries (hyperplanes) instead of the cluster centroids for partitioning the space produce lesser computational complexity (section 3).
The computational complexity of earlier texture synthesis algorithm (Paget and Longstaff [1998]), is huge due to large dimensionality of the neighborhood vector (Sinha and Gupta [2007]) and the nonparametric kernel conditional density estimation.In this paper, the computational complexity of this earlier algorithm has been reduced through the incorporation of proposed FKDE.The proposed algorithm in texture synthesis has been tested with a number of textures taken from a standard database Brodatz [1966].
We are thankful to the invaluable comments of the reviewers.It is true that, the analysis is much similar to the kd-tree structure and in higher dimension kd-tree structure is not optimum.But, with the normal cluster techniques (as used in IFGT or ball-tree), the texture synthesis results are nowhere close to the original, as evidenced by us.Again, eigenvalue decomposition is a computationally expensive module, but it is the reason for faster texture synthesis compared to the kd-tree structure.A total analysis of the proposed algorithm in terms of kernel density estimation with error bound is in preparation.
FIGURE 1: Limitations of IFGT Yang et al. [2003] and possible direction of solution

Fast
Estimation of Nonparametric Kernel Density Through PDDP, and its Application in Texture Synthesis BCS International Academic Conference 2008 -Visions of Computer Science

FIGURE 3 :
FIGURE 3: Near-regular texture synthesis result; Left column: original texture sample, middle column: synthesis result with earlier algorithm Paget and Longstaff [1998], last column: synthesis result with proposed algorithm

12 end 13 else 14
∈ ℜ d , s ∈ I in = {1 . . .N}: The source data points t n ∈ ℜ d : The target data point k: Depth of the tree R: radius required for kernel computation Here, ℵ(t n ) is the neighborhood area(subspace)around target data point t n , defined as ℵ(t n ) = {t s ; ||t s − t n || ≤ R}. 10 remove i th node from temp node 11 insert children nodes of the i th node in temp node remove the i th node from temp node 15 if v n > m r,i then 16 insert right child node of i th node within temp node 17 end 18 else 19 insert left child node of i th node within temp node H (t s − t n ) 29 end 30 Algorithm 1: Fast kernel density estimation (FKDE) through PDDP k is required in equation 11 for modeling the local simulated annealing.When all the values of W k are zero (W k,i = 0; i = 1 . . .d), then from equation 12 one can observe that K H (X k − X s |W k ) becomes constant irrespective of the value X s and X k , i.e., P(X k |W k ) becomes uniform.At other extreme, when all values of W k are one (W k,i = 1