Robustness of Shape Similarity Retrieval under Affine Transformation

The application of Curvature Scale Space representation in shape similarity retrieval under affine transformation is addressed in this paper. The maxima of Curvature Scale Space (CSS) image have already been used to represent 2-D shapes in different applications. The representation has shown robustness under the similarity transformations. Scaling, orientation changes, translation and even noise can be easily handled by the representation and its associated matching algorithm. In this paper, we also consider shear and examine the performance of the representation under affine transformations. It is observed that the performance of the method is promising even under severe deformations caused by shear. The method is tested on a very large database of shapes and also evaluated objectively through a classified database. The performance of the method is compared with the performance of two well-known methods, namely Fourier descriptors and moment invariants. We also observe that global parameters such as eccentricity and circularity are no longer useful in an affine transform environment.


Introduction
Considerable amount of information exists in two dimensional boundaries of 3D objects which enables us to recognise objects without using further information.However, despite great effort, the problem of shape representation in computer vision is still a very difficult one.A shape is originally defined by x and y coordinates of its boundary points which are subject to transformation if the position of the camera with respect to the object changes.The similarity transformation includes translation, uniform scaling and orientation changes.For example, if the distance between camera and object changes, the size of the boundary contour also changes (uniform scaling).
If the camera is allowed to change its viewpoint with respect to the object, the resulting boundary of the object will be deformed.For example, a circle will be converted to an ellipse (see also Figure 2 for a more complicated example).The deformation can be approximated mathematically by affine transformation when in addition to similarity transformation, shapes are also subject to shear.
It is also believed that a 3D object might be represented by a number of standard views eg back, front, above, etc.The possibility of changes in camera viewpoint will lead to deformation on either of these views and as a result, affine transformation is inevitable in such applications.
A number of shape representations have been proposed to recognise shapes under affine transformation.Some of them are the extensions of well-known methods such as Fourier descriptors [3] and moment invariants [5,14].The methods are then tested on a small number of objects for the purpose of object recognition.In both methods, the basic idea is to use a parametrisation which is robust with respect to affine transformation.The arc length representation is not transformed linearly under shear and therefore is replaced by affine length [6].The shortcomings of the affine length include the need for higher order derivatives which results in inaccuracy, and inefficiency as a result of com-putation complexity.Moreover, it can be shown [9] that although the arc length is not preserved, it does not change dramatically under affine transformation.
Affine invariant scale space is reviewed in [12].This curve evolution method is proven to have similar properties as curvature evolution [8], as well as being affine-invariant.However, an explicit shape representation has yet to be introduced based on the theory of affine invariant scale space.The prospective shape representation might be computationally complex as the definition of the affine curvature involves higher order derivatives.
We have already used the maxima of Curvature Scale Space (CSS) image to represent shapes of boundaries in similarity retrieval applications [11,10].The representation is proved to be robust under similarity transformation which include translation, scaling and changes in orientation.In this paper, we examine the robustness of the representation under general affine transformation which also includes shear.As a result of shear, the shape is deformed and therefore the resulting representation may change.We will show that the performance of the method is promising even in the case of severe deformations.
The following is the organisation of the remainder of this paper.In section 2, CSS image is introduced and the CSS matching is briefly explained.Section 3 is about the affine transformation and the way we create our large databases.In section 4, we show that the conventional global parameters are not useful and introduce a new global parameter which can be used to narrow down the range of search.In section 5, we evaluate the performance of the method in two different ways.The results are then compared to the results of other well-known methods in section 6.The concluding remarks are presented in section 7.

Curvature Scale Space image and the CSS matching
Consider a parametric vector equation for a curve: where u is an arbitrary parameter.The formula for computing the curvature function can be expressed as: If g(u ), a 1-D Gaussian kernel of width , is convolved with each component of the curve, then X(u ) and Y (u ) represent the components of the resulting curve, ; : X(u ) = x(u) g(u ) Y (u ) = y(u) g(u ) It can be shown [10] that the curvature of ; is given by: As increases, the shape of ; changes.This process of generating ordered sequences of curves is referred to as the evolution of ; (Figure 1a).If we calculate the curvature zero crossings of ; during evolution, we can display the resulting points in (u ) plane, where u is the normalised arc length and is the width of Gaussian kernel.The result is called the Curvature Scale Space image of the shape.For every we have a certain curve ; which in turn, has some curvature zero crossing points.As increases, ; becomes smoother and the number of zero crossings decreases.When becomes sufficiently high, ; will be a convex curve with no curvature zero crossing, and we terminate the process of evolution.
In order to construct the CSS image of a curve, we first re-sample it by 200 equally distant points.These points are numbered from 1 to 200; which are the values of u.Each coordinate of these points are convolved with a Gaussian function.The width of the Gaussian, , is gradually increased and at each level the locations of curvature zero crossings are determined and registered.The result of this process is represented by a binary image called CSS image of the curve (see Figure 1).The number of columns of this image is equal to the number of equidistant samples of the Challenge of Image Retrieval, Newcastle, 1999 The CSS image contains a number of contours, each related to a segment of the shape.This is clearly shown in Figure 1.The curve is finally represented by locations of the maxima of its CSS image contours.For example, the shape in Figure 1 is represented by seven pairs of integer values corresponding to seven contours of the CSS image.

Curvature Scale Space Matching
The algorithm used for comparing two sets of maxima, one from the input (also called image) and the other from one of the models, has been described in [10] and [11].The algorithm first finds any possible changes in orientation which may have been occurred in one of the two shapes.A circular shift then is applied to one of the image maxima to compensate the effects of change in orientation.The summation of the Euclidean distances between the relevant pairs of maxima is then defined to be the matching value between the two CSS images.
The following is a condensed version of the algorithm which includes the basic concepts.
Apply a circular shift to all image maxima so that the u coordinates of the largest maxima, one from the image and the other from the model become identical.
Starting from the second largest maximum of the image, determine the nearest maximum of the model to each maximum of the image.
Consider the cost of the match as the summation of the Euclidean distances between the corresponding maxima.
If the number of the image and the model maxima differs, add the coordinates of the unmatched maxima to the matching cost.

Pure shear transform and our databases
The general affine transformation can be represented mathematically with the following equation.
x a (t) = ax(t) + by(t) + e y a (t) = cx(t) + dy(t) + f Challenge of Image Retrieval, Newcastle, 1999 A scaling = S x 0 0 S y A rotation = cos ;sin If S x is equal to S y , A scaling represents a uniform scaling.A shape is not deformed under rotation, uniform scaling and translation.However, nonuniform scaling and shear contribute to the shape deformation under general affine transformation.In this paper, we examine the performance of the CSS representation under shear transform.The measure of shape deformation depends on the parameter k, shear ratio, in the matrix A shear .In the present form of the matrix A shear , x axis is called shear axis, as the shape is pulled toward this direction.In order to create four different databases, we choose four different values for shear ratio, 0:5, 1:0, 1:5 and 2:0.We then apply the transformation on a database of 500 original object contours.From every original objects, we obtain 9 transformed shapes with different values of .Therefore, each database consists of 500 original and 4500 transformed shapes.We then carry out a series of experiments on these databases to verify the robustness of the CSS image representation under affine transformations.The following sections are concerned with these experiments.

Global parameters
We have already used the CSS representation for shape similarity retrieval in similarity transformed environment [10] [11].Since the global shape of a closed curve is preserved under similarity transformation, we used two global parameters, namely eccentricity and circularity, to filter out dissimilar shapes prior to the CSS matching.As it is obvious from Figure 2, the global appearance of a shape is severely changed under shear.In this section we show that eccentricity and circularity can not be used as shape descriptors under shear.We introduce another global parameter which is extracted from the CSS image.In order to reject dissimilar models prior to CSS matching, we used to compute e and c , and a as follows [1].e = j e i ; e m j max(e i e m ) c = j c i ; c m j max(c i c m ) a = j a i ; a m j max(a i a m ) where e and c represent the eccentricity and circularity of the boundary and a represents the aspect ratio of the CSS image, while i and m stand for image and model respectively.According to their definition, e , c and a are between zero and one.Our experiences show that for globally similar shapes these values are not more than 0:4.
We considered each original shape of our database as the input shape and its transformed versions as the models.We then computed these parameters for all original images and their associated transformed shapes.For each parameter, we obtained 4500 values between zero and one.The accumulative histograms of different parameters for a shear ratio k = 1 :5 are presented in Figure 3.
As Figure 3a shows, eccentricity can not be used as a shape descriptor under affine transformation, as there is a large difference in this parameter between the original and transformed shapes.In more than 45% of cases, e has been higher than 0:4.Though this figure is 30% for circularity, it is still much higher than it should be.The aspect ratio of the CSS image, however, remains in a reasonable range under affine transformation.Figure 3c shows that a is always less than 0:2 when an original shape is used as the input and its transformed version as a model.We also Challenge of Image Retrieval, Newcastle, 1999

Evaluation
We examined the performance of the representation through two different experiments.The first one was performed on the databases of section 3. Every original shape was selected as the input query and the first n outputs of the system were observed to see if the transformed versions of the query are retrieved by the system.We found out that for k = 0:5, almost all transformed versions of a shape appear in the first 20 outputs of the system.For k = 1 , on average 99% of transformed shapes of the input query are among the first 20 shapes retrieved by the system as the similar shapes to the input query.This figure is 96% and 94% for k = 1 :5 and k = 2 respectively.The results for different values of k and for different numbers of output images are presented in Figure 4.As this Figure shows, the first outputs of the system, always include a large portion of the transformed versions of the input query.These results show that the representation can be used in affine transformed environment.This fact is also verified by the following evaluation method.

Objective evaluation with classified database
We have already used the idea of using classified databases to evaluate a shape representation method in shape similarity retrieval [2].A subset of our large database is usually used for this test.Here we have chosen 76 shapes and classified them in 10 different groups as presented in Figure 5.These are our original shapes and we produce 9 transformed shapes from each original one.As a result, a group with 8 members will have 80 members after adding the Challenge of Image Retrieval, Newcastle, 1999 In order to assign a performance measure to the method, we choose every member of each group as the input query and ask the system find the best n similar shapes to the input from the database.We then observe the number of outputs, m, which are from the same group as the input.The success rate of the system for this particular input is defined as follows.
Success rate for an input query = m m max 100 where m max is the maximum possible value of m.Note that m max is equal to n if n is less than the number of group members; if not, m max will be equal to the number of group members.The success rate of the system for the whole database will be the average of the success rates for each input query.We chose different values for n and for each case, computed the success rate of the method.The results are presented in Figure 6a.As this Figure shows, for lower values of n, the success rate of the system is considerably large.For example, in the case of k = 1:5 and n = 1 0 , it is 98%.In other words, on average 9:8 out of the first 10 outputs of the system belong to the same group as the input.This is as a result of the fact that the system ranks the outputs based on their similarity measure to the input, and there is a strong possibility that the most similar models be in the same group as the input query.As n increases, the similarity between the bottom ranked outputs and the input query decreases and models from other groups may also appear as the output.At the same time, m max is equal to n for n 80 and increases with n.For n 80, m max does not increase anymore and is fixed at 80, however m always increases with n and therefore this part of each plot has a positive slope.

Effects of global parameters
In section 4 we argued that eccentricity and circularity are not appropriate in affine environment.However, the other two parameters, namely aspect ratio and peaks sum may be helpful.We carried out a series of experiments with different threshold values for each of these parameters and observed that although the improvement on the performance measure of the system was not significant (2 ; 3%), the number of rejected candidates was considerably large (more than 70% of the database population).As a result, using global parameters increases the speed of the system significantly.

Comparison with other methods
Fourier descriptors [13] and Moment invariants [4,7] have been widely used as shape descriptors in similarity transform environment.Both methods represent the global appearance of the shape in their most important components.For example, the largest magnitude component of Fourier descriptors represents the dimensions of the best fitted Challenge of Image Retrieval, Newcastle, 1999 Challenge of Image Retrieval, Newcastle, 1999 ellipse of the shape.Since affine transformation changes the global appearance of a shape, it is expected that the performance of these methods is negatively affected under the transformation.The modified versions of these methods have been introduced to deal with affine transformation [3,5,14].They are used in object recognition application and clustering a small number of shapes.However, it has been shown that the improvements in modified versions are not very significant in comparison to the conventional versions.Considering the fact that the implementation of the modified versions is not a straight forward task, we decided to examine the conventional versions of these methods and compare the results with the results of our method.We will observe that the difference between the performance measure of our method and the performance measure of each of these methods is very large.Even if we consider 10 ;15% improvement for the modified versions of the methods, their performance are still well behind the performance of the CSS representation.
We implemented the method described in [13] to represent a shape with its normalised Fourier descriptors.Every original shape and its transformed was presented by their first 20 components.The Euclidean distance was used to measure the similarity between the two representations.
The results for different values of k are presented in Figure 6b.The minimum of the performance measure for different values of k is around 30%, comparing to 70% of the CSS method.At the same time, the slope of the plots after the minimum points is not sharp which means that most of the missing models are not ranked even among the fist 200 outputs of the system.
For moment invariant, each object is represented by a 12 dimensional feature vector, including two sets of normalised moment invariants [4], one from object boundary and the other from solid silhouette.The Euclidean distance is used to measure the similarity between different shapes.The results are presented in Figure 6c.Comparing this plot with Figure 6a, a large difference between the performance of the CSS representation and this method is observed.

Conclusion
In this paper we demonstrated that the maxima of the Curvature Scale Space image can be used for shape representation in similarity retrieval applications even under affine transformation.We observed that the method can identify the transformed versions of input query among a large number of database models.The results of evaluation of the method on a classified database of original and transformed shapes showed that the performance of the method is promising even under severe deformation of the shapes as a result of affine transformation.The method was compared with two well-known methods, namely Fourier descriptors and moment invariants.The results showed the superiority of our method over these methods.We also showed that global parameters such as eccentricity and circularity can not be used in an affine transformed environment.Consequently, we introduced another CSS-based global parameter and used it to filter out a large fraction of dissimilar shapes prior to CSS matching.

Figure 1 :
Figure 1: a)Shrinkage and smoothing of the curve and decreasing of the number of curvature zero crossings during the evolution, from left: = 1 4 7 10 12 14.b)The CSS image of the shape.

Figure 2 :
Figure 2: The deformation of shapes is considerable even with k = 1 in shear transform.The original shape is presented in top left.Others represent transformation with k = 1 and = 2 0 40 : : : 160 180 .

Figure 2
Figure2shows the effects of affine transformation on shape deformation.In this Figure, shear ratio is selected as k = 1 .In order to achieve different shear axes, we have changed the orientation of the original shape prior to applying the pure shear transformation.The values of range from 20 to 180 , with 20 intervals.As this Figure shows, the deformation is severe for k = 1 .For larger values of k, e.g.1:5 and 2, the deformation is much more severe.

Figure 3 :
Figure 3: a) and b)The original shapes are not similar to their affine transformed versions with respect to eccentricity and circularity.c) and d) The CSS-based global parameters show better performance.(see section 4)

Figure 4 :
Figure 4: Identifying transformed versions of the input query.k is the shear ratio and represents the measure of deformation.(see section 5)

Figure 5 :
Figure 5: Classified database used for objective evaluation consists of 10 different classes.Note that for each original shape, nine transformed versions are also generated.Therefore, the actual size of the database is 10 times larger than the size of this database.

Figure 6 :
Figure 6: The results of objective evaluation for different approaches.All terms are defined in section 5.1