Using Colour for Image Indexing

Image colour is often thought to be an intrinsic correlate of surface re ectance and so is a common feature for image indexing. In this paper we point out that image colour is actually a function of surface re ectance and imaging geometry and the colour of the viewing illuminant. Fortunately methods exist for normalizing away these dependencies. Pixel based and colour channel-based normalizations remove dependency on geometry and light colour respectively. Unfortunately, neither method removes both dependencies simultaneously and so a single normalization must be chosen. Common practice dictates that pixel-based normalization is the most useful. In this paper we set out to evaluate the merits or demerits of this common practice. In particular, we asked `which works better, pixel based or channel based normalizations?'. To answer this question we carried our many image indexing experiments on a variety of image databases. In all cases, and contrary to common practice, our results indicate that channel normalization facilitates the best indexing performance. We predict that channel based normalization may improve indexing performance for many image indexing applications.


Introduction
At rst glance, image colour would appear to be an ideal feature for image indexing by image content.For example, because images of `beach scenes' typically comprise sea deep blue, sky light blue and sand beige a tripartite colour query of Findlight blue, sky blue and beige should su ce in nding, at least some, images of beach scenes.Indeed IBM's QBIC 1 Query by image content system supports exactly this sort of query.Another advantage of colour queries is that they can be applied to non-annotated images.To query by shape e.g.nd ` sh', in images all images in a given database must be pre-segmented in order to make shapes explicit.Unfortunately, this segmentation must carried out by a h uman operator and so is a costly exercise for all but the smallest of image databases.
In this paper we sound a cautionary note regarding the use of image colour as a querying mechanism.W e show that the colours that are recorded in an image depend on two confounding factors: the relative pose of surface and light and the colour of the light source.This dependence is su ciently strong to confound indexing 2, 3 a b e a c h viewed under a red dusk sky will result in an image where all the colours are redder than they ought to be.
Fortunately, h o wever, various authors have shown that images can be normalized in order to remove dependence on individual viewing conditions 4, 5, 3 .Scaling every image r; g ; b pixel triple to sum to one, r r+g+b ; g r+g+b ; b r+g+b , removes ambiguity due to illumination surface pose 3 .Similarly scaling each colour channel, that is all values in the R, G and B pixel planes, to sum to one, R P n i=1 Ri ; G P n i=1 Gi ; B P n i=1 Bi removes dependence due to the colour of the light 4 .While, neither normalization removes dependence due to both the relative pose of the light source and the colour of the light the former normalization is more The Callenge of Image Retrieval, Newcastle upon Tyne, 1998 prevalent in the literature 6, 3, 5 than the latter 4 .This suggests that pixel based normalization is thought to be the most useful normalization procedure.
In in our research w e quantitatively address the question which image normalization, pixel-or colour channel-based, should we use?".To answer this question we carried out image indexing experiments for a variety of image datasets.In each case the database comprised images of colourful objects and the query images were images of the same objects viewed under di erent capture conditions.When indexing works well both the query and indexed database image should contain same object.
De ning, indexing as object recognition has two advantages over more qualitative experiments.First, there is a right and wrong answer and rightness and wrongness is easily veri ed by a h uman observer.In contrast indexing studies that are based on weaker notions of similarity are sometimes inconclusive or unsatisfying since what I think looks similar may not be what you think looks similar.The second advantage is that, in the object recognition framework, it is easy to control and measure the e ects of variable image capture conditions.We believe that such a quantitive study is a necessary prerequisite to for building robust indexing methods.
In line with previous colour indexing studies 8, 9, 1 , indexing proceeds by nding the database image colour distribution which is most similar to the distribution of colours in a query image.Experiments were carried out for distributions calculated post pixel-and colour channel-based normalizations.Contrary to current database practice we found colour channel based normalization to be much more useful than pixel-based normalization.
We should forewarn the reader that our indexing experiments are carried out on rather small data sets on the order of 100 images.These small data sets are justi ed on two counts.First, since we wish to examine the e ect of image capture on indexing performance, image capture conditions must be controlled and this makes acquisition a more laborious task.Second, recent studies have shown that even for small data sets containing around 10 images 4, 6 that the indexing problem, formulated as object recognition, is very hard indeed.Clearly, unless we can solve the indexing problem on these small data sets we cannot expect good performance on larger image databases.
In section 2 of this paper we review colour image formation and illustrate how image colours depend on the relative pose of light and surface and on the colour of the light.Pixel-and colour channel-based normalizations are introduced as mechanisms for removing these dependencies.These normalizations are applied as preprocessing steps for image indexing experiments reported in section 3.For the image data sets tested, colour channel normalization proved to be the most useful compared with pixel based normalization.We conclude the paper with a short discussion in section 4.

Colour Image Formation
The light re ected from a surface depends on the spectral properties of the surface re ectance and of the illumination incident on the surface.In the case of Lambertian surfaces these are the only kind we consider here, this light is simply the product of the spectral power distribution of the light source with the percent spectral re ectance of the surface.Illumination, surface re ection and sensor function, combine together in forming a sensor response: x where is wavelength, is the 3-vector of sensor responses rgb pixel value F is the 3-vector of response functions red-, green and blue-sensitive, E assumed constant across the scene is the incident illumination and S x is the surface re ectance function at location x on the surface which is projected onto location x on the sensor array.The relative orientation of surface and light is taken in account b y the dot-product, '.', of the surface normal vector n x with the light source direction e both these vectors have unit length.
Let us denote R w S x EF d as q x;E .It follows that 1 can be rewritten as: x;E = q x;E e:n x 2 Equation 2 informs us that the pixel recorded in an image, for a xed illuminant E, is in a xed direction in colour space, q x;E but but has variable magnitude proportional to the scalar e:n x Pixel based normalization removes this variation by dividing each camera response vector by the sum of all responses at that pixel: x;E P 3 i=1 x;E i = q x;E e:n x e:n x P 3 i=1 q x;E i = q x;E P 3 i=1 q x;E i 3 The dependent scale term e:n x cancels in Equation 3. When x;E = r; g ; b then pixel normalization returns r r+g+b ; g r+g+b ; b r+g+b .Notice, however, that throughout the development summarized in 1, 2 and 3 the superscript E still appears: the pixel normalized colours in an image still depend that is, vary with the capture illuminant.
To understand the e ect of the viewing illuminant on captured colours it is useful to think of the camera sensors as delta functions: F i = , i i = 1 ; 2; 3. Delta functions are sensors which are sensitive to a single wavelength of light.Under this assumption: The same surface viewed under a di erent coloured light, E 1 , but under the same viewing geometry induces the following response: The light colour dependent scale terms , and cancel in Equation 7. Other channel based or colour constancy normalizations exist for removing dependence on the colour of the light 10, 7 , 1 1 , 12 .
Of course to arrive at the simple normalization presented in 7 we had to assume that camera sensors were delta functions.While this need not be true, camera sensors generally behave, or can be made to behave, like delta functions 13  1.The Simon Fraser dataset comprises a small database of 13 object images and 26 query images.Query images contain the same objects but viewed under large changes in relative pose and light colour.In Lee and Berwick's image set there are 8 object images and 9 query images.Again queries images are captured under di erent conditions viewing geometry and light colour change.The composite set comprises 87 database images and 67 queries.
To test the e cacy of pixel based normalization we proceed as follows.For all images, database and query, w e carried out pixel based normalization.At a second stage colour histograms, representing the colour distributions, of the normalized images are constructed.However, because R + G + B = 1 b y de nition, R is not independent o f G and B, so only the distribution of G; B tuples is actually recorded.A 16 16 partion of G; B colour space which h a ve v alues between 0 and 1 de ne the bins for the colour histograms.If D i and Q denotes the histograms for the ith database and query images then the similarity of the ith image to the query image is de ned as: jjD i , Qjj 1 8 where jj:jj 1 denotes the L1 or city-block distance between the colour distributions.This distance is equal to the sum of absolute di erences of corresponding histogram bins.Reassuringly, i f D i = Q then jjD i ,Qjj 1 = 0 .
Closeness corresponds to small distances.
For each query colour distribution, we calculate the distance to all distributions in the database.These distances are sorted into ascending order and the rank of the correct answer ideally the query and the 1st ranked indexed image should contain the same object is recorded.Table 1 summarizes indexing performance for all four data sets.Two performance measures are shown: the of queries that were correctly matched in 1st place and the rank of the worst case match.We repeated the same experiment for the colour channel based normalization.However, after channel normalization the sum of R, G and B at each pixel will not equal 1; rather, each pixel sums, on average, to 3=N.T o solve this problem, the normalized image is multiplied by N=3 before histogramming the G; B tuples this ensures that the average pixel sum is 1.Indexing performance for channel normalization is summarized in Table 2.
Clearly, both normalizations support good indexing performance for Swain's dataset with the pixel-based normalization giving slightly better results.Indeed, we expect this since Swain`s images were captured with The Callenge of Image Retrieval, Newcastle upon Tyne, 1998 respect to a single illuminant colour and pose variation between images was small.However, for all other data sets the colour channel normalization provides superior indexing: the percentage correctly matched is always high and the worst case ranking is always low.In contrast, pixel based normalization performs quite poorly.F or the Simon Fraser dataset less than half of the queries are correctly identi ed and the worst case match is 13th out of 13 which is the worst that is possible!.Similar results are reported for Lee and Berwick's images.The composite data set statistics are perhaps the most interesting.Almost 80 of queries are correctly matched after colour channel normalization compared to less than 60 for pixel-based normalization.Moreover the worst case matches are respectively 16th and 86th out of 87.While 16th is almost acceptable 86th is completely unacceptable.
Notice that, for the pixel base normalization, there are fewer correct matches for the composite dataset than for Swain`s database viewed in isolation 51 compared to 64.This is evidence of poor scalability o f pixel-based normalization.In contrast, channel normalization supports 58 correct matches for the Swain data set and this scales to 69 for the composite dataset.

Conclusions
Colour is an appealing feature for indexing by image content.We can naturally describe scenes in terms of their relative colour composition and this is readily formulated as a database query and this query can be executed quickly.Unfortunately, this approach is only reasonable if the colours captured in an image correspond to the colours that we ourselves see.They do not.Image colours depend on viewing geometry the relative orientation of surface and light source and on the colour of the light.
It follows that e ective image indexing is only possible if we can account for and discount these confounding factors.Fortunately, techniques exist for doing just this.Pixel based normalization removes dependence on viewing geometry and colour channel based normalization removes dependence on light source colour.While neither normalization su ces to remove both dependencies, we present experiments that indicate that normalizing for light source colour is more important for image indexing based on colour compared with the pixel based normalization.This observation is at odds with common practice in the image database community where pixel based normalization is more prevalent.
of light c hanges the values recorded in each colour channel scale by the same factor independent of surface re ectance and viewing geometry.It follows that if R, G and B denote the n values recorded in an image for each of the red, green and blue colour channels.Under a change of viewing illuminant the captured image becomes R, G and B where , and are scalars.Colour Channel based normalization removes dependence on illumination colour by dividing each colour channel by the sum of all the values in the colour channel:

Figure 1 .
Figure 1.24 of Swain's object images

3
Colour Indexing ExperimentsWe carried out image indexing experiments for the Swain and Ballard 14 , Simon Fraser 15 , Berwick and Lee 6 image sets and a set of all images combined.Swain and Ballard's image set comprises 66 database and 31 queries images.All images are taken under a xed colour light source and there are only small changes

Table 1 :
Indexing performance of pixel-based normalization

Table 2 :
Indexing performance of colour channel-based normalization in light surface pose.Because the confounding factors in Swain's images are small, we expect good indexing performance for both pixel-based and colour-channel based normalizations.24 of Swain and Ballard's query images are shown in Figure