Towards automatic recognition of plant varieties

In crop variety registration, visual scores of a plant’s appearance are widely and routinely used to establish variety differences. This makes it an ideal candidate for automating through machine vision. This paper describes the outline of a planned system for searching for matches between images from new candidate varieties and those stored in a database of established varieties. Some of the image analysis tools used are described. These include summary statistics derived from 3D histograms of colour components, morphological measures of the shape of leaf tips, principal components of landmark variation and eigenimage scores.


Introduction
Assessing the appearance of plants is an important botanical skill, with many applications, ranging from simple recognition to plant health diagnosis.One application is Distinctness, Uniformity and Stability (DUS) testing, where new varieties are compared to establish differences from existing varieties before they are given official recognition.The aim is to identify consistent differences in respect of one or more characters between existing and potential new varieties.Any aspect of visual appearance is a potential character.
Traditionally, differences in appearance can be assessed by measurement or subjective scoring.The latter can be used for simple size and shape measurements, such as lengths and their ratios.The latter is used for more subtle differences that would be more difficult or cumbersome to measure.It suffers the inevitable drawbacks that it is tedious to do and subject to inconsistency between observers and over time.
There have been a number of attempts to use automatic methods to assess plant appearance.These have ranged from straightforward use of colour meters [1] to the use of image analysis to extract shape features [2][3][4].
To be of real use in variety registration, a machine vision system would need to deal with many crop species; use many aspects of plant appearance; be easily extended to include an increasing set of varieties; exploit new developments in machine vision; integrate with non-visual information; Challenge of Image Retrieval, Newcastle upon Tyne, 1998  1: Elements of a plant variety matching system.On the left are image analysis methods and algorithms.On the right are the procedures for guiding the user through a variety matching session.These draw on image analysis information.

Towards automatic recognition of plant varieties
be arranged hierarchically so that the information called on depends on crop species and earlier decisions regarding broad groups of varieties.
In this paper we describe how such a system (which we have named Visor) will be set up and illustrate some work on easily generalised image analysis tools that are being used.A broad description of the system is given in x2, with x3 discussing some tools that can be used to measure colour, local shape properties, global shape and brightness features.

A variety matching system 2.1 The task
We wish to compare images from plants of unknown variety (hereafter termed 'the candidate') with those of known variety in the database, in order to find close matches.This might be in order to test that a sample is of the variety claimed, to identify a plant of uncertain provenance or to assess the distinctness of a variety claimed to be new.All of these require an answer to the question "What variety in our database does the candidate most resemble?" The appearance characteristics that are useful for this purpose will depend greatly on the species in question.What is appropriate for strawberries will be very different from what we would use for carrots.There will be many parts of the plants we could acquire images from, and many types of measurements we would make on them.
The system also needs to allow for the variabilities of different characters.There will be variability within varieties not just between individual plants, but also due to the year and site at which the plant was grown.

The image database
There are a number of stages in the implementation of Visor.First, the database of varieties is set up, and images of relevant parts of the plants are obtained and stored.This takes time as a full set of varieties is not usually grown every year.At present, carrots and celery are being used as examples in developing the structure of the database.

Challenge of Image Retrieval, Newcastle upon Tyne, 1998
It is important that, as far as possible, images are obtained of plant parts at a similar growth stage, under standardised lighting, distance from plant etc.Some aspects will be critically affected by such matters (e.g.colour) whereas others (e.g.some aspects of local shape) will be more robust.The system can accept images obtained under non-standard conditions where that is all that is available, and these can be used with a subset of the matching tools.
Although intended as a resource for a machine vision engine, the database is being created to fit in with existing practise in that It can store non-visual information, also used in variety assessment It is suitable for browsing by plant scientists who wish to look at images for their own subject assessments.
The database consists of a simple flat file with each variety being represented by a single record.Within a record, there is held textual and numerical information about the variety along with pointers to image files held in .gifformat.The graphical user interface is based on World Wide Web browsers initially, though it is expected to move eventually to a system using Java.Efficient indexing of visual information will be an important issue, particularly for species with a thousand or more varieties.A possible strategy is the use of hierarchical matching of inexact data.

Image matching
The way a variety matching session will work is shown in Figure 1.A key feature is that a wide range of image analysis tools are available, of which a (possibly quite small) subset will be selected for given species and broad grouping of varieties.To this extent, the system is hierarchical, but is being designed to be flexible and to accommodate differing degrees of prior knowledge on the part of the user.
The fundamental task is to determine how closely a candidate resembles any variety in the database.This is most easily done using scores -measures of some aspect of each image.If we have several images of a variety (possibly accumulated from a number of sites, or over a number of years) we can easily form mean scores.We can also average scores for a broad grouping of varieties, and for many species it will be convenient to do the matching in two stages.Let Z i j ; i = 1 : : : n ; j = 1 : : : m denote the mean score (in the database) of character i for variety j.Then given scores Z i k for a candidate (labelled k), we can measure its resemblance to variety j from D where T j k allows for the inclusion of other information and w il j are weights, which we allow to be different for each variety, reflecting that the pattern of character correlation and variation may not be consistent.A natural choice for w il j are the elements of the inverse correlation matrix of the Z i j , so that the first part of D j k is the Mahalanobis distance.Evaluating D j k is most efficient when the scores Z i j are uncorrelated, so that w il j = 0 for i 6 = l.This holds for the shape principal components and eigenimages described in x3.It is likely that we will usually be able to drop the j suffix from w il j (i.e. using linear rather than quadratic discrimination).We can then also handle some missing scores for the variety or candidate, by replacing the term Z i j , Z i k 2 with its mean over all varieties.All of the algorithms developed to date are directed towards the score approach.However, other image analysis methods can also contribute, and these form the T j k term in (1).For example, we can consider warping one image to fit the other [7,8].The amount of warping needed is a measure of how similar they are, and so can be used in matching.Another approach is to use template matching to find structures in both images.Resemblance between the templates provides a matching criterion.A contribution to T j k could also come from Non-image-derived information.Measurements such as plant height will also be collected Any other prior information about the varieties the candidate k resembles.
We have already tested a number of image analysis tools to generate scores, and shown that they have the ability to discriminate between varieties.The rest of this paper will illustrate a number of these.

Colour
Colour is in some ways the easiest feature to work with.In certain species, the leaves or other plant parts will appear subtly, or sometimes strikingly, different in overall colour between varieties.This can be assessed by looking at the colour histogram, and obtaining some summaries of it.A full description is given in [9].
Fig 2 shows four images each from four varieties of Brussels sprouts.Although reproduced in monochrome (the originals are mainly green), variety differences can be seen.The plant leaf and stems can easily be separated from the surrounding soil by thresholding the green component.The colours of each image were then summarised by obtaining The average green intensity; The average red intensity; The average blue intensity; The proportion of pixels whose green intensity exceeded 200; The proportion of pixels whose green intensity exceeded 225.
The latter two were chosen to indicate the amounts of very bright leaf area in each image, usually due to stems or leaf undersides.Using multivariate analysis of variance, it was found that with a sample of 8 varieties, 86% of differences between pairs were statistically significant at the 1% level.

Local shape
Figure 3 shows four carrot leaves, obtained by thresholding a digitised photograph taken in a laboratory.Certain characteristics of the leaves, such as the average number of pinnae (tips) and their shape differ between varieties.This is a local shape characteristic -it does not matter where the pinna is in the image when we examine its shape: Challenge of Image Retrieval, Newcastle upon Tyne, 1998 Figure 3: Four carrot leaves, from the same plant the leaves are not rigid.The shape of the pinnae is most easily measured by considering the effect of morphological erosions, dilations, openings and closings with discs of different sizes [10].The proportions of the leaf area removed or added by these operations generate a set of shape scores.
It is not difficult to find and count pinnae -we look for leaf edge pixels where the proportion of neighbouring pixels (in a window) that are also leaf reaches a local minimum.The indentations are likewise found, and pinna-tonearest-indentation distances may be obtained.
Although these counts and shape measures do differ significantly between varieties, they did not prove powerful enough in a discriminant analysis test, and need to be combined with other information.The appearance of carrot roots proved more useful.

Global shape
Carrot root appearance can be assessed from longitudinally sliced roots.The most apparent difference between the carrots is their overall shape -the variation in the outline.These differences are in terms of global shape -the relative positions of different parts of the carrot in the image as a whole are important.The most common approach to studying such variation is to use landmark points [11] around the carrot.
Carrot images photographed in a laboratory were first processed to provide silhouettes.This involved thresholding the grey levels to separate the carrot from background, and removing any small (i.e.non-carrot) connected sets of pixels that remained.We selected 17 positions on each carrot outline: the tip, the top and bottom shoulders of the crown and the two edges at each of seven equally spaced positions between the tip and the midpoint of the crown shoulders.The position of the carrot within the image, and its orientation, are irrelevant.To remove position, all landmarks were measured relative to the centroid of the silhouette.The landmarks were then rotated so that the line joining the tip and the midpoint of the crown shoulders was horizontal.
The variation in landmark positions may be studied by looking at their principal components.This has been found useful for other applications, such as electrical components and human faces [12,13].Fig 4 shows the effect of each of the first six components (mean shape +/-2SD), which account for 72.0%, 10.8%, 8.2%, 3.4%, 2.0% and 0.9% of the total variability.These were based on 312 carrots from 26 varieties.The signs of components 4 and 5 contribute no useful information, and so only their absolute values are used.
Landmark principal component scores enjoy the advantage that they do not require prior knowledge of carrot shape for their definition.It is interesting then to note that the first 3 components correspond to scores that variety registration authorities already use.A straightforward modification of the method would be to define components that maximised the between-to-within variety variability.

Eigenimages
Eigenimage analysis considers the principal components of the greylevels of a set of images.It was developed for use with faces [13][14][15], where the results have been termed eigenfaces (which terminology requires us to designate our results 'eigencarrots').A subset of 156 carrots was used.A first step is to warp all the carrots so that their outlines match.First, a simple linear magnification or reduction was applied to transform the carrots to a common length of 350 pixels.The mean thickness, across the 156 carrots, at each of the 350 horizontal positions was then obtained, and these 1-pixel vertical slices were then centred and magnified or reduced to the mean thickness for that slice.
A 156 156 matrix of covariances between all pairs of images was created, and its eigenvectors used to obtain the eigenimage loadings.The loadings for the first six components are shown as images in Fig 5 .(Plots of the effect of individual components, as in Fig 4 , were not used as the low variability in most components makes it difficult to see their effect in a laser-printer quality plot.)The components accounted for 30.7%, 9.7%, 8.0%, 3.2%, 2.3% and 1.9% of total variation.They are more difficult to interpret than the outline principal components: the first component seems to be a core/cortex contrast, the second is affected by core thickness, the third by the tapering of the core, the fourth looks at the core/cortex boundary brightness etc.
There is little correlation between the outline shape components and eigenimage components.When combined it was found that by selecting the first 4 or 5 eigenshape scores and 8 eigenimage scores, and using linear discriminant analysis, a classification success rate of 64% could be obtained with 15 variety groupings.If some of these were combined, this could be improved to 85%.

Conclusion
The Visor system is still under development.A prototype data base has been created and some crop varieties are now being grown to provide a suitable set of images for inclusion.A number of image analysis tools have been developed and tested, and work on this continues.Challenge of Image Retrieval, Newcastle upon Tyne, 1998 (1) (2) (3) (4) (5) (6) Figure 5: Carrot eigenimage loadings Variety testing is carried out in many countries, and a large range of crop species are involved.Most species are tested in several countries.A machine vision system for assessing plant appearance has enormous potential to contribute to this global task.We intend to collaborate internationally with image analysis specialists and plant scientists to ensure that Visor is powerful, flexible and widely applicable.

Figure 2 :
Figure 2: Four varieties of Brussels sprouts, arranged in columns

Figure 4 :
Figure 4: Principal components of carrot shape variation