Searching for Photos - Journalists’ Practices in Pictorial IR

This paper reports the results of a field study on journalists’ practices in requesting, searching for and selecting photos in the course of their daily work. The study addresses different types of search topics common in journalistic illustration tasks, journalists’ searching behaviour and the criteria they apply in selecting photos. Data were collected by observing journalists in their work and interviewing them. A sample of requests received by the archive was also analysed. The results indicate that specific needs dominate the use of newspaper photo archives. Photos of objects, themes, or abstract topics expressed in general terms were also needed, but finding them and formulating queries in these cases especially was considered problematic. The results suggest that browsing is an essential strategy in accessing digital photo archives. Journalists tend to browse but the present archive systems support browsing poorly. The paper concludes with suggestions for the improvement of end-user access to photo archives. The possible applications of current feature-based indexing and retrieval methods in the newspaper photo archive are discussed in the light of the results.


Introduction
Photojournalistic work is changing with the new applications of digital technology in newsrooms [14].Photo retrieval, manipulation and lay-out design can be done at personal workstations in the networked environment.The main tasks of photo archivists are to select and index photos for the archive, while searching has shifted almost totally to end users.Journalists, graphic designers and other editorial staff are expected to exploit digital photo collections directly.[See 25.]This paper deals with end-user behaviour in searching a digital newspaper photo archive.Digital photo archives are a new phenomenon as operational systems and a fairly untouched area of research.Hence, the picture of end-users in searching, selecting and using photos is hazy.
On the one hand, the past research on image retrieval has concentrated on outlining a theoretical framework for conceptual indexing of images performed by humans [9-10, 15, 22-23].On the other hand, some studies have categorised and analysed search requests received by the mediators [4-5, 11, 18-20].Only a few papers have focused on searching behaviour [e.g . 1]. Probably the largest body of research deals with the development of automatic feature-based indexing and retrieval methods for image databases [for reviews, see for example [2][3].
Only a few user-oriented studies have been reported to date and typically they are based on analysing photo requests received by a particular manually operated archive.Various typologies have been applied in categorising image requests.Keister [11] classified requests received by the archive of the National Library of Medicine into two groups: 1) visual requests, in which the user constructs the image by defining what should be seen in it and 2) topical requests with non-specific visual requirements.She found that one third to one half of the requests were of the first type.
Enser [4] categorised almost 3000 requests received by the Hulton Deutsch Collection, which is a general commercial image collection including several distinct sub-collections.In Enser's typology, requests fell into two main categories: 1) unique, containing requests for specific entities, events, locations and 2) non-unique, containing requests formulated in terms of generic concepts.Most requests fell into the unique category (69%) and this was particularly the case among the requests from newspaper and magazine publishers (70%).Requests in both groups were refined with specifications concerning time, location, event or technical attributes.
Ørnager [18][19][20] studied requirements for indexing and retrieval in newspaper photo archives.The 15 archives concerned were of the paper bag type.Requests received by archivists were studied by interviewing archivists and by observing journalists formulating their requests.According to the archivists, one half of requests were simple, concerning persons, and another half dealt with themes.Ten per cent of requests were defined as complex, requiring an in depth-interview.Ørnager proposes a user typology based on observing the journalists giving requests to the archivists.The typology includes specific, general, storyteller, storygiver and only-size-matters inquirers.Ørnager concludes that the interests of the users are difficult to define.Newspaper image archives hold pictures from wide subject domains and journalists' subject fields are all the topics in the world.
Batley's [1] research on searching behaviour was conducted in an experimental environment.A database containing 950 photos was created on videodisc for research purposes.Searchers could use three strategies: 1) random browsing by using a joystick, 2) specific browsing, where the user first selects a subject area from the menu and then browses a group of related photos and 3) searching with keywords selected from a keyword list.Research subjects (41) were given different types of tasks.The study established that the use of browsing increased as the specificity of given tasks decreased.Keywords were used most in searching for images for specific tasks (e.g.The River Dee).For general needs (e.g. a ruined castle) all strategies were applied almost equally frequently.Abstract (e.g. a busy street scene) or subjective (e.g. a pretty scene) tasks mostly led to random browsing.
The aim of the present study was to form a general picture of the journalist's searching behaviour in a digital newspaper photo archive.Searching for photos in the newspaper environment is an integral part of the journalistic work process.To specify, the goals of the preliminary study were to clarify 1.
the characteristics of illustration tasks and work tasks related to illustration 2.
the attributes of photos of interest to journalists when searching for photos themselves or requesting photos from the archivists 3.
the selection of photos and the selection criteria applied (i.e. the relevance assessments made by the journalists in the work task situations) 4.
end-user behaviour in searching for photos.
This was a preliminary study in a project developing indexing and retrieval methods for digital photo archives.The ultimate goal was to develop ideas for potential applications of feature-based indexing and retrieval methods in newspaper photo archives.
We also studied the archiving and indexing practices in the newspaper archive by examining the archiving processes including photo selection, the attributes of photos archivists indexed, sources of information they exploited in indexing and descriptions attached to photos by photo agencies and photographers.These results are reported elsewhere.This paper will focus on the results concerning the needs for photos in journalistic illustration processes, searching behaviour of journalists and the photo selection criteria they apply.The possible applications of feature-based indexing and retrieval methods in the newspaper photo archive are also discussed.

Study Environment
The study was conducted in Aamulehti, the second largest daily newspaper in Finland.In Aamulehti, the digital photo archive came into use in spring 1996.The archive is quite new and it contains only the newest photos.The paper bag and on-line bibliographic archives containing older material are still used to some extent.Photos from these old archives are scanned for the digital archive if they are published.
The archive is divided into a photo cache archive and a permanent archive.Photos are scanned by photographers or delivered by photo agencies into the cache archive where they are kept for three weeks.About 20% of all photos stored to the cache are archived in the permanent archive.This includes all published photos and selected non-published photos.The selection is done mainly by archivists.In 1996, some editorial sections took part in the selection but this interest declined in 1997, even though journalists criticised the selections made by the archivists.
In Aamulehti, the photos for the newspaper issue are selected by journalists, sub-editors, lay-out designers, and other editorial staff henceforth referred to as journalists.Searching for and selection of photos is usually embedded in other tasks (e.g.writing articles, designing lay-out).All editorial staff have access to the digital photo archive at their terminals and usually they conduct searches themselves.A few journalists do not yet use the digital archive and sometimes the active end-users also send requests to the archivists after unsuccessful retrieval attempts or when older photos are desired.
The storage and retrieval system in use is NewsLink 1 , which is based on the TRIP fulltext retrieval programme.The text archive containing published newspaper articles runs on the same system and came into use a few years earlier than the photo archive.This may have affected the searching styles adopted by journalists in searching for photos.The photo archive is integrated into the lay-out programme so that photos can be called up straight to the page under construction.
Each record in the archive consists of the high-resolution image, a low-resolution copy of it and a structured set of textual descriptions.The textual descriptions include in total 39 fields for technical information (e.g.format, size), for information on photo source (e.g.photographer, photo agency, shooting place and date), for archiving data (e.g.record creation date) and for publishing (e.g.dates, sections) and subject information (e.g.caption, additional free text description, person and organisation fields).A thesaurus2 is available for indexing and searching for themes.The same thesaurus is also used in the text archive and it was designed in the first place for indexing newspaper articles.
In addition to the indexing by newspaper archivists, the captions written by photo agencies form an important part of a photo's subject description.Captions usually describe what the photo represents quite concretely (when, where, who, what) and the news event the photo is associated with.Most international photo agencies produce this information in a standard way.For example, a typical caption may contain the following: [Gramm 1996]FILE -Republican presidential contender Sen. Phil Gramm, R-Texas, spends a solitary moment on his plane at the airport in Baton Rouge, La., Tuesday night, Feb. 6, 1996.Battered by back-to-back defeats in Iowa and Louisiana, Gramm called top supporters around the country on Feb.13, and told them he would quit the race on Wednesday.
NewsLink supports Boolean queries in the fields of textual descriptions.Searchers can use either a simple or an advanced search form (Figure 1).The simple search form provides a field for entering Boolean queries and a menu for selecting time limits (record creation date), which makes it easy to restrict the search to the newest photos only.In the advanced search form, searchers can exploit all searchable fields of textual descriptions.This includes theme classification organised into three-level on-line menus.

Data and Methods
The data for this study were collected in the Aamulehti archive and newsroom in summer 1996 and spring 1997.During the study period the number of photos in the archive increased from 21 000 to 83 000.In summer 1996 many journalists were still familiarising themselves with the archive and requests were quite often sent to the archivists.The archivists estimated that in spring 1997 the rate of requests had decreased to one third from that in summer 1996.
The data concerning illustration was gathered by observing journalists in their work and by interviewing them.In addition a sample of photo requests received by the archivists was analysed.Different methods were used to produce complementary data.This is considered to improve the validity of the research [see e.g.6].
The observation was conducted in the news sections in the evenings, when most of the photos are needed.In the current affairs sections the time for observation had to be fixed beforehand with the journalists, because the illustration processes were conducted by several journalists in different rooms.Journalists working days are busy and unpredictable and this characterised the data gathering [see also 17].Most of the time spent in the observation phase was "hanging around" and watching everything that happened in the newsroom.The illustration tasks journalists conducted took only a small part of their working time and their daily schedules were difficult to anticipate.However, the time spent in the newsrooms was useful in understanding the context of the illustration: the different work tasks and the flow of the work in editorial office.
The observation was conducted in a participative manner.The observer followed journalists' work in their offices.The actions taken during observation were written down.The journalists explained what they were doing and the observer asked questions if it was necessary.The most problematic area on which to get data was selection criteria, which journalists obviously found difficult to explain.Otherwise the journalists were quite eager to describe their work.
The main focus was on the illustration processes.The journalists were asked to characterise their illustration tasks if they did not do this spontaneously, which was most common.When the journalists commented on the illustration tasks, created ideas for illustration and conducted searches their utterances were tape-recorded.Query statements made were written down.During the observations it became clear that the query statements did not reveal much about actual photo needs.Users sometimes composed restrictive queries based only on the photo source and date.While from the observer's point of view the queries sometimes seemed to have a tenuous relationship to the subject of the search, i.e. what the journalists said they were looking for, more attention was given to obtaining journalists' descriptions of their search topics.When the journalist selected the photo to be published, which might happen much later than the actual search, tape-recording was again used.
The processes of 20 illustration tasks conducted by eight journalists were observed from the creation of ideas to final selection of photos to be published.The illustration processes observed related to the eight editorial sections displayed below.The division of sections follows the practice commonly used in newspapers.

Number of search topics (N=27)
News sections: Economics, foreign affairs, front page, sports 7 8 Current affairs sections: culture, current affairs, sunday supplement 7 10 Pull-out supplement (deals mostly with TV, films and music) 6 9

Table 1: Number of Illustration Processes observed and Search Topics Occurring in Different Editorial Sections
In total, 27 search topics originated from 20 illustration tasks because some tasks involved different searching ideas.A search topic is defined here as a search for photo(s) concerning one illustration idea expressed by a journalist.For example, if a journalist said that a photo of Bill Clinton and a photo on the theme of nuclear power were optional photos for an article, these were considered as two search topics.If a journalist said that (s)he was looking for something about the Russian Mafia and tried search keys like St. Petersburg and contract killings, we considered there to be only one search topic, even if the searcher tried to find photos on different topics.
The recorded tapes were transcribed and combined with the written notes to obtain a complete description of illustration processes.Every illustration process was then drawn as a continuum of moves or steps taken by a journalist.Even though there were differences in the illustration processes analysed resulting from the individual styles of journalists and from the characteristics of the illustration tasks, common patterns could be identified.
The aim of the theme interviews was, first, to check if the observations were in line with the views of the subjects.Second, the interviews were designed to help in explaining the observed searching behaviour.Third, the journalists had the opportunity of expressing their views on the archive.Interviews with three journalists (a group discussion with two journalists and a separate interview with one journalist) were conducted.
A total of 108 requests sent to the archive was collected by questionnaires, which the archivists filled in as they received the request.The requests were mostly given by telephone.In all, 49 requests were gathered in 1996 and 59 in 1997.The objective of the analysis of requests was to clarify the subjects of interest to journalists, bearing in mind that they express only the compromised needs [see 24] of journalists.In addition, because the requests were recorded by archivists they may not be in the original form as expressed by the journalists.

Photo Needs and Searching Behaviour of Journalists 4.1 Characteristics of the Illustration Processes
In Aamulehti, illustration is integrated into writing, editing and lay-out tasks.The journalists' work is characterised by tight schedules and unexpected changes in daily plans.We found that in daily routines, they did not seem to have much time to find "the best" photo.Rather they tended to make acceptable selections.The effort devoted to illustration, generating and evaluating ideas, searching for candidate photos and selecting the one to be published depended on time available and on the status of the photo on the page.
The observed behaviour indicated that the journalists often had many ideas in mind when they were looking for an illustration.The illustration of an article is a fairly open task, even though a photo is sought and selected for a particular article.Broader considerations such as the section and page lay-out also restrict the options for illustration.However, photos are often rich in elements and they may be used in various contexts and in many ways.As Ørnager [18][19][20] observed, photos are not always tied to the subject of an article.The function of the photo may be also to evoke associations.Furthermore, a photo may be made to fit the article by using only selected parts of it or by creating the associative link between the text and the photo in the caption [e.g.see 12].In some cases the empty space on page can be filled by any "neutral" photo, text or graph.
There appeared to be differences in illustrating news articles and feature articles3 .The differences were based on two dimensions: time span and objectivity.Speed is essential in reporting news events and photos of these events are needed urgently.Photos used to illustrate news are mostly current documentary photos and there is little time or space for developing illustration ideas.The main point is that the photo should represent the particular event that the article is reporting.These photos, when searched for in the archive, are searched for in the cache archive, to which photo agencies send photos relating to daily news.Feature articles are typically less date dependent and more subjective.They often provide more opportunities for illustrations than news articles.Symbolic photos 4 and photos of themes are often used to illustrate feature articles.Portraits are the most common photo type in a newspaper issue.According to journalists' comments the portrait is the easiest and quickest, but often the most boring way to illustrate an article.
After creating ideas, the journalist either searches for the photo in the archive or gives a request to the archivist.(A model of the illustration process is given in Figure 2

Photo Needs
The main focus of requests sent to archivists and search topics originated in the illustration tasks observed fell into four main categories: 1. concrete objects 2.
themes interpretable from the photo 3.
known photo The first category deals with photos of concrete objects like people, buildings or places.In the second category the main focus of journalists is on themes or abstractions interpretable from the photo.The third category deals with the photo's background information.Journalists are looking for photos that document something.Topics like specific news events and films and television programmes were included in this category.The fourth category deals with a known photo or series of photos, which are usually searched for or requested by publishing time, shooting date, place or the photographer.
Most requests received by the archivists were for photos of objects and presented in the form of proper names (Table 2).Almost half of all requests were for persons (e.g."Mr.Olli Keskinen, face").Other specific requests were mostly for buildings (e.g."the Alexander school building, outside view") and geographic locations (e.g."Rauma Old Town").Some 20% of requests were made in the form of common nouns referring to objects.These objects were various, most frequently animals ("cow in the pasture"), vehicles ("a good photo of the front part of a bus") and people ("clergyman wearing bands").Some 8% of requests were for themes (e.g."gesture language").Some of these requests were constructed by examples (e.g."a photo of the savannah in Africa, a lion, giraffe or other wild animals or of the Serengeti" or "symbolic photos of medicine, for example a snake or diseases").Some 8 % of requests focused on the photo's background information.Half of these concerned films and television programmes searched for by the titles of films or names of directors.Half of the background requests were for news events, for example, "a photo of the repatriation of Russian war prisoners to the Soviet Union after the counterattack" and "a photo of the completion of the Näsineula scenic tower in 1971".Three requests were for a particular photo or series of photos.For instance, a journalist requested "the photos taken in Hervanta suburban market place last summer".Known photo 3 3

Table 2: Distribution of Requests Received by the Archive
The distribution of requests is in line with Ørnager's [18][19][20] results, where half of the requests were for persons (here 45%).To compare our results to Enser's [4][5] results, some classes must be combined.Enser's 'unique' category corresponds to objects expressed by proper names, background information and known photos.The 'non-unique' category corresponds to objects expressed by common nouns and themes.The results seem to support Enser's findings: 'unique' requests account for 70% of all requests.
More than a half of the requests were refined [see 4] by some technical or contextual criteria or criteria relating to the subject of photo.The most frequently used criterion was colour, reflecting the fact that old archives also include black and white photos, which are seldom wanted nowadays.The creation year of the photo was used frequently as well as expressions "current shots" or "old photos".A few requests were refined with cyclic time expressions, for example "a summer photo of Lake Näsijärvi".However, according to the archivists the season of the year is an implicit criterion.Even though archive photos are used, the journalists want to give an impression of actuality.Hardly ever are photos of other than the current season published.Shooting distance was also a common criterion and close-ups were most often desired.Horizontal or vertical direction of a photo was an uncommon criterion.
Refinements relating to the subject of a photo were attributes of main objects (e.g."fish, extremely big", "judge wearing a wig"), places involved ("cow in the pasture") and action taking place (e.g."the cork of the champagne pops").The expression "symbolic" was also used frequently, especially in the category of themes (e.g. a symbolic photo of torture").In all, criteria varied from concrete to highly subjective.For instance, one journalist requested for "a photo of a fur animal who doesn't want to relinquish its fur, wild or in captivity"!
The comparison of requests and photos finally published in the newspaper showed that not all the criteria journalists expressed in their requests need to be met to get the photo published.Furthermore, the comparison revealed that photos may be used in various ways, for example, as a model for a drawing, which was not expressed in the requests.

Distribution of Search Topics in Observed Illustration Processes
The search topics occurring during the observed illustration processes were categorised in the same way as requests (Table 3).Most search topics (56%) concerned objects and especially persons, like the actress Julia Roberts.The category 'other objects' included only two search topics: a named building (the University of Tartto) and a named jazz band.Five topics fell into the category of themes, for instance, "holidays in the south".The seven search topics in the category 'background information' mainly concerned news events.
The small share of requests concerning news events (only one of current news in the sample) suggests that these photos are quite easily found in the archive (mainly among the daily photos in the cache archive) by the end users themselves.On the other hand, the journalists observed did not search for photos of objects defined by common nouns although one fifth of requests fell into this category.Few journalists expressed interest in such photos in the idea stage but they did not put these ideas into practice.Named persons were requested and searched for at an equal rate.Known photo --

Table 3. Distribution of Search Topics in Illustration Processes Observed
The journalists interviewed stated that the types of search topics we defined are common and that they had had such search topics recently.However, in the interviews the journalists emphasised photos of abstract themes.These received much attention, presumably because journalists consider searching for these problematic while searching for photos of specific objects or news events was regarded as quite simple.The journalists also mentioned topics that are quite subjectively interpretable from photos.These concerned atmosphere and feelings, e.g."love" or "a photo of a child's anxiety".Such topics were not found among the requests sent to the archivists.Their absence may be explained by the journalists' statements that it is easier to search for this kind of photo by themselves than to try to explain the topic to the archivist and "after an hour you get a stack of photos which are totally wrong".

Searching Behaviour
"There may always be a better photo beyond the next click" Different types of search topics produced different kinds of searching behaviour.General search topics easily led to multiple sessions, various queries and heavy browsing.Specific needs led more likely to just one or two querying and browsing sessions.However, heavy browsing was necessary when the retrieved sets were large, e.g.300 photos of Bill Clinton.
When a search consisted of multiple sessions, each session was an attempt to find useful images using a particular viewpoint or strategy.We found that the journalists did not pay much attention to selecting search keys or formulating a query.Different options or approaches recognised by the user were tested on a trial-anderror basis.The first search keys were often picked from the article to be illustrated.The journalists tended to make single-word or single-phrase queries.Both English and Finnish were used in searching.Because the number of retrieved thumbnails was usually large, it was common to restrict the query by date or by photo source.Use of date also reflects the fact that current photos are most desirable for the newspaper.Restricting a query by the photo source relates to cost factors.Different photo sources have different charging policies.
Most recorded queries were based on proper names of persons, countries, cities or buildings.Proper names were commonly used for searching for photos of given persons, places or buildings as well as for photos of news events.These were usually searched for and found by the scene of the event or by the participants' names.
Journalists also tended to convert general photo needs into more tangible queries.The journalists interviewed confirmed that they preferred proper names for terms referring to abstract concepts.They explained that it was easier to find photos through the names of persons, places or events and that sometimes they could not find photos when applying abstract concepts.The selection of search keys for abstract concepts was considered difficult.Journalists presumed that there were photos relating to these abstract themes in the archive, but they just had not discovered the right search keys to retrieve them.Sometimes they suddenly ran across photos of themes they had searched for before; sometimes they could not find a photo they had recently seen in the archive.Journalists claimed that they did not even consider illustrating with photos of themes when they had not time for a lengthy searching process.
Journalists described the alternative search strategies applied to find photos for general photo needs.They might try to remember earlier published photos of the current theme and search for a known photo (or a photo in the same series) with search keys relating to the subject of the article.Another strategy was to query by words relating to the background information of a photo.For instance, one of the journalists interviewed explained that he used search keys like 'rubber currency' or the names of some Indian tribes to find a photo of a rain forest.
Browsing was used a lot since single-key queries often retrieved large sets of thumbnail images.Journalists claimed that the general search keys and heavy browsing reflect the conception that very narrow queries exclude the best photos from the set.Journalists stated that they would rather have a set of 50 thumbnails than a set of five thumbnails.They also found that browsing often required less effort and time than formulating a refined query.
However, heavy browsing was not always considered a desirable choice.Sometimes journalists did not find any other way to locate a suitable photo.For example, two journalists mentioned searches for photos connected to places.These journalists had recently illustrated feature articles; the first of which was about travelling by hot-air balloon in Africa while the second dealt with Britain and British people.Both journalists had made queries of the same type: "africa" and "britain".The first journalist got over 500 thumbnail images and did not have time to browse through them.The journalist querying "britain" got some 2000 thumbnail images and, because he had time, he browsed through them.The result set of the latter searcher included, for example, all photos shot somewhere in Britain and most of these were obviously irrelevant.
The browsing threshold varied according to the journalist and the work situation: most journalists reformulated the query (usually by restricting by date) when the retrieval set was over 100 photos.Willingness to browse was dependent not only on time, but also on motivation.When a journalist truly wants to find the perfect photo for a particular article (s)he might browse a surprisingly large number of thumbnails.
Browsing was particularly common for journalists working in the foreign affairs and sport sections.These sections used a lot of photos of news events sent to the cache archive by international photo agencies.Journalists browsed the cache every now and then to check the newly arrived photos.Browsing was a way to gain insight into the daily photos.In news sections it was also common that queries became more specific during the day, because journalists learned the words by which the desired photos were described in the captions.
During each session, one or more candidate photos were selected.They were printed on paper or kept in mind and re-retrieved later.Candidate photos were compared and finally one was selected.The final selection might occur much later than the search and it was not always done by the person who first selected the candidate photos.One journalist explained his selection method as follows: "If I have some six photos, I may put them side by side on the table and leave them there for the rest of the evening (...) then it just turns out that this is the one.In a way they drop out automatically." Journalists did not exploit all the options the system offered for searching.First, they did not exploit the database field structure, which is one reason for the large and unfocused result sets.For example, the journalist searching for photos of Tom Cruise got as many photos of Cruise's actress wife as of the actual search topic.Second, journalists did not exploit the thesaurus, which led to problems when they were searching for photos of themes.The journalists could not find terms used in indexing and therefore missed relevant photos.The utilisation of database field structure and thesaurus required the use of the advanced search form, which the journalists considered too complicated to use.

Selection Criteria
One goal of the study was to explore the criteria journalists apply in the selection of photos, i.e. when assessing the relevance of retrieved photos.It is widely agreed that relevance is a multidimensional concept and that the user's relevance criteria are situational and dynamic in nature.[E.g.21].Topicality is identified as one criterion among others.However, it can be seen as the core of relevance, the first step users take in their relevance judgements [8].
This study indicates clearly the diversity of the relevance criteria which journalists apply in selecting photos and the situational nature of their relevance judgements.The first criterion journalists applied was topicality.As stated earlier, the queries made by the journalists were often too general to properly restrict the set of photos in terms of topicality.The associated caption text seemed to be the most important source of information in judging the topical relevance: even though the photo looked relevant, the journalist needed to know what was really happening in the photo and what its background was.
After the topical relevance was ensured journalists applied more criteria.The technical and contextual attributes of the photo were assessed.Some of these criteria were applied quite generally.For example, the preference for technically good, not recently published and current photos (sometimes, though, the journalists were looking particularly for old photos) is likely to be common for most journalists.The cost of the photo was also an important criterion.It is dependent on the photo source, which the journalists usually checked when rating photos.
The selection criteria relating to the visual attributes of photos were closely connected to the individual illustration tasks.Sometimes the article type and style demanded photos in a particular style.For example, anniversaries, appointments, etc. were illustrated with "passport photos", as journalists called formal portraits.In some cases the persons in the photo should not be recognisable.Often the message the journalist wished to convey through the photo was stated explicitly as a selection criterion.Journalists felt that these criteria were quite difficult to verbalise.They explained their choices either abstractly "the photo provokes thoughts" or more concretely "because there are fleeing people" or "the photo underlines that she (Sharon Stone) is a star".They used expressions like dramatic, surprising, effective, shocking, funny, expressive, humanity and threat as explanations for their selections.Photos of persons were selected, for example, on the following basis: "lively", "attractive look", "funny gesture" and "you can hear what they think".
The journalists paid much attention on the freshness of the photos.They rejected photos constantly by stating that they were "typical".According to journalists typical photos are of "politicians and handshaking", they are "portraits" and photos where "persons pose".
In the last selection phase, when the journalist selected the photo to be published from the candidate photos the selection was based solely on the visual attributes of the photo.At this point, candidate photos were supposed to be topically, technically and contextually acceptable.The aesthetic attributes, for example colour and composition, were also said to play an important role at this stage.The critical criteria for rejecting or accepting a photo depended on earlier selections.A photo already chosen on a page or nearby pages restricted the possibility of using other similar photos.Photos used recently in the newspaper were also kept in mind.According to journalists the goal is to make the illustration of the page attractive, balanced and dynamic.To achieve this there should be photos of different types (horizontal and vertical photos, portraits, group photographs, action, themes...) and with different visual features.Even small details mattered.The direction of a man's movement or look in a photo could lead to the selection of one photo and to the rejection of another.
The criteria and the importance of different criteria seem to depend on the work situation.The factors affecting the selection criteria are related to the article, the lay-out, the page as a whole, the section and its illustrative style, the whole newspaper and its editorial policy and the ethical rules journalists follow.When asked, the journalists interviewed could mention only one criterion which is always crucial when selecting photos: the technical quality of the photo.

Conclusions and Discussion
The objectives of this study were to ascertain the requirements for a digital newspaper photo archive where journalists conduct searches by themselves.The whole illustration process from the creation of ideas to the selection of photos was investigated.This made it possible to ascertain the type of photo needs typical for journalists, their searching behaviour and the criteria they apply in selecting photos.As far as we know this is the first published study where end-users of a digital photo archive were investigated while searching for and selecting images in real work situations.The study was made comprehensive by exploring entire illustration processes as well as tasks and work situations related to these processes.On the other hand it consisted of a few unique cases in one newspaper, which makes us cautious in making generalisations.This preliminary study outlined an overall picture of a vast field and, though, much remains for further research projects.

Needs for Photos
The analysis of photo requests sent to the archivists and topics searched for by the journalists themselves emphasised the role of photos representing concrete objects like named persons, buildings or places.The results support the findings of Enser [4][5] that specific photo needs dominate the use of photo archives.In end-user searching, the share of topics dealing with recent news events and abstract themes was larger than in requests sent to the archive.On the other hand, the journalists did not search for photos of concrete objects defined by common nouns even though a fifth of requests was of this type.
The interviews provided some explanations for the searching and requesting behaviour adopted.Photos of named persons are often needed.The journalists considered proper name queries easy to do.This is obvious, since names (especially the names of persons) are quite specific and standard search keys and they are already indexed exhaustively in captions.Occasionally proper name queries retrieve excessively large thumbnail sets because newspaper archives typically hold many photos of certain public figures.Photos of current news events are easily found since these photos are searched for from the small cache archive.The ease of proper name querying may also explain the tendency to use proper names as search keys for photo needs not directly dealing with named objects.
The journalists emphasised the importance of photos of themes and would have liked to use them more.Searching for photos in this category was regarded as difficult.However, the share of searches focusing on themes was greater in self-made searches than in requests sent to the archive.Especially if the atmosphere or feelings associated to a photo were essential in an illustration task, the journalists expressed lack of confidence in mediated searches.The greater share of self-made theme searches may also be due to the iterative nature of end-user searching [see also 7].When journalists explain their needs to archivists the search topics become more tangible while searching by themselves makes it possible to be more explorative and rely more on browsing.
To find photos of generic objects (for instance, anonymous persons, animals, plants) the journalists seemed to trust the help of archivists more since they commonly requested such photos from the archivists.In the illustration processes observed a few journalists expressed ideas concerning such photos but they did not ltimately put these ideas into practice.An explanation for the difference in trusting the archivists is that the problem in searching for photos of concrete objects is mainly associated with guessing the right search keys.The archivists are more likely to succeed since they selected and indexed the photos in the collection.In searching for photos of themes, and especially when symbolic value, atmosphere and feelings are essential, the nature of the illustration task and the way the photos are interpreted is crucial.Very different photos may be relevant in a particular context and attempts to transfer the ideas to the archivist may be seen as a waste of time.

Querying and Browsing
The user interface of the Aamulehti photo archive was too complex for the non-professional searchers, creating an obvious bottleneck at the query formulation stage.The journalists used only the simple search form giving them two options: (1) input a Boolean query and (2) limit by date.The latter option was supported by predefined menu.The advanced search form offered the whole set of fields supported by the system in entering and selecting specified search keys.
The archivists classified the photos by subject and theme using a thesaurus.The thesaurus was interfaced to the advanced search form as a hierarchical selection menu.Unfortunately, the journalists did not use the advanced search form, which was deemed too complex.This fact may partly explain why the end-users felt uncomfortable with searches focused on themes or objects defined by common nouns.
The journalists mainly entered single word or single phrase queries.This worked quite well if photos of named objects were needed, since in most cases the resulting sets of thumbnail images were small enough for browsing.In other types of needs, single word queries did not always give an appropriate starting point for browsing.Either the search key did not find the relevant photos or the set of retrieved thumbnail images was too large for efficient browsing.
The results of this study suggest that browsing is an essential strategy in retrieving photos, supporting the earlier observations of Batley [1].First, some criteria used in selecting photos are difficult to express by words but are easily applied when the photo is seen.For instance, they are based on a high level visual interpretation of a photo.Second, non-professional searchers have difficulties in formulating focused queries.Browsing is a method to compensate for these difficulties.Third, photo selection criteria depend on a particular work situation.These aspects are difficult to predict in indexing.Fourthly, browsing of thumbnail images is quite efficient and the journalists feel comfortable with browsing.Thus, in photo archives search capabilities should be based more on browsing than querying features.
Although the journalists prefer browsing, there is a limit to the number of thumbnail images worth browsing.The present photo archiving systems (the system investigated as an example) do not sufficiently support browsing.Querying is too complicated for end-users, resulting in large or incorrectly focused query sets.There is no mechanism for structuring the set of retrieved thumbnail images.

Challenges for Feature-based Indexing and Retrieval
The goal of this study was to develop ideas for the potential applications of feature-based indexing and retrieval methods in newspaper type photo archives.These merit investigation, since the development of traditional indexing methods has limited chances of solving the problems of end-user access.This pessimistic view is taken because manual indexing seems to require more resources than are and will be available.The authors do not claim that the development of traditional methods is useless, on the contrary.Some ideas on this subject will be reported elsewhere (forthcoming).
Automatic visual indexing and retrieval methods are currently working reliably on low-level attributes of images such as colour, texture, shape and spatial location.In the semantic-level retrieval, i.e. retrieval of object types or individual objects, progress has been more limited.Eakins [3] sees this approach as feasible in fairly restricted domains.
At first sight, the results of this study are not very favourable for feature-based indexing and retrieval methods in this application area: (1) Low-level visual features were not expressed as the main search criteria in any of the search topics analysed.(2) In nearly half of the search topics, the main focus was not on the objects seen in the photo and could not be extracted automatically from the photo.The focus was on the background information (e.g. a particular news event) or abstract themes requiring high human reasoning [3].(3) In most searches for photos of named objects, text-based querying worked well.
A more thorough analysis reveals that the situation is not so poor for automatic visual methods.The journalists in our study used a digital photo archive supporting traditional textual query operations.Thus it is difficult to predict how they would change their searching behaviour if they could execute queries based on visual similarity of photos.However, it is difficult to envisage common uses for pure visual query without textual search keys combined.The first problem is how to formulate a visual query.Usually this requires that something desired, a query image, has already been found.The size of newspaper image collections and the heterogeneity of images in these collections might make this approach problematic.However, querying by image might work in some special cases, when photos searched for have some very distinct features (for instance, photos of sailing boats at sea).The authors take the view that the main chances of applying automatic visual methods are associated with the browsing stage of photo searching.
The easiest way to make the query stage more effective in those topic categories which caused difficulties to the journalists is to apply traditional concept-based indexing and classification methods and develop user interfaces to support browsing.The selection of theme-based sub-sets from the database for browsing should be made convenient.If browsing tools are efficient users will be able to process larger query sets and rough classification schemes can be applied, thus making manual indexing a straightforward task.
Automatic feature-based methods could be applied in structuring the set of thumbnail images retrieved by keywords or index terms.The point is, within the retrieved set, to group visually similar photos together and organise the output by these groups.In that way the user can see different photo categories contained in the retrieved set.For instance, if the user has made a query using a person's name, the visual organizer could group the "passport" photos, single portraits, photos also containing other people and photos with special backgrounds.
Our prediction is that automatic visual methods combined with textual querying could offer a more solid base than visual methods alone for developing practical applications for environments similar to newspaper photo archives.Current methods of feature-based indexing could be applied for grouping query results to make browsing more effective.In the future we shall test this approach and investigate how useful journalists find these groupings and how they exploit this feature in browsing.

Figure 1 :
Figure 1: Simple and Advanced Search Forms of the NewsLink System in Aamulehti (texts translated in English by the authors)