ACM Multimedia 97 - Electronic Proceedings

November 8-14, 1997 Crowne Plaza Hotel, Seattle, USA

Supporting Similarity Queries in MARS

Michael Ortega
Department of Computer Science,
University of Illinois at Urbana Champaign
1304 West Springfield Av.
Urbana, IL, 61801
(217)333-3722
ortega-b@cs.uiuc.edu
http://www.uiuc.edu/ph/www/ortega-b
Yong Rui
Beckman Institute,
University of Illinois at Urbana Champaign
405 N. Mathews
Urbana, IL, 61801
(217)244-1089
yrui@ifp.uiuc.edu
http://www.uiuc.edu/ph/www/rui
Kaushik Chakrabarti
Department of Computer Science,
University of Illinois at Urbana Champaign
1304 West Springfield Av.
Urbana, IL, 61801
(217)333-3722
kaushikc@cs.uiuc.edu
http://www.uiuc.edu/ph/www/kaushikc
Sharad Mehrotra
Department of Computer Science,
University of Illinois at Urbana Champaign
1304 West Springfield Av.
Urbana, IL, 61801
(217)333-1424
sharad@cs.uiuc.edu
http://www.uiuc.edu/ph/www/sharad-m
Thomas S. Huang
Beckman Institute,
University of Illinois at Urbana Champaign
405 N. Mathews
Urbana, IL, 61801
(217)333-6912
huang@ifp.uiuc.edu
http://www.ifp.uiuc.edu/

 ACM Copyright Notice


Abstract:

To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) in our group at the University of Illinois [HMR96]. In this paper, we concentrate on the retrieval subsystem of MARS and its support for content-based queries over image databases. Content-based retrieval techniques have been extensively studied for textual documents in the area of automatic information retrieval [SM83,CCH92]. This paper describes how these techniques can be adapted for ranked retrieval over image databases. Specifically, we discuss the ranking and retrieval algorithms developed in MARS based on the Boolean retrieval model and describe the results of our experiments that demonstrate the effectiveness of the developed model for image retrieval.

Keywords

MARS, Content based retrieval, Image database


Table of Contents


Introduction

While advances in technology allow us to generate, transmit, and store large amounts of digital images, and video, research in content based retrieval over multimedia databases is still at its infancy. Due to the difficulty in capturing the content of multimedia objects using textual annotations and the non-scalability of the approach to large data sets (due to a high degree of manual effort required in defining the annotations), the approach based on supporting content-based retrieval over visual features has become a promising research direction. This is evidenced by several prototypes[SC94,PPS94,MM95,MCO97] and commercial systems[F95,BFG] that have been built recently. Such an approach can be summarized as follows:

  1. Computer vision techniques are used to extract visual features from multimedia objects. For example, color, texture, shape features for images, and motion parameters for video.
  2. For a given feature, a representation of the feature and a notion of similarity measure are determined. For example, color histogram is used to represent color feature, and intersection distance is used for similarity measure.
  3. Objects are represented as a collection of features and retrieval of objects is performed based on computing similarity in the feature space. The results are ranked on the similarity values computed.

Since automatically extracted visual features (e.g., color, texture etc.) are too low level to be useful to the users in specifying their information needs directly, content-based retrieval using visual features requires development of effective techniques to map higher-level user queries (e.g., retrieve images containing field of yellow flowers) to visual features. Mapping a user's information need to a set of features extracted from textual documents have been extensively studied in the information retrieval literature [SM83]. This paper describes how we have generalized these approaches for content-based retrieval over image features in the Multimedia Analysis and Retrieval System (MARS) being developed in our group at the University of Illinois.

Information Retrieval Models

Before we describe the retrieval approach used in MARS, we briefly review the retrieval process in modern information retrieval (IR) systems [SM83]. In an IR system, a document is represented as a collection of features (also referred to as terms). Examples of features include words in a document, citations, bibliographic references, etc. A user specifies his information needs to the system in the form of a query. Given a representation of the user's information need and a document collection, the IR system estimates the likelihood that a given document matches the users information needs. The representation of documents and queries, and the mechanism used to compare their similarity forms the retrieval model of the system. Existing retrieval models can be broadly classified into the following categories:

Boolean Models
Let be the set of terms in a collection. Each document is represented as a binary-valued vector of length k where the element of the vector is assigned true if is assigned to the document. All elements corresponding to features/terms not assigned to a document are set to false. A query is a Boolean expression in which operands are terms. A document whose set of terms satisfies the Boolean expression is deemed to be relevant to the user and all other documents are considered not relevant.

Vector-based Models
Let be the set of terms in a collection. Both documents and queries are represented as a vector of k dimensions where each element in the vector corresponds to a real-valued weight assigned to a term. Several techniques have been proposed to compute these weights, the most common being tf.idf weights [SM83], where tf refers to the term frequency in the document, and idf is a measure proportional to the inverse of its frequency in the collection. Similarly, many similarity measures have been proposed between the document and the query [SM83] the most common being the cosine of the angle between the document and the query vectors.

Probabilistic Retrieval Models
In these models the system estimates the probability of relevance of a document to the user's information need specified as a query. Documents are ranked in decreasing order of the relevance estimate.

Given a document and a query, the system computes which represents the probability that the document d will be deemed relevant to query q. Using Bayes' theorem and a set of independence assumptions about the distribution of terms in documents these probabilities are computed and the documents are ranked based on these probabilities.

Traditionally, commercial IR systems have used the Boolean model. Systems based on Boolean retrieval partition the set of documents into either being relevant or not relevant and do not provide any

estimate as to the relevance of any document in a partition to the users information need. To overcome this problem, many variations of the term-weighting models and probabilistic retrieval models that provide ranked retrieval have been developed. The boolean model has also been extended to allow for ranking in the text domain ( p-norm model [SFV83]). Vector-based models and probabilistic retrieval models are in a sense related and provide similar performance. The primary difference being that while the vector models are ad hoc and based on intuitive reasoning, probability based models have a more rigorous theoretical base.

Overview of the Retrieval Approach used in MARS

 

With a large number of retrieval models in the information retrieval literature, MARS attempts to exploit this research for content-based retrieval over images. In MARS, an image is represented as a collection of low-level image features (e.g., color features, texture features, shape and layout features extracted automatically) as well as a manual text description of the image. A user graphically constructs a query by selecting certain images from the collection. A user may choose specific features from the selected images. For example, using a point-and-click interface a user can specify a query to retrieve images similar to an image A in color and similar to an image B in texture. A user's query is interpreted as a Boolean expression over image features and a Boolean retrieval model (adapted for retrieval over images) is used to retrieve a set of images ranked based on the degree of match. Boolean queries provide a natural interface for the user to formulate and refine conceptual queries to the system using lower-level image features. For example, high level concepts like fields of yellow flowers or a sunset by a lake can be expressed as a boolean combination of lower level features. Such a mapping of high to low level concepts can be provided explicitly by the user or alternatively learned via user interaction by a relevance feedback mechanism.

Being able to support such conceptual queries is critical for the versatility of large image databases.

To see how MARS adapts the Boolean model for image retrieval, consider first a query Q over a single feature (say color represented as a color histogram). Let be the color histogram of image I and be the color histogram specified in the query and be the distance between the two histograms. The simplest way to adapt the Boolean model for image retrieval is to associate a degree of tolerance with each feature such that:

Given the above interpretation of a match based on a single feature , an image I matches a given query Q if it satisfies the Boolean expression associated with Q. For example, let , where is some color histogram, and is a texture representation. Image I matches Q if its color and texture representations are within the specified tolerances of and .

While the above straightforward adaptation of Boolean retrieval can be used, it suffers from many potential problems. First, it is not clear how the degree of tolerance , for a given feature , should be determined. If an a priori value is set for , it may result in poor performance -- two images and at the distance of and from a query q, where , are essentially very similar will be considered as very different by the system. While will be considered relevant to the query, will be considered as not relevant. This problem may be alleviated if instead of fixed a priori values for tolerance for a given feature, was computed dynamically for each query based on the image collection. However, the approach would still suffer from the fundamental restriction of the basic Boolean retrieval in that it produces a unranked set of answers.

To overcome the above discussed problems, in developing MARS, we have adopted the following two extensions to the basic Boolean model that produce ranked list of answers.

Fuzzy Boolean Retrieval
distance between the image and the query feature is considered to be the degree of membership of the image to the fuzzy set of images that match the query feature. Fuzzy set theory is used to interpret the Boolean query and the images are ranked based on the their degree of membership in the set.
Probabilistic Boolean Retrieval
distance between the image and the query feature is considered to be a measure of probability that the image matches the user's information need. Feature independence is exploited to compute the probability of an image satisfying the query which is used to rank the images.

Unlike the basic Boolean model, both the fuzzy and probabilistic Boolean models provide a ranked retrieval over the image sets. Furthermore, as one expects from Boolean models, both the fuzzy and the probabilistic Boolean models rank images based on queries corresponding to semantically equivalent Boolean expressions in the same order.

The rest of the paper is developed as follows. In Section 2, we describe the set of basic image features used in MARS including technique used to measure similarity between images based on a single feature. Section 3 is devoted to defining the Boolean retrieval models used in MARS and discussing the issues related to their efficient implementation. Section 4 discusses normalization of the low level features necessary to combine with each other. Experimental results that show the retrieval effectiveness of the developed models are discussed in Section 5 and finally Section 6 offers concluding remarks including a discussion of the work we are pursuing in the future.

Table of contents


Image Features Used in MARS

The retrieval performance of an image database is inherently limited by the nature and the quality of the features used to represent the image content. In this section, we briefly describe the image features used in MARS and the corresponding distance functions used for comparing similarity of images based on the features. The discussion is kept short since the purpose of this section is only to provide a background for discussing issues related to normalization and ranked retrieval based on Boolean queries. Detailed discussion on the rationale and the quality of the chosen features can be found in references [FFN93,T78,MCO97,M88,RSH96].

Color Features: To represent color, we choose the HSV space due to its de-correlated and uniform coordinates. The color feature is represented using a color histogram. Since V coordinate is easily affected by the lighting condition, we use only HS coordinates to form an two-dimensional histogram. To measure distance between two color histograms, we compute the amount of non-overlap between the two histograms which is defined as follows:

where and are the two histograms and N is the number of bins used in the histogram. The above intersection based measure of distance provides an accurate and efficient measure of (dis)similarity between two images based on their color [SC94].

Texture Features: To represent texture of an image, a CCD ( coarseness, contrast, and directionality) texture feature representation has developed in [T78,FFN93]. Coarseness is a measure of granularity of the texture, i.e. fine vs coarse. Contrast represents the distribution of luminance of the image and is defined as

where . Here and are the standard deviation and the fourth central moment of the luminance, respectively. Directionality is a measure of how ``directional'' the image is. Using the above definitions of CCD, texture is represented as a set of three numbers. A problem with the above described CCD features is that it is sensitive to noise. We have developed and implemented an enhanced version of CCD by using histogram-base features [MCO97]. Experimental results show that our enhanced version is much more robust and accurate than the original definition of CCD [MCO97].

Shape Features: Shape of an object in an image is represented by its boundary. A technique for storing the boundary of an object using modified Fourier descriptor (MFD) is described in [RSH96]. To measure similarity between two shapes, the Euclidean distance between two shape features can be used. In [RSH96], however, we proposed a standard deviation based similarity measure that performs significantly better compared to the simple Euclidean distance. The proposed representation and similarity measure provide invariance to translation, rotation, and scaling of shapes, as well as the starting point used in defining the boundary sequence.

Layout Features: The features discussed so far only describe the global properties of the image. Besides these global properties, MARS also supports features that describe the layout of color and texture in an image. To extract the layout features, the whole image is first split into sub-images. Then color and texture features are extracted from each sub-image and stored in the database. For color layout, a two-dimensional HS histogram is constructed for each sub-image, similar to the procedure described earlier.

Since the enhanced CCD representation uses a histogram based measure, it is not suitable for texture layout. This is because the small sub-images may not produce good histograms. Instead, a wavelet-based representation is used, in which the mean and the standard deviation at 10 sub bands are used to represent the texture of each sub-image.

The Euclidean distance is used to compute the texture similarity distance for the corresponding sub-images. A weighted sum is then used to form the texture layout distance.

Table of contents


Retrieval Models Used In MARS

This section will discuss how to support the Boolean query based on the simple feature distances. MARS supports two mechanisms for generating the ranking of Boolean queries -- the first is based on the fuzzy interpretation of the distance and the second is based on a probabilistic interpretation. In the discussion below, we will use the following notation. Images in the collection are represented as . Features over the images are represented as , where is used to represent both the name of the feature as well as the domain of values that the feature can take. For example, say is the color feature which is represented in the database using an HS histogram. In that case, is also used to represent the set of all the color histograms. A query is a Boolean expression , where are variables. Each variable takes its value from some domain . For example, a query, may be a query where has a value equal to the color histogram associated with image and has a value of the texture feature associated with . The query Q then represents a desire to retrieve images whose color matches that of image and whose texture matches that of image .

Fuzzy Boolean Model

Let be a query and I be an image. In the fuzzy retrieval model, a query variable is considered to be a fuzzy set of images I such that the distance between the variable and the corresponding feature in I is used to compute the degree of membership of I in the fuzzy set. That is:

where represents the distance measure between and the corresponding feature in the image I. With the above interpretation of the distance measure between the image feature and the feature specified in the query, a Boolean query Q is interpreted as an expression in fuzzy logic and fuzzy set theory is used to compute the degree of membership of an image to the fuzzy set represented by the query Q. Specifically, the degree of membership for a query Q is computed as follows:

And:
Or:
Not:

Consider for example a queryQ:

The membership of an image I in the fuzzy set corresponding to Q can be determined as follows:

The value in the above formula is determined using the equation (3). Once the membership value of the image in the fuzzy set associated with the query have been determined, these values are used to rank the images, where a higher value of represents a better match of the image I to the query Q. We refer to the fuzzy Boolean model as model F1.

Probabilistic Boolean Model

 

Let be a query and I be an image. In the probabilistic Boolean model, the distance between the query variable and the corresponding feature in the image is used to compute the probability of the image I matching the query variable , denoted by . These probability measures are then used to compute the probability that I satisfies the query (denoted by ) which is in turn used to rank the images. To be able to compute , an assumption of feature independence is made. That is, we assume that for all variables such that the domain of is not the same as the domain of , the following holds:

Before the probabilistic Boolean model can be used, however, we need to map the distance measure between a query variable and the image into a measure of probability that the image matches the query variable. There are many ways in which such a mapping can be achieved. Three such mechanisms (depicted in Figure 1) implemented in MARS are described below.

  
Figure 1: Distance to probability transformation functions.

 

 

 

It is easy to verify that for each of the above interpretations of probability, the range of is [0, 1]. In addition, when equals 0 (best match), ; when equals 1 (worst match), . Also, is a monotonic decreasing function of , which fits the physical meanings of and .

The choice of the mapping has an impact on the image ranking and as a result on the retrieval performance. We will discuss this further in the section on experimental results. We will refer to the three different probabilistic Boolean models resulting from the above equations as models P1, P2, and P3 respectively.

Once distance between image and the query variable has been converted to a probability measure, we next need to estimate the probability that the image satisfies the Boolean query , denoted by . If Q is a disjunction , following the laws of probability, can be estimated as follows:

Since all probabilities are conditioned on the image I, we will omit this for brevity from now on. Similarly, can be computed as follows:

To compute conjunction queries, i.e. it is desirable that and are independent so that . However even though features are independent, the sub queries and may not be independent. To see this consider for example, a query , where and . In such a case since and are not independent, cannot be replaced by the product of and .

The above motivates us to convert the query into a disjunctive normal form (DNF) in which a query is represented as a disjunction of conjunctsgif. Once the query has been converted to a DNF expression, we can compute the probability of an image satisfying the query based on the probability that the image satisfies query feature variables. We illustrate how computation is done using an example. Consider a query .

 

 


: Example Derivation

In the derivation shown in figure 2, we have made the assumption that each pair of variables are over independent features. For example, may be a color histogram and may be the shape feature. Notice that, in general, the two variables may be over the same feature space. For example, in a query , and may correspond to two color histograms. Our retrieval results (see Section 5) show that even if these variables are considered as independent, the resulting retrieval performance is quite good. Developing a feature dependence model and incorporating it in the system may improve retrieval performance further and is an important extension to our current work.

Finding the Best N Matches

While the above developed Boolean retrieval models provide a mechanism for ranking the given images based on the query, for the approach to be useful, techniques must be developed to retrieve the best N matches efficiently without having to rank each image. Such a technique consists of two steps:

The images can be efficiently retrieved ranked based on a single feature by maintaining an index based on that feature. Since all features in MARS are represented as feature vectors with multiple feature elements, retrieval of images ranked based on a single feature requires search using the values of all the elements in the feature vector. For example, the color feature in MARS is a 64-element vector (8 8 histogram). An alternative is to use several single-attribute indexes, one for each feature element of the feature vector. This is extremely inefficient in terms of the number of I/O accesses needed for the search. The other alternative is to use one or more of the existing multidimensional data structures [N84,Gut84,D.90]. However, these data structures do not scale to the high dimensionality of the feature vectors used in MARS. Moreover, since these data structures can only be used to index Euclidean feature vectors, they cannot be used for features defined over non-Euclidean distance measures (e.g. color histograms for which the intersection distance metric is used). Instead MARS uses an incremental clustering approach described in [RCM97].

Once efficient ranked retrieval based on a single feature has been achieved, the ranked lists are normalized and then the normalized ranked lists are merged into a ranked set of images corresponding to a query. The normalization process used in MARS is described in the following section. To merge the normalized ranked lists, a query is viewed as a query tree whose leaves correspond to single feature variable queries. Internal nodes of the tree correspond to the Boolean operators. Specifically, nodes are of either of the three forms: which is a conjunction of positive literal; , which is a conjunction consisting of both positive and negative literals; and which is a disjunction of positive literals. The query tree is evaluated as a pipeline from the leaf to the root. Each node in the tree provides to its parent a ranked list of images, where the ranking corresponds to the degree of membership (in the fuzzy model), or the measure of probability (in the probabilistic model). For example, in the fuzzy model, a node n in a tree provides to its parent a ranked list , where corresponds to the query associated with the node n.

The algorithms used to combine the ranked lists of images from the child to generate a list for parents depend upon the retrieval model used. The algorithms for the fuzzy model are discussed in the Appendix. The corresponding algorithms for the probabilistic model are more complex and not included due to space restrictions. They can be found in [OMR+97].

Table of contents


Feature Sequence Normalization

The normalization process serves two purposes:

  1. It puts an equal emphasis on each feature element within a feature vector. To see the importance of this, notice that in texture layout representation, the feature elements may be totally different physical quantities. For example, one feature can be a mean while the other can be a standard deviation. Their magnitudes can vary drastically, thereby biasing the Euclidean distance measure. This is overcome by the process of intra-feature normalization.
  2. It maps the distance values of the query from each atomic feature into the range [0,1] so that they can be interpreted as the degree of membership in the fuzzy model or relevance probability in the probability model. While some similarity functions return a value in the range of [0, 1], e.g. the color histogram intersection, others do not, e.g. the Euclidean distance used in texture layout. In the latter case the distances need to be converted to the range of [0, 1] before they can be used. This is referred to as inter-feature normalization.

Intra-feature Normalization

This normalization process is only needed for vector based feature representation, as in the case of wavelet texture feature representation. In other cases, such as color histogram intersection, where all the feature elements are defined over the same physical domain, no intra-feature normalization is needed.

For the vector based feature representation, let be the feature vector, where N is the number of feature elements in the feature vector and be the images. For image , we refer the corresponding feature F as .

Since there are M images in the database, we can form a feature matrix , where is the jth feature element in feature vector . Now, each column of F is a length-M sequence of the jth feature element, represented as . Our goal is to normalize the entries in each column to the same range so as to ensure that each individual feature element receives equal weight in determining the Euclidean distance between the two vectors. One way of normalizing the sequence is to find the maximum and minimum values of and normalize the sequence to [0, 1] as follows:

where and refer to the smallest and the biggest value of , . Although simple, this is not a desirable normalization. Considering the sequence {1.0, 1.1, 1.2, 1.3, 100}, if we use (8) to normalize the sequence, most of the [0, 1] range will be taken away by a single element 100, and most of the useful information in {1.0, 1.1, 1.2, 1.3} will be warped into a very narrow range.

A better approach is to use the Gaussian normalization. Assuming the feature sequence to be a Gaussian sequence, we compute the mean and standard deviation of the sequence. We then normalize the original sequence to a N(0,1) sequence as follows:

It is easy to prove that after the normalization according to (9), the probability of a feature element value being in the range of [-1, 1] is 68%. If we use in the denominator, according to the 3- rule, the probability of a feature element value being in the range of [-1, 1] is approximately 99%. In practice, we can consider all of the feature element values are within the range of [-1,1] by mapping the out-of-range values to either -1 or 1. The advantage of this normalization process over (8) is that the presence of a few abnormally large or small values does not bias the importance of the feature element in computing the distance between feature vectors.

Inter-feature Normalization

Intra-feature normalization ensures equal emphasis of each feature element within a feature. On the other hand, inter-feature normalization ensures equal emphasis of each feature within a composite query.

The feature representations used in MARS are of various forms, such as vector based (wavelet texture representation), histogram based (histogram color representation), irregular (MFD shape representation).

To map the distance computations of the heterogeneous features to the same scale and into the range [0,1] the following inter-feature normalization process is used for each feature .

  1. For any pair of images and , compute the similarity distance between them:

    where and are the feature representations of images and .

  2. For the possible distance values between any pair of images, treat them as a value sequence and find the mean m and standard deviation of the sequence. Store m and in the database to be used in later normalization.
  3. After a query Q is presented, compute the raw (un-normalized) similarity value between Q and the images in the database. Let denote the raw similarity values.
  4. Normalize the raw similarity values as follows:

    As explained in the intra-feature normalization section, this Gaussian normalization will ensure 99% of to be within the range of [-1,1]. An additional shift will guarantee that 99% of similarity values are within [0,1]:

    After this shift, in practice, we can consider all the values are within the range of [0,1], since an image whose distance from the query is greater than 1 is very dissimilar and can be considered to be at a distance of 1 without affecting retrieval.

Weights for feature and feature elements

After the intra- and inter-feature normalization processes discussed above, the feature elements within a feature as well as the features within a composite query are of equal weights. This objective equality allows us to further associate subjective unequal intra- and inter-feature weights for a particular query.

Inter-feature weights associated with the composite query reflect the user's different emphasis of the atomic feature in the composite query. For example, for a composite query based on color and texture, a user may put the weight for color equals 90% and weight for texture equals 10%. The support of different inter-feature weights enables the user to specify his/her information need more precisely.

Intra-feature weights associated with each feature vector reflect the different contributions of the feature elements to the feature vector. For example, in the wavelet texture representation, we know that the mean of a sub-band may be corrupted by the lighting condition, while the standard deviation of a sub-band is independent of the lighting condition. Therefore the user may want to put more weight on the standard deviation feature element, and less weight on the mean feature element. The support of the different intra-feature weights enables the system to have more reliable feature representation and thus better retrieval performance.

In MARS, we have explored many techniques to associate subjective weights with feature elements and feature vectors. Associating subjective weights improves the retrieval performance considerably. Due to page limitations we do not discuss this any further. Instead we refer the readers to [RHMO97b,RHMO97a].

Table of contents


Experimental Evaluation

For our experiments we used a collection of images of ancient African artifacts from the Fowler Museum of Cultural History. We used a total of 286 images of such artifacts. A sample of our collection is shown as the result of a query in figure 3.

 

 


: Screen shot of the query ``texture or shape'' with a stool image as parameter

To demonstrate the retrieval quality of our system, we chose 13 typical conceptual queries, three examples of which are ``all stools'', ``stone masks'' or ``golden pots''. Sample images satisfying these three concept queries are shown in figure 4.

  
Figure 4: Three sample images used in conjunction with known relevant sets to evaluate precision and recall

To determine the relevant answer sets for each of the conceptual queries, we browsed through the collection and marked those images relevant to a concept. The concept queries were then mapped into a Boolean formulation that best represents them (e.g. stone masks are expressed as `texture=image X and shape=image Y').

As in the text based retrieval systems, the retrieval performance is defined based on precision and recall [SM83,CCH92].

Precision
is the ratio of the number of relevant images retrieved to the total number of images retrieved.

Perfect precision (100%) means that all retrieved images are relevant.

Recall
is the ratio of the number of relevant images retrieved to the total number of relevant images.

Perfect recall (100%) can be obtained by retrieving the entire collection, but the precision will be poor.

We have conducted experiments in which we calculate precision at various levels of recall. These results are reported in figure 5.

  
Figure 5: Precision for various levels of recall averaged over 13 queries

As we can see from the figure, the probability models consistently outperform the fuzzy model at equal recall or precision points. An interpretation of this is that the fuzzy model includes only partial information. For and (or) queries, the fuzzy model uses the min (max) operation that selects an image based on the worst (best) degree of membership in both operand sets. This ignores any contribution of the other operand set in computing the final degree of membership of an image. The probability model instead combines the information provided by both terms in a better way. This allows for an improved ranking as more information about an image is captured.

We can also see that P1 is the best among the three probability models. This relates to figure 1 where P1 < P2 < P3 at the same distance. Model P1 has a sharper drop in probability for distance less than 0.414. This has the effect of ``warping'' space close to the optimum match (Pr = 1) which results in slight ranking differences in the and and or operations. By this effect, good matches are drawn closer by a larger factor which has the effect of increasing precision. P1 gives more importance to images ranking good in any of the operand sets while models P2 and P3 give progressively less importance to good rankings, trying to equate them to worse rankings in the hope that the combination might be better. A faster decreasing function will negatively impact recall. In the limiting case () precision will be 1, but recall very low since even small distance reductions between a feature and the query feature will result in a large probability drop. As a result, similar images (unless they match perfectly) would rank lower and thus not be returned.

Table of contents


Related Work

Content-based retrieval of images is an active area of research being pursued independently by many research teams. Similar to MARS, most existing content-based image retrieval systems also extract low-level image features like color, texture, shape, and structure [FFN93,T78,MCO97,F95,M88,MM95,PPS94,SC94]. However, compared to MARS the retrieval techniques supported in some of these systems are quite primitive. Many of these systems support queries only on single features separately. Certain other systems allow queries over multiple feature sets by associating a degree of tolerance with each feature. An image is deemed similar to the query if it is within the specified tolerance on all the query features. As discussed in section 1.2, this approach has many drawbacks.

Some commercial systems have been developed. QBIC [F95], standing for Query By Image Content, is the first commercial content-based Image Retrieval system. Its system framework and techniques had profound effects on later Image Retrieval systems. QBIC supports queries based on example images, user-constructed sketches and drawings and selected color and texture patterns, etc. The color features used in QBIC are the average (R,G,B), (Y,i,q),(L,a,b) and MTM (Mathematical Transform to Munsell) coordinates, and a k element Color Histogram. Its texture feature is an improved version of the Tamura texture representation [T78], i.e. combinations of coarseness, contrast and directionality. Its shape feature consists of shape area, circularity, eccentricity, major axis orientation and a set of algebraic moments invariants. QBIC is one of the few systems which take into account high dimensional feature indexing. In its indexing subsystem, the KL transform is first used to perform dimension reduction and then -tree is used as the multi-dimensional indexing structure.

Virage is a content-based image search engine developed at Virage Inc. Similar to QBIC, Virage [BFG] supports visual queries based on color, composition (color layout), texture, and structure (object boundary information). But Virage goes one step further than QBIC. It also supports arbitrary combinations of the above four atomic queries. Users can adjust the weights associated with the atomic features according to their own emphasis. In [BFG], Jeffrey et al. further proposed an open framework for image management. They classified the visual features (``primitive'') as general (such as color, shape, or texture) and domain specific (face recognition, cancer cell detection, etc.). Various useful ``primitives'' can be added to the open structure depending on the domain requirements. To go beyond the query-by-example mode, Gupta and Jain proposed a nine-component query language framework in [GJ97].

Photobook [PPS94] is a set of interactive tools for browsing and searching images developed at the MIT Media Lab. Photobook consists of three sub-books, from which shape, texture, and face features are extracted respectively. Users can then query based on corresponding features in each of the three sub-books. In its more recent version of Photobook, FourEyes, Picard et al. proposed to include human in the image annotation and retrieval loop [MP96]. The motivation of this was based on the observation that there was no single feature which can best model images from each and every domain. Furthermore, human perception is subjective. They proposed a ``society of models'' approach to incorporate the human factor. Experimental results show that this approach is very effective in interactive image annotation.

In [JV96] the authors propose an image retrieval system based on color and shape.

Their color measure is based on the RGB color space and euclidean and histogram intersection measures are used. For shape, they use a polygonal description that is resilient to scaling, translation and rotation.

The proposed integration uses a weighted sum of shape and color to arrive at the final result. They address high dimensional feature indexing with a clustering approach, where clusters are build upon database creation time.

To date, no systematic approach to answering content based queries based on image features has emerged. To address this challenge, similar to the approaches taken in information retrieval system, the approach we have taken in developing MARS is to support an ``intelligent retrieval'' model using which a user can specify their information need to the image database and the database provides a ranked retrieval of images to user's request. The retrieval model supported is a variation of the Boolean model based on probabilistic and fuzzy interpretation of distances between the image and the query.

Recently, in parallel to our work, the problem of processing boolean queries over multimedia repositories has also been studied in [Fag96] and [CG96]. These approaches have, however, restricted themselves to a boolean model based on a fuzzy interpretation of boolean operators. Our experimental results illustrate that the probabilistic model outperforms the fuzzy model in terms of retrieval performance (see figure 5). Furthermore, the query evaluation approach used in MARS differs significantly from the approaches developed in [Fag96,CG96]. As will become clear in the appendix, MARS follows a demand-driven data flow approach [Gra96]; i.e., data items are never produced before they are needed. So the wait in a temporary file or buffer between operators in the query tree for each item is minimized. This model is efficient in its time-space-product memory costs [Gra96]. In this model, the operators are implemented as iterators which can be effectively combined with parallel query processing. On the other hand, the strategy presented in [Fag96] requires intermediate storage of data items at internal nodes of the query tree to evaluate the best N answers. This is also true for the approach followed in [CG96].

Table of contents


Conclusions

To address the emerging needs of applications that require access to and retrieval of multimedia objects, we are developing the Multimedia Analysis and Retrieval System (MARS) in our group at the University of Illinois [MCO97]. In this paper, we described the retrieval subsystem of MARS and its support for content-based queries over image databases. To support content-based retrieval, in MARS many visual features are extracted from images-- color, texture, shape, color and texture layout. Information retrieval (IR) techniques, modified to work over visual features, are then used to map user's queries to a collection of relevant images. Specifically, extended boolean models based on a probabilistic and fuzzy interpretation of boolean operators are used to support ranked retrieval. Our results show that using IR techniques for content-based retrieval in image databases is a promising approach.

The work reported in this paper is being extended in many important directions. In our current system, we have concentrated on adapting the boolean retrieval model for content-based retrieval of images. Many other retrieval models that have a better retrieval performance compared to the boolean approach have been developed in the IR literature for textual databases [SM83,CCH92,VR79]. We are currently exploring how these models can be adapted for content-based image retrieval. Furthermore, our current work has concentrated on image databases. We are also generalizing our approach to content-based retrieval in multimedia databases. Finally, we are also exploring the use of relevance feedback techniques in our extended boolean model.

Table of contents


Acknowledgments

This work was supported in part by the Army Research Laboratory under Cooperative Agreement No. DAAL01-96-2-0003; in part by NSF/DARPA/NASA Digital Library Initiative Program under Cooperative Agreement No. 94-11318; in part by NSF CISE Research Infrastructure Grant CDA-9624396. Yong Rui is supported in part by CSE, College of Eng. UIUC. Michael Ortega is supported in part by CONACYT grant 89061. The example images used in this article are used with permission from the Fowler Museum of Cultural History at the University of California--Los Angeles. These images were part of an image database delivery project called the Museum Educational Site Licensing Project (MESL), sponsored by the Getty Information Institute. This goal of the two-year MESL project was to test the licensing and delivery of digital images and meta data from seven U. S. museums to seven U. S. universities. The University of Illinois was selected as a participant in the MESL project.

References

BFG
Jeffrey R. Bach, Charles Fuller, Amarnath Gupta, Arun Hampapur, Bradley Horowitz, Rich Humphrey, Ramesh Jain, and Chiao fe Shu. The virage image search engine: An open framework for image management. In SPIE Storage and Retrieval for Still Image and Video Databases IV.

CCH92
J. P. Callan, W. B. Croft, and S. M. Harding. The inquery retrieval system. In In Proceedings of the Third International Conference on Database and Expert Systems Applications, Valencia, Spain, 1992.

CG96
Surajit Chaudhari and Luis Gravano. Optimizing queries over multimedia repositories. Proc. of SIGMOD, 1996.

D.90
Lomet D. The hb-tree: A multiattribute indexing mechanism with good guaranteed performance. ACM Transactions on Database Systems, 15(4), 1990.

F95
M. Flickner et al. Query by image and video content: The qbic system. IEEE Computer, September 1995.

Fag96
Ronald Fagin. Combining fuzzy information from multiple systems. Proc. of the 15th ACM Symp. on PODS, 1996.

FFN93
C. Faloutsos, M. Flocker, W. Niblack, D. Petkovic, W. Equitz, and R. Barber. Efficient and effective querying by image content. Technical Report RJ 9453 (83074), IBM Research Report, Aug. 1993.

GJ97
Amarnath Gupta and Ramesh Jain. Visual information retrieval. Communications of the ACM, 40(5), 1997.

Gra96
Goetz Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, Vol. 25, No. 2, 1993, 1996.

Gut84
A. Guttman. R-tree: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.

JV96
Anil K. Jain and Aditya Vailaya. Image retrieval using color and shape. Pattern Recognition, 29(8), 1996.

M88
Makoto Miyahara et al. Mathematical transform of (r,g,b) color data to munsell (h,v,c) color data. In SPIE Visual Communications and Image Processing'88, volume 1001, 1988. 650.

MCO97
Sharad Mehrotra, Kaushik Chakrabarti, Michael Ortega, Yong Rui, and Thomas S. Huang. Towards extending information retrieval techniques for multimedia retrieval. In 3rd International Workshop on Multimedia Information Systems, Como, Italy, 1997.

MM95
B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval of image data. Technical Report TR-95-06, CIPR, July 1995.

MP96
T. P. Minka and R. W. Picard. Interactive learning using a ``society of models''. Technical Report 349, MIT Media Lab, 1996.

N84
J. Neivergelt et al. The grid file: an adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, March 1984.

OMR97
Michael Ortega, Sharad Mehrotra, Yong Rui, Kaushik Chakrabarti, and Thomas S. Huang. Processing similarity queries in mars. Technical Report TR-MARS-97-12, University of Illinois at Urbana-Champaign, Aug. 1997.

PPS94
A. Pentland, R. W. Picard, and S. Sclaroff. Photobook: Tools for content-based manipulation of image databases. In Proc. Storage and Retrieval for Image and Video Databases II, volume 2(185), pages 34--47, Bellingham, Wash, 1994. SPIE.

RCM97
Yong Rui, Kaushik Chakrabarti, Sharad Mehrotra, Yunxin Zhao, and Thomas S. Huang. Dynamic clustering for efficient retrieval in high dimensional multimedia databases. Technical Report TR-MARS-97-10, University of Illinois at Urbana Champaign, 1997.

RHMO97a
Yong Rui, Thomas S. Huang, Sharad Mehrotra, and Michael Ortega. Automatic matching tool selection via relevance feedback in mars. to appear in the 2nd Int. Conf. on Visual Information Systems, 1997.

RHMO97b
Yong Rui, Thomas S. Huang, Sharad Mehrotra, and Michael Ortega. A relevance feedback architecture in content-based multimedia information retrieval systems. In Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries. IEEE, 1997.

RSH96
Yong Rui, Alfred C. She, and Thomas S. Huang. Modified fourier descriptors for shape representation -- a practical approach. In Proceeding of First International Workshop on Image Databases and Multi Media Search, 1996. Amsterdam, The Netherlands.

SC94
John R. Smith and Shih-Fu Chang. Tools and techniques for color image retrieval. In IS & T/SPIE proceedings, volume 2670, 1994. Storage & Retrieval for Image and Video Databases IV.

SFV83
G. Salton, Edward Fox, and E. Voorhees. Extended boolean information retrieval. Communications of the ACM, 26(11):1022--1036, November 1983.

SM83
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill Computer Science Series, 1983.

T78
Hideyuki Tamura et al. Texture features corresponding to visual perception. IEEE Trans. Systems, Man, and Cybernetics, SMC-8(6), June 1978.

VR79
C. J. Van Rijsbergen. Information Retrieval. London: Butterworths, 1979.

Table of contents


Fuzzy model evaluation algorithms

In this appendix, we present the algorithms used to compute the nodes in the query tree in the case of the fuzzy Boolean retrieval model. For simplicity we restrict ourselves to compute only binary nodes. That is, we assume that the query node Q has exactly two children, A, and B. Algorithms are presented for the following three cases: , and . Notice that we do not present an algorithm for only an unguarded negation (i.e., ) or a negation in the disjunction (i.e., ). Presence of an unguarded negation or negation in a disjunction does not make much intuitive sense. Typically, a very large number of images will satisfy the query and if such a negation is present in the query, it is best to rank each image and sort the answer set based on this ranking. We therefore only consider a restricted notion of negation when it appears within a conjunctive query.

In describing the algorithms the following notation is used.

In describing the algorithms below we omit some critical error and boundary checking for clarity purposes, which have been addressed in the implementation.

Conjunctive Query with Positive Sub queries

 

The following algorithm computes the set of images ranked on their degree of membership to the query , given input streams A and B which are ranked based on the degree of membership of images in A and B. In the algorithm, at each stage, the best image out of the sources A and B is chosen and added to a set of images and which function as buffers of images already observed from the corresponding stream. When an image is found that was already observed in the other stream, the loop is terminated since this is the next best image according to the query node Q. The resulting image is returned (that is, placed in the set ) with the degree equal to the minimum degree of the image in both streams.

Algorithm GetNextAnd_Fuzzy(A , B )
;returns: next best image in A and B \
  while (TRUE)
     = Peek (A ), = Peek (B )
    if then
    &nb = GetNext(A )
    &nb
    &nbif then ;image already seen in B
    &nb = image []
    &nbexit loop
    &nbend if
    else
    if then
    &nb = GetNext(B )
    &nb
    &nbif then ;image already seen in A
    &nb = image []
    &nbexit loop
    &nbend if
    end if
  end while
  ; reached upon finding a common image in and
  
  
  
  return I

Conjunctive Query with Negative Sub query

We next develop the algorithm for computing the query . The algorithm is different compared to the one developed for the conjunctive query with no negative sub query. Unlike the algorithm discussed earlier, only the stream for the node A is used in computing the degree of membership of images according to . Images are retrieved from the input stream A in ranked order.

For a given image I its degree of membership in the set associated with is evaluated using the following function:

Probe(I, query): the Probe function returns the degree of membership of the image identified by in the fuzzy set associated with the query.

Let I be the image in with the highest degree according to Q and let be the next best image in rank order in stream A. An image is the next best match if either of the two conditions hold:

  1. If then I is the next best match according to Q since all other images to follow will have the degree of membership according to Q less than .
  2. If but then is the next best match according to Q since the membership of is better than all the images already in , and is also higher than all other images that will be produced in the future.

To find the next best match according to Q, As stream is traversed until one of the two conditions become true.

Algorithm GetNextAnd_Not_Fuzzy(A , B )
;returns: next best image in A and not B \


  while (TRUE)
     = Peek (A )
    if MaximumDegree() then
    &nbI = image from with maximum degree
    &nb
    &nbexit loop
    else
    &nb = GetNext(A ) ; consume from A
    &nb
    &nb Probe(, )
    &nbif then
    &nb
    &nbexit loop
    &nbelse
    &nb
    &nbend if
    end if
  end while
  return I

Disjunctive Query

 

The following algorithm computes the set of images ranked on their degree of membership to the query , given input streams A and B which are ranked based on the degree of membership of images in A and B. The algorithm essentially consists of a merge but makes sure that an image that was already retrieved is ignored. This accomplishes the desired max behavior of the degree function associated with the disjunction in the fuzzy model.

Algorithm GetNextOr_Fuzzy(A , B )
;returns: next best image in A or B \
  flag = TRUE
  while (flag)
     = Peek (A ), = Peek (B )
    if then
    &nbI = GetNext(A )
    else
    &nbI = GetNext(B )
    end if
    flag = FALSE
    if then
    &nbflag = TRUE
    end if
  end while
  
  return I

Table of contents