A novel access structure for similarity search in metric
databases, called Similarity Hashing (SH), is proposed. It
is a multi-level hash structure, consisting of search-separable
bucket sets on each level. The structure supports easy insertion
and bounded search costs, because at most one bucket needs to be
accessed at each level for range queries up to a pre-defined value
of search radius. At the same time, the number of distance
computations is always significantly reduced by use of
pre-computed distances obtained at insertion time.
Experimental results demonstrate that the performance of SH
is superior to the available tree-based structures. Contrary to tree
organizations, the SH structure is suitable for distributed and
parallel implementations.