Linear Time Baire Hierarchical Clustering for Enterprise Information Retrieval

Fionn Murtagh, Pedro Contreras Albornoz

Research output: Contribution to journalArticlepeer-review

Abstract

The Baire or longest common prex metric induces an ultrametric or tree topology.
It has many interesting properties such as the following: the Baire distance, or metric, is also an ultrametric; associated with the tree topology is a hierarchically-structured, embedded set of clusters; the hierarchical clustering can be viewed in terms of density-based and grid-based structuring of the data. We are interested in using the hierarchical structuring of the data induced by the Baire metric for top-down search, in an information retrieval context. Enterprise search and retrieval requires exhaustivity of retrievals. Another requirement is that enterprise search supports situation awareness in order to implement dierent policies of access to, and use of, data. We show how situation awareness can be supported by the Baire metric, as used for structuring data in order to support enterprise search and retrieval.
Original languageEnglish
JournalInternational Journal of Software and Informatics
Volume6
Issue number3
Publication statusAccepted/In press - 2012

Keywords

  • Information retrieval, search, exact match, partial match, best match, query, enterprise, metric, ultrametric, hierarchy, tree, longest common prefix, computation, efficiency, linear time, logarithmic time.

Cite this