Click here for full text:
pFilter: Global Information Filtering and Dissemination
Tang, Chunqiang; Xu, Zhichen
Keyword(s): No keywords available.
Abstract: Due to the overwhelming amount of information on the Internet, it is becoming increasingly difficult for people to find relevant information in a timely fashion. Information filtering and dissemination systems allow users to register persistent queries called user profiles. They detect new contents, match them against the profiles, and continuously notify users when relevant information becomes available. Existing systems, however, either are not scalable; or do not support matching of unstructured documents. Unstructured documents such as text, HTML or multimedia files, account for a significant percentage of contents on the Internet. To address the limits of the existing systems, we describe pFilter, a global- scale decentralized information filtering and dissemination system for unstructured documents. To handle potentially billions of documents for millions of subscribers, pFilter connects potentially millions of computers in national (and international) computing Grids or ordinary desktops into a structured peer-to- peer overlay network. Nodes in the overlay collectively publish/collect documents, build index, register profiles, and filter and disseminate information. To enable efficient and accurate match between profiles and documents without flooding either documents or profiles, profiles in the overlay are organized around their vector representations (based on modern information retrieval algorithms) such that the searching space of a new document is organized around related profiles. In pFilter, we introduce a new application-level multicast algorithm that allows documents to be efficiently disseminated to a large number of interested parties.
Back to Index