
Dennis Wilkinson and Bernardo A. Huberman
HP Laboratories
Palo Alto, CA 94304
Abstract
We present an automated method of identifying communities of
functionally related genes from the biomedical literature. These communities
encapsulate human gene and protein interactions and identify groups of genes
that are complementary in their function. We use graphs to represent the
network of gene cooccurrences in articles mentioning particular keywords, and
find that these graphs consist of one giant connected component and many
small ones. In addition, the vertex degree distribution of the graphs follows
a power law, whose exponent we determine. We then use an algorithm based on
betweenness centrality to identify community structures within the giant
component. The different structures are then aggregated into a final list of
communities, whose members are weighted according to how strongly they belong
to them. Our method is efficient enough to be applicable to the entire
Medline database, and yet the information it extracts is significantly
detailed, applicable to a particular problem, and interesting in and of
itself. We illustrate the method in the case of colon cancer and demonstrate
important features of the resulting communities.
Full paper (to appear in the Proceedings of the National Academy of Sciences, USA): communities.pdf

|