Course Home
Syllabus
Lectures
Project
Bibliography
Software



CS 236601: Information Retrieval and Digital Libraries

Bibliography

This is an initial listing of some relevant papers. As the course progresses, additional papers may be added to this list. It is expected that each student will read each and every paper at least to be conversant with the ideas and technologies presented in the papers.
Books
  • Information Retrieval -- Algorithms and Heuristics, by David A. Grossman and Ophir Frieder, Kluwer Academic Publishers, 1998.
  • Modern Information Retrieval, by Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley, 1999.
  • Managing Gigabytes, Compressing and Indexing Documents and Images, by Ian H. Witten, Alistair Moffat, and Timothy C. Bell, Morgan Kaufmann Publishers, 1999.
  • Information Storage and Retrieval, by Robert Korflage, John Wiley and Sons, 1997.
Courses
This is a collection of pointers to related courses taught at other institutions.
Papers
  1. Androutsopoulos, Ion, Koutsias, John, Chandrinos, Konstantinos V., and Spyropoulos, Constantine D., An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages, In Proceedings SIGIR'2000, pp. 160-167, Athens, Greece, July 2000.
  2. Brin, Sergey and Page, Lawrence, The anatomy of a large-scale hypertextual web search engine, WWW7, 1998. http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm
  3. Dietterich, Thomas G., Approximate statistical tests for comparing supervised classification learning algorithms, Neural Compuation, 10(7), 1998. pp. 1895-1924. ftp://ftp.cs.orst.edu/pub/tgd/papers/nc-stats.ps.gz
  4. Dumais, Susan and Chen, Hao, Hierarchical classification of web content, In Proceedings SIGIR'2000, pp. 256-263, Athens, Greece, July 2000.
  5. Goldszmidt, M., and Sahami, M., A probabilistic approach to full-text document clustering, Technical Report ITAD-433-MS-98-044, SRI International, 1998. http://robotics.standord.edu/users/sahami/papers-dir/gm-clustering.ps
  6. Iwayama, Makoto, Relevance feedback with a small number of relevance judgements: Incremental relevance feedback vs. document clustering, In Proceedings SIGIR 2000, pp. 10-16, Athens, Greece, July 2000.
  7. Koller, D., and Sahami, M., Hierarchically classifying documents using very few words, In ICML-97: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 170-178, San Francisco, CA. 1997. http://robotics.stanford.edu/users/sahami/papers-dir/ml97-hier.ps
  8. Letsche, Todd, Toward large-scale information retrieval using latent semantic indexing. Masters thesis. University of Tennessee, Knoxville. 1996.
  9. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E., A Bayesian approach to filtering junk e-mail, in Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05. 1998. http://robotics.stanford.edu/users/sahami/papers-dir/spam.ps
  10. Sahami, M., Using machine learning to improve information access. PhD thesis. Department of Computer Science, Stanford University. 1998. http://robotics.stanford.edu/users/sahami/papers-dir/thesis.ps
  11. Salzberg, Steven L., On comparing classifiers: A critique of current research and methods, Data Mining and Knowledge Discovery, 1, 1999. pp. 1-12. http://www.tigr.org/~salzberg/critique.ps
  12. Singer, Yoram and Lewis, David D., Machine learning for information retrieval: Advanced techniques. A tutorial presented at the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). Athens, Greece. July 2000. http://www.cs.huji.ac.il/~singer/papers/ml4ir.ps.gz
  13. Slonim, Noam and Tishby, Naftali, Document clustering using word clusters via the information bottleneck method. Proceedings 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). Athens, Greece. July 2000. http://www.cs.huji.ac.il/labs/learning/Papers/paper9.eps.gz
  14. Yan, Tak W. and Garcia-Molina, Hector, SIFT - A tool for wide-area information dissemination, Proceedings of the 1995 USENIX Technical Conference, February 1995. pp. 177-186. http://www-db.stanford.edu/pub/yan/1994/sift.ps
  15. Yang, Yiming and Pedersen, J.P., A Comparative study on feature selection in text categorization. Proceedings of the Fourteenth Interational Conference on Machine Learning (ECML'97), 1997. http://www.cs.cmu.edu/~yiming/papers.yy/icml97.ps.gz
  16. Yang, Yiming and Liu, Xin, A re-examination of text categorization methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1999, pp 42--49. http://www.cs.cmu.edu/~yiming/papers.yy/sigir99.ps
  17. Yang, Yiming, An Evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, Vol. 1, No. 1/2, pp. 67--88. http://www.cs.cmu.edu/~yiming/papers.yy/irj99.ps.gz
  18. Zobel, Justin, How reliable are the results of large-scale information retrieval experiments?, In Proceedings SIGIR'1998, pp. 307-314, Melbourne, Australia. August 1998.