George Forman: External Publications

Research areas:

  1. Data Mining, Machine Learning, Text Classification
  2. Knowledge Management, Model-Based Reasoning, Diagnosis
  3. Parallel & Distributed Computing, Clustering
  4. Mobile Computing, Variable Resources 

It's hard to keep this kind of page up-to-date.  If you're looking for something more recent than I have listed here, try the list of publications collected for me by DB Trier or CiteSeer or search HP Technical Reports.   Note to HP personnel: see also HP-internal tech reports.  I only list HP-external publications here.

  1. Classifying with Temporal Inductive Transfer for Recurrent Concept Drift.  G. Forman.   NIPS'05 workshop: Inductive Transfer: 10 Years Later.  HPL-2005-198.
  2. Feature Selection: We've barely scratched the surface.  G. Forman.  Essay requested for IEEE Intelligent Systems, Trends and Controversies department.  HPL-2005-165.
  3. Counting Positives Accurately Despite Inaccurate Classification.  G. Forman.  ECML'05.  HPL-2005-96R1
  4. Beware the Null Hypothesis: Critical Value Tables for Evaluating Classifiers.  G. Forman & Ira Cohen.  ECML'05.  HPL-2005-70.
  5. Finding Similar Files in Large Document Repositories.  G. Forman, K. Eshghi, and S. Chiocchetti.  KDD'05. HPL-2005-42R1.
  6. Learning from Little: Comparison of Classifiers Given Little Training  G. Forman & Ira Cohen. ECML'04.  HPL-2004-19R1.
  7. A Pitfall and Solution in Multi-Class Feature Selection for Text Classification.  G. Forman.  ICML'04. HPL-2004-86. SpreadFx/Round-Robin method.
  8. Feature Engineering for a Gene Regulation Prediction Task. G. Forman. KDD Explorations, 4(2), 2003. HPL-2002-318. This was an invited paper for getting honorable mention in the 2002 KDD Data mining Cup competition.
  9. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. G. Forman. Special Issue on Variable and Feature Selection, Journal of Machine Learning Research, 3(Mar):1289-1305, 2003. HPL-2002-147R1.
  10. Incremental Machine Learning to Reduce Biochemistry Lab Costs in the Search for Drug Discovery. G. Forman. Data Mining in Bioinformatics Workshop, 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02), July 2002. HPL-2002-141.
  11. A Method for Discovering the Insignificance of One's Best Classifier and the Unlearnability of a Classification Task. G. Forman.  Data Mining Lessons Learned Workshop, 19th International Conference on Machine Learning (ICML), Sydney, Australia, July 8-12, 2002. HPL-2002-123R2.
  12. Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification. G. Forman. In the Joint Proceedings of the 13th European Conference on Machine Learning and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD '02), August 19-23, 2002. HPL-2002-88R2. This paper gives a new analysis method for comparison studies which focuses on which method or pair of methods is most likely to give the best result on a single dataset-- a different perspective than existing machine learning papers that focus on average results over many of datasets. It also lead to the discovery of a new feature selection metric that is superior with respect to accuracy, recall and precision: Bi-Normal Separation. 
  13. Distributed Clustering can be Accurate and Efficient. G. Forman, B. Zhang. ACM KDD Explorations special issue on Scalable Data Mining Algorithms, January 2001. HPL-2000-158. 
  14. Accurate Recasting of Parameter Estimation Algorithms using Sufficient Statistics for Efficient Parallel Speed-up Demonstrated for Center-Based Data Clustering Algorithms.  B. Zhang, M. Hsu, G. Forman.  4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 243-254, Lyon, France, September 13-16, 2000. HPL-2000-94.
  15. Linear Speed-Up for a Parallel Non-Approximate Recasting of Center-Based Clustering Algorithms, including K-Means, K-Harmonic Means, and EM.  G. Forman, B. Zhang.  ACM SIGKDD Workshop on Distributed and Parallel Knowledge Discovery, KDD-2000, Boston, MA, August 20, 2000. HPL-2000-93.
  16. A Method based on Genetic Programming for Learning Text Classifiers Applied in the Domain of Spam E-mail Filtering.  G. Forman, M. Hopkins, E. Reeber, J. Suermondt.  Submission to ACM KDD Explorations. HPL-2000-140.
  17. Practical Optimization Criteria for Diagnostic Knowledge Representation. P. Cornwell, J. Suermondt, G. Forman, E. Kirshenbaum, A. Seetharaman. AI in Equipment Maintenance Service and Support, AAAI Spring Symposium, Stanford, CA, March 1999.
  18. Automated End-To-End System Diagnosis of Networked Printing Services Using Model-Based Reasoning. G. Forman, M. Jain, M. Mansouri-Samani, J. Martinka, A. Snoeren. Distributed Systems: Operation & Management, October 1998.  HPL-98-41R1.
  19. Wanted: Programming Support for Ensuring Responsiveness Despite Resource Variability and Volatility. Workshop on Computing and Communication in the Presence of Mobility, April 1998. HPL-98-15. Also appears in "Mobility: Processes, Computers, and Agents," eds. F. Douglis, D. Milojicic, R. Wheeler, Addison-Wesley, 1999.
    The topic of an invited panel presentation at the 20th International Conference on Software Engineering (ICSE), April 1998.
  20. Dissertation: Obtaining Responsiveness in Resource-Variable Environments. Computer Science & Engineering Dept., Univ. of Washington, 1996.
  21. Survey: The Challenges of Mobile Computing.  G. Forman, J. Zahorjan. IEEE Computer, 27(4):38-47, April 1994.  Also appears in "Mobility: Processes, Computers, and Agents," eds. F. Douglis, D. Milojicic, R. Wheeler, Addison- Wesley, 1999.
  22. ZPL vs. HPF: A Comparison of Performance and Programming Style. C. Lin, L. Snyder, R. Anderson, B. Chamberlain, S. Choi, G. Forman, E. Lewis, W. Weathersby. Tech Report UW-CSE-95-11-05, Department of Computer Science and Engineering, University of Washington, 1994.
  23. The Ariadne Debugger: Scalable Application of Event-Based Abstraction.  J. Cuny, G. Forman, A. Hough, J. Kundu, C. Lin, L. Snyder, D. Stemple. ACM/ONR Workshop on Parallel and Distributed Debugging, San Diego, CA, May 1993. SIGPLAN Notices, 28(12): 85-95, Dec. 1993.
  24. A Distributed Operating System for the K2 Based on Amoeba. Tech Report 89/14, Swiss Federal Institute of Technology, Zürich, Switzerland, 1989.

===========================================================
George Forman, click to reveal email address@hpl.hp.com    Spam robots may prefer to send to gforman4@hpl.hp.com.