Dhruva is a senior researcher in HP Labs interested in building novel, high-performing systems software. He is currently investigating programming techniques with non-volatile memory, focusing on durability and concurrency aspects. In the recent past, he has worked on semantics and implementation of transactional memory.
Prior to joining HP Labs in 2008, he was a member of the high level optimizer team in the HPUX compiler section. He led the loop optimization and the automatic parallelization projects and was at the forefront of the inter-procedural optimizer development. He received his PhD in Electrical and Computer Engg. from Northwestern University in 2000. His doctoral thesis was on the PARADIGM compiler where he developed an automatic parallelization framework for irregular and hybrid scientific applications. He received an M.Tech. in Computer Science from the Indian Institute of Technology, Kanpur (IITK) in 1996 during which he designed efficient testing methodologies for regular VLSI circuits. He obtained a B.E. degree in 1994, also in Computer Science, from Jadavpur University. Dhruva has been granted 5 US patents and a few more are pending.
Programming languages, compilers, runtime systems, performance analysis and modeling.
Programming with non-volatile memory: Due to scalability limits of DRAM, devices using new memory technologies are likely to be seen in emerging systems. When connected as memory, they offer load/store access and low latencies but in addition, they are non-volatile. The question is whether existing programming techniques are adequate in taking advantage of the features these devices offer. Some of our work on desired semantics and performance characteristics are available.
Software transactional memory (STM): The complexity of STMs, such as aborts due to conflicts, requires another look at the performance debugging features available today. We developed support for some new abstractions that should be exposed to the user for easier performance analysis. These include the runtime abort graph at the memory reference level and associated fine grain performance characteristics. Parts of this work appeared in a paper at CGO 2011 and a poster at PPoPP 2010. It turns out that such runtime abort patterns provide enough actionable data for an automatic optimizer to generate higher performing code. Specifically, we introduce a hybrid acquire mode for atomic sections where distinct atomic sections within the same application can use either eager or lazy policy. We then devise an offline feedback driven technique that uses abort patterns from a previous run to generate a hybrid acquire mapping for atomic sections. More details can be found in the following papers and a technical report.
- Dhruva R. Chakrabarti, Prithviraj Banerjee, Hans-J. Boehm, Pramod G. Joisha and Robert S. Schreiber, The Runtime Abort Graph and its Application to Software Transactional Memory Optimization, International Symposium on Code Generation and Optimization (CGO), April 2011, Chamonix, France.
- Dhruva R. Chakrabarti, New Abstractions for Effective Performance Analysis of STM Programs, ACM SIGPLAN Principles and Practice of Parallel Programming (PPoPP), Jan. 2010, Bangalore, India.
In another line of work towards STM program optimization, we used a hybrid technique for lock allocation (static compiler analysis and STM hash-based address to lock mapping) that reduces false conflicts and improves performance. Details appeared in the following paper.
- Sandya S. Mannarswamy, Dhruva R. Chakrabarti, Kaushik Rajan, Sujoy Saraswati, Compiler Aided Selective Lock Assignment for Improving the Performance of Software Transactional Memory, ACM SIGPLAN Principles and Practice of Parallel Programming (PPoPP), Jan 2010, Bangalore, India.
We explored how compiler analysis can help improve the performance of hardware transactional memory implementations by reducing the transactional footprint and hence overflows. The results can be found in the following paper.
- Jaewoong Chung, Dhruva R. Chakrabarti, Chi Cao Minh, Analysis on Semantic Transactional Memory Footprint for Hardware Transactional Memory, IEEE International Symposium on Workload Characterization (IISWC), Dec. 2010, Atlanta, GA.
In the domain of lock-based multithreaded programs, we looked at the problem of applying sequentially-sound dataflow transformations without any change. Details can be found in the following papers.
- Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans-J. Boehm, Dhruva R. Chakrabarti, A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code, ACM Symposium on Principles of Programming Languages (POPL), Jan. 2011, Austin, TX.
- Laura Effinger-Dean, Hans-J. Boehm, Dhruva Chakrabarti, Pramod Joisha, Extended Sequential Reasoning for Data-Race-Free Programs, ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, June 2011, San Jose, CA.
Loop optimizations, interaction, and phase ordering: We designed and implemented a number of classical loop optimizations such as loop distribution, alignment, fusion, etc. The most intriguing aspect was recognizing that interaction and phase ordering of the transformations had a significant impact on the quality of generated code. A lot of existing work has resorted to an iterative exploration of the optimization search space by varying compiler parameters and repeated execution of the program. However, this kind of technique does not scale well. It not only requires a considerable time but also fails to provide the user or the compiler-writer with insight into the behavior of the application on a certain platform. We implemented an orthogonal solution employing compiler identification of the mutual enabling and disabling effects of optimizations. While many existing schemes transform the program and run the transformed program for performance estimation, our solution speculates on the data structures maintained by the compiler to predict the final performance. This latter technique requires understanding the nuances of the optimizations under consideration but it generates high quality code at the expense of a fraction of the compile time of other existing schemes. Moreover, a careful design of the optimization interfaces ensures that the framework allows plugging different implementations; this neither sacrifices performance nor requires re-tuning the interaction analysis between different phases. In addition, the correct phase order of the optimizations automatically falls out of this design. However, at this point, none of this work has been published.
We recently did some work on software prefetching for multicores. The motivation is that while compiler-inserted prefetching leads to significant performance improvements on small number of cores, this behavior is often not scalable as the number of cores is increased. The proposed scheme uses static compiler analysis of the program and generates customized prefetching helper threads. This technique reduces the otherwise negative interaction on the shared cache between the prefetches issued by the threads involved in the program. More details are in the following paper.
- Seung Woo Son, Mahmut Kandemir, Mustafa Karakoy, Dhruva Chakrabarti, A Compiler-Directed Data Prefetching Scheme for Chip Multiprocessors, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2009, Raleigh, NC.
Inter-procedural optimizations: As part of my work on the HPUX high level compiler, I was involved in building a highly efficient and scalable inter-procedural optimization framework. While program analysis across source file boundaries exposes tremendous performance potential, the costs can be prohibitive in terms of both compile-time and memory consumption. We developed a novel compilation model that is designed from ground-up with these objectives in mind. A new form of intermediate representation, parallelization of parts of the back-end optimizer, coupled with transformation-aware file operations allowed us to achieve manifold compile-time and memory usage reductions while maintaining robust runtime performance. More details can be found in the following papers:
- Sungdo Moon, Xinliang D. Li, Robert Hundt, Dhruva R. Chakrabarti, Luis Lozano, Uma Srinivasan, Shin-Ming Liu, SYZYGY: A Framework for Scalable Cross-module IPO, International Symposium on Code Generation and Optimization (CGO), March 2004, Palo Alto, CA.
- Dhruva R. Chakrabarti, Shin-Ming Liu, Inline Analysis: Beyond Selection Heuristics, International Symposium on Code Generation and Optimization (CGO), March 2006, New York, NY.
- Robert Hundt, Sandya Mannarswamy, Dhruva R. Chakrabarti, Practical Structure Layout Optimization and Advice, International Symposium on Code Generation and Optimization (CGO), March 2006, New York, NY.
- Dhruva R. Chakrabarti, Luis Lozano, Xinliang D. Li, Robert Hundt, Shin-Ming Liu, Scalable High Performance Cross-module Inlining, International Conference on Parallel Architectures and Compilation Techniques (PACT), October 2004, Antibes Juan-les-pins, France.
Automatic parallelization of hybrid applications: My doctoral thesis work was on PARADIGM, a parallelizing compiler for distributed memory message passing machines. I developed a number of techniques to parallelize and optimize programs that contain both regular and irregular accesses, i.e. hybrid programs. I developed a static single assignment form for message passing programs and used it to perform global optimizations in the presence of irregular array references. Using a uniform framework at both compile-time and run-time, we implemented a quasi-dynamic method of code generation with optimized inter-processor communication schedules and placement. Details of this work can be found in the following papers.
- Dhruva R. Chakrabarti and Prith Banerjee, Global Optimization Techniques for Automatic Parallelization of Hybrid Applications, Proceedings of the 15th ACM International Conference on Supercomputing, June 2001, Sorrento, Naples, Italy.
- Dhruva R. Chakrabarti and Prith Banerjee, Static Single Assignment Form for Message-Passing Programs, The International Journal of Parallel Programming, Vol. 29, No. 2, April 2001.
- Subhasish Mitra and Dhruva R. Chakrabarti, Portability of Parallel Programs: So Who Does the Grunt Work, Proceedings of the Second Workshop on Distributed Computing, December 2000, Calcutta, India.
- Antonio Lain, Dhruva R. Chakrabarti and Prith Banerjee, Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications, The IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 2, February 2000.
- Dhruva R. Chakrabarti and Prith Banerjee, Accurate Data and Context Management in Message-Passing Programs, The 12th International Workshop on Languages and Compilers for Parallel Computing, August 1999 (LCPC'99), San Diego, California, USA, Published in the Springer Lecture Notes in Computer Science (LNCS 1863).
- Dhruva R. Chakrabarti and Prith Banerjee, A Novel Compilation Framework for Supporting Semi-Regular Distributions in Hybrid Applications, Proceedings of the 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP'99), April 1999, San Juan, Puerto Rico.
- Dhruva R. Chakrabarti, Pramod G. Joisha, John A. Chandy, Dilip Krishnaswamy, Venkat Krishnaswamy and Prithviraj Banerjee, WADE: A Web-based Automated Parallel CAD Environment, Proceedings of HiPC '98: The 5th International Conference on High Performance Computing, December 17-20, 1998, Chennai (Madras), India.
- Dhruva R. Chakrabarti, Nagaraj Shenoy, Alok Choudhary and Prithviraj Banerjee, An Efficient Uniform Run-time Scheme for Mixed Regular-Irregular Applications, Proceedings of The 12th ACM International Conference on Supercomputing (ICS'98), Melbourne, Australia, July 1998.
- Dhruva R. Chakrabarti, Antonio Lain and Prithviraj Banerjee, Evaluation of Compiler and Runtime Library Approaches for Supporting Parallel Regular Applications, Proceedings of The 12th International Parallel Processing Symposium & 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'98), Orlando, Florida, USA, April 1998.
VLSI design/test: During my Masters, I worked on automatic test generation for regular circuits. Here are some papers on that work.
- Dhruva R. Chakrabarti and Ajai Jain, An Efficient Test Generation Technique for Sequential Circuits with Repetitive Sub-circuits, Proceedings of The 9th International Conference on VLSI Design, Bangalore, India, January 1996.
- Dhruva R. Chakrabarti and Ajai Jain, Fault-diagnosis and Tolerance in Repetitive Circuits, The 1st Conference on Fault-Tolerant Systems and Software, Madras, India, December 1995.
- Dhruva R. Chakrabarti and Ajai Jain, An Improved Hierarchical Test Generation Technique for Combinational Circuits with Repetitive Sub-circuits, The 4th Asian Test Symposium, Bangalore, India, November 1995.
- External Review Committee, Principles and Practice of Parallel Programming (PPoPP), 2014.
- External Review Committee, Principles and Practice of Parallel Programming (PPoPP), 2013.
- External Review Committee, Programming Language Design and Implementation (PLDI), 2012.
- External Review Committee, Principles and Practice of Parallel Programming (PPoPP), 2012.
- Program Committee, Code Generation and Optimization (CGO), 2012.
- Program Committee, IEEE International Conference on Parallel and Distributed Systems (ICPADS), 2011.
- Program Committee, Principles and Practice of Parallel Programming (PPoPP), 2011.
- Program Committee, IEEE International Conference on Cluster Computing, 2010.
- Program Committee, ACM Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) 2009.
- Program Committee, Open64 workshop at International Symposium on Code Generation and Optimization 2009.