When University of Chicago biologist Michele Markstein was looking for a computer scientist to help with her PhD, she didn't have to search far. Both her father, a researcher at HP Labs, and her mother, a computer consultant, fit the bill nicely.
Michele was investigating patterns in sequences of nucleotides (the A, C, G, and T's of the genetic code) of the fruit fly, Drosophilia, and she needed a piece of software that would allow her to automate searches through strings that could be millions of letters long.
Peter Markstein, a Principal Scientist at HP Labs, and Vicky Markstein, a consultant in computer architecture and programming, quickly offered to help.
When they couldn't locate existing software to do the job, they thought it would be a fun home project to write a program themselves.
entrée to bioinformatics
The resulting program has turned out to be a remarkably useful tool, yielding some important insights in Michele's field of study. What's more, the program has the potential to speed up the process of searching genomes, which could considerably accelerate bionformatics research.
What started as a hobby has now become Peter's primary research focus at HP Labs, giving HP in the process an entrée to the fast-growing field of bioinformatics.
computers aid study of genetics
Bioinformatics is the marriage of computer science to biology, and, in the words of Scientific American, "it is destined to change the face of biomedicine."
In the last few years researchers have begun to deliver on the dream of sequencing entire animal and plant genomes. As a result, geneticists now have access to giant databases filled with millions and millions of points of raw data.
The challenge now is to make sense of it all. The only way to do that with any speed is to use computers.
finding patterns in genetic code
Michele Markstein's particular use for computers was to help locate enhancers, small stretches of DNA that can act as switches to turn genes on or off.
"We all have the same set of genes in each cell," she explains, "but they aren't all expressed, or turned on, in each cell. Hence cells can be different from each other. Ultimately we want to see if we can predict when and where every gene would be expressed in an organism."
Michele was searching for sequences of genetic code that mark the place of such enhancers on a chromosome. Researchers believe that if they know where enhancers are and how they work, they'll have a better chance of developing drugs that will regulate the gene that a particular enhancer triggers.
Finding patterns in the genetic code of the fruit fly that revealed the presence of enhancers wasn't an easy problem to solve.
"Michele would give us several hundred short sequences of nucleotides at a time," Peter Markstein recalls, "and ask us to find every place in the fly genome where, say, three or more occur within a certain size window. Then she'd want to be able to change the window size and also how many there are in a cluster."
surprising results yield paper
But Peter was able to write an algorithm that he and his wife embedded in software that Michele christened Fly Enhancer. It searched the entire fly genome in just the way that Michele wanted -- and in the process, came up with some significant results.
"We found what appeared to be enhancers in about a dozen places," says Peter. "Some of them were quite surprising. Enhancers are usually just before the gene, but we found some that were actually inside genes."
A gene's enhancer is a key to being able to control it, so it is significant that some of those keys turned out to be in a place where researchers usually do not look because of the computational expense involved.
Michele was then able to do wet lab experiments to show that what her parent's program had indicated were likely to be enhancers were, in fact, the real thing.
That was an important enough finding for the resulting paper -- co-authored by Michele, Peter, Vicky and Michele's advisor, Professor Michael Levine of UC Berkeley -- published in January in the prestigious Proceedings of the National Academy of Sciences.
program speeds searches
What was also important about the program the Marksteins wrote was the speed with which it allowed Michele to work. Other engines are able to search the various completed animal and plant genomes, but, says Michele, "as far as I know, this is the quickest one out there."
Searching the human genome currently requires about a day, Michele says. "Using even the slowest HP computer, my Dad's algorithm lets you do it in about 30 minutes. If we can get our hands on a Superdome, it would take about a minute."
There's a considerable value to this. "Finding enhancers is still a hit or miss process," Michele explains, "but FlyEnhancer enables us to make some pretty good guesses."
Because of the program's speed, researchers can continue to refine searches until the guesses become statistically significant predictions. With FlyEnhancer, says Michele, "I was able to define enhancers an order of magnitude faster."
scientific and business benefits
The Marksteins' work also shows the value of putting computers into the service of biology. "It makes the bioinformatic approach practical," says Abraham Lempel, director of the Advanced Studies Program at HP Labs.
"HP recognizes the scientific importance and the enormous business potential of this field," says Lempel.
"When we talk to customers," he adds, "it is helpful to have researchers who are familiar with the problem domain, and who can speak with authority on the subject."
Indeed, the success of what started as a fun family project has led to bioinformatics being Peter Markstein's main focus at HP Labs. He's working to better understand the field's particular computing challenges, with the goal of helping HP determine how to best supply tools and support for bioinformatics. HP is the leading provider of computing equipment in the bioinformatics field.
building on the work
The Markstein family collaboration has changed all their lives.
"Michele can drive us pretty hard," jokes her father. "But we're enjoying it. It's very exciting."
Michele will soon be taking up a post doc in Cambridge, where she plans to build on her recent work.
Peter is working on a new search tool. Vicky is chairing an IEEE committee that is looking into the possibility of creating an IEEE bioinformatics section.
tackling the human genome
Michele created a public website (http://flyenhancer.org), developed with UC Berkeley researcher Ka-Ping Yee, for the Fly Enhancer program. The site, recently highlighted in Science magazine's Netwatch, has already received about 5,000 search requests. They've recently added sister sites that use Fly Enhancer to search the genomes of the C. elegans worm and a grass, Arabidopsis thaliana.
Next the Marksteins plan to tackle the human genome.
"What we've done on the fly has pretty sure implications for humans," says Peter. "The same techniques can be used, except the human genome is about 25 times larger than the fly."
"If we were to scale this up for the human genome," adds Michele, "I think it would be useful to the scientific community at large. There's a possibility that in further collaboration with HP we might do this, especially if we can run it through a Superdome. Then I think we'll really see the power of the algorithm, both in terms of speed and scientific advancements."
by Simon Firth