"This is our brand new freezer," Don Humphries said. "It holds 4 million vials."
You'd think a freezer big enough to hold 4 million vials of blood would be easy to spot. But to my great embarrassment, I couldn't see it.
Humphries and I were standing in a lab in the basement of the Veterans Affairs hospital in the Jamaica Plain neighborhood of Boston. He had led me through a labyrinth of windowless rooms, packed with robots handling tubes of blood donated from veterans, pipes roaring with coolant, and gorilla-sized tanks of liquid nitrogen, until he stopped next to a featureless wall.
After a few awkward moments, I admitted my ignorance. "So, where is the freezer?" I asked.
Humphries, the scientific director of the lab, blinked and then looked at the featureless wall. "Right here," he said. He craned his head upwards. "This is it."
I followed his gaze, and then it clicked. The wall was actually the side of a vault that seemed to be about as big as a two-story house.
Near the top I could spy a small window. Humphries led me up a mobile staircase so that I could look through it. Inside the vault was a long, dimly lit corridor, flanked on either side by 16 separate compartments cooled to as low as 80 degrees below zero Celsius. A robot inside the freezer ferried vials to their assigned compartments.
This is no walk-in freezer.
The freezer, in fact, is at the heart of one of the most ambitious projects ever undertaken to understand our DNA. The Department of Veterans Affairs is gathering blood from 1 million veterans and sequencing their DNA. At the same time, computer scientists are creating a database that combines those genetic sequences with electronic medical records and other information about veterans' health.
The ultimate goal of the project, known as the Million Veteran Program, is to uncover clues about disorders ranging from diabetes to post-traumatic stress disorder.
Since its launch in 2010, the VA has spent $30 million building and running MVP. Caring for 8.76 million veterans enrolled in the Veterans Health Administration, it has a strong interest in understanding the role that genes play in the diseases they develop. The VA is also uniquely situated to carry out this kind of project, in part because veterans tend to have medical records in the system that stretch back decades. But the research being done as part of the MVP — which has already enrolled more than 420,000 participants — could have implications that reach far beyond the VA.
"We're working in a space where no has ever worked before on this scale," said Dr. Michael Gaziano, one of the principal investigators of the Million Veteran Program.
If the project develops as planned, it could fuel discoveries for years to come, leading to new medical treatments not just for veterans, but for all patients. "We hope that folks who follow us come up with ideas that we can't even think of right now," said Dr. John Concato, the other principal investigator on the program.
For decades, researchers have been trying to find links between genes and diseases, but for a long time their studies often ended in disappointment. They came to realize that they weren't comparing enough people's DNA to get a clear picture. And so, in the United States and abroad, scientists began gathering DNA on a huge scale. The biggest of these so-called biobanks now hold DNA from hundreds of thousands of people. In January, President Obama announced plans for the Precision Medicine Initiative, which will create a database with over 1 million participants.
For now, the Precision Medicine Initiative is a plan. The Million Veteran Program, on the other hand, is up and running. In fact, it's already mature enough for several teams of scientists to start searching its database for links between genes and diseases. But to reach this point, the MVP had to come up with new solutions for unprecedented challenges — how to recruit participants on a massive scale, how to keep their data safe without impeding scientific research, and how to squeeze hidden information out of their data with new artificial intelligence systems.
That's not to say that MVP's work is over. It now brings in about 100,000 new participants a year, and at that pace will enroll its millionth veteran in 2018. The MVP computer scientists are scrambling to assemble enough computing power to store and analyze the growing data. And it's up to Humphries — whose lab is known as the Core Laboratory at the Massachusetts Veterans Epidemiology Research and Information Center — to store away millions of blood vials in his giant freezer.
On the day of my visit, his team had already stored over half-a-million samples from over 100,000 veterans, and had a backlog of hundreds of thousands more samples waiting in temporary storage.
Yet Humphries seemed strangely at peace with the colossal numbers that now rule his life. "Next year, it might get easier," he said with a shrug.
The push for big science
Today it's easy to forget just how little the first generation of geneticists a century ago knew about genes. They couldn't even study genes directly. Instead, they pieced together clues, such as the way diseases ran in families. They found that parents with Huntington's disease, for example, passed down the disorder to half their children. Only in the 1990s did geneticists discover that these parents pass down a faulty copy of the gene encoding a protein called huntingtin.
Huntington's disease is simple as diseases go. Many common disorders such as heart disease and diabetes are the result of a complex interplay between many different genes and the environment. "You've got the hand you've been dealt, and how you play that hand," said Gaziano.
For doctors who work at the VA, one of the most important of those complex disorders is post-traumatic stress disorder. An estimated 12 percent to 20 percent of Iraq War veterans are treated for PTSD in a given year, according to the VA. Experiencing trauma in war isn't a guarantee that troops will develop PTSD, though. "You have two people who sit in the same foxhole and see the exact same thing," said Gaziano. "One guy sleeps great every night and the next guy relives that over and over again."
In the 1990s, scientists found the first clues that genes are a source of these differences. They looked at the incidence of PTSD in identical twins, who have virtually identical genes, and found that they tended to experience the same outcomes more often than other siblings — even fraternal twins.
"Environmental factors are probably more important than genetic factors, overall, but not by that much," said Dr. Joel Gelernter, a psychiatrist at Yale School of Medicine who also does research at the VA hospital in West Haven, Conn.
Over the past decade, Gelernter and his colleagues have searched for the specific genes involved in PTSD. They've compared the DNA in people with the disorder and without, looking for variations that turn up unusually often in people who suffer from it. They've found a few genes that show some promising hints of being involved. Unfortunately, those hints have a way of melting into air. When researchers look at different groups of people, they find different genes that appear to play a role in PTSD. "They may well turn out to be correct, or they may well not turn out to be correct," said Gelernter.
The problem is that complex disorders like PTSD involve not just one gene, like Huntington's disease, but hundreds. The most common variations to those genes tend to have a very small effect on the risk of a disorder, making it hard to distinguish them from harmless mutations. Other mutations have a big impact, but they're typically so rare that scientists who study small groups of people may never observe them.
To get beyond this impasse, researchers have been developing new methods. They've invented more powerful statistical techniques, and they've also put in the extra effort to study many more people. "I've pounded the message for 10 years that sample size is everything," said Jeffrey Barrett, a scientist at the Wellcome Trust Sanger Institute, a genome research center in England.
Big sample sizes may make studies more powerful, but they also demand a huge amount of work — to find people to participate in them, to gain their informed consent, and to gather data from them. And every time scientists set out to study a new condition — be it blood pressure or near-sightedness or height — they have to bring together yet another gargantuan cohort.
By the mid-2000s, a number of researchers — including epidemiologists and geneticists who work at the VA — recognized that there was another route to the big numbers they needed. They could create a single, enormous biobank.
Such a biobank would contain sizable numbers of people with a wide range of conditions, ready to be studied. Instead of spending time and money gathering yet another set of 10,000 participants for each new study, scientists could jump straight to the science.
This story was produced by STAT, a national publication covering health, medicine, and life science. Read more and sign up for their free morning newsletter at statnews.com. You can also follow STAT on Twitter and like them on Facebook.