Forensic Bioinformatics: Investigating Reproducibility of Results

“There is a deep, dark secret: scientists occasionally make mistakes,” said Keith A Baggerly, a professor at the University of Texas MD Anderson Cancer Center. Baggerly was a keynote speaker at the 2011 CSE annual meeting in Baltimore, MD. As part of his job, he tried to reproduce the results of groundbreaking cancer research, with the aim of telling his colleagues how such research could be replicated.

That’s when he found the mistakes.

In a 2006 Nature Medicine paper, researchers from Duke University claimed they could predict which cancer drugs would be most successful for which patients on the basis of the patients’ genetic profiles. Baggerly said that claim really excited MD Anderson clinicians and researchers and the rest of the genetics community. In fact, Discover magazine labeled it one of the top genetics stories, according to a 2007 article.

He and those he works with jumped on the project but immediately ran into problems as they started looking at the raw data.

Baggerly said he wanted to see the actual data because the documentation is often lacking.

As he was examining some of the data, he noticed various simple errors in the report. The software used required one input with a “header” row, and another without, but headers were supplied with both. That produced gene lists that were off by one line, Baggerly said, skewing the results throughout the report: the genes reported as important weren’t actually involved. Other simple errors included label swaps; for some drugs these included apparent swaps of “sensitive” and “resistant” labels, so the predictions would be backwards.

He and his colleagues informed Nature Medicine, the journal that originally published the research, and reported the data errors in 2007. In the meantime, the Duke investigators published further generalizations in The Lancet Oncology and the Journal of Clinical Oncology. These latter papers also contained errors, but letters from Baggerly’s group identifying the errors were rejected.

But Baggerly said he and his Texas colleagues were especially concerned when they learned (in mid 2009) that clinical trials had already begun (in May 2007). Baggerly and his colleague Kevin Coombes then summarized their various objections in a paper to the Annals of Applied Statistics; the paper appeared online in September of 2009, and Duke announced in October that it was suspending the trials pending an internal investigation.

Duke University initially suspended the trials because of concerns about the research, but in January 2010 it announced its internal investigation (with outside reviewers) had cleared the trials to restart, which Duke proceeded to do. However, Duke refused to release the data or the report justifying the restarts.

Baggerly’s group objected to the restarts, in part because new data posted by the Duke investigators in November 2009 (while the investigation was underway) showed continued problems. Baggerly said when the data were posted, sample names were scrambled and mislabeled. Although these errors were reported to Duke at the time, they were not mentioned when the trials were restarted (it was later revealed that Duke never told the external reviewers about Baggerly’s November 2009 report). Despite these objections, the trials were opened to enroll new patients between January and July of 2010. In the end, it was a report by The Cancer Letter about one of the lead researchers, who falsely claimed he was a Rhodes scholar, which finally led to a full-blown probe of the trials. The inflated CV claims were reported July 16, and the trials were resuspended the next week. By November, the trials were terminated, Duke officials acknowledged the trials should never have been restarted, one of the investigators had resigned, and efforts were underway to retract several of the papers involved.

After more than 4 years, Baggerly said all the main papers have been retracted and the clinical trials stopped, but the story is still playing out. The Institute of Medicine is now looking at what evidence should be in place before genomic signatures are used to guide patient therapy in clinical trials (report expected in 2012), the National Cancer Institute held a workshop in June to clarify rules applying to trials it funds, and the story hit the front page of The New York Times in July.

Mislabeled and scrambled samples, columns moved and switched—Baggerly said it’s not an isolated incident. It is “far more common than we’d like to admit.”

“Most mistakes are the common ones,” he said. “If you have clear documentation, you can find them.”

As a data “investigator,” Baggerly said he wants to see how the experiment was done. “I’d love to see the code and the data,” he told the CSE attendees. “You don’t have to host it. Tell us where data are, where the code is.”

And, he added, this case is one of the worst examples. “Most scientists want to get it right,” he said.

His suggestions for better, more complete research papers include requiring

Data
Provenance
Code
Descriptions of nonscriptable steps
Descriptions of planned design, if used

“This would add extra overhead to clinical trials, but I don’t see any other way,” Baggerly said.

TERESA M MELCHER is editor of CSE’s Science Editor