Thursday, December 15, 2011

Statisticians can prove almost anything, a new study finds

The research flaws described below are well known in academe and I have made multiple references to some of them -- but it is good to see attention being drawn to them

Catchy headlines about the latest counter-intuitive discovery in human psychology have a special place in journalism, offering a quirky distraction from the horrors of war and crime, the tedium of politics and the drudgery of economics.

But even as readers smirk over the latest gee whizzery about human nature, it is generally assumed that behind the headlines, in the peer-reviewed pages of academia, most scientists are engaged in sober analysis of rigorously gathered data, and that this leads them reliably to the truth.

Not so, says a new report in the journal Psychological Science, which claims to show “how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis.”

In “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” two scientists from the Wharton School of Business at the University of Pennsylvania, and a colleague from Berkeley, argue that modern academic psychologists have so much flexibility with numbers that they can literally prove anything.

In effect turning the weapons of statistical analysis against their own side, the trio managed to to prove something demonstrably false, and thereby cast a wide shadow of doubt on any researcher who claims his findings are “statistically significant.”

In “many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not,” they write.

Defined as “the incorrect rejection of a null hypothesis,” a false positive is “perhaps the most costly error” a scientist can make, they write, in part because they are “particularly persistent” in the literature.”

False positives also waste resources, and “inspire investment in fruitless research programs and can lead to ineffective policy changes.” Finally, they argue, a field known for publishing false positives risks losing its credibility.

Psychology, especially the branch of social psychology that merges with economics, is particularly sensitive to this criticism. It is a field in which reputations can be made with a single mention on the Freakonomics blog, and book deals signed based on single headlines.

One example of this trend is described in the December issue of The Atlantic magazine, in which David B. Klein, a libertarian economist at George Mason University in Virginia, retracts the claim he made last year in the Wall Street Journal, that left-wingers do not understand economics.

As quirky headlines go, it is hard to imagine a better one for the conservative Wall Street Journal than “Study Shows Left Wing Wrong About Economy” (In fact, the headline was “Are You Smarter Than A Fifth Grader?” which Klein acknowledges carried the implication that left-wingers are not.)

Citing his own “myside bias,” otherwise known as confirmation bias, or the tendency to favour ideas that fit with one’s settled positions, Prof. Klein now admits that, according to the data he used, the ignorance he attributed to the left is also true of the right, and so the headline should have been less dramatic, something closer to “Nobody Understands Economics: Study.”

The problem, as Prof. Klein puts it, was the hidden bias in his own use of the data, and in the decisions he made about how to analyze it.

These decisions about data use are not usually made in advance of the research, based on rigid principles, according to the authors of the Psychological Science paper. Rather, they are dealt with as they arise, and it is common and accepted practice “to explore various analytical alternatives, to search for a combination that yields ‘statistical significance,’ and then to report only what ‘worked.’ ”

The authors — Joseph P. Simmons, Leif D. Nelson and Uri Simonsohn — describe this flexibility as “researcher degrees of freedom,” and suggest that too much of it leads to bias at best, and nonsense at worst.

As a remedy, they offer a series of proposed guidelines for researchers and reviewers, but it was their somewhat cheeky experiment that brought the problem into the starkest relief.

As ever in social psychology, the experiment began with a room full of undergraduate guinea pigs, in this case paid for their attendance at a lab at the University of Pennsylvania. In the first of two separate trials, 30 students listened on headphones to one of two songs: either Kalimba, “an instrumental song by Mr. Scruff that comes free with the Windows 7 operating system,” or Hot Potato, performed by the children’s band The Wiggles.

Afterwards, they were asked to fill out a survey including the question, “How old do you feel right now: very young, young, neither young nor old, old, or very old.” They were also asked their father’s age, which allowed the researchers to control for variation in baseline age across participants.

Using a common statistical tool known as analysis of covariance, or ANCOVA, which measures one set of numbers against another, the authors were able to show that, on average, listening to the children’s song made people feel older than listening to the control song.

A second experiment aimed to extend these results with a song about getting old, When I’m Sixty-Four, by the Beatles, with Kalimba again as the control song. But this time, instead of being asked how old they felt, they were asked for their actual birthdate, which allowed precise calculation of their age.

An ANCOVA analysis, controlling for their father’s age, showed a statistically significant but logically impossible effect: listening to When I’m Sixty-Four made people 16 months younger than listening to Kalimba.

Listening to a song obviously has no bearing on how old you actually are. This nonsensical result, they argue, was merely an artifact of flawed analysis within a scientific culture that permits all kinds of relevant details to be excluded from the final publication.

Under their proposed guidelines, though not under current accepted scientific practices, the authors would have been required to disclose that they in fact asked participants many other questions, and did not decide in advance when to stop collecting data, which can skew results. They also would have been obliged to disclose that, without controlling for father’s age, there was no significant effect, and the experiment was more or less a bust.

“Our goal as scientists is not to publish as many articles as we can, but to discover and disseminate truth,” they write. “We should embrace these [proposed rules about disclosing research methods] as if the credibility of our profession depended on them. Because it does.”


Report: Studies overstated cellphone crash risk. Maybe no added risk at all

Another battle in the war on cellphones. Everything popular must be BAD!

So-called "distracted driving" has become a big public health issue in recent years. The majority of U.S. states now ban texting behind the wheel, while a handful prohibit drivers from using handheld cellphones at all (though many more ban "novice" drivers from doing so).

But studies have reached different conclusions about how much of an added crash risk there is with cellphone use.

In the new report, Richard A. Young of Wayne State University School of Medicine in Detroit finds that two influential studies on the subject might have overestimated the risk.

The problem has to do with the studies' methods, according to Young. Both studies a 1997 study from Canada, and one done in Australia in 2005 were "case-crossover" studies.

The researchers recruited people who had been in a crash, and then used their billing records to compare their cellphone use around the time of the crash with their cell use during the same time period the week before (called a "control window").

But the issue with that, Young writes in the journal Epidemiology, is that people may not have been driving during that entire control window.

Such "part-time" driving, he says, would necessarily cut the odds of having a crash (and possibly reduce people's cell use) during the control window and make it seem like cellphone use is a bigger crash risk than it is.

The two studies in question asked people whether they had been driving during the control windows, but they did not account for part-time driving, Young says.

So for his study, Young used GPS data to track day-to-day driving consistency for 439 drivers over 100 days. He grouped the days into pairs: day one was akin to the "control" days used in the earlier studies, and day two was akin to the "crash" day.

Overall, Young found, there was little consistency between the two days when it came to driving time.

When he looked at all control windows where a person did some driving, the total amount of time on the road was about one-fourth of what it was during the person's "crash" day.

If that information were applied to the two earlier studies, Young estimates, the crash risk tied to cellphone use would have been statistically insignificant.

That's far lower than the studies' original conclusions: that cellphone use while driving raises the risk of crashing four-fold.

And, Young says, the results might help explain why some other studies have not linked cell use to an increased crash risk.

A researcher not involved in the work said that the two earlier studies may well have overstated the crash risk from using a cellphone.

But that doesn't mean you should feel free to chat and text away at the wheel, according to Fernando Wilson, an assistant professor at the University of North Texas Health Science Center in Fort Worth.

A number of other studies, using designs other than case-crossover, have suggested that cellphone use, and particularly texting, is hazardous on the road, Wilson told Reuters Health.

"In wider policy, I don't think this study is going to change the conversation about distracted driving," Wilson said. "Most of the conventional thinking is that we need to do something to reduce it."

In his own study published last year, Wilson looked at information from a government database that tracks deaths on U.S. public roads. He found that after declining between 1999 and 2005, deaths blamed on distracted driving rose 28 percent between 2005 and 2008.

And the increase seemed to be related to a sharp rise in texting. ("Distracted driving" refers to anything that takes the driver's attention off the road, from fiddling with the radio to talking to other people in the car.)

Other studies, Wilson noted, have used mounted cameras to show that drivers' behavior becomes more risky when they are using cellphones.

All of those studies have limitations, and cannot pinpoint just how big a risk driving-while-texting (or talking) might be. Wilson said the current study highlights a limitation in case-crossover studies.

But the new study, itself, has shortcomings. Applying the GPS findings from this study to the two earlier ones, done with different drivers, in different countries, is tricky, both Young and Wilson point out. "It's possible that the (earlier) study findings were overstated," Wilson said, "but it's difficult to know by how much."

According to the National Highway Traffic Safety Administration, about 450,000 Americans were injured in crashes linked to distracted driving in 2009. Another 5,500 were killed.


No comments: