Tuesday, February 1, 2011

Data Ethics

Mark Twain popularized the phrase "There are three kinds of lies: lies, damned lies, and statistics."--attributing the quote to 19th-century British Prime Minister Benjamin Disraeli (although there's no written evidence this is true). Why does this phrase resonate with us? Perhaps because many people misuse data and statistics for their own ends? Perhaps because individuals have been know to "make them up." My personal version of "lies, damned lies, and statistics" is "Never trust the data of someone with an agenda." I realize many individuals grasp at any number that will support their cause--it isn't that they intend to deceive us. However, that can be the result.

Today, I'd like to put out a plea for integrity and understanding in the use of data.

Now, I'm not going to be specific, because this isn't about pointing fingers. It's just an example of how data can be used and abused. Recently, I read an article by an official promoting a particular industry. This individual gave a level of employment for that particular industry and indicated that just last year the industry had created a large number of new jobs.

Since I'm a data geek and track employment data, I knew that wasn't true. BUT, in a way it was. If you looked at ONE anomalous month in 2010, employment in that particular industry did reach the reported level (and subsequently dropped). If you looked at that ONE particular month, indeed, employment was up substantially from the same month in 2009. (Incidentally, this was probably due to some unusual seasonality.)

However, the annual numbers for 2010 showed a much different picture--on average, employment levels were much lower and the industry actually lost jobs. Did the aforementioned official intend to deceive? Probably not. But did we understand the true picture?

In a world where economic information is so important, understanding and using data appropriately is priceless.

