Home > Blog

Does Big Data Sanctify False Conclusions?

Midway across the Atlantic, KLM, JFK to AMS (Amsterdam Airport Schiphol), seated in coach on a brand new 747, summer 1972, on an IASA Student ticket; altitude, 30,000 feet.

In-flight movies had recently been introduced, and on one leg the film was spilling off the take-up reel, out the housing opening, and falling on the passengers seated below the projector. Mid-flight entertainment went from a forgettable movie to live entertainment as the Flight Attendant wrestled with the film as more and more came spilling out, covering her in 35mm seaweed.

Later, on this flight, they showed Charlie Chaplin's controversial film, Monsieur Verdoux, a flop in the US but which did well in Europe and this was, after all, KLM and not an American airline, and so the passengers liked it. Otherwise OK, I still remember Chaplin's final speech about how small numbers can be scrutinized and comprehended, but massive numbers take on their own aura of sanctity. Is this lovely notion time-stamped to the film's post WWII original release?

Paul Krugman, in his recent NY Times OpEd columns, once again mentions the recent implosion of the 'Austerity leads to Prosperity' school of economic thought, based on a now infamous Reinhart-Rogoff (R-R, for short) 'Excel error'. Why was the 90% Debt to GDP threshold accepted as the point of no-return when real-world observations proved austerity didn't work for Ireland or anywhere else which tried it? It was not just the Excel formula, in my opinion; it was the supposed sanctity of the 900 page book of mind-numbing data, charts and statistics used to justify the austerity argument to begin with, and which until just recently, had never been questioned or validated. How many of us have been in strategic decision meetings where GB after GB of data is presented, and all we need to do is get the top-line summary, decide and get on with execution? How many of us have seen project plans with over a thousand tasks, many of which are rolled-up plans in themselves, and have just accepted the underlying assumptions were right and need not be tested?

Sales forecasting is certainly an area where big numbers can sanctify. I was in a room as a national sales force for a struggling software company forecast the upcoming Quarter. Being a NASDQ listed company, financials and Street whispers mattered, which is why I attended. Like many sales organizations, they used the weighted method, where a sale of $1,000,000 revenues with a 30% probability of closing in the upcoming Quarter, was listed as $300,000 'earned'. Trying to please the Finance oriented senior leadership, they listed every encounter, be it in a meeting or on a subway, as a potential opportunity. I told them they were "kiting forecasts", which was unacceptable for obvious reasons, but they continued, producing a forecast with several hundred rows when 100 would have sufficed. The sanctity of numbers showed they were out there, beating the bushes. If senior leadership had a deeper understanding of the end-to-end sales process, and understood each large opportunity as a communications and agreement process taking a semi-repeatable period of time (similar to Reference Class Forecasting), and not just as a set of numbers, a radically reduced and more accurate forecast would not have annoyed the Street, even if missed by a small amount. Then again, this was a highly unstable company, and many in senior leadership were doing a Cleopatra - Queen of denial to keep their jobs for another 90 days. In the end, reality won, and I wish them all well wherever they wound up.

Mike Tiabbi, in the May Rolling Stone magazine, writes how the price of gold is set, not based on a massive data trove run through a model, but by a conference call between 5 banks. Silver is similar, with 3 banks setting the price. Jet fuel, diesel, electric power, coal, etc. are all set by small groups, not gargantuan datasets and models. Libor, the interest rate underlying the world's financial system, is set each morning by 18 banks, each bank submitting their interest rates across 18 currencies and 15 time periods. Submissions are taken for granted; no validation is performed. By averaging out these 2700 data points, Libor is set and the world reacts. An academic can spend a life modeling empirical observations via data, and the bottom line is they would be better off understanding the qualitative reasons behind these 2700 elements.

Many companies now have terabytes of data in different data bases, and Big Data is today's must-have hyped technology. Why the hype? Big Data s easy for most people to understand and feel current - the same people who wear loud shirts at idea creation (and not code generation) offsite 'Hackathons', which used to called Ideation sessions, or Brainstorming, depending on when you were born. Consulting companies, no longer able to ride the 200+ person per gig ERP wave, love this kind of engagement, and so they talk it up. But as we have seen in the R-R Austerity situation, does more data always mean more accurate? Many of the junior staffers who focus on data presentation in large companies lack the experience based deep insights required to verify the information and the conclusions are solid. It's easier to show you worked hard, not necessarily smart, by maxing out Excel's 1M+ Rows by 16K Column limit, than it is to get a deep understanding of what the numbers mean, are they correctly stated, and do we actually need that level of data? What about the outliers, do we deny them as just signal noise?

Big Data implies massive centralized data and BI functions, and as we all know, anything centralized takes on an administrative overhead and calcified change structure, which could actually make the data stale and, therefore, any resulting analysis subject to 'winning the last war' syndrome. The Open Knowledge Foundation, last week, posted to their blog:

Just as we now find it ludicrous to talk of "big software" - as if size in itself were a measure of value - we should, and will one day, find it equally odd to talk of "big data". Size in itself doesn't matter - what matters is having the data, of whatever size, that helps us solve a problem or address the question we have.

Their prognosis is:

... and when we want to scale up the way to do that is through componentized small data: by creating and integrating small data "packages" not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos.

This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.

Is this to say Big Data is never big? Bioinformatics puts it in perspective. The Human genome sequence is 3 million base pairs and is stored at ¾ GB. That's it. Here, Big Data undoubtedly means Big Meaning. What we need is to stop treating Big Data as gathering, but rather think of Big Data as a continuous conversation, describing a changing world. A centralized Big Data function should be structured for agile governance, empowering operating and planning units to get accurate input for their market/function specific models, as they are closest to these conversations.

Just like networking protocols, organizations should focus on context - common definitions, and formats, so a 'Closed Sale' means the same thing across all business lines, and a customer relationship is defined with the common hierarchy and definitions. This does not imply over-simplification, it's usually quite complex, but the result is a lingua franca, where apples=apples. I worked on a Finance Transformation initiative where we discovered this multi-divisional, close to 100 year-old company had no common financial language. The financials were consolidated through some powerful computing, but did the results mean anything? We took a step back and developed their first common language. Here, too, the key is not having a newly minted MBA collect data; it's the contextual understanding making the data purposeful.

If you spend the time deeply understanding core underlying issues and causes (qualitative), and not just accumulating and presenting data (quantitative), less will be more. Predictive models, harder to set-up than combining multiple structured and unstructured data sets (since a model implies understanding, not mechanics), will most likely produce better results than unending graphs and charts. It requires the data being scrutinized by experienced employees who can use that most powerful organic computer to go beyond the colorful graphics. By keeping data decentralized, with a common set of definitions, we can best house data in the hands of those most needing and understanding it while retaining agility. Sanctity comes, not from size, but from meaning, context, currency and availability.

By the way, last week was Big Data Week. I wonder how many people celebrated and how they were broken out by age, location, height, weight and specific gravity.

Richard Eichen is the Founder and Managing Principal of Return on Efficiency, LLC, http://www.growroe.com and is one of their senior turnaround leaders/CROs, Program Rescue and Interim Executives with over 25 years' experience reshaping companies, Operations, IT/Systems Integration and key initiatives. Return on Efficiency, LLC specializes in those companies and initiatives where technology is the primary means of service delivery and revenue creation. He can be reached at [email protected], and followed on Twitter, @RDEgrowroe.