Contribution: "Big data’s greatness and fantasy" by Paul Vacca (2/3)

With the emergence of big data, the magical thinking made a powerful comeback. Key to a utopian world for some people, it is for others the promise of an Orwellian future. However, both opposing sides agree on the same dogma: big data is going to take possession of reality. Yet it seems that reality is resisting. What if the mighty-power fantasies related to big data were mostly an illusion made up of dollars and myths? Here is a three-step deconstruction.

1. The all-mighty big data and the come-back of the magical thinking
2. Big data to the test of reality
3. Dollars and myths

2. Reality vs. Big data: 1- 0

For now, amidst its dreams of omniscience, big data finds an obstacle: reality. The example of NSA is a real textbook case. It is well known now that the American intelligence agencies collect massively data via its wiretaps and surveillance systems that it sounds out (with the datamining technique) with data analysis tools (algorithms) with the hope of detecting suspicious statistics (signatures) that will allow us to report terrorist behaviors. For what result? Keith Alexander, the NSA’s director, asserted in 2013 that his agency’s surveillance programs – after more than 10 years of massively collecting data – had helped to foil dozens of conspiracies. A few months later, he mentioned 13 events, before admitting that actually only two threats were avoided…

The NSA, the panoptic illusion

According to Grégoire Chamayou, researcher at the CNRS, this catastrophic can be perfectly explained. In an article released in June 2015 in La Revue du Crieur[1], he dismounted point by point the panoptic illusion in which the NSA is immersed. He recalls the sentence of an American researcher who highlighted that “the only foreseeable thing regarding terrorist datamining is its permanent failure”. This inevitable failure lies in two major illusions. The first one is the blind faith in massive data collection (“Collect it all”) that, rather than “looking for a needle in a haystack, consists in collecting the whole haystack”, while multiplying the difficulties in analyzing all the data. The second one is the belief that there exists a “terrorist signature” – i.e. a succession of actions that would lead to an attack – that could be detected, what is false. There is a double risk: on the one hand, that of letting some “true” terrorist acts happen, because the principle of terrorist acts precisely consist in foiling preset schemes by developing unprecedented modus operandi. On the other hand, there is the risk of seeing them everywhere. If the scheme “a person owning a truck, driving to a sensible place and having bought ammonium nitrate” can allow the identification of a potential terrorist act, it also applies to almost all farmers in Nebraska who own a truck and buy ammonium nitrate (which is used in the production of fertilizers). In short, either the NSA doesn't detect the terrorist attack, or it detects too many.

“The prediction is hard, especially as it concerns the future”

Google experienced the same kind of failure with its “Google Flu Trends”. This “revolutionary” application launched in 2008 allowed following the flu epidemic in real time simply thanks to online requests including “paracetamol”, “flu”, “headache” in the search bars… initially, everyone – including the prestigious American scientific magazine Nature – rightfully believed in a miracle: the results are reliable, close to those given by the CDC, the American official body for disease prevention, but faster and without needing a researchers armada. Except that the application rapidly seized up. In 2013, the media announced an epidemic risk and the online requests went haywire, which distorted the results, overestimating the risks of an epidemic. The application then became the reflection of the Internet users’ hypochondria more than of reality. A victim of an epidemic of requests, it was totally disrupted. Google closed the service last August.

Yes, even for big data “prediction is hard, especially as it concerns the future” as noticed by Groucho Marx. For now, the predictive marketing from our spread of data on the Internet particularly can be seen in retrology, this art of guessing the past by proposing for instance to discover the hotel that we booked two weeks ago or by submitting a book that we already bought or even read.

Smart data or big data: Auguste Dupin vs. Scotland Yard

As big data becomes increasingly bigger - via hyperconnection, the Internet of things, open data and clouds – rather than helping to reveal the reality with billions of data and dreams of exhaustiveness, it seems to bury it, like a haystack that would cover the needle we are looking for. The idea that exhaustiveness would help take control of reality and would duplicate it in some way, embodies an accounting like idea of reality, a denial of reality, a misinterpretation. Like a map with on a 1/1 scale that would mix up territories, that would possess all the guarantees of accounting accuracy but would reveal to be an inadequate guide for us.

Actually – and this is the decisive input by philosophers, from Descartes to phenomenologists including Kant – the reality is not a mere comiplation of data , as exhaustive as they may be, it is a hypothesis. It is not given, delivered “as such”, it is a construction of our intellect. Here is why some prescribe to put back intelligence and human factor at the heart of clouds and data torrent, and to opt for smart data preferring pertinence and discernment in the data collection and intelligence in their analysis.

The perfect illustration of the difference between the smart data approach and the big data one – and between their respective effectiveness – is given by Edgar Poe’s short story, “The Purloined Letter”[2]. While the Scotland “big data” Yard’s team struggle desperately to go through the slightest millimeter of the apartment looking for the compromising letter, the detective Auguste “smart data” Dupin, based on some pertinent data, discovers where it is before even entering the apartment… But we will not reveal the solution in order not to spoil for those who have not yet read this smart jewel on intelligence.

About Paul Vacca

Paul Vacca is a novelist, essayist and consultant. He scans the social transformations related to digital technologies as well as the trends in media and cultural markets. He published articles inTechnikartLe Monde and La Revue des Deux Mondes, is a speaker for conferences at the Institut Français de la Mode and collaborates to the think-tank La Villa Numeris.

Recent publications: the novel Comment Thomas Leclerc 10 ans 3 mois et 4 jours est devenu Tom L’éclair et a sauvé le monde (Belfond 2015) and the essay La Société du hold-up - Le nouveau récit du capitalisme (Fayard 2012).

On Twitter : @Paul_Vacca