Article - "Culturomics", the last born of the knowledge society

To the statement "you can not quantify culture", the Harvard engineers - inventors of culturomics -reply that there is no reason not to use quantitative methods to study culture ... The idea is to associate "n -grams "to certain events in our societies. Created by Claude Shannon, information theorist, the objective of this modeling was to determine the probability of the next letter after the publication of a word. Taking on the Markov model, the n-grams are reusing n-last letters of a word to predict the next letter ...
The engineers have adopted this model to characterize certain stigma of an determined time and in a given location. For example, when you type 'Culture' (or Rembrandt or Voltaire or Martin Luther King or rose petals), you can view the number of times this word was used in the books referenced in Google. Produced from the mass digitization of books by Google, culturomics, the software uses the digitized corpus of American business - which contains about 4% of all printed books for 200 years in the world.
Derived from a science that English terminology ends with "omics", culturomics is a science born of the combination of new technologies, tools and statistical literature. It is used to assemble data and reorganize them. The main objective is to offer researchers and students a new playground: investigate quantitatively the cultural developments of various societies through time. By focusing on the linguistic and cultural phenomenon, embodied in books, between 1800 and 2000. Several phenomena can then be studied: grammar, epistemology, adoption of certain technologies, censorship, and the perception of historical events or the adoption of different schools of thought.
In an article dated January 14, 2011 in Science magazine, researchers Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang and A. Martin Nowak, have shown particular trends between old English and modern English through the use of irregular verbs. We can observe a decrease of 51% of their use among the old, such as used in 1800 and the modern, the 2000s.
Another new field of investigation that is offered to us to understand better our history, our cultures, languages ​​and their influences over time. This embryo or last offspring of the knowledge society, like the Semantic Web, of course, raises many questions. Let’s bet that on this ground, the battle of intellectual property only adds an additional front line. Let’s bet also that the same work of culturomics on the billion pages of the web since the grand public Internet in 1995 would also be instructive...