My attention was caught this weekend by thedailybeast article with funny title – Why Big Data Doesn’t Live up to the Hype. I read the article and on my long travel during the weekend skimmed over the book Uncharted: Big Data as a Lens on Human Culture by Erez Aiden and Jean-Baptiste Michel mentioned in this article. The authors were instrumental in creating of Google Ngram Viewer.
The Google Ngram Viewer is a phrase-usage graphing tool developed by Jon Orwant and Will Brockman of Google, and charts the yearly count of selected n-grams (letter combinations)[n] or words and phrases, as found in over 5.2 million books digitized by Google Inc (up to 2008). The words or phrases (or ngrams) are matched by case-sensitive spelling, comparing exact uppercase letters, and plotted on the graph if found in 40 or more books during each year (of the requested year-range). The Ngram tool was released in mid-December 2010.
The word-search database was created by Google Labs, based originally on 5.2 million books, published between 1500 and 2008, containing 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. Italian words are counted by their use in other languages. A user of the Ngram tool has the option to select among the source languages for the word-search operations.
Researchers have analysed the Google Ngram database of books written in American or British English discovering interesting results. Amongst them, they found correlations between the emotional output and significant events in the 20th century such as the World War II.
If you never tried Ngram Viewer, you should. Navigate here and try it out. You can find some interesting trends. Here is my funny example – “data” is eclipsing “love” trend. Does it mean something? I’m not sure, but it is funny…
Google certainly has a power to deal with such large projects. Everybody are trying to collect data these days. You can see some very interesting examples. Ambitions of CAD and PLM companies are not going so far… yet. Here is the idea for somebody with budget and free time – to collect product lifecycle information related to manufacturing industry, suppliers, material trends and consumer behaviors. More and more data becomes available publicly on the web. To collect and classify this information can help us to explore future demands and opportunities.
What is my conclusion? In data we trust. Data is a very powerful argument and we use it frequently. With globalization of manufacturing industry and ambitious to discover future trends and opportunity of manufacturing and supply chain, I can see collecting of publicly available manufacturing data as a key towards unknown unknowns. Just a crazy idea and my thoughts… Happy Monday!