A blog by Oleg Shilovitsky
Information & Comments about Engineering and Manufacturing Software

Why PLM should care about Web Data Commons Project?

Why PLM should care about Web Data Commons Project?
Oleg
Oleg
10 June, 2013 | 3 min for reading

Big data is one of the biggest hyped buzzwords of the last two years. With all hype around, it is very hard to find a good definition when it comes to a simple question about what “big data” means for every specific case in your industry and your applications. The following definition is how wikipedia describes big data:

Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis,[4] and visualization.

Web is an interesting place to dig for new sources of information. These days web is going much beyond just web pages and database driven websites. Web contains lots of structured information that can be used by businesses. Manufacturing companies are one of them. Information about products, customers, interests, priorities – this is a new goldmine era for web researchers.

I’ve been skimming the information from semanticweb.com website. The publication  about Web Data common project caught my attention. Web data common is about structured data on the internet. Here is an interesting snippet about what it does:

More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML pages using markup formats such as RDFa, Microdata and Microformats. The Web Data Commons project extracts all Microformat, Microdata and RDFa data from the Common Crawl web corpus, the largest and most up-to-data web corpus that is currently available to the public, and provide the extracted data for download in the form of RDF-quads and also in the form of CSV-tables for common entity types (e.g. product, organization, location, …). In addition, we calculate and publish statistics about the deployment of the different formats as well as the vocabularies that are used together with each format.

Dig a bit inside to learn about statistics of structured data. You can see some information here – Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus. According to this statistic, product-related information is the most popular in the data corpus researched. Look on the following passage:

Products in RDFa. We identified three RDFa classes, og:”product”, dv:Product, and gr:Offering, that are used each on at least 500 different websites for describing products. og:”product” is the most popular class, being used by more than 19,000 websites.

In addition to that, product data was found in websites using microdata and microformats.

Reviewing all Microdata classes that are used in more than 100 different websites, we could identify four classes, schema:Product, schema:Offer, datavoc:Product, and datavoc:Offer, that are frequently used to describe products or offers. The following table shows the co-occurences of these classes with other product-related classes on the same website. For instance, 4,308 websites provide product data together with aggregate ratings for these products.In addition to the class co-occurrences, we analyzed which properties are frequently used to describe schema:Products. The table below shows that schema:Product/name, schema:Product/description, schema:Product/image, and schema:Product/offers are the most frequently used properties.

What is my conclusion? Manufacturing companies are looking how to improve the decision process related to products. The potential leverage can come from the analyses of web data about products and services. PLM vendors can think about non traditional approaches to get information about products and customers. Important. Just my thoughts…

Best, Oleg

Recent Posts

Also on BeyondPLM

4 6
23 May, 2010

Welcome to Beyond PLM! I will be starting blogging here very soon. For the moment, please check my Daily PLM...

14 September, 2016

I will be attending Accelerate 2016 event in Boston later this week. This is a 3rd annual event Autodesk is...

17 June, 2021

In my recent article – 5 Things PLM Vendor Won’t Tell You But Should, I discussed things that usually are...

10 November, 2017

Earlier this year, I learned from CIMdata that cloud PLM is not doing very well. CIMdata announced collaborative research sponsored...

24 January, 2012

I want to talk about a social-networking topic again. I was reading Gartner Top Vertical Industry Prediction for 2012 and...

22 April, 2010

Last week during COFES 2010 I had chance to attend DaS Symposium. This half-day Sustainability Think Tank was fully loaded...

21 January, 2011

During last two days I’ve been busy attending PLM Innovation Congress 2011 in London. This is a short statistic of...

26 June, 2009

Short note. Impressive video about how you can merge photo with 3D models on iPhone using 3DVIA tools.  3DVIA is...

4 May, 2015

PDM. Product Data Management was one of the topics engineers are really hating. It was always about slow implementations, complex...

Blogroll

To the top