A blog by Oleg Shilovitsky
Information & Comments about Engineering and Manufacturing Software

Why PLM should care about Web Data Commons Project?

Why PLM should care about Web Data Commons Project?
Oleg
Oleg
10 June, 2013 | 3 min for reading

Big data is one of the biggest hyped buzzwords of the last two years. With all hype around, it is very hard to find a good definition when it comes to a simple question about what “big data” means for every specific case in your industry and your applications. The following definition is how wikipedia describes big data:

Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis,[4] and visualization.

Web is an interesting place to dig for new sources of information. These days web is going much beyond just web pages and database driven websites. Web contains lots of structured information that can be used by businesses. Manufacturing companies are one of them. Information about products, customers, interests, priorities – this is a new goldmine era for web researchers.

I’ve been skimming the information from semanticweb.com website. The publication  about Web Data common project caught my attention. Web data common is about structured data on the internet. Here is an interesting snippet about what it does:

More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML pages using markup formats such as RDFa, Microdata and Microformats. The Web Data Commons project extracts all Microformat, Microdata and RDFa data from the Common Crawl web corpus, the largest and most up-to-data web corpus that is currently available to the public, and provide the extracted data for download in the form of RDF-quads and also in the form of CSV-tables for common entity types (e.g. product, organization, location, …). In addition, we calculate and publish statistics about the deployment of the different formats as well as the vocabularies that are used together with each format.

Dig a bit inside to learn about statistics of structured data. You can see some information here – Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus. According to this statistic, product-related information is the most popular in the data corpus researched. Look on the following passage:

Products in RDFa. We identified three RDFa classes, og:”product”, dv:Product, and gr:Offering, that are used each on at least 500 different websites for describing products. og:”product” is the most popular class, being used by more than 19,000 websites.

In addition to that, product data was found in websites using microdata and microformats.

Reviewing all Microdata classes that are used in more than 100 different websites, we could identify four classes, schema:Product, schema:Offer, datavoc:Product, and datavoc:Offer, that are frequently used to describe products or offers. The following table shows the co-occurences of these classes with other product-related classes on the same website. For instance, 4,308 websites provide product data together with aggregate ratings for these products.In addition to the class co-occurrences, we analyzed which properties are frequently used to describe schema:Products. The table below shows that schema:Product/name, schema:Product/description, schema:Product/image, and schema:Product/offers are the most frequently used properties.

What is my conclusion? Manufacturing companies are looking how to improve the decision process related to products. The potential leverage can come from the analyses of web data about products and services. PLM vendors can think about non traditional approaches to get information about products and customers. Important. Just my thoughts…

Best, Oleg

Recent Posts

Also on BeyondPLM

4 6
24 June, 2013

Cloud. Public. Private. Dedicated. Secured. Security topic can detonate and destabilize any discussion about cloud deployment. Tell people about security...

9 June, 2024

The reality of every engineering team or manufacturing enterprise is multiple systems. For the last 20+ years, the question of...

25 May, 2010

A short note on WorldCAD Access by Ralf Grabowski got my attention few days ago. In a very competitive world...

24 December, 2008

 One of the biggest organizational challenges is to get an agreement about business and organizational processes in the context of...

28 March, 2018

I shared my thoughts how collaboration technologies and product development can make a dent into current PLM development trajectories. Digital...

30 April, 2009

The latest development in Product Lifecycle Management has raised the level of PLM systems with their ability to support wider...

18 May, 2012

Big data is hyping trend these days. Many people is using the term of big data for different purposes and...

18 February, 2009

I think everybody likes cool presentations … especially when they are about surface computing. So, coincidentally, I had the chance...

20 March, 2009

 I’d like to start  a wide topic for discussion – BOM. Yes, Bill of Material. This may seem like an...

Blogroll

To the top