A blog by Oleg Shilovitsky
Information & Comments about Engineering and Manufacturing Software

Why PLM should care about Web Data Commons Project?

Why PLM should care about Web Data Commons Project?
Oleg
Oleg
10 June, 2013 | 3 min for reading

Big data is one of the biggest hyped buzzwords of the last two years. With all hype around, it is very hard to find a good definition when it comes to a simple question about what “big data” means for every specific case in your industry and your applications. The following definition is how wikipedia describes big data:

Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis,[4] and visualization.

Web is an interesting place to dig for new sources of information. These days web is going much beyond just web pages and database driven websites. Web contains lots of structured information that can be used by businesses. Manufacturing companies are one of them. Information about products, customers, interests, priorities – this is a new goldmine era for web researchers.

I’ve been skimming the information from semanticweb.com website. The publication  about Web Data common project caught my attention. Web data common is about structured data on the internet. Here is an interesting snippet about what it does:

More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML pages using markup formats such as RDFa, Microdata and Microformats. The Web Data Commons project extracts all Microformat, Microdata and RDFa data from the Common Crawl web corpus, the largest and most up-to-data web corpus that is currently available to the public, and provide the extracted data for download in the form of RDF-quads and also in the form of CSV-tables for common entity types (e.g. product, organization, location, …). In addition, we calculate and publish statistics about the deployment of the different formats as well as the vocabularies that are used together with each format.

Dig a bit inside to learn about statistics of structured data. You can see some information here – Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus. According to this statistic, product-related information is the most popular in the data corpus researched. Look on the following passage:

Products in RDFa. We identified three RDFa classes, og:”product”, dv:Product, and gr:Offering, that are used each on at least 500 different websites for describing products. og:”product” is the most popular class, being used by more than 19,000 websites.

In addition to that, product data was found in websites using microdata and microformats.

Reviewing all Microdata classes that are used in more than 100 different websites, we could identify four classes, schema:Product, schema:Offer, datavoc:Product, and datavoc:Offer, that are frequently used to describe products or offers. The following table shows the co-occurences of these classes with other product-related classes on the same website. For instance, 4,308 websites provide product data together with aggregate ratings for these products.In addition to the class co-occurrences, we analyzed which properties are frequently used to describe schema:Products. The table below shows that schema:Product/name, schema:Product/description, schema:Product/image, and schema:Product/offers are the most frequently used properties.

What is my conclusion? Manufacturing companies are looking how to improve the decision process related to products. The potential leverage can come from the analyses of web data about products and services. PLM vendors can think about non traditional approaches to get information about products and customers. Important. Just my thoughts…

Best, Oleg

Recent Posts

Also on BeyondPLM

4 6
13 July, 2020

I’m learning more about Siemens’s plans to expand their portfolio and capabilities in the cloud and SaaS directions. My previous...

3 November, 2015

Engieering.com article earlier this week brought the news of significant deal related to the PLM implementation services domain – Indian HCL...

30 December, 2010

I was reading Salesforce.com announcement made earlier this month during Dreamforce 2010 conference about introduction of a new database.com platform....

26 May, 2016

Cloud is one of the top trending topics in PLM space. After initial resistance, all PLM vendors are currently supporting...

21 June, 2019

Many years ago, I’ve heard a very serious marketing joke. How to create enterprise software? It is easy – just...

22 August, 2010

In the end of 1990s, CAD/PLM community was excited about development of Boeing 777 jet. The Boeing 777 is the first...

18 July, 2015

One of the topics I’m following closely these days is cloud CAD development trend. To make cloud CAD work from...

24 April, 2013

Do you think Big Data and noSQL are the last and coolest trend in data world? No way. Software architects...

31 March, 2017

Disruption is the hallmark of modern technology world. We are thinking and talking about it all the time. One of...

Blogroll

To the top