A blog by Oleg Shilovitsky
Information & Comments about Engineering and Manufacturing Software

Why PLM should care about Web Data Commons Project?

Why PLM should care about Web Data Commons Project?
Oleg
Oleg
10 June, 2013 | 3 min for reading

Big data is one of the biggest hyped buzzwords of the last two years. With all hype around, it is very hard to find a good definition when it comes to a simple question about what “big data” means for every specific case in your industry and your applications. The following definition is how wikipedia describes big data:

Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis,[4] and visualization.

Web is an interesting place to dig for new sources of information. These days web is going much beyond just web pages and database driven websites. Web contains lots of structured information that can be used by businesses. Manufacturing companies are one of them. Information about products, customers, interests, priorities – this is a new goldmine era for web researchers.

I’ve been skimming the information from semanticweb.com website. The publication  about Web Data common project caught my attention. Web data common is about structured data on the internet. Here is an interesting snippet about what it does:

More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML pages using markup formats such as RDFa, Microdata and Microformats. The Web Data Commons project extracts all Microformat, Microdata and RDFa data from the Common Crawl web corpus, the largest and most up-to-data web corpus that is currently available to the public, and provide the extracted data for download in the form of RDF-quads and also in the form of CSV-tables for common entity types (e.g. product, organization, location, …). In addition, we calculate and publish statistics about the deployment of the different formats as well as the vocabularies that are used together with each format.

Dig a bit inside to learn about statistics of structured data. You can see some information here – Additional Statistics and Analysis of the Web Data Commons August 2012 Corpus. According to this statistic, product-related information is the most popular in the data corpus researched. Look on the following passage:

Products in RDFa. We identified three RDFa classes, og:”product”, dv:Product, and gr:Offering, that are used each on at least 500 different websites for describing products. og:”product” is the most popular class, being used by more than 19,000 websites.

In addition to that, product data was found in websites using microdata and microformats.

Reviewing all Microdata classes that are used in more than 100 different websites, we could identify four classes, schema:Product, schema:Offer, datavoc:Product, and datavoc:Offer, that are frequently used to describe products or offers. The following table shows the co-occurences of these classes with other product-related classes on the same website. For instance, 4,308 websites provide product data together with aggregate ratings for these products.In addition to the class co-occurrences, we analyzed which properties are frequently used to describe schema:Products. The table below shows that schema:Product/name, schema:Product/description, schema:Product/image, and schema:Product/offers are the most frequently used properties.

What is my conclusion? Manufacturing companies are looking how to improve the decision process related to products. The potential leverage can come from the analyses of web data about products and services. PLM vendors can think about non traditional approaches to get information about products and customers. Important. Just my thoughts…

Best, Oleg

Recent Posts

Also on BeyondPLM

4 6
30 December, 2024

In this fourth installment of the “Rethinking Change Management” series, we dive into the technical architecture and workflow underpinning a...

26 November, 2021

Connected PLM transformation is one of the strong trends I can see happening in the market of Product Lifecycle Management...

12 October, 2010

I loved this – “Look ma, no hands”. Did you try it in your childhood? Even so, I’m sure it...

9 June, 2012

PLM and Single Point of Truth. You probably heard about that before. I tried to address this topic in the...

31 January, 2022

In a previous blog post, I shared my thoughts about Metaverse and PLM user experience. Check this out here. PLM...

31 December, 2009

The idea of 3D publishing isn’t new. For the last few years we had chance to see multiple examples of...

1 August, 2014

Cloud is one of the topics that I’m following on my blog for a long time. I can see lots...

2 January, 2009

We have seen many technological changes over the past 20 years of PC. We changed different versions of Windows and...

18 November, 2016

The second day AU2016 keynote was focusing on news and updates about Autodesk product lines. Amar Hanspal, Senior VP of...

Blogroll

To the top