Data is a fascinating topic these days. The amount of data is growing and it raises lots of concerns on both – consumers and business sides of the world. Last week I’ve been blogging about web data common project. Openness is another interesting issue that debates by many people today. Semantic web blog posted about session discussing open data on the web at the Semantic Tech and Business conference last week made by W3C eGov consultant Phil Archer. Navigate here to read more. Most of us is associating open data with linked data. One of the most interesting discovery mentioned by the article was the fact of Excel data dominance on the web. Here is an interesting passage:
“JSON and CSVs are the kings,” he said. “If you look at open data portals, CSVs [which get converted to JSON files] outweigh Linked Data by a mile,”
He brings an interesting story about Open Knowledge Foundation describing CKAN platform enabling publishing CSV files in a semantic way to the net.
The OKF is responsible for the CKAN platform that the U.S. open government data portal, data.gov, now incorporates. “CKAN,” Archer said during his presentation, “is a really important platform and basically it’s about publishing CSVs, and it spits out a bit of RDF data.” He also noted that Dr. Rufus Pollock, founder and co-director of the Open Knowledge Foundation, has proposed a new standard for a data package that includes CSV and JSON. Frictionless Data, now in alpha, includes as principles using web-native formats like JSON. It defines a data package for delivery, installation and management of datasets, with a Simple Data Format (SDF) at heart whose key features are CSV for data, single JSON file (datapackage.json) to describe the dataset including a schema for data files, and the reuse wherever possible of existing work including other Data Protocols specifications.
The story of CSV dominance on the web made me think about the future of open data in manufacturing and enterprise organizations. Organizations have zillons of Excel files located everywhere. Packaging of data in a semantic way makes a lot of sense and it increases the openness of enterprise software platforms including PLM.
What is my conclusion? Openness is important. Companies of all sizes are struggling with the amount of data located in Excel files. It is not reliable and actually not open. To access data from Excel files and make it available across the organization can be interesting imperative forcing company to be more open. PLM original intent was to drive companies stop poisoning organizations with Excel infusions. However, the implementation is far from the ideal. PLM technology providers should make a note. Just my thoughts…
Best, Oleg