One of the most popular topics in engineering (but not only) software ecosystem. Open vs. Close. I’ve been discussing it many times – Open vs. Closed PLM Debates, PLM and New Openness, Closed Thoughts About PLM openness and few more. There is clear trend towards openness these days and, in my view, it is hard to find a PDM/PLM company that will defend closed approach and not openness.
However the definition of openness can be quite different. What else, the implementation of openness can be different too. Speaking from the engineering standpoint, devil is in details. So, I wanted to speak about some aspects of “openness” and how it might be implemented in PDM / PLM world. For very long period of time, data in PDM/PLM world was completely dependent on Relational Database Management Systems (RDBMS). The time of proprietary databases and data files is finally over. So, you can think data is peacefully located in RDBMS where it can be easy accessed and exchanged. Not so fast… There are two main constraints preventing data openness in RDBMS: data access technology and data schema. You need to support both in order to have access to the data. An alternative would be to use published APIs, which will provide you an access layer. In most cases, APIs will eliminate the need to know data model, but in a nutshell will not be very different from data access technology.
For many years ODBC remains one of the most widely adopted database access technology. I’m using name ODBC, but it can also refer variety of similar data access technologies – JDBC, OLE DB, JDBC, ADO.NET, JDO, etc. This is where things went wrong with data access and openness. The power and success of ODBC came from the use of DSN (Data Source Names) as a identification of data access. All ODBC-compliant applications leveraged the fact other developers have implemented RDBMS specific libraries – ODBC drivers. So, used don’t need to think about Oracle, SQL server, MySQL, etc. User just need to connect to DSN.
The distinct development and end-user models of ODBC ensured a massive ecosystem of ODBC-compliant applications and database connectivity drivers. Unfortunately, RDBMS vendors — the same ones that collectively created the SQL CLI and inspired its evolution into ODBC — also sought to undermine its inherent RDBMS agnosticism. The problem it created lies in the producing of huge amount of data driven applications relying on ODBC data access and claiming data openness as the ability to access, retrieve and (sometimes) update data in the RDBMS. Hidden behind DNS, databases converted into data silos. Data extracted from a specific database was dead and lost without context of the database. So called “openness” became simple “data sync pipe”. What else, each DNS remains separate. So, if you have few databases you are out of luck to access data in logical way. Applications are pumping data from one database to another mostly trying to synchronize data between different databases. The amount of duplicated and triplicated data is skyrocketing.
So, what is the alternative? We need to stop “syncing data” and instead of we need to start “linking data”. Think about simple web analogy. If you want to reference my blog article, you don’t need to copy it to your blog. For most of the cases you can create a link to my blog and URL address. Now, let’s bring some more specific technologies into this powerful analogy. Maybe you are familiar with semantic web and linked data. If not, this is the time! Start here and here.
There is a fundamental differences between old ODBC world and new way of linking data. You can get some fundamentals here and by exploring W3C data activity. I can summaries three main principles of linking data – 1/ use of hyperlinks to the source of data; 2/ separation of data abstraction data access APIs; 3/ conceptual data modeling instead of application level data modeling. So, instead of implementing ODBC drivers and APIs to access data, each data provider (think about PLM system, for the moment) will implement an linked data web abstraction layer. This abstraction layer will allow to other applications to discover data and run queries to get results or interlink data with data located in other systems. LinkedData is fast developed ecosystem. You can lear more here.
What is my conclusion? We are coming to the point where we need to re-think the way we are accessing data in business systems and start building a better abstraction level that will allow to stitch data together via linkage opposite to synchronization. The wold wide web and the internet are ultimately success stories for open standard adoption and implementation techniques. Applying that will simplify access to data and build a value of data connection to the enterprise. Just my thoughts…
Best, Oleg