How do you envision the future of PLM data modeling? In my earlier posts, I was talking about some of the trends that are happening in PLM architecture and technologies these days. If you missed them, here are some of the links to them.
A growing role of data in the future PLM applications
A post-monolithic world of PLM – data and system architecture
For more than a decade, I’ve been closely following different data modeling technologies and trends. As such I’ve seen what impact changes in data management technologies and architecture has on the development of modern PLM platforms, cloud and introduction of polyglot persistence, development of graph databases, and future ontology development.
Strategic Move From Applications To Data
Data is fast becoming a centerpiece of business processes. For just 15-20 years, we’ve been mostly focusing on building the software and applications. It is becoming obvious that the data lifecycle is much longer than the application lifecycle. Therefore, companies are starting to focus on the data and data architecture first. The role of the data is growing and the role of applications is decreasing. Applications will be replaced, but data will leave long life.
In my blog today, I want to talk about what technological data foundation modern PLM applications will be using and what are the pros and cons of different approaches. Legacy PLM suites are using SQL database technology and data models. I can see a move away from SQL databases toward graph-based RDF/OW models. Which one will be more successful in the long run? Let’s explore key differences between these two approaches and make a case for why RDF/OW may be the better choice for PLM data management. Keep reading to learn more!
The Difference Between SQL and RDF/OWL
What is better – ontology or relational data model? SQL data model, which is usually another name for relational data model was a powerful mechanism to create any data abstractions used in PLM systems for the last 20-30 years. Let’s compare it with RDF/OWL world. In the RDF world, the SQL data model is compared with the model that expresses constraints on the entities on the primitive types (eg. properties, attributes, columns, etc.). The difference would be in the way SQL and RDF treat references. In SQL models, references will be defined using foreign keys (usually integers or strings), while ontology language provides more expressive data modeling on a more declarative data definition.
The main difference between SQL and RDF/OWL modeling will come on the so-called ORM level (object-relational mapping). All existing legacy PLM systems are, in fact, ORMs built on top of SQL data models. The SQL requires business logic to be embedded in the software code. In the RDF molding, OWL lets you express semantics familiar from an object-oriented approach such as classes/properties, so you don’t need to have ORM logic implemented in the PLM system. You can also express constraints between classes and entities.
However, formally, everything you can define using RDF/OWL can be also expressed using SQL data model and general programming language. It is, after all, software and you can write a code to express any logic. Going down this path, we don’t need to use SQL and we can use Excel spreadsheets or even proprietary file formats to store data. The format is less important if you’re going to load the data using a programming language (note: don’t forget how first PDM/PLM systems came from proprietary databases before starting using SQL data).
While the decision belongs to developers, I just want to mention a rule of least power that helped me to make the right technical decisions in the past. My favorite passage is the one from Tim Berners-Lee:
Computer Science in the 1960s to 80s spent a lot of effort making languages that were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful. The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative form, anyone can write a program to analyze it in many ways. The Semantic Web is an attempt, largely, to map large quantities of existing data onto a common language so that the data can be analyzed in ways never dreamed of by its creators. If, for example, a web page with weather data has RDF describing that data, a user can retrieve it as a table, perhaps average it, plot it, deduce things from it in combination with other information. At the other end of the scale is the weather information portrayed by the cunning Java applet. While this might allow a very cool user interface, it cannot be analyzed at all. The search engine finding the page will have no idea of what the data is or what it is about. The only way to find out what a Java applet means is to set it running in front of a person.
To sum up, RDF/OWL gives you more expressive data power and will reduce the risk of future incompatibility of your data model with new applications.
New Data Management, Polyglot Persistence, and RDF/OWL
PLM systems demand complex data modeling technologies to express to definition and lifecycle of products. The demand for complexity is high, and product development and manufacturing are just the beginning. The expansion of PLM systems to support an entire lifecycle brings even more challenges than the first PDM/PLM systems experienced 20-25 years ago. Although PLM aspirations are high, the data management system used by most PLM platforms (eg. Aras, Dassault Systemes 3DX, PTC Windchill, Oracle Agile, Siemens TeamCenter, etc.) are limited to SQL database technology and relational data models.
At the same time, the demand for globally connected systems brings more and more data management needs that can be solved using modern data management technologies combined with web/cloud architecture and microservices. Back in my Data Management for PLM in the 21st century, I shared an insight on what database technologies are now available and they can be used together for advanced PLM applications. Usage of the right database technology for the right task will be a future approach that already started to be implemented in modern cloud-based platforms (Eg. Autodesk Forge, Onshape, OpenBOM, and some others).
Check for some additional examples here – How to use multiple database implementations to scale online web services. Here is one important passage to explain polyglot persistence.
…the database architecture approach related to the use of multiple databases called polyglot persistence. The core idea is quite simple. To understand it, you need to think back in time when every software vendor had a dilemma about what programming language to use. Advanced components and web tech made this question irrelevant. We use multiple languages for web, servers, and other applications, which is polyglot programming now. The same is happening now with data – service-based architecture makes it very easy and efficient to use multiple databases to get advantages of technologies to optimize and simplify data management layers.
Modern systems are now available as online services. Here is an example of Autodesk Forge Data Servies and OpenBOM Services. The next step will be injecting semantic data modeling using RDF and OWL into PLM applications provided as online services using multiple databases (polyglot persistence foundation). A combination of online PLM web services can be used to plug a complex data model into a more advanced data management solution. The approach was described by Mercedes Benz Cars paper and become a foundation for the semantic federation layer for a digital thread.
What is my conclusion?
PLM systems are transforming. In the next 5-10 years, PLM will transform from enterprise applications using SQL database technology and relational data models to online connected services to provide a foundation for solutions to support a modern digital thread between multiple engineering and manufacturing entities, connecting customers and providing a solution for an entire product lifecycle. To have an SQL data model was enough to manage CAD files and their lifecycle. The demand for modern PLM systems is much higher and there is a need for solutions that are focusing on how to support resilient data management for these applications. This is why I can see an increased interest in the semantic web, RDF, and OWL as technologies for future PLM data models. It will take some time to get these systems implemented and to become more mature, but general directions towards more expressive data management make more sense to me. If you’re a PLM architect looking for the future of data architecture for industrial companies, getting up to speed with RDF/OWL would be an absolutely important step for your 2022/2023 goals. For PLM vendors, retooling will be hard as most of the legacy PLM systems are single-tenant, SQL data model-based. But even so, switching to online services, to support micro-services, polyglot persistence, and considering RDF/OWL model to support a digital thread can be a good starting point. Just my thoughts…
Best, Oleg
Disclaimer: I’m co-founder and CEO of OpenBOM developing a digital cloud-native PDM & PLM platform that manages product data and connects manufacturers, construction companies, and their supply chain networks. My opinion can be unintentionally biased.