PLM: How to enable long term retention of you product data?

PLM: How to enable long term retention of you product data?

5inch-floppyjpgDo you remember how a 5-inch floppy disk looks? And how to use it? It doesn’t seem like so long ago that we used this floppy quite widely. But, as you can see, we have moved forward, and it’s very interesting how we can enable fast ROI for PLM, how to make a quick implementation, updates online, etc. But at the same time, in my opinion, there are problems that exist today and are also related to long processes. The problem is how to keep my product data for a long period of time. I think that this problem will become more urgent over the next 2-3 years. There are two main reasons for this: Data around us is growing very fast. Most of today’s PLM implementations (even if they are 3-5 years long) are still located on corporate “spinning disks”. As storage is becoming cheaper, this problem is getting pushed to a longer queue, since you can increase your operational storage and postpone the problem to a few years from now.

Most of the discussions related to long term data retention were around CAD formats. Indeed, this is an important topic, but I think that the problem is not limited to data formats, and much wider in scope. I have separate issues on four topics and will analyze the options I see possible for solving these problems.

1. Physical Storage

This is probably not a very “PLM-related” problem. How do you physically keep data? Most of today’s examples I’ve seen use a CD/DVD combination to store data. I’m not so sure if this is so good. The main problem is that by copying data onto a DVD, we are removing data. This is really bad. The biggest value of data being able to use it – by putting your data on a DVD, your data will die. The new development of cloud data centers shows a big promise for solving this problem. I think that companies using cloud data centers will benefit from long data retention and therefore resolve the physical storage problem.

2. Data Models

This one is tricky. Product data is quite complex and not sequential. If you backup your mail, the only model you need is a calendar. You can retrieve any data back using your calendar model – sender(s)/year/month/date. This is all you need to know when you want to retrieve your mail. This is not true with product data. Product data has many dependencies that are very complex and span across timelines related to multiple products etc. So, this is a big problem, in my opinion. Even today I cannot say that a data model is stable for most PDM/PLM implementations. It becomes even less stable over a long period of time. The most advanced development in this field is based on the STEP format, but I still consider this issue very challenging.

3. Data Formats

Most of today’s discussions about long data retention are about formats or CAD formats. When we discuss mainly 3D, one of the first questions we ask is how to store our 3D (and non-3D) graphical models for the long-term. Lifecycle of CAD packages is too short. I think most of today’s implementations are focused on 2D and print storages. There is no silver bullet today, in my view. 3D PDF is promising. Private formats (even if they pretend to be according to industry standards and acceptance, such as JT) can solve the problem only partially.  Combining a good data model and data format may work. But more promising is granular data storage – this will let us keep the underlining model behind 3D and 2D data. 

4. Logical Access

This is a complex term for the simple word “search”. I don’t want to oversimplify, but maybe I’m wrong. At the end of the day, I may need to find data that I saved 30 years ago and examine or reuse it for a particular reason. You can call this simply “searching”, but we need to bring new ideas to this issue – about how to logically retrieve relevant data in the context of a specific problem/issue/task.

And one more thing… There is a certain promise in environment virtualization usage for long term data retention. You can keep your data and your applications together with you computer systems – and you can use a virtual environment to bring your old systems to life. I don’t know why, but this option always reminds me of sci-fi films when humans were frozen and un-frozen after 100-200 years…. Imagine that you start your MS-DOS now to read your document in Word Perfect.

And this topic probably is not be complete if I don’t mention the aging workforce. Not only do we need to keep product data over a long period of time, but we also need to keep data in order for tomorrow’s users “to use data” rather than “decode” data – there’s a big difference.

So, to sum up: I don’t think we have a fundamental solution for this problem. I’ve heard about a few programs such as LOTAR, development done by PDES. Inc; The Siemens announcement of JT approval for long term retention program, and more (sorry if I left something out).

I’d be glad to discuss if you have had experience in this field and what’s your opinion on how to solve this fundamental issue.


Share This Post

  • Oleg,
    a. long term retention is primarily about satisfying legal requirements of product liability etc – and in this respect this is usually drawings and documents in some widely accepted neutral derived format such as TIFF or PDF.
    b. I worked some time over 15 years ago in healthcare concerning archiving of digital images (MRI etc). My conclusion then (and still now) is to keep everything on line. That means completely replace your online storage medium every 2-4 years and just copy everything. Till now Moore’s law has not let us down and it does not look like it is going to fail in the near future. Using hierarchical storage systems or offline media is a waste of money, extremely risky (in terms of ability to read in 20 years time) and difficult to migrate (which you will always have to do since no system or media lasts for ever !).
    c. No large company I know would put their sensitive product data in the cloud.

  • Martin, thanks for your thoughts. My comments are:
    a. I think depends on type of manufacturing or different business it can be not only for legal purpose. Typical examples are long lifecycle products, such as planes, buildings etc.
    b. This is confirms my proposition that cost of online storage will drop proportional to growing data demands.
    c. What will happen if this is will be they only solution to satisfy (b)?

  • Hello Oleg,

    My experience includes nearly 13 years at Autodesk, half as a technical account manager, and recently as the product manager for a SaaS provider operating, in part, a large web-based data repository. In both cases, I spent a very large part of my time dealing with data translation, migration, storage, and retrieval. The industries served included automotive, health, finance, and more.

    To more fully flesh out your assessment, I might suggest you consider ‘competition’ as a primary source of conflict preventing a clean answer.

    The core problem is less how to define a 3D data model (e.g. a ‘document). The various CAD manufacturers, as an example industry, did this years ago. They continue to evolve their formats to suit their own needs. An AutoCAD R14 file will be accessible for some time to come, for instance. Each of these formats are developed in what the manufacturer believes is an optimum form to efficiently deliver its value-add differentiators (features) to its customers. Hence, all the different formats. They don’t create conflicting formats because it’s an area in which they make big money. Yes, there’s the ‘locked in’ argument; but that’s a byproduct of how the environment evolved.

    It is not in competitors’ best interests to openly define how their 3D data is defined. As a result, so-called neutral standards like STEP, passing IGES, VDA, and others will continue to suffer in their suitability for the task.

    Operating a web-based repository, we held 2 Billion pages of content, the vast majority of which were various forms of text. The next most popular formats were TIFF and PDF. The stored documents were generated in various applications. Many of these systems were themselves capable of generating complex packages of information, requiring overlays, cross-references, and other concepts familiar to operators of CAD systems.

    The difference here is that the preferred long-term formats were chosen for stability, adequacy to preserve intent, and (perhaps unspoken) the relative lack of competitiveness between formats. A TIFF is a TIFF; a PDF a PDF; and the like. I do simplify the issue somewhat, but it is not by much.

    Here’s the striking difference in these two worlds: When working with CAD, the dominant discussion is on competition-driven format differences, less so on life span beyond 5-10 years. In the business realm, there is near-zero discussion of formats, but an intense focus on ability to preserve for 3, 5, 7, to 30+ years, and how to retrieve documents.

    An aside. The CAD PLM space worries about how to find things downstream. Interesting that many major businesses (e.g. home supply store) can manage to find almost anything with perhaps 6-10 database indices. In their case, we’re discussing 100s of thousands of assemblies (multi-part documents) monthly. Is the tracking of a widget in CAD being made unnecessarily complex? Might ‘simpler’ be better?


  • Hello JT, Thank you for your thoughts and observations. I’d agree with most of them starting from how do you see 3D model optimized by CAD vendors and what is most important end up with request “to think simple”. I believe overall story in data retention is sort of balance on complex/simple. The simpler you will do it now – it will be probably available later. Complex solutions have no chance to survive time-line. Therefore solutions like TIFF, PDF had success and you mentioned it too. At the same time, I see definite trend to raise time-line for online data availability. This long data online live will allow us to find way to optimize storage and access procedures. Something that before just got lost in scanned and paper materials. Cheers. Oleg.

  • Hello Oleg,

    Focusing on your last thought, keeping data alive online for an extended period, is also an item I dealt with frequently as operator of a repository.

    Storage costs will continue to fall. Many of us already have personal systems with 1+ TB at home. In the business realm, you get what you pay for. Options include backup, disaster recovery, and business continuity. The more you demand, the greater the cost/KB to store.

    It has not been uncommon for a customer to ask for quotes to ‘store everything’ for ’10 years.’ The price quoted often induces a new sense of reality.

    At the consumer level, there are arguments that the cost of storage has already become $0. Consider the storage you can get at Google–GB for no cash outlay. That, of course, is actually a cost subsidized by the consumer having to tolerate advertisements.

    In B2B and intra-business environments, you do not have adverts available to offset real-world cost. As a result the ‘long data online live’ notion will continue to be very much a budget-driven reality. We already have the ability to store almost everything we want, but who wants to hire another sys.admin next week to manage that growth?

    In the end, this is a simple equation: content value vs. volume & cost. To the point content value exceeds the resources demanded to keep it (i.e. ROI), it’ll stay online. If the content’s value approaches, and falls below the required resource value, it’ll go offline.


  • JT,

    I got your point. I have two observations with regards to falling cost of storage. First -I’m not sure this trend is unlimited. In the end, there are energy limits and communication limits. I can have 5TB storage in my house, but I will be in real trouble to transmit this date. But, even if this is solvable, my second problem is data access on the logical level. We product too much data, but our ability to sort, compare, search and consume this data are not growing in the same proportion. This problem (search for right data) will become more urgent with growing amount of data we have today and will have tomorrow.
    Does it make sense?

  • Do you know of standards wrt. long-term archiving of product data? I blogged what I could find on this topic.

  • Samuel, I don’t think there are specific standards. The most advanced work I know is project LOTAR. I think there is big tendencies to keep data live (don’t archive). In order to do so, systems need to be modified to keep track of data for long period of time. Storage won’t be a problem. It will be only cheaper… Oleg

  • Pingback: Product Lifecycle Future in 60 years point-of-view BBC film()

  • Pingback: Product Lifecycle Future in 60 years point-of-view BBC film | Daily PLM Think Tank Blog()

  • Pingback: PLM, DNA and Long Term Information Storage()

  • Pingback: PLM, DNA and Long Term Information Storage | Daily PLM Think Tank Blog()

  • Hi there i am kavin, its my first time to commenting anyplace, when i read this paragraph i thought i
    could also create comment due to this good piece of writing.

  • I am conducting a survey on policies and practices in archiving and maintaining PLM data. First 50 respondents will be entered to win a $25 gift card.