Home / Blog / A growing need for “Long Data”

A growing need for “Long Data”

Century-long optical data memory with an unchanged baseline with nanoplasmonic hybrid glass composites. Transformation from the current way to store long data with the variation of baselines into the century-long optical data memory with an unchanged baseline in nanoplasmonic hybrid glass composites. The gray-scale in the disks indicates the change of the baseline over time. Inset: Schematic drawing of the nanoplasmonic hybrid glass composites. Credit: RMIT

We are quite familiar with Big Data and we are also quite familiar with Huge Data. The latter are the ones being generated and used every single moment by all of us (posting that picture on Instagram, watching that clip on YouTube….) with the former being the exploitation of that huge and growing data set that seems to have no upper bound limit:  according to the IDC study published in 2017, Data Age 2025, we can expect a ten times growth of data generated from 2016 (16.1 Zettabyte) to 2025 (163 ZB).

Interestingly, the IDC study points out that we can expect a fifty times growth of the data analysed to reach 5.2ZB in 2025 (that is just 3% of data created and much less than 1% of available data) and a hundred times growth of data analysed by cognitive systems to reach 1.4ZB in 2025 (that is less than 1% of data created and a tiny fragment of available data).

These figures point out the vast under-utilisation of data that we will still be facing in the next decade: although the increase in data analyses capacity (and of meaning extraction) grows faster than the data generation rate we will have to wait several decades to take full advantage of all data.

Actually, there are two major stumbling blocks on the way (from a purely technology standpoint, there are several others, even more difficult to address from ethical, social, economical standpoints; read privacy, ownership, accountability….): the storage capacity (and its  accessibility) and the processing capacity. They are both growing but at a slower pace than data creation. The reason? Data are created at the edges, hence by a multitude of devices -like smartphones, sensors…- , but the available ones are stored and processed centrally (in clouds …).

A completely different paradigm that moves storage and processing at the edges leveraging on 6G networks – no it is not a typo: we need to go beyond 5G for that!- would open up the door to a completely new data fabric, much more similar to the one that we are starting to see in our brain, where the trick is massive filtering and emergence of perception/meaning/feelings. That is something being discussed in the IEEE FDC Symbiotic Autonomous Systems Initiative.

This change in architecture, however, will not be enough because it will support Big Data but it will not support “Long Data”!

So, what are these “Long Data”? Usually we are interested in the here and now. The past is basically coded in our “selves” (brain) and provide the context that is actually driving our processing (experiences -the past- changes the structure of our brain that means change in the way it processes today’s data).
In these last two decades, mostly in the last one, researchers have developed algorithms to take advantage of historical data, i.e. to have machine learning from experience. There  is, and will be, a tremendous accumulation of data and that will provide the springboard for ever more effective machine learning.
The problem is storing these data and keeping them accessible. Current magnetic support have an average life time of just few years (basically every two years you need to refresh the data). Given the doubling every 3 years of data created this is becoming an impossible task.

This is where the result of research carried out the RMIT University in Australia and Wuhan Institute of Technology in China comes to help.

Researchers have created an optical disk that can store 10TB of information (that’s a lot but if you need to store 1 ZB than you will have to have 100,000 of those disks!) and, most important, that has a life time of 600 years. This means that you can store data once and keep using them for centuries before having to refresh them. You really get to use long data.

As machine learning progress and the capability of inference from huge data sets gets better having access to these long data will become more and more important.

Of course this solves the physical issue of storage. The compatibility of data (operating systems, the data format, taxonomies, … remains and it is not trivial. Even if you have in some drawers an historical floppy disk reader and a floppy disk from the 80ies it would be basically impossible to access the data which is not a big issue, since those magnetised data have long since faded away… but you get the point.

About robertosaracco

Leave a Reply

Your email address will not be published. Required fields are marked *