Harvesting Data Value
When one says “data are valuable” that’s not the complete truth. It is sufficient to look around at how many data are being used every single day for free (getting and using is almost a synonym in data, which is not the case in the world of atoms!).
Even when one pays for data the value is usually in something else, like the convenience of accessing those data and the value that can be derived from the use, the knowledge about those data. A song may be priced 99c but people are not paying that money for the song, rather for the convenience of getting that song in just one click. That same song could be obtained for free by ripping a YouTube clip of that song (and yes, there is going to be a YouTube clip of that song). By paying 99c one saves time and time is a valuable scarce resource.
Hence the point is how can one harvest data value?
There are basically 4 increasing layers of data value:
- Factual: this is the layer of data acquisition. Through sensors one can transform a physical characteristic in data mirroring it. As an example data can be harvested by a digital camera at an intersection point and a software application can detect a traffic jam. This is a fact. It can be just a number (like 0: traffic flow is regular, 1: traffic is slow/blocked) or it can be a set of numbers (number of cars, the time spent in crossing the intersection, the types of cars, trucks,…). This factual information can be made available to everybody, or, higher value, to those in the vicinity that seems to be approaching that intersection.
- Analytic: this is the layer where factual data are analysed, compared with other data to find out what may be going on. Following on the example a software application can compare the factual situation at the intersection to the one at the same time on previous days. It might so discover that the slowdown is recurring at that particular time of the day, or it may be something unusual. In this case it can look at other data, like presence of an event, a soccer match, that just ended, thus justifying the increase traffic volume, or the warning of an accident. Again, the analyses can be made available (metadata) to everybody or to those that are likely to be affected, with information of an abnormal situation.
- Predictive: a further software application, using both factual and analytic data can work out a prediction on what is likely to happen if nothing is done. Is the traffic jam going to get worse (many more vehicles are converging on that point and it will take time before the accident is removed)? Is it going to fade away in some 15 minutes? Again, this information can be made publicly available or it can be provided to those that may find it useful (higher value).
- Prescriptive: once it is known what to expect if nothing is done prescriptive analytics applications can be used to evaluate different evolution if the context is changed, i.e. if the world of atoms is affected by some actions. Closing on the example the knowledge of probable destinations of the incoming traffic that if nothing is done will be converging on the intersection can be used to direct each incoming vehicle to use an alternate routing skipping the overloaded intersection thus avoiding worsening the situation and saving time to people travelling. This knowledge can be derived from a mapping of travel of individual smartphones (as the move from one cell to the next) over weeks and months. Still keeping the owner information private a telecom Operator can send specific messages to individual phones saying something like: “if you happen to be going to A it is better to choose this route, since intersection XY is currently blocked”. This clearly provides the highest value to the receiver.
Notice that these four layers of increasing data value fit any situation. The Digital Transformation creates factual data by mirroring atoms (i.e. the physical world with its ongoing activities) and these data can be stored for historical records and enable, once correlated with other data streams to create valuable information. Also notice that conceptually all data are available at the same time in the same place, that is in the cyberspace. In practice it is possible to partition data and allocate them in different places interacting with them through APIs (Application Programming Interface). This leads to an encapsulation of data preserving ownership and privacy. The disclosure through API can be regulated and monitored for value accrual, aggregation, sharing and monitoring, all the while enforcing the desired level of privacy. In the previous example, it is irrelevant to know which cars are blocked in the traffic jam, what matters is the traffic jam and its dimension to derive information on its evolution. Likewise, it is not important to know whose smartphone is travelling to a certain point, only that someone is intending to go there. At the same time, the owner of the data (like the person who is blocked in the traffic jam) may wish to share this information to her team to let them know that she will be late to the meeting. The API invoked by the meeting organiser will have access to the identity of that person whilst the API invoked by the city traffic supervision will not.