The industrial data revolution what founders got wrong

The Economist issued a study titled “Data, Data Everywhere” in February 2010. We had no idea how straightforward the data landscape was at the time. That is, in comparison to the factual realities we are confronted with as we move ahead to 2022. In that Economist article, I discussed how society is undergoing an “Industrial Revolution of Data,” which began with the hype around Big Data and has continued into our present era of data-driven AI.

Many in the sector expected uniformity, with greater signal and less noise, because of this revolution. Instead, there is more noise, but a stronger signal. That is to say, we are dealing with more data that are difficult issues with more potential commercial repercussions.

In addition, we have witnessed significant advancements in artificial intelligence. What does this signify in today’s digital world? Let us have a look at where we were at the time. I was on leave from UC Berkeley at the time of the Economist story, running an Intel Research lab in partnership with the school. We entirely focused on what is now known as the Internet of Things back then (IoT).

We were talking about networks of small linked sensors implanted in everything at the time — buildings, nature, even the paint on the walls — at the time. We were studying ideas and creating gadgets and systems in the hopes of being able to measure the physical world and capture its actuality as data.

We were anticipating it. However, at the time, the majority of public interest in data focused on the advent of the internet and search engines. Everyone was talking about the ease of access to massive amounts of digital data in the form of “documents” — human-generated content designed for human consumption.

Over the horizon, we could see an even larger tsunami of machine-generated data. One component of what I meant by “industrialization of data” is that, because data would be pounded out by machines, the amount would skyrocket. That is exactly what occurred. The advent of standardization was the second feature of the “Industrial Revolution of Data” that I anticipated. Simply said, if robots are creating things, they will create them, in the same way, every time, thus analyzing and integrating data from various sources should be much easier.

Standardization has parallels in the classical Industrial Revolution when all parties had an incentive to standardize on common resources like transportation and shipping, as well as product requirements. That seems to hold true for the new Data Industrial Revolution as well, with economics and other pressures driving data standardization. That did not take place. In reality, the exact reverse occurred. We saw a huge increase in “data exhaust” — the leftovers of exponentially increasing computing in the form of log files — but only a little increase in normalized data.

As a result, rather of having consistent, machine-oriented data, we now have a significant growth in the variety of data and data kinds, as well as a reduction in data governance. We began to see hostile data usage in addition to data exhaust and machine-generated data. This happened because the individuals who worked with data were motivated to use it.

Take, for example, social media data and current discussions about “fake news.” The early twenty-first century has been a massive experiment in what makes digital material go viral, not just for individuals, but also for companies and political interests seeking to reach a large audience.

Machines now create much of that information, but it created for human consumption and behavioral patterns. This contrasts with the naive “by people, for people” web of the past. In summary, today’s data production sector is extremely high volume, but it is not geared for standard data representations, at least not in the way I predicted over a decade ago.