According to some estimates, 463 exabytes of data will generated per day by 2025. (For perspective, exabyte storage can hold 50,000 years of DVD quality video.)
Translating physical and digital actions into data is now easier than ever and data has collected as much as possible to gain a competitive edge in all types of business.
However, in our collective tendency about data (and more of it to be found), which is often overlooked is the fact that the story plays a role in deriving real value from data. The reality is that the data itself is insufficient to influence human behavior. Whether the goal is to improve the bottom line of a business or to persuade people to stay home in the midst of an epidemic, it forces the story to take action rather than a single number.
As they collect and analyze more data, their role in separating signals from communication and storytelling will become integral in the field of science. Yet it could an area where information scientists fight. Of the more than 2,300 data scientists in the Anaconda 2020 State of Data Science survey, nearly a quarter of respondents said they lack communication skills within the Data Science or Machine Learning (ML) team.
This may be because about 40% of the respondents said that they were able effectively demonstrate the effect of the business “only occasionally” or “almost never”. The best data practitioners must be as good at storytelling as they are at coding and deployment models – and yes, it extends beyond creating visualization with reporting. Here are some suggestions on how data scientists can determine their results in a larger context.
Increasingly increasing datasets help machine learning models had better understand the scope of a problem, but more data does not necessarily contribute to human comprehension. Even for the most left-minded of thinkers, it is not in our nature to understand things like large abstract numbers or marginal improvements in accuracy.
That is why it important to include reference points when telling your story that makes the information clear. Throughout the epidemic, for example, we bombarded with countless statistics on case counts, mortality rates, positive rates, and more. While all of this data is important, it is more effective than a huge data dump in providing instant interactive maps and conversational contexts around reproduction numbers, alerting, and, consequently, helping to change needed behaviors.
When working with numbers, data practitioners have a responsibility to provide the necessary structure so that the data can understand by the audience.