Although the concept of big data itself is relatively new, the origins of large data sets go back to the 1960s and ‘70s when the world of data was just getting started with the first data centers and the development of the relational database. n 1985, Bill Inmon first coined the term “data warehouse” as a “subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management’s decisions.”
Around 2005, people began to realize just how much data users generated through Facebook, YouTube, and other social networks. Hadoop (an open-source framework created specifically to store and analyze big data sets) was developed that same year. NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was essential for the growth of big data because they make big data easier to work with and cheaper to store. In the years since then, the volume of big data has skyrocketed. Users are still generating huge amounts of data—but it’s not just humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the internet, gathering data on customer usage patterns and product performance. The emergence of machine learning has produced still more data.
Big data leads to the need for data lakes and other innovative storage solutions. These solutions are designed to efficiently manage large volumes of unstructured data. Unlike traditional databases or storage systems, data lakes allow raw data to be stored in its native format until needed. This approach offers flexibility in data processing and analysis. It also greatly cuts down the time needed to prepare data for business intelligence.
Throughout the 2010s, data lakes became a standard for storing large amounts of unstructured data, while data warehouses focused on storing structured data. These workloads started to merge in the late 2010s, with Databricks promoting the concept of data lakehouse.
The concept of big data has transformed industries, offering unprecedented insights and opportunities. Artificial Intelligence (AI) and Machine Learning (ML) stand at the forefront of the data revolution. These technologies enable sophisticated data analysis and decision-making, driving efficiency and innovation across various sectors.