A Deep Dive Into Data Lakes

September 18, 2018

755

In the age of Big Data, we’ve had to come up with new terms to describe large-scale data storage. We have databases, data warehouses and now data lakes.

While they all contain data, these terms describe different ways of storing and using that data. Before we discuss data lakes and why they are important, let’s examine how they differ from databases and data warehouses.

Let’s start here: A data warehouse is not a database. Although you could argue that they’re both relational data systems, they serve different purposes. Data warehousing allows you to pull data together from a number of different sources for analysis and reportiong. Data warehouses store vast amounts of historical data for complex queries across all data types being pulled together.

Data lakes are centralized storage and data repositories that allow you to work with a variety of different types of data. The cool thing here is that you don’t need to structure the data and it can be imported “as-is.” This allows you to work with raw data and run analytics, data visualization, big data processing, machine learning tools, AI, and much more. This level of data agility can actually give you some pretty cool competitive advantages.

RELATED ARTICLESMORE FROM AUTHOR

Celebrating the Second Year of Linux Man-Pages Maintenance Sponsorship

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

RELATED ARTICLES MORE FROM AUTHOR