Building an Enterprise Data Lake Architecture: Lake Zones, File Format & Domains — Part 3
Introduction
In the second chapter, “Design & Planning,” we navigated through the expansive waters of data lakes and uncovered the practicalities of structuring a data lake to dismantle data silos and enhance data management across an organization. In this third chapter, we shift our focus to the key aspects of data zones, file formats, and hierarchy, all crucial components to ensure the lake flows smoothly and efficiently.
Chapter 1: Defining our Lake Zones & Structure
The data zones outlined below may go by different names, but their core functions and purpose remain consistent — they categorize the various states and characteristics of data as it flows through the lake. Each client I’ve worked with had their own unique requirements, guiding the design and implementation of these zones.
Raw Zone
Raw Zone functions like a vast reservoir, storing data in its original, unfiltered, and unprocessed state. Data is ingested directly from external source systems, often in formats such as CSV, JSON or Parquet. This layer replicates the source system’s table structures ‘as-is,’ while developers enhance the data by adding metadata columns during the ETL…