Building an Enterprise Data Lake Architecture: Lake Zones, File Format & Domains — Part 3

Analytics at KareTech
13 min readSep 9, 2024

Introduction

In the second chapter, “Design & Planning,” we navigated through the expansive waters of data lakes and uncovered the practicalities of structuring a data lake to dismantle data silos and enhance data management across an organization. In this third chapter, we shift our focus to the key aspects of data zones, file formats, and hierarchy, all crucial components to ensure the lake flows smoothly and efficiently.

Chapter 1: Defining our Lake Zones & Structure

The data zones outlined below may go by different names, but their core functions and purpose remain consistent — they categorize the various states and characteristics of data as it flows through the lake. Each client I’ve worked with had their own unique requirements, guiding the design and implementation of these zones.

Raw Tier
Raw Tier

Raw Zone

Raw Zone functions like a vast reservoir, storing data in its original, unfiltered, and unprocessed state. Data is ingested directly from external source systems, often in formats such as CSV, JSON or Parquet. This layer replicates the source system’s table structures ‘as-is,’ while developers enhance the data by adding metadata columns during the ETL…

--

--

Analytics at KareTech

I use my gift of storytelling, mix it with a blend of technical expertise and provide you with real-life data stories from a first-person narrative in analytics