Project Overview
This project demonstrates the design and implementation of a locally hosted, end-to-end analytics data platform, replicating how data moves through a modern organisation — from raw ingestion to executive reporting.
Rather than focusing on exploratory analysis alone, the emphasis is on data architecture, orchestration, transformation, and consumption, using a real-world e-commerce dataset as the input source.
The result is a fully functional analytics pipeline that mirrors an Azure-style cloud workflow, built and operated locally.
The goal of this project was to move beyond isolated dashboards and instead demonstrate:
How raw data is ingested and stored reliably
How workflows are orchestrated and automated
How raw data is transformed into analytics-ready models
How business intelligence tools consume structured data
How technical components work together as a system
This reflects how data teams operate in production environments, rather than in one-off analysis tasks.
Successfully implemented a local analytics platform replicating real-world architecture
Built automated, repeatable data workflows rather than manual processes
Modelled complex relational data into clean analytical structures
Produced business-facing dashboards backed by governed data models
Gained hands-on experience debugging orchestration, storage, and schema issues
The platform was built locally using containerised services to replicate a cloud-style analytics stack.
High-level architecture:
Object storage layer for raw data ingestion
Workflow orchestration to manage pipelines
Analytics database for structured data
Transformation layer to model business entities
BI layer for reporting and insight delivery
Each component was configured, connected, and tested as part of a single integrated system.
Docker Desktop (containerised local environment)
Object storage (S3-style via MinIO)
Workflow orchestration (Apache Airflow)
Analytics database (PostgreSQL)
Data transformation (dbt)
Business intelligence & visualisation (Power BI)
The platform uses a multi-table e-commerce dataset as its raw input acquired from Kaggle.com. The dataset provides realistic complexity — multiple entities, relationships, and time-based behaviour — suitable for modelling a production-style analytics workflow.
The dataset itself is not the focus of the project; it serves as a representative source to support system design, transformation logic, and reporting outputs.
Data Pipeline & Workflow
Raw CSV files are ingested into object storage, simulating a data lake or blob storage layer commonly used in cloud environments.
Automated workflows manage ingestion and processing steps, ensuring tasks execute in the correct order and can be monitored and debugged when failures occur.
Raw data is transformed into analytics-ready fact and dimension tables, applying consistent naming, relationships, and business logic.
This layer is designed to support:
Reusability
Clear lineage from raw to curated data
BI-friendly schemas
The transformed data is consumed by a BI tool to produce executive dashboards covering:
Sales and revenue performance
Product and category insights
Customer and geographic trends
Delivery and operational health