Project Overview
This platform tracks the divergence between UK residential property values and household incomes over a 20-year longitudinal window. By intersecting three disparate public datasets, the project quantifies the "Squeeze"—the specific velocity at which property price growth outpaces local wage productivity.
The defining technical challenge of this development was a "Nuclear Reset" necessitated by the discovery of geographic fragmentation in raw ONS data. The resulting platform prioritizes data legitimacy, utilizing a custom resolution layer to bridge legacy administrative boundaries with modern GSS standards.
The goal of this project was to demonstrate high-fidelity data engineering capable of surviving technical scrutiny:
Quantifying the Squeeze: Measuring the percentage-point divergence between property appreciation and local wage growth.
Engineering for Legitimacy: Implementing a "Contract-First" architecture to resolve geographic tangling and avoid silent data loss.
Bridging the "2012 Cliff": Creating a technical "Rosetta Stone" to map legacy ONS codes to modern non-hierarchical standards.
Scalable Architecture: Building a modular ELT pipeline that can be extended to include rental indices or mortgage rate modeling.
Restored 20-Year Timeline: Successfully bridged the transition from legacy hierarchical codes to 9-character GSS standards without losing historical data.
Localized Affordability Insights: Identified "Commuter Parasite" cities where residence-based pay is decoupled from local property values.
Auditable Data Pipeline: Built a dbt-governed transformation layer that centralizes business logic and prevents metric drift.
Resilient Engineering: Rebuilt the entire stack to prioritize data integrity over rapid visualization, resulting in a verified, "board-ready" analytics platform.
Ingestion & Curation
Raw Excel and CSV files from the ONS and Land Registry are transformed into "Curated CSVs" using custom Python scripts. This stage enforces strict numeric typing and standardizes column names, converting suppression markers (like ":" or "..") into database-ready null values.
Geographic Resolution (The Rosetta Stone)
To solve the "Geographic Tangling" identified in Attempt 1, the platform utilizes a Rosetta Stone Mapping. This layer ensures that historical records (e.g., South Bedfordshire) are accurately mapped to their modern successors (Central Bedfordshire), allowing for a continuous 2005–2025 timeline.
Modelling & Transformation
Using dbt, data moves from Staging to Intermediate (Resolution) and finally to Marts.
The Squeeze Index: A dbt-calculated metric measuring the delta between house price and wage growth percentages.
P/E Ratios: Automated calculations of local property price-to-earnings multiples.
Analytics & Insights
The final mart tables are consumed by Looker Studio, producing a longitudinal dashboard that tracks:
Regional Trends: High-velocity price growth vs. stagnant wage pulses.
Affordability Rankings: Identifying the UK’s most and least "Squeezed" local authorities.
Infrastructure: Docker Desktop (PostgreSQL containerization).
Environment: Python 3.12 (Virtual environments via PowerShell).
Database Management: pgAdmin 4.
Data Transformation: dbt (Data Build Tool).
Orchestration & Ingestion: Custom Python scripts for ONS "pathology" handling.
Visualisation: Looker Studio & Jupyter Notebooks.
The analysis is bounded by three authoritative sources:
UK HPI: HM Land Registry (Monthly/Yearly Averages).
ASHE Table 8: ONS (Residence-based earnings).
GDHI: ONS (Gross Disposable Household Income).
Data Pipeline & Workflow
Raw CSV files are ingested into object storage, simulating a data lake or blob storage layer commonly used in cloud environments.
Automated workflows manage ingestion and processing steps, ensuring tasks execute in the correct order and can be monitored and debugged when failures occur.
Raw data is transformed into analytics-ready fact and dimension tables, applying consistent naming, relationships, and business logic.
This layer is designed to support:
Reusability
Clear lineage from raw to curated data
The transformed data is consumed by a BI tool to produce executive dashboards covering:
Sales and revenue performance
Product and category insights
Customer and geographic trends
Delivery and operational health