Turning billions of MOT and accident records into real-time vehicle insights.
Visit the live project here:
www.carhunch.com
What CarHunch Does
- Aggregates billions of MOT test results and STATS19 UK accident records.
- Provides real-time analytics on vehicle makes, models, years, and conditions.
- Compares a specific car against similar vehicles (make/model/year).
- Highlights common MOT failures and safety risks for different vehicles.
How It Works
CarHunch is powered by a ClickHouse data warehouse for ultra-fast queries, with:
- Python ETL pipelines for MOT and accident data ingestion.
- Incremental updates from DVLA bulk & delta files.
- Redis caching for instant lookups.
- Machine learning (MiniLM embeddings + clustering) to spot defect patterns.
- LLM integration (LLaMA) to generate natural-language insights.
Example Insights
“Your 2010 Ford Focus has a 28% higher MOT failure rate than average for similar cars, mainly due to suspension wear.”
“BMW 3 Series (2008–2012) commonly fail MOTs due to brake issues around 80,000 miles.”
“Motorcycles show a different pattern of MOT failures compared to cars, with lighting and tyre defects being most common.”
Technical Overview
CarHunch isn’t just about insights — it’s also a demonstration of building a modern, high-performance OLAP data platform from the ground up.
- Database: ClickHouse OLAP warehouse for real-time analytics on billions of records.
- ETL: Python pipelines ingesting DVLA MOT bulk/delta files and STATS19 accident datasets.
- Data Modeling: Normalised vehicle/test/defect schema with indexing and partitioning for query performance.
- APIs: REST endpoints (Flask/FastAPI) serving real-time queries to front-end applications.
- Caching: Redis for ultra-fast repeated lookups.
- Machine Learning: MiniLM embeddings + HDBSCAN clustering for identifying defect patterns and grouping similar vehicles.
- LLM Integration: Local LLaMA models for natural-language explanations and summaries.
- Deployment: Dockerised services on a Proxmox node, easily portable to cloud infrastructure.
- Monitoring: Logging & system metrics (rsyslog, lm-sensors) for reliability and performance tracking.
Why CarHunch?
CarHunch shows how big data + AI can turn raw government datasets into meaningful insights that benefit both consumers and the automotive industry.
👉 Explore more at
CarHunch.com
Get in touch
if you’d like to collaborate or learn more.
Discover more from Don's Blog
Subscribe to get the latest posts sent to your email.