CarHunch – Vehicle Insights Platform

CarHunch Logo

Turning billions of MOT and accident records into real-time vehicle insights.

Visit the live project here:

www.carhunch.com


What CarHunch Does

  • Aggregates billions of MOT test results and STATS19 UK accident records.
  • Provides real-time analytics on vehicle makes, models, years, and conditions.
  • Compares a specific car against similar vehicles (make/model/year).
  • Highlights common MOT failures and safety risks for different vehicles.

How It Works

CarHunch is powered by a ClickHouse data warehouse for ultra-fast queries, with:

  • Python ETL pipelines for MOT and accident data ingestion.
  • Incremental updates from DVLA bulk & delta files.
  • Redis caching for instant lookups.
  • Machine learning (MiniLM embeddings + clustering) to spot defect patterns.
  • LLM integration (LLaMA) to generate natural-language insights.

Example Insights

“Your 2010 Ford Focus has a 28% higher MOT failure rate than average for similar cars, mainly due to suspension wear.”

“BMW 3 Series (2008–2012) commonly fail MOTs due to brake issues around 80,000 miles.”

“Motorcycles show a different pattern of MOT failures compared to cars, with lighting and tyre defects being most common.”

Technical Overview

CarHunch isn’t just about insights — it’s also a demonstration of building a modern, high-performance OLAP data platform from the ground up.

  • Database: ClickHouse OLAP warehouse for real-time analytics on billions of records.
  • ETL: Python pipelines ingesting DVLA MOT bulk/delta files and STATS19 accident datasets.
  • Data Modeling: Normalised vehicle/test/defect schema with indexing and partitioning for query performance.
  • APIs: REST endpoints (Flask/FastAPI) serving real-time queries to front-end applications.
  • Caching: Redis for ultra-fast repeated lookups.
  • Machine Learning: MiniLM embeddings + HDBSCAN clustering for identifying defect patterns and grouping similar vehicles.
  • LLM Integration: Local LLaMA models for natural-language explanations and summaries.
  • Deployment: Dockerised services on a Proxmox node, easily portable to cloud infrastructure.
  • Monitoring: Logging & system metrics (rsyslog, lm-sensors) for reliability and performance tracking.

Why CarHunch?

CarHunch shows how big data + AI can turn raw government datasets into meaningful insights that benefit both consumers and the automotive industry.

👉 Explore more at

CarHunch.com

CarHunch Screenshot

 
Get in touch
if you’d like to collaborate or learn more.