Donald Simpson
DevOps Engineer | SRE | AI & MLOps Platform Engineering
Edinburgh, Scotland, UK
Contents
Profile
Senior Platform Engineer with 20+ yearsā experience designing, building, and operating secure, automated, cloud-native platforms in enterprise and regulated environments. Deep expertise in Kubernetes-based platform engineering, AWS, infrastructure as code, CI/CD, observability, and DevSecOps.
Strong background in reliability engineering (SRE), with hands-on experience defining SLOs/SLIs, reducing operational toil, and improving incident response. Increasing focus on AI/ML and data platforms from an infrastructure and operations perspective, including GPU-aware Kubernetes workloads and data-heavy pipelines.
Published technical author (Extending Jenkins, Beginning Docker) and longātime contributor to DevOps best practices across large organisations and startups.
Core Skills
- Cloud & Platforms: Kubernetes, AWS (EKS, ECS, Lambda, API Gateway, IAM), Bare Metal, Hybrid Cloud
- Infrastructure as Code: Terraform, Terragrunt, AWS CDK, CloudFormation
- CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, ArgoCD, Docker
- Observability & SRE: Dynatrace, Datadog, Prometheus, Grafana, OpenSearch, ELK/EFK, SLO/SLI design
- DevSecOps: Policy-as-code, container security, compliance-driven environments
- Data & AI Platforms: ClickHouse, embeddings & clustering (MiniLM, HDBSCAN), local inference, MLOps workflows
- Languages & Scripting: Python, Bash, Groovy, YAML
Applied Platform & AI Projects
Remora Risk Engine | remora-ai.com | Oct 2025 ā Present
- Designed and built a production-grade Python microservice delivering explainable, real-time risk scores for trading bots.
- Implemented multi-signal regime detection using volatility modelling, momentum classifiers, and ML-driven indicators.
- Built a REST API with safe-mode fallback logic for automated trading integrations.
- Back-tested thousands of historical trade windows to validate filters and quantify avoided losses.
- Tech: Python, FastAPI, Pandas, NumPy, ML/AI, REST APIs, Ubuntu, Proxmox, Observability
CarHunch | carhunch.com | May 2025 ā Present
- Designed and operated a ClickHouse-based OLAP platform, including schema design, partitioning, backups, and materialized views.
- Built Python ETL pipelines to ingest and process large-scale MOT datasets.
- Implemented Redis caching to optimise API performance and reduce query latency.
- Developed API layers (Python/PHP) and a custom WordPress-based front end.
- Prototyped AI-driven insights using MiniLM embeddings and clustering; evaluating LLaMA/GPT4All for natural-language reports.
- Tech: ClickHouse, Python, PHP, Redis, WordPress API, GitLab CI, Proxmox, AI/ML
Professional Experience
Shell | Apr 2023 ā Present
- Platform engineer for OSDU, a missionācritical, APIādriven global data platform.
- Designed and operated AWS EKS platforms across multiple regions using Terraform and GitHub Actions.
- Standardised infrastructure and deployment patterns in collaboration with AWS consultants and internal teams.
- Led platform observability using Dynatrace, including synthetic monitoring, log ingestion, metrics, and alerting.
- Defined and evolved SLOs/SLIs, embedding reliability targets into operational workflows.
- Applied SRE principles to incident management, postāincident reviews, and automation.
- Contributed to early AI/ML platform initiatives, including Amazon Bedrock/Q experiments and AIāassisted observability.
Ubloquity | Nov 2022 ā Apr 2023
- Designed and operated hybrid Kubernetes platforms across AWS EKS and bareāmetal OVH using Rancher and Terraform.
- Owned the endāto-end platform stack in a startup environment.
- Built permissioned blockchain platforms (Multichain) integrated into CI/CD workflows.
- Implemented GitOps pipelines with GitHub, ArgoCD, and Docker.
- Delivered centralised logging and observability with OpenSearch and Fluentd.
RS Components | Mar 2022 ā Nov 2022
- Developed reusable Terraform/Terragrunt modules for AWS.
- Built and optimised CI/CD pipelines using GitLab CI.
- Operated scalable GitLab Runners on ECS/Fargate and Nomad.
- Led migration of on-prem services to AWS.
- Embedded DevSecOps practices with Datadog observability.
Registers of Scotland | Jan 2020 ā Mar 2022
- AWS-focused DevOps using AWS CDK for infrastructure automation.
- Enabled multiple teams onboarding to AWS.
- Built centralised logging with Elasticsearch/Kibana and Kinesis Firehose.
- Developed monitoring platforms with Prometheus and Grafana on ECS/Fargate.
Publications & Community
- Author: Extending Jenkins (Packt Publishing) ā Amazon | Sample Chapter
- Author: Beginning Docker (Video Course, Packt) ā Packt | Preview
- Technical Reviewer: Multiple Jenkins and Docker titles (Packt)
- Community: Organiser, Edinburgh Automated IT Solutions Meetup
Education
- Postgraduate Diploma, Information Systems ā Edinburgh Napier University
- Bachelor of Land Economy (BLE) ā University of Aberdeen
- HNC, Business Studies ā Cardonald College, Glasgow
Availability
Open to fully remote, Outside IR35 contracts in:
- Platform Engineering
- DevOps / SRE
- AIāadjacent infrastructure and data platform roles
One thought on “CV/Resume – DevOps/MLOps Engineer”
Comments are closed.