CV/Resume – DevOps/MLOps Engineer

Donald Simpson

DevOps Engineer | SRE | AI & MLOps Platform Engineering
Edinburgh, Scotland, UK

Profile

Senior Platform Engineer with 20+ years’ experience designing, building, and operating secure, automated, cloud-native platforms in enterprise and regulated environments. Deep expertise in Kubernetes-based platform engineering, AWS, infrastructure as code, CI/CD, observability, and DevSecOps.

Strong background in reliability engineering (SRE), with hands-on experience defining SLOs/SLIs, reducing operational toil, and improving incident response. Increasing focus on AI/ML and data platforms from an infrastructure and operations perspective, including GPU-aware Kubernetes workloads and data-heavy pipelines.

Published technical author (Extending Jenkins, Beginning Docker) and long‑time contributor to DevOps best practices across large organisations and startups.

Core Skills

  • Cloud & Platforms: Kubernetes, AWS (EKS, ECS, Lambda, API Gateway, IAM), Bare Metal, Hybrid Cloud
  • Infrastructure as Code: Terraform, Terragrunt, AWS CDK, CloudFormation
  • CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, ArgoCD, Docker
  • Observability & SRE: Dynatrace, Datadog, Prometheus, Grafana, OpenSearch, ELK/EFK, SLO/SLI design
  • DevSecOps: Policy-as-code, container security, compliance-driven environments
  • Data & AI Platforms: ClickHouse, embeddings & clustering (MiniLM, HDBSCAN), local inference, MLOps workflows
  • Languages & Scripting: Python, Bash, Groovy, YAML

Applied Platform & AI Projects

Remora Risk Engine | remora-ai.com | Oct 2025 – Present

  • Designed and built a production-grade Python microservice delivering explainable, real-time risk scores for trading bots.
  • Implemented multi-signal regime detection using volatility modelling, momentum classifiers, and ML-driven indicators.
  • Built a REST API with safe-mode fallback logic for automated trading integrations.
  • Back-tested thousands of historical trade windows to validate filters and quantify avoided losses.
  • Tech: Python, FastAPI, Pandas, NumPy, ML/AI, REST APIs, Ubuntu, Proxmox, Observability

CarHunch | carhunch.com | May 2025 – Present

  • Designed and operated a ClickHouse-based OLAP platform, including schema design, partitioning, backups, and materialized views.
  • Built Python ETL pipelines to ingest and process large-scale MOT datasets.
  • Implemented Redis caching to optimise API performance and reduce query latency.
  • Developed API layers (Python/PHP) and a custom WordPress-based front end.
  • Prototyped AI-driven insights using MiniLM embeddings and clustering; evaluating LLaMA/GPT4All for natural-language reports.
  • Tech: ClickHouse, Python, PHP, Redis, WordPress API, GitLab CI, Proxmox, AI/ML

Professional Experience

Shell logo Shell | Apr 2023 – Present

  • Platform engineer for OSDU, a mission‑critical, API‑driven global data platform.
  • Designed and operated AWS EKS platforms across multiple regions using Terraform and GitHub Actions.
  • Standardised infrastructure and deployment patterns in collaboration with AWS consultants and internal teams.
  • Led platform observability using Dynatrace, including synthetic monitoring, log ingestion, metrics, and alerting.
  • Defined and evolved SLOs/SLIs, embedding reliability targets into operational workflows.
  • Applied SRE principles to incident management, post‑incident reviews, and automation.
  • Contributed to early AI/ML platform initiatives, including Amazon Bedrock/Q experiments and AI‑assisted observability.

Ubloquity logo Ubloquity | Nov 2022 – Apr 2023

  • Designed and operated hybrid Kubernetes platforms across AWS EKS and bare‑metal OVH using Rancher and Terraform.
  • Owned the end‑to-end platform stack in a startup environment.
  • Built permissioned blockchain platforms (Multichain) integrated into CI/CD workflows.
  • Implemented GitOps pipelines with GitHub, ArgoCD, and Docker.
  • Delivered centralised logging and observability with OpenSearch and Fluentd.

RS Components logo RS Components | Mar 2022 – Nov 2022

  • Developed reusable Terraform/Terragrunt modules for AWS.
  • Built and optimised CI/CD pipelines using GitLab CI.
  • Operated scalable GitLab Runners on ECS/Fargate and Nomad.
  • Led migration of on-prem services to AWS.
  • Embedded DevSecOps practices with Datadog observability.

Registers of Scotland logo Registers of Scotland | Jan 2020 – Mar 2022

  • AWS-focused DevOps using AWS CDK for infrastructure automation.
  • Enabled multiple teams onboarding to AWS.
  • Built centralised logging with Elasticsearch/Kibana and Kinesis Firehose.
  • Developed monitoring platforms with Prometheus and Grafana on ECS/Fargate.

Publications & Community

  • Author: Extending Jenkins (Packt Publishing) – Amazon | Sample Chapter
  • Author: Beginning Docker (Video Course, Packt) – Packt | Preview
  • Technical Reviewer: Multiple Jenkins and Docker titles (Packt)
  • Community: Organiser, Edinburgh Automated IT Solutions Meetup

Education

  • Postgraduate Diploma, Information Systems — Edinburgh Napier University
  • Bachelor of Land Economy (BLE) — University of Aberdeen
  • HNC, Business Studies — Cardonald College, Glasgow

Availability

Open to fully remote, Outside IR35 contracts in:

  • Platform Engineering
  • DevOps / SRE
  • AI‑adjacent infrastructure and data platform roles

One thought on “CV/Resume – DevOps/MLOps Engineer”

Comments are closed.