Advanced risk management for Freqtrade strategies








Advanced Risk Management for Freqtrade: Integrating Real-Time Market Awareness | Technical Deep Dive


Advanced Risk Management for Freqtrade: Integrating Real-Time Market Awareness

Freqtrade is a popular open-source cryptocurrency trading bot framework. It gives you solid tools for strategy development, backtesting, and running automated trading strategies live – and it does a very good job of evaluating individual trade entries and exits.

I’ve used Freqtrade extensively, both for testing ideas and for running strategies live.

One thing I kept running into, though, was that while Freqtrade is very good at answering one question:

“Is this a valid entry signal right now?”

…it doesn’t really answer a different, higher-level one:

“Is this a market worth trading at all right now?”

If you’ve run Freqtrade strategies with real money, you’ve probably seen the same pattern: strategies that look perfectly reasonable in backtests, with sensible entry logic and risk controls, can still bleed during periods of high volatility, regime shifts, or market-wide panic – even when individual entries look fine in isolation.

That gap is what led me to experiment with a separate market-level risk layer, which eventually became Remora.

Rather than changing strategy logic or adding yet another indicator, Remora sits outside the strategy and provides a market-level risk assessment – answering whether current conditions are historically safe or risky to trade, regardless of what your entry signals are doing.

Importantly, this is an additive layer – your strategy logic and entry signals remain unchanged.

This article walks through how that works, how to integrate it into Freqtrade safely, and how to validate its impact using reproducible backtests.

TL;DR: This article shows how to add real-time market risk filtering to Freqtrade using Remora, a small standalone microservice that aggregates volatility, regime, sentiment, and macro signals. Integration is fail-safe, transparent, and requires only a minimal change to your strategy code.

What This Article Covers

  • Why market regime risk matters for Freqtrade strategies
  • What Remora does (at a high level)
  • How to integrate it safely without breaking your strategy
  • How to validate its impact using reproducible backtests

Who This Is For (And Who It Isn’t)

This is likely useful if you:

  • Run live Freqtrade bots with real capital
  • Care about drawdowns and regime risk, not just backtest curves
  • Want a fail-safe, auditable risk layer
  • Prefer transparent systems over black-box signals

This is probably not useful if you:

  • Want a plug-and-play “buy/sell” signal
  • Optimise single backtests rather than live behaviour
  • Expect risk filters to magically fix bad strategies

Part 1: The Missing Layer in Most Freqtrade Strategies

Market conditions aren’t always tradable. Periods of extreme volatility, panic regimes, bear markets, and negative sentiment cascades can turn otherwise solid strategies into consistent losers – even when individual entries look fine in isolation.

Typical Freqtrade risk controls (position sizing, stop-losses, portfolio exposure) protect individual trades, but they don’t address market regime risk – the question of whether current market conditions are fundamentally safe to trade in at all.

Part 2: Remora – Market-Wide Risk as a Service

Remora is a standalone market-risk engine designed to sit outside your strategy logic.

Instead of changing how your strategy finds entries, Remora answers one question:

“Are current market conditions safe to trade?”

Results at a Glance (Why This Matters)

Before diving into implementation details, it’s useful to see what this approach looks like in practice.

Across 6 years of data (2020-2025), 4 different strategies, and 20 backtests:

  • 90% of tests improved performance (18 out of 20)
  • +1.54% average profit improvement
  • +1.55% average drawdown reduction
  • 4.3% of trades filtered (adaptive – increases to 16-19% during bear markets)
  • Strongest impact during bear markets (2022 saw 16-19% filtering during crashes)

All results are fully reproducible using an open-source backtesting framework (details later).

Core Design Principles

  • Fail-open by default: If Remora is unavailable, your bot continues trading normally.
  • Transparent decisions: Every response includes human-readable reasoning.
  • Multi-source aggregation: Dozens of signals with redundancy and failover.
  • Low-latency: Designed for synchronous use inside live trading loops.
  • No lock-in: Simple HTTP API. Remove it by deleting a few lines of code.

Data Aggregation Strategy (High-Level)

Rather than relying on a single indicator, Remora combines multiple signal classes:

Technical & Market Structure:

  • Volatility metrics (realised, model-based)
  • Momentum indicators
  • Regime classification (bull / bear / choppy / panic)
  • Volume and market structure signals

Sentiment & Macro:

  • News sentiment (multi-source)
  • Fear & Greed Index
  • Funding rates and liquidations
  • BTC dominance
  • Macro correlations (e.g. VIX, DXY)

Each signal type has multiple providers. If one source fails or becomes stale, others continue supplying data.

The output is:

  • safe_to_trade (boolean)
  • risk_score (0-1)
  • market regime
  • volatility metrics
  • clear textual reasoning

Part 3: Freqtrade Integration (Minimal & Reversible)

Integration uses Freqtrade’s confirm_trade_entry hook.

You do not modify your strategy’s entry logic – you simply gate entries at the final step.

Step-by-Step Integration

Here’s exactly what to add to your existing Freqtrade strategy. The code is color-coded: gray shows your existing code, green shows the new Remora integration code.

Step 0: Set Your API Key

Before running your strategy, set the environment variable:

export REMORA_API_KEY=”your-api-key-here”

Get your free API key at remora-ai.com/signup.php

Step 1: Add Remora to Your Strategy

Insert the green code blocks into your existing strategy file exactly as shown:

class MyStrategy(IStrategy):
    # ----- EXISTING STRATEGY LOGIC -----
    def populate_entry_trend(self, dataframe: DataFrame, metadata: dict) -> DataFrame:
        pair = metadata['pair']
        
        # Your existing entry conditions...
        # dataframe.loc[:, 'enter_long'] = 1  # example existing logic

        # ----- REMORA CHECK -----
        if not self.confirm_trade_entry(pair):
            dataframe.loc[:, 'enter_long'] = 0  # REMORA: Skip high-risk trades

        return dataframe

    # ----- ADD THIS NEW METHOD -----
    def confirm_trade_entry(self, pair: str, **kwargs) -> bool:
        import os
        import requests
        api_key = os.getenv("REMORA_API_KEY")
        headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
        
        try:
            r = requests.get(
                "https://remora-ai.com/api/v1/risk",
                params={"pair": pair},
                headers=headers,
                timeout=2.0
            )
            return r.json().get("safe_to_trade", True)  # REMORA: Block entry if market is high-risk
        except Exception:
            return True  # REMORA: Fail-open

Integration Notes:

  • Inside your existing populate_entry_trend(), insert the green Remora check just before return dataframe.
  • After that, add the green confirm_trade_entry() method at the same indentation level as your other strategy methods.
  • All green comments are prefixed with # REMORA: so you can easily identify or remove them later.
  • Everything else in your strategy stays unchanged.

Removing Remora is as simple as deleting these lines. No lock-in, fully transparent.

Pair-Specific vs Market-Wide Risk

You can query Remora in two modes:

Pair-specific:

params={“pair”: “BTC/USDT”}

Market-wide (global trade gating):

# No pair parameter

Many users start with market-wide gating to reduce API calls and complexity.

What the API Returns

{
“safe_to_trade”: false,
“risk_score”: 0.75,
“regime”: “bear”,
“volatility”: 0.68,
“reasoning”: [
“High volatility detected”,
“Bear market regime identified”,
“Fear & Greed Index: Extreme Fear”,
“Negative news sentiment”
]
}

This allows debugging blocked trades, auditing decisions, custom logic layered on top, and strategy-specific thresholds.

Part 4: Backtesting & Validation (Reproducible)

Live APIs don’t work in historical backtests – so Remora includes an open-source backtesting framework that reconstructs historical risk signals using the same logic as production.

Repository: github.com/DonaldSimpson/remora-backtests

What It Provides

  • Historical signal reconstruction
  • Baseline vs Remora-filtered comparisons
  • Multiple strategy types
  • Consistent metrics and visualisations

What It Shows

  • Improvements are not strategy-specific
  • Filtering increases during crashes
  • Small trade suppression can meaningfully reduce drawdowns
  • Performance gains come from avoiding bad periods, not over-trading

You’re encouraged to run this yourself and independently verify the impact on your own strategies.

Here’s what comprehensive backtesting across 6 years (2020-2025), 4 different strategies, and 20 test cases has proven:

Overall Performance Improvements

Metric Result
Average Profit Improvement +1.54% (18 out of 20 tests improved – 90% success rate)
Average Drawdown Reduction +1.55% (18 out of 20 tests improved)
Trades Filtered 4.3% (2,239 out of 51,941 total trades)
Best Strategy Improvement +3.20% (BollingerBreakout strategy)
Most Effective Period 2022 Bear Market (16-19% filtering during crashes)

Financial Impact by Account Size

Based on average improvements, here’s the financial benefit on different account sizes:

Account Size Additional Profit Reduced Losses Total Benefit
$10,000 +$154.25 +$154.70 $308.95
$50,000 +$771.25 +$773.50 $1,544.75
$100,000 +$1,542.50 +$1,547.00 $3,089.50
$500,000 +$7,712.50 +$7,735.00 $15,447.50
$1,000,000 +$15,425.00 +$15,470.00 $30,895.00

What These Numbers Mean

  • 4.3% Trade Filtering: Remora prevents trades during dangerous market periods. This is adaptive – during the 2022 bear market, filtering increased to 16-19%, showing Remora becomes more protective when markets are most dangerous.
  • +1.54% Profit Improvement: By avoiding bad trades during high-risk periods, strategies show consistent profit improvements. 90% of tests (18 out of 20) showed improvement.
  • +1.55% Drawdown Reduction: Less maximum loss during unfavorable periods. This is critical for risk management and capital preservation.
  • Best During Crashes: Remora is most effective during bear markets and crashes (2022 showed 16-19% filtering), exactly when you need protection most.

Part 5: Production & Advanced Use

Always fail-open:

except requests.Timeout:
return True

Log decisions:

logger.info(
f”Remora: safe={safe}, risk={risk_score}, regime={regime}”
)

Reduce API load:

  • Cache responses (e.g. 30s)
  • Use market-wide checks
  • Upgrade tier only if needed

Advanced Uses (Optional)

  • Dynamic position sizing based on risk_score
  • Strategy-specific risk thresholds
  • Regime-based strategy switching
  • Trade blocking during macro stress events

These are additive – not required to get value.

Part 6: Technical Implementation Details

Data Pipeline Architecture

Remora’s data pipeline follows a producer-consumer pattern:

  1. Data Collection: Multiple scheduled tasks fetch data from various sources (Binance API, CoinGecko, news APIs, etc.)
  2. Data Storage: Raw data stored in ClickHouse time-series database
  3. Materialized Views: ClickHouse materialized views pre-aggregate data for fast queries
  4. Risk Calculation: Python service calculates risk scores using aggregated data
  5. Caching: Redis caches risk assessments to reduce database load
  6. API Layer: FastAPI serves risk assessments via REST API

ClickHouse Materialized Views

ClickHouse materialized views enable real-time aggregation without query-time computation overhead:

CREATE MATERIALIZED VIEW volatility_1h_mv
ENGINE = AggregatingMergeTree()
ORDER BY (pair, timestamp_hour)
AS SELECT
pair,
toStartOfHour(timestamp) as timestamp_hour,
avgState(price) as avg_price,
stddevSampState(price) as volatility
FROM raw_trades
GROUP BY pair, timestamp_hour;

This allows Remora to provide real-time risk assessments with minimal latency, even when processing millions of data points.

Failover & Redundancy

Each data source has multiple providers with automatic failover. This ensures reliable risk assessments even if individual data sources experience outages or rate limiting.

def get_fear_greed_index():
“””
Fetch Fear & Greed Index with multi-provider failover.
Tries multiple sources until one succeeds.
“””
providers = [
fetch_from_alternative_me,
fetch_from_coinmarketcap,
fetch_from_custom_source,
fetch_from_backup_provider_1,
fetch_from_backup_provider_2,
# … additional providers for redundancy
]

# Try each provider until one succeeds
for provider in providers:
try:
data = provider()
if data and is_valid(data):
return data
except Exception:
continue

# If all providers fail, return None
# The risk calculator handles missing data gracefully
return None

This multi-provider approach ensures:

  • High Availability: If one provider fails, others continue providing data
  • Rate Limit Resilience: Multiple providers mean you’re not dependent on a single API’s rate limits
  • Data Quality: Can validate data across providers and choose the most reliable source
  • Graceful Degradation: If all providers for one signal type fail, the risk calculator continues using other available signals (volatility, regime, sentiment, etc.)

In Remora’s implementation, each signal type (Fear & Greed, news sentiment, funding rates, etc.) has multiple providers. If one data source is unavailable, others continue providing information, ensuring the system maintains reliable risk assessments even during external API outages.

Security & Best Practices

  • API Key Management: Store API keys in environment variables, never in code
  • HTTPS Only: Always use HTTPS for API calls (Remora enforces this)
  • Rate Limiting: Respect rate limits to avoid service disruption
  • Monitoring: Monitor Remora API response times and error rates
  • Fail-Open: Always implement fail-open behaviour – never let Remora block your entire trading system

API Access & Pricing

Remora offers a tiered API access structure designed to accommodate different use cases:

Unauthorized Access (Limited)

  • Rate Limit: 60 requests per minute
  • Use Case: Testing, development, low-frequency strategies
  • Cost: Free – no registration required
  • Limitations: Lower rate limits, no historical data access

Registered Users (Free Tier)

  • Rate Limit: 300 requests per minute (5x increase)
  • Use Case: Production trading, multiple strategies, higher-frequency bots
  • Cost: Free – registration required, no credit card needed
  • Benefits: Higher rate limits, faster response times, priority support

Pro Tier (Coming Soon)

  • Rate Limit: Custom limits based on needs
  • Use Case: Professional traders, institutions, high-frequency systems
  • Features:
    • Customizable risk thresholds and filtering rules
    • Advanced customization options
    • Historical data API access for backtesting
    • Dedicated support and SLA guarantees
    • White-label options
  • Status: Currently in development – contact for early access
Getting Started: Start with the free registered tier – it’s sufficient for most Freqtrade strategies. Upgrade to Pro when you need customization, higher limits, or advanced features.

Getting Started

To get started with Remora for your Freqtrade strategies:

  1. Get API Key: Sign up at remora-ai.com/signup.php (free, no credit card required). Registration gives you 5x higher rate limits (300 req/min vs 60 req/min).
  2. Set Environment Variable: export REMORA_API_KEY="your-api-key-here"
  3. Add Integration: Add the confirm_trade_entry method to your strategy (see color-coded code examples above)
  4. Test: Run a backtest or paper trade to verify integration
  5. Validate with Backtests: Use the remora-backtests repository to run your own strategy with and without Remora, independently verifying the impact
  6. Monitor: Review logs to see Remora’s risk assessments and reasoning

Conclusion

Market regime risk is one of the most common reasons profitable backtests fail live.

Remora adds a thin, transparent, fail-safe risk layer on top of Freqtrade that helps answer whether current market conditions are safe to trade in. It doesn’t replace your strategy – it protects it.

Beyond Freqtrade: While Remora is optimised for Freqtrade users, the same REST API integration pattern works with any trading bot or custom trading system that can make HTTP requests.
Ready to get started? Visit remora-ai.com to get your free API key and start protecting your Freqtrade strategies from high-risk market conditions.

Resources

About the Author: This article was written as part of building Remora, a production-grade market risk engine for algorithmic trading systems. The system is built using modern Python async frameworks (FastAPI), time-series databases (ClickHouse), and MLOps best practices for real-time data aggregation and risk assessment.

Have questions about integrating Remora with Freqtrade? Found this useful? I’d love to hear your feedback or see your integration examples. Feel free to reach out or share your experiences.


Remora-ai







Introducing Remora: Building a Real-Time Market Risk Engine | Tech Deep Dive


Introducing Remora: Building a Real-Time Market Risk Engine

I’ve recently launched a new project: Remora. It’s a service for algorithmic traders that adds a layer of market awareness I felt was missing from tools like Freqtrade – helping strategies actually understand the current market and make smarter trading decisions.

When building and backtesting crypto strategies, I noticed that even the “good” ones would get wrecked by just a few bad trades during wild market swings. Remora is my attempt to fix that: it gives trading bots real-time context so they can avoid high-risk conditions and focus on opportunities that actually make sense.

TL;DR: I built Remora, a production-ready market risk microservice using FastAPI, ClickHouse, and modern MLOps practices. It analyzes dozens of real-time market conditions and returns a simple safe_to_trade signal that can filter out 30-60% of losing trades. The system is now live, handling real-time requests with sub-100ms response times, and includes a complete observability stack with Prometheus and Grafana.

The Problem: Trading Bots Need Market Context

The Challenge

When building algorithmic trading strategies, I noticed a consistent pattern:

  • Good strategies were failing not because of bad logic, but because they traded during terrible market conditions
  • 30-60% of losing trades happened during extreme fear, volatility spikes, choppy markets, or bearish news events
  • Strategies had no awareness of current market regime, volatility levels, or external risk factors
  • Single bad entries during panic conditions could wipe out weeks of gains

Traditional trading bots focus on technical indicators (RSI, MACD, moving averages) but completely ignore:

  • Market regime (bull, bear, choppy, panic)
  • Volatility levels (normal vs extreme)
  • External sentiment (Fear & Greed Index, news sentiment)
  • Macro indicators (VIX, DXY, funding rates)
  • Event flags (extreme fear, panic, bear market signals)
Problem: Trading bots were executing trades during conditions that any experienced trader would avoid. They needed a layer of market awareness to filter out high-risk entries.

What Is Remora?

Remora is a real-time market risk microservice that provides trading bots with market context. It analyzes dozens of market conditions in real-time and returns a simple boolean: safe_to_trade.

Core Features

  • Market Regime Detection: Classifies market conditions (bull, bear, choppy, high_vol, panic, sideways)
  • Volatility Scoring: Calculates and classifies volatility (low, normal, high, extreme) using ATR%, Bollinger width, and returns stddev
  • Composite Risk Score: Weighted risk assessment (0-1) from multiple factors
  • Safe-to-Trade Flag: Boolean signal indicating whether conditions are favorable
  • REST API: Real-time API endpoints for live trading integration
  • Multi-Exchange Support: Works with Kraken, Binance, and other CCXT-compatible exchanges
  • Observability Stack: Prometheus metrics and Grafana dashboards

What Remora Monitors

Remora continuously tracks:

  • Technical Indicators: SMA50/200, ADX, ATR%, RSI, Bollinger Bands
  • Market Regime: Trend classification, momentum, trend strength
  • External Data: Fear & Greed Index, CryptoPanic news sentiment, VIX, DXY
  • On-Chain Metrics: BTC dominance, funding rates, liquidation data
  • Event Flags: Extreme fear, panic, bear market, high volatility signals

Example API Response

Listing 1: Remora risk assessment API response

{
  "safe_to_trade": false,
  "risk_score": 0.77,
  "risk_class": "very_high",
  "regime": "choppy",
  "volatility": 0.047,
  "volatility_classification": "low",
  "risk_confidence": 0.73,
  "trend_classification": "sideways",
  "momentum_classification": "neutral",
  "reasoning": [
    "Extreme Fear (F&G=15) - blocking trades",
    "Trading disabled due to global risk conditions"
  ],
  "blocked_by": ["flag_extreme_fear"],
  "risk_breakdown": {
    "volatility": 0.017,
    "regime": 0.24,
    "trend_strength": 0.1,
    "momentum": 0.066,
    "external": 0.35
  },
  "event_flags": {
    "flag_extreme_fear": true,
    "flag_high_volatility": false,
    "flag_downtrend": false,
    "flag_panic": false
  },
  "recommendation": "no_entry",
  "fear_greed_index": 15,
  "btc_dominance": 56.37,
  "funding_rate": 0.000069,
  "vix": 20.52,
  "dxy": 122.24
}
Key Insight: Remora doesn’t just say yes or no. It returns complete transparency: risk scoring, regime classification, volatility levels, event flags, and human-readable reasoning – so you always know why a trade was blocked.

Tech Stack: FastAPI, ClickHouse, and Modern MLOps

Remora is built with a modern, production-ready tech stack designed for real-time performance and scalability:

Backend & API

  • FastAPI: Modern Python web framework with async support, automatic OpenAPI docs, and excellent performance
  • Python 3.11+: Type hints, async/await, modern language features
  • Pydantic: Data validation and settings management
  • CCXT: Unified cryptocurrency exchange API for multi-exchange support
  • APScheduler: Background task scheduling for data updates

Data Storage & Analytics

  • ClickHouse: Columnar database for time-series data, historical risk metrics, and analytics
  • Materialised Views: Pre-aggregated risk data for fast queries (learned from my ClickHouse MLOps work)
  • CSV Output: File-based output for backtesting compatibility with Freqtrade

Observability & Monitoring

  • Prometheus: Metrics collection and time-series storage
  • Grafana: Real-time dashboards for risk metrics, API performance, and system health
  • Custom Metrics: Risk scores, regime classifications, API latency, data freshness

Infrastructure

  • Docker & Docker Compose: Containerized deployment with observability stack included
  • Uvicorn: ASGI server for FastAPI
  • Environment Variables: Configuration management for different environments

External Data Sources

  • Exchange APIs: Kraken, Binance via CCXT
  • Alternative.me: Fear & Greed Index
  • CryptoPanic: News sentiment analysis
  • Yahoo Finance: VIX, DXY macro indicators
  • Coinglass: Funding rates, liquidation data

Why This Stack? FastAPI provides excellent async performance and automatic API documentation. ClickHouse handles billions of time-series records efficiently. Prometheus/Grafana give real-time visibility into system health. Docker makes deployment trivial. Together, they form a production-ready microservice architecture.

System Architecture

Remora follows a microservice architecture with clear separation of concerns. External data sources feed into background data fetchers, which populate the risk calculation engine. The FastAPI layer serves real-time risk assessments to trading bots via REST API endpoints.

Component Breakdown

Listing 2: Remora service structure

remora_service/
├── app/
│   ├── main.py              # FastAPI application
│   ├── config.py            # Configuration management
│   ├── models.py            # Pydantic models
│   ├── scheduler.py         # Background scheduler
│   ├── metrics.py           # Prometheus metrics
│   ├── engine/              # Risk calculation engine
│   │   ├── risk_calculator.py
│   │   ├── regime_detector.py
│   │   └── volatility_scorer.py
│   ├── data/                # Data fetching and processing
│   │   ├── fetchers/
│   │   └── processors/
│   └── api/                 # API routes
│       └── v1/
│           ├── risk.py
│           ├── regime.py
│           └── volatility.py
├── examples/                # Integration examples
├── tests/                   # Test suite
├── grafana/                 # Grafana dashboards
├── prometheus/              # Prometheus configuration
├── Dockerfile
├── docker-compose.yml
└── requirements.txt

Data Flow

  1. Background Scheduler: Runs every minute (configurable) to fetch fresh market data
  2. Data Fetchers: Collect OHLCV data, external indicators (Fear & Greed, VIX, DXY), and news sentiment
  3. Risk Engine: Calculates technical indicators, classifies regime, scores volatility, computes composite risk
  4. Storage: Results stored in ClickHouse for historical analysis and CSV files for backtesting
  5. API Layer: FastAPI serves real-time risk assessments with sub-100ms response times
  6. Observability: Prometheus collects metrics, Grafana visualizes system health

Implementation Deep Dive

Risk Score Calculation

At the heart of Remora is a composite risk score that quantifies the current market environment. This score combines multiple factors, including volatility, trend strength, momentum, and broader market regimes, into a single 0-1 scale. The higher the score, the riskier the market conditions, helping algorithmic traders avoid potentially catastrophic trades.

Market Regime Classification

Remora continuously evaluates the market and classifies it into regimes such as bull, bear, choppy, or panic. These regimes provide context for trading strategies, so bots can adapt their behaviour dynamically rather than trading blindly. The engine looks at how trends, volatility, and momentum interact to determine the current regime.

Safe-to-Trade Decisions

Using the risk score and regime classification, Remora flags whether market conditions are currently safe for trading. This allows trading bots to pause or reduce exposure during high-risk periods, and resume normal operation when conditions are more favourable. The system also considers external market events to further refine its decisions.

Integration via API

Traders can access Remora’s insights in real time through a simple API. For example, a bot can query the risk engine for a trading pair, receive the current risk assessment, and make informed decisions programmatically.

This design keeps your strategies aware of the market context – the kind of awareness that human traders rely on instinctively, but which most algorithmic bots lack.

Observability and Monitoring

If you don’t already know: I love observability. So this was a good excuse to set up a comprehensive (maybe a little over the top) observability stack for Remora to monitor system health, API performance, and data quality in real-time and in great detail.

Monitoring Infrastructure

The monitoring setup includes:

  • Prometheus: Collects metrics on risk scores, regime classifications, API request rates, latency percentiles, and data freshness from external sources
  • Grafana: Real-time dashboards visualising risk metrics across all tracked pairs, regime distributions, API performance trends, and system health indicators
  • Custom Metrics: Track risk scores per trading pair, volatility classifications, external API error rates, and data age from each source

This observability stack helps ensure the service is performing correctly, data sources are fresh, and API responses are fast. It’s been essential for debugging issues and optimising performance during development and production use.

Results and Impact

Performance Metrics

Metric Value
API Response Time 50-100ms (p95)
Data Update Frequency Every 1 minute (configurable)
Uptime 99.9%+ (with fail-open design)
Concurrent Requests 1000+ req/s (async FastAPI)
Historical Data 619,776 records (2020-2025, 5-min granularity)

Backtesting Results

Backtests across 51,941 trades show:

  • 30-60% of losing trades occur during high-risk conditions that Remora flags
  • Improved win rates when Remora filtering is applied
  • Reduced drawdowns by avoiding entries during extreme volatility
  • Better risk-adjusted returns (Sharpe ratio improvements)

Real-World Usage

Remora is now:

  • Live in production at remora-ai.com
  • Integrated with Freqtrade strategies via REST API
  • Used by traders for both automated and manual trading decisions
  • Open source with Python client library available
Impact: Remora transforms trading bots from blind executors into context-aware systems. By filtering out high-risk entries, strategies can focus on favorable market conditions, leading to better risk-adjusted returns and reduced drawdowns.

Lessons Learned

1. Fail-Open Design Is Critical

Lesson: Remora must never become a single point of failure. If the API is unavailable, strategies should continue trading (fail-open). This ensures Remora enhances strategies without breaking them.

Implementation: All integrations include timeout protection (2s default) and exception handling that defaults to allowing trades.

2. Transparency Builds Trust

Lesson: Traders need to understand why a trade was blocked. Remora returns complete reasoning, risk breakdowns, and event flags – not just a boolean.

Implementation: Every API response includes reasoning array, risk_breakdown by component, and event_flags showing which conditions triggered blocks.

3. Observability Is Not Optional

Lesson: Production systems need real-time visibility. Prometheus and Grafana provide essential insights into system health, API performance, and data freshness.

Implementation: Complete observability stack included in Docker Compose, with pre-configured dashboards for common metrics.

4. Async Architecture Scales

Lesson: FastAPI’s async support enables handling hundreds of concurrent requests with minimal resource usage. This is essential for a microservice that needs to respond quickly.

Implementation: All data fetching and API endpoints use async/await, enabling concurrent request handling.

5. ClickHouse for Time-Series Analytics

Lesson: ClickHouse is perfect for storing and analysing billions of time-series risk records. Materialised views enable fast historical analysis and backtesting.

Implementation: All risk assessments stored in ClickHouse with materialised views for common query patterns (learned from my previous ClickHouse work).

6. Start Simple, Iterate

Lesson: The first version of Remora was much simpler. I started with basic regime detection and volatility scoring, then added external data sources, event flags, and comprehensive reasoning over time.

Implementation: Remora v0.1 had 3 regime types. v0.2 added 6 regime types, external data, event flags, and full observability. Future versions will add ML-based predictions.

Conclusion

Remora transforms trading bots from blind executors into context-aware systems. By providing real-time market risk assessment with complete transparency, it enables strategies to avoid high-risk conditions and focus on favorable market regimes.

Key Technical Achievements:

  • FastAPI async architecture handling 1000+ req/s
  • ClickHouse for efficient time-series storage and analytics
  • Prometheus/Grafana for real-time observability
  • Docker containerization for easy deployment
  • Multi-Exchange support via CCXT

For Your Project:

Open Source Repositories

I’ve created two GitHub repositories to help others get started with Remora:


remora-freqtrade

Complete Freqtrade integration examples and onboarding guides. Includes working strategy templates, configuration examples, and step-by-step tutorials.


remora-backtests

Reproducible backtesting framework with complete methodology, historical data scripts, and visualisation tools. All backtest results are fully reproducible.

Additional Resources

Next Steps: Remora is live and ready to use. If you’re building trading strategies, consider adding market context awareness. Start with one integration, measure the impact, then expand. The fail-open design means you can add Remora without risk – it only enhances, never breaks.

Have you built similar market awareness systems? I’d love to hear about your experiences and any lessons learned. Feel free to reach out or share your story in the comments below.


Insights with MiniLM: Hands-On Text Embeddings for MLOps

From MOT Notes to Insights with MiniLM: A Practical Guide to Text Embeddings

 

Intro: I’m not a data scientist or statistician – I’m a DevOps engineer who got interested in ML through building CarHunch.
 
This post shares what I’ve learned about embeddings through that journey, hopefully presented in a way that other DevOps engineers and people interested in AI/ML can understand and experiment with.
 
The Jupyter notebook is a simplified version of techniques I use in CarHunch at a much larger scale, made quick and easy to run so you can see and the concepts in action.
 

 

Every year, millions of vehicles undergo MOT testing in the UK, generating a massive amount of free-text defect notes that could revolutionize how we understand vehicle maintenance patterns.

 

But there’s a catch – these notes are messy, inconsistent, and nearly impossible to analyze at scale using traditional methods.

 

Consider these real examples from MOT records:

 

“Nearside rear brake pipe corroded”
“Brake hose deteriorated”
“Brakes imbalanced across an axle”
“Headlamp aim too high”
“Exhaust leaking gases”

 

While these notes are invaluable for mechanics, they create a nightmare for data analysis. Every tester phrases things slightly differently, and traditional keyword searches miss the bigger picture. How do you find all brake-related issues when they’re described in dozens of different ways?

 

The answer lies in embeddings – a powerful technique that transforms unstructured text into structured, analyzable data.

 

Embeddings convert text into numeric vectors, placing similar meanings close together in high-dimensional space. With embeddings, “brake hose deteriorated” and “brake pipe corroded” become neighbors – even though the wording differs significantly. This opens up entirely new possibilities for analyzing text data at scale.

 

This post demonstrates a practical, hands-on approach using MiniLM to:

 

  • – Transform messy MOT defect notes into structured embeddings
  • – Cluster similar defects automatically using machine learning
  • – Run semantic search to find related issues by meaning, not just keywords
  • – Visualize the results to understand patterns in vehicle defects

 


Try the Interactive Demo

 

The demonstration is a Jupyter notebook that you can open directly in Google Colab – no setup required on your local machine.

 

Important Note about Google Colab: When you click the link below, you’ll be prompted to sign in to Google. This is completely normal and free – Google Colab requires a Google account to save your work and provide computational resources. Your data remains private, and you can always download your work or run it locally if you prefer.

 

Open the demo in Google Colab

 

Or if you’d rather you can view the repository and run it locally:

github.com/DonaldSimpson/mot_embeddings_demo

 


How It Works: From Text to Insights

 
The demo is comprised of three key steps, each building on the previous one:
 

Step 1: Text to Numbers

 

The MiniLM model (specifically “all-MiniLM-L6-v2”) converts each defect note into a 384-dimensional vector. Think of this as creating a unique “fingerprint” for each piece of text that captures its semantic meaning. Notes about similar issues will have similar fingerprints.

 

Step 2: Finding Patterns

 

K-means clustering automatically groups these fingerprints together. The algorithm discovers that “brake pipe corroded” and “brake hose deteriorated” belong in the same cluster, while “headlamp aim too high” forms its own group. You’ll see this visualized in a 2D scatter plot using PCA (Principal Component Analysis).
 

Step 3: Intelligent Search

 

Semantic search uses cosine similarity to find the most relevant notes for any query. When you search for “brake failure,” it doesn’t just look for those exact words – it finds notes that are semantically similar, even if they use completely different terminology.
 

The notebook demonstrates this with a carefully curated set of real MOT defect notes, including:

 

  • Brake-related issues (pipes, hoses, imbalance)
  • Lighting problems (headlamp aim, functionality)
  • Steering and suspension defects
  • Exhaust system issues
  • Tyre wear problems

 

Each example is designed to show how embeddings capture meaning beyond literal word matching.

 


Hands-On Experimentation: Make It Your Own

 

This isn’t just a static demonstration – it’s a tool for discovery. The notebook is designed for active exploration, and the best way to understand embeddings is to experiment with them yourself.

 

Here’s a roadmap for turning this demo into a more personal learning experience:

 

    1. Start with your own data  
      The most rewarding experiment is using your own MOT notes. Have you had an MOT recently? Try adding those defect notes to see how they cluster with the sample data. You might be surprised by the patterns that emerge.

      notes = [
          "Engine oil leak",
          "Headlight not working", 
          "Nearside front tyre bald",
          "Steering pulling to the left",
          "Brake discs worn and pitted",
          # Add your own notes here...
          "Your defect notes here",
          "More of your defect notes"
      ]

      Suggestion: Try adding notes from different vehicle types (cars, vans, motorcycles) to see if the clustering adapts to different contexts.

 

    1. Play with clustering granularity 
      This is where things can get really interesting. Change the number of clusters and watch how the groupings shift:

      # Try different values: 2, 3, 4, 5, 6...
      kmeans = KMeans(n_clusters=3, random_state=42)

      This uses scikit-learn’s KMeans implementation.

      Start with 3 clusters and gradually increase. You’ll see how the algorithm balances between creating too many small groups versus too few large ones. The visualization will show you exactly how your notes are being grouped – some results might surprise you!

    2.  

    3. Make your own queries 
      The semantic search feature is incredibly powerful. Try queries that test the model’s understanding:

      # Test the model's semantic understanding
      query = "tyre wear"           # Should find tyre-related issues
      query = "steering problem"    # Should find steering defects  
      query = "engine issue"        # Should find engine problems
      query = "safety concern"      # Should find safety-related defects

      Try abstract concepts like “safety concern” or “performance issue” to see how well the model understands context beyond literal word matching.

 

  1. Experiment with different models (for the curious) 
    If you want to see how different embedding models perform, try swapping out MiniLM:

    # Larger, potentially more accurate model 
    model = SentenceTransformer("multi-qa-mpnet-base-dot-v1")
     
    # Or try a model specifically trained for technical text
     
    model = SentenceTransformer("all-mpnet-base-v2")

    These are SentenceTransformer models from the Hugging Face model hub.

    Compare the results – do the clusters change? Are the search results more relevant? This is a great way to understand how model choice affects performance.

  2.  

  3. Scale up and discover patterns 
    Once you’re comfortable with the basics, try working with larger datasets. The DVLA MOT dataset contains millions of records, and you’ll start to see fascinating patterns emerge:

    • Which vehicle makes have the most brake-related failures?
    • Do certain types of defects cluster by geographic region?
    • How do defect patterns change over time?

    This is where embeddings really shine – they can reveal insights that would be impossible to find with traditional keyword searches.

 

Each of these modifications provides immediate feedback – you can see the results directly in the notebook, making it an ideal learning environment.


 

Real-World Applications: CarHunch

 

For CarHunch, I’ve been applying this same approach to millions of MOT records. Embeddings make it possible to:

 

  • Standardize messy defect notes into consistent categories
  • Compare your car’s defects with similar vehicles
  • Surface patterns across the UK fleet (e.g., which makes and models fail most often on brakes)

 

A Surprising ‘Discovery’: The Land Rover Defender Seatbelt issue

DEFENDER2.NET - View topic - Seatbelt catching on @#$!

Sometimes, the most interesting insights come from patterns you’d never expect to find. Take my own Land Rover (original) Defender 110 as an example. When I analyzed its MOT history alongside thousands of similar vehicles, I discovered something surprising:

 

seatbelt damage is the number 1 most common issue for Defenders – not the engine, suspension or rust problems you’d probably expect from a rugged old off-road vehicle!

 

This revelation only became apparent through the kind of clustering and semantic analysis we’re exploring in this notebook. Traditional keyword searches would have missed this pattern entirely, because MOT testers describe seatbelt issues in dozens of different ways:

 

“Seatbelt webbing frayed”
“Driver’s seatbelt damaged”
“Seatbelt retraction mechanism faulty”
“Belt webbing showing signs of wear”

 

But embeddings revealed the underlying pattern: all these different descriptions clustered together as the same fundamental issue.

 

Even more fascinating, the analysis showed this is a design quirk specific to Defenders; the front seatbelts naturally fall right in to the door jambs as there’s nowhere else for them to go (plus the tensioners are weak/slow), so when the doors are closed they get trapped, causing accelerated wear that doesn’t occur in most other vehicles.

 

The Bigger Picture: How This Could Transform Automotive Design

 

This Defender example hints at something much larger: embeddings could impact how car manufacturers identify design flaws and improve vehicle quality. Imagine if every manufacturer had access to this kind of analysis across their entire fleet:

  • Early Warning System: Spot recurring issues before they become widespread problems
  • Design Validation: Verify that design changes actually solve the problems they’re meant to address
  • Cost-Benefit Analysis: Quantify the real-world impact of design decisions on maintenance costs
  • Competitive Intelligence: Understand how your vehicles compare to competitors in terms of reliability

 

Traditional quality control relies on warranty claims and customer complaints – reactive data that comes too late. But MOT data is generated continuously, providing a real-time view of how vehicles perform in the wild. The challenge has always been extracting meaningful insights from the unstructured text that testers write.

 

This is exactly the kind of insight that would be impossible to discover without the semantic understanding that embeddings provide. You can explore this particular analysis yourself with CarHunch’s enhanced hunches feature, which uses the same techniques demonstrated in this notebook.

 

This example is just a small subset of what that larger platform does, showing how embeddings can transform unstructured text data into actionable insights that reveal patterns invisible to traditional analysis methods.

 


From Experimentation to Production

 

Once you’ve experimented with the notebook and understand how embeddings work, you might be wondering: “How do I turn this into a production system?” This is where the journey from data science experimentation to operational ML begins.

 

In my previous post, “MLOps for DevOps Engineers – MiniLM & MLflow demo”, I showed how to take these same embedding techniques and build them into a proper MLOps pipeline. That post covers:

 

  • Containerizing the embedding pipeline with Docker
  • Tracking experiments and model versions with MLflow
  • Automating the entire workflow with Makefiles
  • Building quality gates and reproducibility into the process

 

Think of it this way: this notebook is your playground for understanding embeddings, while the MLOps post shows you how to turn that playground into a production system. The same MiniLM model that powers this interactive demo is the foundation for the automated pipeline in the MLOps example.

 

For DevOps engineers, this represents a natural progression: start with hands-on experimentation to understand the concepts, then apply your existing automation and infrastructure skills to make it production-ready.

 


Key Takeaways

 

For DevOps and SRE engineers curious about machine learning, embeddings represent an excellent entry point:

 

  • No GPU required for basic experimentation
  • Easy to run locally or in cloud environments
  • Immediately useful for messy, real-world text data
  • Natural bridge to production MLOps workflows

 

Give the notebook a try, experiment with your own MOT notes, and discover what insights you can uncover. When you’re ready to take it further, the MLOps post will show you how to automate and scale these techniques.

 

Open the demo in Google Colab

 


 

Contains public sector information licensed under the Open Government Licence v3.0.

MLOps for DevOps Engineers – MiniLM & MLflow demo

MLOps for DevOps Engineers – MiniLM & MLflow pipeline demo

 

As a DevOps and SRE engineer, I’ve spent a lot of time building automated, reliable pipelines and cloud platforms. Over the last couple of years, I’ve been applying the same principles to machine learning (ML) and AI projects.

 

One of those projects is CarHunch, a vehicle insights platform I developed. CarHunch ingests and analyses MOT data at scale, using both traditional pipelines and applied AI. Building it taught me first-hand how DevOps practices map directly onto MLOps: versioning datasets and models, tracking experiments, and automating deployment workflows. It’a a new and exciting area but the core idea is very much the same, with some interesting new tools and concepts added.

 

To make those ideas more approachable for other DevOps engineers, I have put together a minimal, reproducible demo using MiniLM and MLflow.

 

You can find the full source code here:

github.com/DonaldSimpson/mlops_minilm_demo

 

The quick way: make run

The simplest way to try this demo is with the included Makefile; that way all you need is Docker installed

# clone the repo
git clone https://github.com/DonaldSimpson/mlops_minilm_demo.git

cd mlops_minilm_demo

# build and run everything (training + MLflow UI)
make run

 

That one ‘make run’ command will:

  • – Spin up a containerised environment
  • – Run the demo training script (using MiniLM embeddings + Logistic Regression)
  • – Start the MLflow tracking server and UI

 

Here’s a quick screngrab of it running in the console:

Once it’s up & running, open
http://localhost:5001
in your browser to explore logged experiments

 

What the demo shows

– MiniLM embeddings turn short MOT-style notes (e.g. “brakes worn”) into vectors

– A Logistic Regression classifier predicts pass/fail

– Parameters, metrics (accuracy), and the trained model are logged in MLflow

– You can inspect and compare runs in the MLflow UI – just like you’d review builds and artifacts in CI/CD

– Run detail; accuracy metrics and model artifact stored alongside parameters

 

Here are screenshots of the relevant areas from the MLFlow UI:











 

Why this matters for DevOps engineers

    • Familiar workflows: MLflow feels like Jenkins/GitHub Actions for models – every run is logged, reproducible, and auditable

 

    • Quality gates: just as builds pass/fail CI, models can be gated by accuracy thresholds before promotion

 

    • Reproducibility: datasets, parameters and artifacts are versioned and tied to each run

 

    • Scalability: the same demo pattern can scale to real workloads – this is a scaled down version of my local process

 

 

Other ways to run it

 

If you prefer, the repo includes alternatives:

 

    • Python venv: create a virtualenv, install requirements.txt, run train_light.py

 

    • Docker Compose: build and run services with docker-compose up --build

 

    • Make targets: make train_light (quick run) or make train (full run)

 

These are useful if you want to dig a little deeper and see exactly what’s happening

 

Next steps

Once you’re comfortable with this small demo, natural extensions are:

 

    • – Swap in a real dataset (e.g. DVLA MOT data)

 

    • – Add data validation gates (e.g. Great Expectations)

 

    • – Introduce bias/fairness checks with tools like Fairlearn

 

    • – Run the pipeline in Kubernetes (KinD/Argo) for reproducibility

 

    • – Hook it into GitHub Actions for end-to-end CI/CD

 

 

Closing thoughts

DevOps and MLOps share the same DNA: versioning, automation, observability, reproducibility. This demo repo is a small but practical bridge between the two

 

Working on CarHunch gave me the chance to apply these ideas in a real platform. This demo distills those lessons into something any DevOps engineer can try locally.

 

Try it out at github.com/DonaldSimpson/mlops_minilm_demo and let me know how you get on

 

CarHunch – Vehicle Insights Platform

CarHunch Logo

Turning billions of MOT and accident records into real-time vehicle insights.

Visit the live project here:

www.carhunch.com


What CarHunch Does

  • Aggregates billions of MOT test results and STATS19 UK accident records.
  • Provides real-time analytics on vehicle makes, models, years, and conditions.
  • Compares a specific car against similar vehicles (make/model/year).
  • Highlights common MOT failures and safety risks for different vehicles.

How It Works

CarHunch is powered by a ClickHouse data warehouse for ultra-fast queries, with:

  • Python ETL pipelines for MOT and accident data ingestion.
  • Incremental updates from DVLA bulk & delta files.
  • Redis caching for instant lookups.
  • Machine learning (MiniLM embeddings + clustering) to spot defect patterns.
  • LLM integration (LLaMA) to generate natural-language insights.

Example Insights

“Your 2010 Ford Focus has a 28% higher MOT failure rate than average for similar cars, mainly due to suspension wear.”

“BMW 3 Series (2008–2012) commonly fail MOTs due to brake issues around 80,000 miles.”

“Motorcycles show a different pattern of MOT failures compared to cars, with lighting and tyre defects being most common.”

Technical Overview

CarHunch isn’t just about insights — it’s also a demonstration of building a modern, high-performance OLAP data platform from the ground up.

  • Database: ClickHouse OLAP warehouse for real-time analytics on billions of records.
  • ETL: Python pipelines ingesting DVLA MOT bulk/delta files and STATS19 accident datasets.
  • Data Modeling: Normalised vehicle/test/defect schema with indexing and partitioning for query performance.
  • APIs: REST endpoints (Flask/FastAPI) serving real-time queries to front-end applications.
  • Caching: Redis for ultra-fast repeated lookups.
  • Machine Learning: MiniLM embeddings + HDBSCAN clustering for identifying defect patterns and grouping similar vehicles.
  • LLM Integration: Local LLaMA models for natural-language explanations and summaries.
  • Deployment: Dockerised services on a Proxmox node, easily portable to cloud infrastructure.
  • Monitoring: Logging & system metrics (rsyslog, lm-sensors) for reliability and performance tracking.

Why CarHunch?

CarHunch shows how big data + AI can turn raw government datasets into meaningful insights that benefit both consumers and the automotive industry.

👉 Explore more at

CarHunch.com

CarHunch Screenshot

 
Get in touch
if you’d like to collaborate or learn more.

Integrating Solana with GitHub Workflows for Enhanced CI/CD

Intro

Being a fan of Solana and interested in exploring and using the technology, I wanted to find some practical use for it in my role as a DevOps Engineer.

This post attempts to do that, by integrating Solana in to a CI/CD workflow to provide an audit of build artefacts. Yes, there are many other ways & tools you could do this, but I found this particular combination interesting.

Overview

Solana is a high-performance blockchain platform known for its speed and scalability.

Integrating Solana with GitHub Workflows can bring a new level of security, transparency, and efficiency to your CI/CD pipelines.

This blog post demonstrates how to leverage Solana in a GitHub Workflow to enhance your development and deployment processes.

What is Solana?

Solana is a decentralised blockchain platform designed for high throughput and low latency. It supports smart contracts and decentralized applications (dApps) with a focus on scalability and performance. Solana’s unique consensus mechanism, Proof of History (PoH), allows it to process thousands of transactions per second.

Why Integrate Solana with GitHub Workflows?

Integrating Solana with GitHub Workflows can provide several benefits:

  • Immutable Build Artifacts: Store cryptographic hashes of build artifacts on the Solana blockchain to ensure their integrity and immutability.
  • Automated Smart Contract Deployment: Use Solana smart contracts to automate deployment processes.
  • Transparent Audit Trails: Record CI/CD pipeline activities on the blockchain for transparency and auditability.

Setting Up Solana in a GitHub Workflow

Let’s walk through an example of how to integrate Solana with a GitHub Workflow to store build artifact hashes on the Solana blockchain.

Step 1: Install Solana CLI

Ensure you have the Solana CLI installed on your local machine or CI environment:

sh -c "$(curl -sSfL https://release.solana.com/v1.8.0/install)"
Step 2: Set Up a Solana Wallet

Then, you need a Solana wallet to interact with the blockchain. You can use the Solana CLI to create a new wallet:

solana-keygen new --outfile ~/my-solana-wallet.json

This command generates a new wallet and saves the keypair to ~/my-solana-wallet.json.

Step 3: Create a GitHub Workflow

Create a new GitHub Workflow file in your repository at .github/workflows/solana.yml:

name: Solana Integration

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Set up Solana CLI
        run: |
          sh -c "$(curl -sSfL https://release.solana.com/v1.8.0/install)"
          export PATH="/home/runner/.local/share/solana/install/active_release/bin:$PATH"
          solana --version

      - name: Build project
        run: |
          # Replace with your build commands
          echo "Building project..."
          echo "Build complete" > build-artifact.txt

      - name: Generate SHA-256 hash
        run: |
          sha256sum build-artifact.txt > build-artifact.txt.sha256
          cat build-artifact.txt.sha256

      - name: Store hash on Solana blockchain
        env:
          SOLANA_WALLET: ${{ secrets.SOLANA_WALLET }}
        run: |
          echo $SOLANA_WALLET > ~/my-solana-wallet.json
          solana config set --keypair ~/my-solana-wallet.json
          solana airdrop 1
          HASH=$(cat build-artifact.txt.sha256 | awk '{print $1}')
          solana transfer <RECIPIENT_ADDRESS> 0.001 --allow-unfunded-recipient --memo "$HASH"
Step 4: Configure GitHub Secrets

To securely store your Solana wallet keypair, add it as a secret in your GitHub repository:

  1. Go to your repository on GitHub.
  2. Click on Settings.
  3. Click on Secrets in the left sidebar.
  4. Click on New repository secret.
  5. Add a secret with the name SOLANA_WALLET and the content of your ~/my-solana-wallet.json file.
Step 5: Run the Workflow

Push your changes to the main branch to trigger the workflow. The workflow will:

  1. Check out the code.
  2. Set up the Solana CLI.
  3. Build the project.
  4. Generate a SHA-256 hash of the build artifact.
  5. Store the hash on the Solana blockchain.

Example Output and Actions

After the workflow runs, you can verify the transaction on the Solana blockchain using a block explorer like Solscan. The memo field of the transaction will contain the SHA-256 hash of the build artifact, ensuring its integrity and immutability.

Example Output:
Run sha256sum build-artifact.txt > build-artifact.txt.sha256
b1946ac92492d2347c6235b4d2611184a1e3d9e6 build-artifact.txt
Run solana transfer <RECIPIENT_ADDRESS> 0.001 --allow-unfunded-recipient --memo "b1946ac92492d2347c6235b4d2611184a1e3d9e6"
Signature: 5G9f8k9... (shortened for brevity)
Possible Actions:
  • Verify Artifact Integrity: Use the stored hash to verify the integrity of the build artifact before deployment.
  • Audit Trail: Maintain a transparent and immutable audit trail of all build artifacts.
  • Automate Deployments: Extend the workflow to trigger automated deployments based on the stored hashes.

Conclusion

Integrating Solana with GitHub Workflows provides a powerful way to enhance the security, transparency, and efficiency of your CI/CD pipelines.

By leveraging Solana’s blockchain technology, you can ensure the integrity and immutability of your build artifacts, automate deployment processes, and maintain transparent audit trails.

I have used solutions similar to this previously; by automatically adding a containers hash to an immutable database when it passes testing, while at the same time ensuring that the only images permissable for deployment in the next environment up (e.g. Production) exist on that list, you can (at least help to) ensure that only approved code is deployed.

If you’d like to learn more about Solana they have some great documentation and examples: https://solana.com/docs/intro/quick-start

Enhancing CI/CD Pipelines with Immutable Build Artifacts Using Crypto Technologies

Intro:

In the ever-evolving landscape of software development, ensuring the integrity and security of build artifacts is paramount. As CI/CD pipelines become more sophisticated, integrating cryptocurrency technologies can provide a robust solution for managing and securing build artifacts. This blog post delves into the concept of immutable build artifacts and how crypto technologies can enhance CI/CD pipelines.

Understanding CI/CD Pipelines

CI/CD pipelines are automated workflows that streamline the process of integrating, testing, and deploying code changes. They aim to:

  • Continuous Integration (CI): Automatically integrate code changes from multiple contributors into a shared repository, ensuring a stable and functional codebase.
  • Continuous Deployment (CD): Automatically deploy integrated code to production environments, delivering new features and fixes to users quickly and reliably.

The Importance of Immutable Build Artifacts

Build artifacts are the compiled binaries, libraries, and other files generated during the build process. Ensuring these artifacts are immutable—unchangeable once created—is crucial for several reasons:

  • Security: Prevents tampering and unauthorized modifications.
  • Reproducibility: Ensures that the same artifact can be deployed consistently across different environments.
  • Auditability: Provides a clear and verifiable history of artifacts.

Leveraging Crypto Technologies for Immutable Build Artifacts

Cryptocurrency technologies, particularly blockchain, offer unique advantages for managing build artifacts:

  • Decentralization: Distributes data across multiple nodes, reducing the risk of a single point of failure.
  • Immutability: Ensures that once data is written, it cannot be altered or deleted.
  • Transparency: Provides a transparent and auditable history of all transactions.

Implementing Immutable Build Artifacts in CI/CD Pipelines

  1. Generate Build Artifacts: During the CI process, generate the build artifacts as usual.
   # Example: Building a Docker image
   docker build -t my-app:latest .
  1. Create a Cryptographic Hash: Generate a cryptographic hash (e.g., SHA-256) of the build artifact to ensure its integrity.
   # Example: Generating a SHA-256 hash of a Docker image
   docker save my-app:latest | sha256sum
  1. Store the Hash on a Blockchain: Store the cryptographic hash on a blockchain to ensure immutability and transparency.
   # Example: Using a blockchain-based storage service
   blockchain-store --hash <generated-hash> --metadata "Build #123"
  1. Retrieve and Verify the Hash: When deploying the artifact, retrieve the hash from the blockchain and verify it against the artifact to ensure integrity.
   # Example: Verifying the hash
   retrieved_hash=$(blockchain-retrieve --metadata "Build #123")
   echo "<artifact-hash>  my-app.tar.gz" | sha256sum -c -

Example Workflow

  1. CI Pipeline:
  • Build the artifact (e.g., Docker image).
  • Generate a cryptographic hash of the artifact.
  • Store the hash on a blockchain.
  1. CD Pipeline:
  • Retrieve the hash from the blockchain.
  • Verify the artifact’s integrity using the retrieved hash.
  • Deploy the verified artifact to the production environment.

Benefits of Using Immutable Build Artifacts

  • Enhanced Security: Blockchain’s immutable nature ensures that build artifacts are secure and tamper-proof.
  • Improved Reproducibility: Immutable artifacts guarantee consistent deployments across different environments.
  • Increased Transparency: Blockchain provides a transparent and auditable history of all build artifacts.

Conclusion

Integrating cryptocurrency technologies with CI/CD pipelines to manage immutable build artifacts offers a range of benefits that enhance security, reproducibility, and transparency. By leveraging blockchain’s decentralized and immutable nature, organizations can ensure the integrity and authenticity of their build artifacts, providing a robust foundation for their CI/CD processes.

As the software development landscape continues to evolve, embracing these cutting-edge technologies will be crucial for maintaining a competitive edge and ensuring the reliability and security of software deployments. By implementing immutable build artifacts, organizations can build a more secure and efficient CI/CD pipeline, paving the way for future innovations.

ArgoCD and K8sGPT with KinD

This post covers a lot (very quickly and reasonably easily);

It starts with using Kuberenets in Docker (KinD) to create a minimal but functional local Kubernetes Cluster.
Then, ArgoCD is setup and a sample app is deployed to the cluster.
Finally, k8sgpt is configured and a basic analysis of the cluster is run.

The main point of all of this was to try out k8sgpt in a safe and disposable environment.

Step 1: Install kind

First, ensure you have kind installed.

KinD can be installed quickly and easily with just the following commands:

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.17.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

Check out this older post for more detail on KinD:
https://www.donaldsimpson.co.uk/2023/08/09/kind-local-kubernetes-with-docker-nodes-made-quick-and-easy/

Step 2: Create a kind Cluster

Create a new kind cluster:

kind create cluster --name argocd-cluster

Step 3: Install kubectl

Ensure you have kubectl installed. You can install it using the following command:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/

Step 4: Install ArgoCD

  1. Create the argocd namespace:
kubectl create namespace argocd
  1. Install ArgoCD using the official manifests:
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Step 5: Access the ArgoCD API Server

  1. Forward the ArgoCD server port to localhost:
kubectl port-forward svc/argocd-server -n argocd 8080:443
  1. Retrieve the initial admin password:
kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d; echo

Step 6: Login to ArgoCD

  1. Open your browser and navigate to https://localhost:8080.
  2. Login with the username admin and the password retrieved in the previous step.

Step 7: Install argocd CLI

  1. Download the argocd CLI:
curl -sSL -o argocd https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
chmod +x argocd
sudo mv argocd /usr/local/bin/
  1. Login using the argocd CLI:
argocd login localhost:8080
  1. Set the admin password (optional):
argocd account update-password

Step 8: Deploy an Application with ArgoCD

  1. Create a new application:
argocd app create guestbook \
    --repo https://github.com/argoproj/argocd-example-apps.git \
    --path guestbook \
    --dest-server https://kubernetes.default.svc \
    --dest-namespace default
  1. Sync the application:
argocd app sync guestbook
  1. Check the application status:
argocd app get guestbook

Step 9: Install K8sGPT

  1. Install K8sGPT CLI:
curl -Lo k8sgpt https://github.com/k8sgpt-ai/k8sgpt/releases/latest/download/k8sgpt-linux-amd64
chmod +x k8sgpt
sudo mv k8sgpt /usr/local/bin/
  1. Configure K8sGPT:
k8sgpt auth --kubeconfig ~/.kube/config

Step 10: Inspect the Cluster with K8sGPT

  1. Run K8sGPT to inspect the cluster:
k8sgpt analyze

Example Output and Possible Associated Actions

Example Output:

[INFO] Analyzing cluster…

[INFO] Found 3 issues in namespace default:

[WARNING] Pod guestbook-frontend-5d8d4f5d6f-abcde is in CrashLoopBackOff state

[WARNING] Service guestbook-frontend is not reachable

[INFO] Deployment guestbook-frontend has 1 unavailable replica

Associated Actions:

  1. Pod in CrashLoopBackOff State:
    • Action: Check the logs of the pod to identify the cause of the crash.
    • Command: kubectl logs guestbook-frontend-5d8d4f5d6f-abcde -n default
    • Possible Fix: Resolve any issues found in the logs, such as missing environment variables, incorrect configurations, or application errors.
  2. Service Not Reachable:
    • Action: Verify the service configuration and ensure it is correctly pointing to the appropriate pods.
    • Command: kubectl describe svc guestbook-frontend -n default
    • Possible Fix: Ensure the service selector matches the labels of the pods and that the pods are running and ready.
  3. Deployment with Unavailable Replica:
    • Action: Check the deployment status and events to understand why the replica is unavailable.
    • Command: kubectl describe deployment guestbook-frontend -n default
    • Possible Fix: Address any issues preventing the deployment from scaling, such as resource constraints or scheduling issues.

Conclusion

Ok, addmitedly that was a bit of a whirlwind, but if you followed it you have successfully deployed ArgoCD to a kind cluster, deployed an application using ArgoCD to that new cluster, then inspected the cluster & app using K8sGPT.


The example output and associated actions provide guidance on how to address common issues identified by K8sGPT.


This setup allows you to manage your applications and monitor the health of your Kubernetes cluster effectively, and being able to spin up a disposable cluster like this is handy for many reasons.

No Man’s Sky Save Editor on Mac

Quick notes on using the No Man’s Sky Save Editor on Mac

Make a local directory to install and keep the required files in

mkdir NMSEditor; cd NMSEditor 

download the jar file from the GitHub repo:

https://github.com/goatfungus/NMSSaveEditor/blob/master/NMSSaveEditor.jar

or if you’re happier using the command line:

curl -L -O https://github.com/goatfungus/NMSSaveEditor/raw/master/NMSSaveEditor.jar

Next, you need java installed to run the jar file with, the easist way for this (and for adding lots of other useful tools to your Mac) is to use HomeBrew, install instructions for that are here:

https://brew.sh/

When you’ve got brew setup, you can then install java with this command:

brew install java

As recommended by brew, then run the following (or similar, depending on your shell) to update your shell & path;

echo 'export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"' >> ~/.zshrc

Remember to open a new shell/terminal session to pick up this change

NMS Save file location

The Steam save file location on Mac is the equivalent of this, you need to update for your user name and whatever your “st_xxx” numbers are…

/Users/<YOURUSERNAME>/Library/Application Support/HelloGames/NMS/st_<my_numbers>

to run the save editor, you can now just do:

java -jar NMSSaveEditor.jar 

when it opens, select the path to your save file based on the above, choose a save slot, modify as desired… and remember to take frequent backups

Frigate – object detection and notifications

Intro

My notes on setting up Frigate NVR for a home CCTV setup.

The main focus of this post is on object detection (utilising a Google Coral TPU) and configuring notifications to Amazon Fire TVs (and other devices) via intregration with HomeAssistant.

There’s a lot to cover and no point in reproducing the existing documentation, you can find full details & info on setting up the main components here:

ZoneMinder
Google Coral Edge TPU
Frigate
HomeAssistant

Background

I used Zoneminder for many years to capture and display my home CCTV cameras. There are several posts – going back to around 2016 – on this site under the ZoneMinder category here

This worked really well for me all that time, but I was never able to setup Object Detection in a way I liked – it can be done in a number of different ways, but everything I tried out was either very resource intensive, required linking to Cloud services like TensorFlow for processing, or was just too flaky and unreliable. It was fun trying them out, but none of them ever suited my needs. Integration and notification options were also possible, but were not straightforward.

So, I eventually took the plunge and switched to Frigate along with HomeAssistant. There was a lot to learn and figure out, so I’m posting some general info here in case it helps other people – or myself in future when I wonder why/how I did things this way….

Hardware

I have 4 CCTV cameras, these are generic and cheap 1080p Network IP cameras, connected via Ethernet. I don’t permit them any direct access to the Internet for notifications, updates, event analysis or anything.

I ran ZoneMinder (the server software that manages and presents the feeds from the cameras) on various hardware over the years, but for the Frigate and HomeAssistant setup I have gone for an energy-efficient and quiet little “server” – an HP ProDesk 600 G1 Mini – it’s very very basic and very low powered… and cost £40 on eBay:

After testing Object Detection using the CPU (this is waaaay too much load for the CPU to cope with longer-term, but really helps to test proves the concept) I have since added a Google Coral Edge TPU to the host via USB. This enables me to offload the detection/inference work to the TPU and spare the little CPU’s energy for other tasks:

Objectives

My key goals here were to:

  1. Setup and trial Frigate – to see if it could fit my requirements and replace ZoneMinder
  2. Add Object Detection – without having to throw a lot of hardware at it or use Cloud Services like TensorFlow
  3. Integrate with HomeAssistant – I’d been wanting to try this for a while, to integrate my HomeKit devices with other things like Sonos, Amazon Fire TVs, etc

Note that you do not need to use HomeAssistant or MQTT in order to use or try Frigate, it can run as a standalone insatnce if you like. Frigate also comes with its own web interface which is very good, and I run this full-screen/kiosk mode on one of my monitors.

Setup and trial Frigate: setting up Frigate was easy, I went for Ubuntu on my host and installed Docker on that, then configured Frigate and MQTT containers to communicate. These are both simply declared in the Frigate config like this:

mqtt:
  host: 192.168.0.27
detectors:
  coral:
    type: edgetpu
    device: usb

Add Object Detection: with Frigate, this can be done by a Google Coral Edge TPU (pic above) – more info here: https://coral.ai/products/accelerator/ and details on my config below. I first trialled this using the host CPU and it ‘worked’ but was very CPU intensive: adding the dedicated TPU makes a massive difference and inference speeds are usually around 10ms for analysis of 4 HD feeds. This means the host CPU is free to focus on running other things (which is just as well given the size of the thing).

    objects:
      track:
        - person
        - dog
        - car
        - bird
        - cat

https://docs.frigate.video/configuration/objects

Integrate with HomeAssistant : Added the HomeAssistant Docker instance to my host, then ran and configured MQTT container for Frigate then configured Frigate + HomeAssistant to work together. This was done by first installing HACS in HA, then using the Frigate Integration as explained here:
https://docs.frigate.video/integrations/home-assistant/

Setup Notifications

Phone notifications – I have previosuly had (and posted about my) issues with CGNAT and expected I would need to set up and ngrok tunnel and certs and jump through all sorts of hoops to get HA working remotely.

HA offers a very simple Cloud Integration via https://www.nabucasa.com/

I trialled this and was so impressed I have already signed up for a year – it’s well worth it for me and makes things much simpler. Phone notifications can be setup under HomeAssistant > Settings > Automations and Scenes > Frigate Notifications – after installing the Frigate Notifications Bueprint via HACS.

I can now open HomeAssistant on my phone from anywhere in the World and view a dashboard that has live feeds from my CCTV cameras at home. I have also set it up to show recently detected objects from certain cameras too.


Amazon FireTV notifications – I have just setup the sending of notifications to the screen of my Amazon Fire TV, this was done by first installing this app on the device:
https://www.amazon.com/Christian-Fees-Notifications-for-Fire/dp/B00OESCXEK
Then installing
https://www.home-assistant.io/integrations/nfandroidtv/
on HA and configuring Notifications as described there. I now get a pop-up window on my projector screen whenever there’s someone at my front gate.

This is a quick (and poor quality) pic of my projector screen (and chainsaw collection) with an Amazon Fire TV 4k displaying a pop-up notification in the bottom-right corner:

This means I now don’t need to leave a monitor on showing my CCTV feeds any more, as I am notified either via my mobile or on screen. And my notifications are only set up for specific object types – people & cars, and not for things it picks up frequently that I don’t want to be alerted on, like birds or passing sheep or cows.

Minor Apple Watch update – these notifications are also picked up on my Apple Watch, which is set to display my phone notifications. So I also get a short video clip of the key frames which is pretty awesome and works well.


My Frigate Config – here’s an example from the main “driveway” camera feed, this is the one I want to be montoring & ntoified about most. It’s using RTSP to connect, record and detect the listed object types that I am interested in:

  driveway:
    birdseye:
      order: 1  
    enabled: True
    ffmpeg:
      inputs:
        - path: rtsp://THEUSER:THEPASSWORD@192.168.0.123:554/1
          roles:
            - detect
            - rtmp
        - path: rtsp://THEUSER:THEPASSWORD@192.168.0.123:554/1
          roles:
            - record        
    detect:
      width: 1280
      height: 720
      fps: 5
      stationary:
        interval: 0
        threshold: 50  
    objects:
      track:
        - person
        - dog
        - car
        - bird
        - cat

The full 24/7 recordings are all kept (one file/hour) for a few days then deleted and can be seen via HA under
Media > Frigate > Recordings > {camera name} > {date} > {hour}

Docker container start scripts

A note of the scripts I use to start the various docker containers.

This would be much better managed under Docker Compose or something, there are plenty of examples of that online, but I’d like to look at setting all of this up on Kubernetes so leaving this as rough as it is for now.

I am also running Grafana and NodeExporter at the moment to keep an eye on the stats, although things would probably look less worrying if I wasn’t adding to the load just to monitor them:

<help!>

I’ll need to do something about that system load; it’s tempting to just get a second HP host & Coral TPU and put some of the load and half of the cameras on that – will see… a k8s cluster of them would be neat.

# Start Frigate container
docker run -d \
--name frigate \
--restart=unless-stopped \
--mount type=tmpfs,target=/tmp/cache,tmpfs-size=1000000000 \
--device /dev/bus/usb:/dev/bus/usb \
--device /dev/dri/renderD128 \
--shm-size=80m \
-v /root/frigate//storage:/media/frigate \
-v /root/frigate/config.yml:/config/config.yml \
-v /etc/localtime:/etc/localtime:ro \
-e FRIGATE_RTSP_PASSWORD='password' \
-p 5000:5000 \
-p 8554:8554 \
-p 8555:8555/tcp \
-p 8555:8555/udp \
ghcr.io/blakeblackshear/frigate:stable

# Start homeassistant container
docker run -d \
--name homeassistant \
--privileged \
--restart=unless-stopped \
-e TZ=Europe/Belfast \
-v /root/ha_files:/config \
--network=host \
ghcr.io/home-assistant/home-assistant:stable

# Start MQTT container
docker run -itd \
--name=mqtt \
--restart=unless-stopped \
--network=host \
-v /storage/mosquitto/config:/mosquitto/config \
-v /storage/mosquitto/data:/mosquitto/data \
-v /storage/mosquitto/log:/mosquitto/log \
eclipse-mosquitto

# Start NodeExporter container
docker run -d \
--name node_exporter \
--privileged \
--restart=unless-stopped \
-e TZ=Europe/Belfast \
-p 9100:9100 \
prom/node-exporter

# Start Grafana container
docker run -d \
--name grafana \
--privileged \
--restart=unless-stopped \
-e TZ=Europe/Belfast \
-p 3000:3000 \
grafana/grafana