Big Data – API Crazy

NCSC Cyber Assessment Framewok, NHS Data Security and Protection Toolkit & NHS Digital Technology Assessment Criteria

May 27, 2026May 19, 2026 mustnotgrumbleLeave a comment

The DSPT is an online self-assessment tool that allows organisations to measure their performance against the National Data Guardian’s 10 data security standards and must be reported yearly if interacting with patient data. The DTAC is an assessment framework for care commissioners and providers to use when assuring digital health technology (DHT) product. It is… Continue reading NCSC Cyber Assessment Framewok, NHS Data Security and Protection Toolkit & NHS Digital Technology Assessment Criteria

International AI Bodies and their Powers

Featured mustnotgrumbleLeave a comment

Every major economy now has an AI safety body. The international network that came out of the Seoul Summit in 2024 has grown to include the UK, the US, the EU, France, Germany, Japan, South Korea, Canada, Singapore, India and Australia. On paper it looks like coordinated global governance. In practice, almost none of these… Continue reading International AI Bodies and their Powers

Understanding AI Classifiers, Terminologies, terminology Engines

May 20, 2026May 18, 2026 mustnotgrumbleLeave a comment

These three concepts get conflated constantly in healthcare informatics conversations. All deal with clinical codes and they all do fundamentally different jobs. Understanding what each does is critical if you're building anything that touches coded clinical data. Category What it does Major examples Healthcare context Terminology Clinical code systems Defines the codes, concepts, descriptions and… Continue reading Understanding AI Classifiers, Terminologies, terminology Engines

Using CatBoost.ai in Healthcare

May 18, 2026May 18, 2026 mustnotgrumbleLeave a comment

Because SNOMED and ICD Codes must be treated as categories for gradient boosting The 8 main blood types (A+, A-, B- rare, B+, O+, O- universal, AB+, AB-) are categories. If I used label encoding then each category becomes integer (e.g A+ =0, B-=1, AB+=2). This is compact but introduces a false ordering and the… Continue reading Using CatBoost.ai in Healthcare

MLOps Toolsets for Different ML Types

Featured mustnotgrumbleLeave a comment

I've been playing with various ML training tools and different monitoring and operations tools. I've been unsure if it's one size fits all (e.g. langsmith or mlflow) or whether certain tools are more proportionate for the need. I haven't gone into licencing costs but for each ML Type, I have put together a list of… Continue reading MLOps Toolsets for Different ML Types

MLOps for Scikit-learn

May 17, 2026May 14, 2026 mustnotgrumbleLeave a comment

Setting up MLOps for repeatable pipelines when using scikit-learn Not every AI problem requires a large language model. In many enterprise environments, the most valuable systems are still well engineered, explainable, repeatable and operationally governed. This is where classical machine learning pipelines still provide benefit and Scikit-learn remains one of the strongest foundations for these… Continue reading MLOps for Scikit-learn

Ducks on Icebergs

Featured mustnotgrumbleLeave a comment

Federating Data Between Snowflake and Databricks with DuckDB and Apache Iceberg If you're running both Snowflake and Databricks — and most enterprises I work with are — you've probably hit the federation problem. Data lives in both platforms, analysts need to query across them, and the obvious solutions (ETL everything into one place, or pay… Continue reading Ducks on Icebergs

EU Sovereign Cloud List

Featured mustnotgrumbleLeave a comment

The rule of law is a fundamental principle from the Mesopotanian Code Ur-Nammu, through Magna Carta to International Criminal Court's decisiion to ditch Microsoft Office for European open source alternatives. Data sovereignty requires certainty that services will never be terminated or at the mastery of a governmental body. For this reason I find it useful… Continue reading EU Sovereign Cloud List

Explaining Medallion Data Architectures in Healthcare

Featured mustnotgrumbleLeave a comment

Faster Insight, Better Reuse, and Scalable Data Foundations Healthcare organisations face growing demand for better use of data: improving operational performance, supporting population health management, enabling AI, and accelerating research. Yet many still rely on fragmented pipelines, duplicated transformations, and slow bespoke data requests. At the same time, the economics of technology have changed. Modern… Continue reading Explaining Medallion Data Architectures in Healthcare

A Patient Centric Approach to Medical Data using Containers

Featured mustnotgrumbleLeave a comment

Medical data ownership needs to pass from the trust to the patient to enable ML diagnostics and better research collaboration. The best way of achieving this without being compelled by the hyper-scalers is to move to a persistent container based architecture.