Skip to main content

Posts

Technical Advisor & Fractional Chief Architect

I help organisations design, build, stabilize, and scale mission critical distributed data systems, including data platforms, Data Lake and Lakehouse architectures, streaming systems, IoT ingestion, and AI infrastructure.

I work with engineering and platform teams to drive clear architecture decisions, reduce systemic risk, and restore delivery momentum without adding full time leadership overhead.

See consulting services

Recent articles:

Connect BACnet to the Cloud with bacnet-mqtt-gateway

The bacnet-mqtt-gateway project is an open source protocol bridge that translates BACnet building automation traffic into MQTT messages for cloud and IoT systems. It provides discovery, polling, bidirectional writes, APIs, security, and easy deployment via Docker. Many enterprises struggle to unify BACnet with modern data pipelines and cloud platforms because BACnet is local-network only and not cloud ready. This gateway provides a scalable, secure, production-ready adapter for MQTT ecosystems and smart building integrations. The Problem with BACnet Building automation runs on BACnet . HVAC controllers, lighting systems, metering equipment: they all speak ASHRAE 135 . The protocol handles local control loops well. It fails at cloud ingress. BACnet relies on UDP broadcasts. These do not route over the internet or into VPCs. Your chiller controller cannot talk to AWS IoT Core . Your VAV box cannot publish to an MQTT broker. The air gap between operational technology and modern cl...

IoT Device Manufacturing Needs Verifiable Assembly Records

IoT security failures often originate during manufacturing. This article examines how signed firmware metadata and append-only registries improve device provenance and auditability. Using the novatechflow cerbtk proof of concept, we connect established IoT manufacturing guidance with a concrete implementation that records verifiable assembly events and device identities. I recently worked with an IoT project where we discussed device provenance during a security review. The question was simple: can you prove which firmware was installed on a specific device during manufacturing? The answer was no. Firmware builds existed in CI systems, device identities lived in spreadsheets, and assembly logs sat in a database that any admin could modify. The cryptographic chain that should connect these stages did not exist. This is not unusual. Large scale IoT deployments depend on manufacturing processes that are rarely verifiable after devices leave the factory. Firmware provenance, ke...

SQL on Streaming Data Does Not Require a Streaming Engine

Most teams do not need continuous stream processing for day to day Kafka questions. Kafka data is already written as immutable log segments, and those segments can live in object storage. For bounded queries like tailing recent events, time window inspection, and key based debugging, a SQL interface that plans against segment boundaries can replace an Apache Flink or ksqlDB cluster, with clearer costs and less operational overhead. Stream processing engines solved a real problem: continuous computation over unbounded data. Flink, ksqlDB, and Apache Kafka Streams gave teams a way to run SQL-like queries against event streams without writing custom consumers. The operational cost of that solution is widely acknowledged even by vendors and practitioners: you are adopting a distributed runtime with state, checkpoints, and cluster operations. For a large share of the questions teams ask their Kafka data, a simpler architecture exists: SQL on immutable segments in object storage...

AI Agents Fail in Production for a Boring Reason: Their Data Is Not Immutable, Queryable, or Close Enough

Most agent projects stall not because the model is weak, but because the agent cannot reliably retrieve complete historical context, reproduce decisions, or prove what it saw. The pattern that scales is storage native: persist immutable facts in object storage, version them with table snapshots, and run ephemeral compute that reads directly from the data layer. This makes agent runs auditable, backfillable, and cheaper to operate than long lived stateful services tied to ingestion paths. The money is there. The production gap is still massive. Enterprise generative AI spend tripled from $11.5B in 2024 to $37B in 2025, with roughly half landing in infrastructure and model access depending on how you segment the stack. The point is simple: budgets are moving fast. Sources: Menlo Ventures: 2025 State of Generative AI in the Enterprise . Report PDF . At the same time, enterprise IT leaders are telling KPMG they are implementing or planning to implement AI ag...

Data Processing Does Not Belong in the Message Broker

Apache Kafka made event streaming practical at scale. Pushing data processing into the streaming platform creates recovery, scaling, and isolation problems in production. Vendor documentation, Kafka improvement proposals, and migration case studies point to the same architectural boundary: streaming platforms handle durable transport, processing engines handle state and checkpoints. Separating them leads to systems that scale and recover cleanly. Kafka changed the industry by making event streaming practical at scale. Durable logs, ordering, fan-out, and backpressure turned event-driven systems from fragile prototypes into mainstream infrastructure. Where things get messy is when teams push data processing into the streaming platform itself: Kafka Streams, ksqlDB, broker-side transforms. It starts as convenience and ends as operational coupling. Not because engineers are doing it wrong, but because the streaming layer and the processing layer solve different problems. T...

Kafka on Object Storage Was Inevitable. The Next Step Is Open.

Kafka on object storage is not a trend. It is a correction. WarpStream proved that the Kafka protocol can run without stateful brokers by pushing durability into object storage. The next logical evolution is taking that architecture out of vendor-controlled control planes and making it open and self-hosted. KafScale is built for teams that want Kafka client compatibility, object storage durability, and Kubernetes-native operations without depending on a managed metadata service. The problem was never Kafka clients The Kafka protocol is one of the most successful infrastructure interfaces ever shipped. It is stable, widely implemented, and deeply integrated into tooling and teams. The part that aged poorly is not the protocol. It is the original broker-centric storage model. Stateful brokers made sense in a disk-centric era where durability lived on the same machines that ran compute. That coupling forces partition rebalancing, replica movement, disk hot spots, slow recovery, an...