Skip to main content

Stream IoT data to S3 - the simple way

Listen:

First, a short introduction to infinimesh, an Internet of Things (IoT) platform which runs completely in Kubernetes

infinimesh enables the seamless integration of the entire IoT ecosystem independently from any cloud technology or provider. infinimesh easily manages millions of devices in a compliant, secure, scalable and cost-efficient way without vendor lock-ins.

We released some plugins over the last weeks - a task we had on our roadmap for a while. Here is what we have so far:

Elastic
Connect infinimesh IoT seamless into Elastic.

Timeseries
Redis-timeseries with Grafana for Time Series Analysis and rapid prototyping, can be used in production when configured as a Redis cluster and ready to be hosted via Redis-Cloud.

SAP Hana
All code to connect infinimesh IoT Platform to any SAP Hana instance

Snowflake
All code to connect infinimesh IoT Platform to any Snowflake instance.

Cloud Connect


All code to connect infinimesh IoT Platform to Public Cloud Provider AWS, GCP and Azure. This plugin enables customers to use their own cloud infrastructure and extend infinimesh to other services, like Scalytics, using their own cloud native data pipelines and integration tools.

We have chosen Docker as main technology, because it enables users to run their own plugins in their own space in their controlled environment. And, since our plugins don't consume so much resources, they fit perfectly into the free tiers of AWS EC2 - I use them here in that blog post.

The plugin repository was structured with developer friendliness in mind. All code is written in Go, and the configuration will be done on dockerfiles. Since you need to put credentials into, we highly advise to run the containers in a controlled and secure environment. 

infinimesh UI

Stream IoT data to S3

Here I like to show how easy it is to combine IoT with already installed infrastructures in public clouds. The most used task we figured is the data stream to S3; most of our customers use S3 either directly with AWS, or by implementing their own object storage using the S3 protocol, like MinIO - which is also Kubernetes native.

Of course a private installation of infinimesh or accounts on infinimesh.cloud and AWS are needed, if using the cloud version of both. Here is a screenshot from the SMA device I used to write this post:

Preparation

  1. Spin up an EC2 instance in the free tier with Linux, a t2.micro instance should fit mostly all needs
  2. Log into the VM and install docker as described in the AWS documentation: Docker basics for Amazon ECS - Amazon Elastic Container Service
  3. Install docker-compose and git:
sudo curl -L \
https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)\
-o /usr/local/bin/docker-compose \
&& sudo chmod +x /usr/local/bin/docker-compose \
&& sudo yum install git -y

That’s all we need as preparation, now log-off and login again to enable the permissions we have set earlier. 

Setup and Run

  1. Clone the plugin - repo:
    git clone https://github.com/infinimesh/plugins.git
  2. Edit the CloudConnect/docker-compose.yml and replace CHANGEME with your credentials
  3. Compose and start the connector (-d detaches from the console and let the containers run in background):
    docker-compose -f CloudConnect/docker-compose.yml --project-directory . up --build -d
  4. Check the container logs:
    docker logs plugins_csvwriter_1 -f
We used Go as development language, therefore the resource consumption is low:


After one minute the first CSV file should be arriving in S3. That’s all - easy and straightforward.



 Some developer internals

We have built some magic around to make the use of our plugins as easy as possible for customers and at the same time easy to adapt for developers.

First we iterate over /objects, finding all endpoints marked with [device], call the API for each device and store the data as a sliding window into a local redis store, to buffer network latency. After some seconds we send the captured data as CSV to the desired endpoints. In our tests we transported data from up to 2 Million IoT devices over this plugin, each of those devices send every 15 seconds ten key:value pairs as JSON.

Comments

Popular posts from this blog

Deal with corrupted messages in Apache Kafka

Under some strange circumstances, it can happen that a message in a Kafka topic is corrupted. This often happens when using 3rd party frameworks with Kafka. In addition, Kafka < 0.9 does not have a lock on Log.read() at the consumer read level, but does have a lock on Log.write(). This can lead to a rare race condition as described in KAKFA-2477 [1]. A likely log entry looks like this: ERROR Error processing message, stopping consumer: (kafka.tools.ConsoleConsumer$) kafka.message.InvalidMessageException: Message is corrupt (stored crc = xxxxxxxxxx, computed crc = yyyyyyyyyy Kafka-Tools Kafka stores the offset of each consumer in Zookeeper. To read the offsets, Kafka provides handy tools [2]. But you can also use zkCli.sh, at least to display the consumer and the stored offsets. First we need to find the consumer for a topic (> Kafka 0.9): bin/kafka-consumer-groups.sh --zookeeper management01:2181 --describe --group test Prior to Kafka 0.9, the only way to get this in...

Beyond Ctrl+F - Use LLM's For PDF Analysis

PDFs are everywhere, seemingly indestructible, and present in our daily lives at all thinkable and unthinkable positions. We've all got mountains of them, and even companies shouting about "digital transformation" haven't managed to escape their clutches. Now, I'm a product guy, not a document management guru. But I started thinking: if PDFs are omnipresent in our existence, why not throw some cutting-edge AI at the problem? Maybe Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) could be the answer. Don't get me wrong, PDF search indexes like Solr exist, but they're basically glorified Ctrl+F. They point you to the right file, but don't actually help you understand what's in it. And sure, Microsoft Fabric's got some fancy PDF Q&A stuff, but it's a complex beast with a hefty price tag. That's why I decided to experiment with LLMs and RAG. My idea? An intelligent knowledge base built on top of our existing P...

Run Llama3 (or any LLM / SLM) on Your MacBook in 2024

I'm gonna be real with you: the Cloud and SaaS / PaaS is great... until it isn't. When you're elbow-deep in doing something with the likes of ChatGPT or Gemini or whatever, the last thing you need is your AI assistant starts choking (It seems that upper network connection was reset) because 5G or the local WiFi crapped out or some server halfway across the world is having a meltdown(s). That's why I'm all about running large language models (LLMs) like Llama3 locally. Yep, right on your trusty MacBook. Sure, the cloud's got its perks, but here's why local is the way to go, especially for me: Privacy:  When you're brainstorming the next big thing, you don't want your ideas floating around on some random server. Keeping your data local means it's  yours , and that's a level of control I can get behind. Offline = Uninterrupted Flow:  Whether you're on a plane, at a coffee shop with spotty wifi, or jus...