Building and Scaling Robust Zero-code IoT Streaming Data Pipelines with Open Source Technologies
Brett Orr
General Manager Bigmate

About This Webinar

With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.

Inspired by this story, we built two demonstration streaming pipelines for ingesting, storing, and visualizing public IoT data (Tidal data from NOAA, the National Oceanic and Atmospheric Administration) using multiple open source technologies. The common ingestion technologies were Apache Kafka, Apache Kafka Connect, and Apache Camel Kafka Connector, supplemented with Prometheus and Grafana for monitoring. The initial experiment used Open Distro for Elasticsearch and Kibana as the target storage and visualisation technologies, while the second experiment used PostgreSQL and Apache Superset.

In this talk we introduce each technology and the pipeline architecture, and walk through the steps followed, challenges encountered, and solutions used to build reliable and scalable pipelines, and visualize the results (including Tidal periods, ranges and locations). We compare and contrast the two approaches, focussing on exception handling, scalability, performance and monitoring, and the pros and cons of the two visualization technologies (Kibana and Superset).

Paul Brebner
Technology Evangelist

© Instaclustr Copyright
Apache Cassandra®, Apache Kafka®, Apache Spark™, Apache Kafka®, and Apache ZooKeeper™ are trademarks of the Apache Software Foundation. Elasticsearch™ and Kibana™ are trademarks for Elasticsearch BV. Kubernetes® is a registered trademark of the Linux Foundation. OpenSearch is a registered trademark of Amazon Web Services. Postgres®, PostgreSQL® and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission. Redis™ is a trademark of Redis Labs Ltd. *Any rights therein are reserved to Redis Labs Ltd. Any use by Instaclustr Pty Limited is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and Instaclustr Pty Limited. All product and service names used in this website are for identification purposes only and do not imply endorsement.