How to detect data anomalies using a simulator and an IIoT factory model

Oliver Thamm

Detecting data anomalies in a constant event stream of IIoT sensor data


Manufacturing companies look for ways to automatically analyze recorded sensor data, detect errors and improve production processes. Anomaly detection in industrial sensor data is a challenge to many manufacturing companies, because there exists a large variety of sensor types and manufacturers on the IIoT market.

Popular IIoT applications often require anomaly detection, such as asset monitoring and alerting, predictive maintenance for machines and safety and security on the factory floor.

In this post, we show how to create a constant event stream of sensor data using our IIoT factory model, and how to manually add data anomalies. Record the event stream data and add the recordings to our simulator to replay them at will. Set up data pipelines to detect those data anomalies and respond to them by sending text notifications.


Producing a continuous stream of valid data with the camera-based product reader

In a previous blog post, we explained how to build an IIoT conveyor belt model. We made a camera-based product-reader model that reads product data from QR code strips. We also showed how to flexibly create such QR codes from a JSON file with product data.

In this post, we use the same method and add dimension measures as attributes “x_mm” and “y_mm” and values around 800.0 to the input data file. Vary each dimension measure by less than plus or minus 0.5 mm of tolerance and generate the QR codes. This is a sample of our product data:

Print the QR codes and glue them together to a strip long enough to wrap around the conveyor belt. Start the IIoT factory model and it produces a constant stream of product data including the dimension measures.

Manually adding data anomalies to the product data stream

Again, use the QR code maker script to create and print product data encoded in QR codes. This time they shall contain data anomalies. Create a new JSON file and vary at least one dimension value per record by more than plus or minus 0.5 mm of tolerance.

Generate the QR codes, print them and cut them out with scissors. Glue the QR codes to the back of coasters so we can throw them onto the conveyor belt of our IIoT factory model at will.

Recording the event stream of IIoT sensor data for a simulator

With the QR code strip and coasters with encoded product data in place, start the factory model and watch the event stream get printed to the monitor of your Raspberry Pi. Per default, the model will only produce valid product data as encoded on the QR code strip in an endless loop. Throw a coaster onto the conveyor belt and the model will add the respective data anomaly to the event stream. The factory model does not know what constitutes a data anomaly and will only report sensor data as measured.

Run the IIoT factory model two times for five minutes each and record the data. The first time do nothing with the coasters and therefore record a flawless stream of valid product data. The second time, use the coasters to add a few data anomalies to the recorded event stream. The HTTP API of our Raspberry Pi provides POST endpoints to start and stop the recording. The response body of the stop endpoint contains the recorded data.

The Ruby Sinatra code for recording the event streams we added to the open source IIoT Server looks like this:

The recorded data returned by the HTTP API has lots of doubles. This is because the limited number of product data QR codes on the conveyor belt repeat in a loop. To get unique product IDs and to save the recorded data to a trace file in which each line holds the data record of a single event in JSON format, run a script on the data that looks like this: 

Run the factory model twice to get two trace files, one with anomalies and one without any. The trace files can have hundreds of records and look like this:

Replaying multiple event streams of IIoT sensor data with a simulator

We at Xapix added our two trace files to our simulator to replay the trace files on the simulator at will. In a previous blog post, we briefly explained how to build such a simulator yourself, in case you do not want to use ours. 

Replay the trace file without anomalies multiple times simultaneously to simulate a lot of regular data volume. Next, replay a single trace file containing an anomaly in the simulator. The anomaly is hard to detect in the regular data traffic and therefore a good test for anomaly detection software.

In our Xapix simulator, starting such a simulation via HTTP API calls looks like this:

Next, you could either make a Xapix account and follow the final steps of this article, or you could set up consumers for the Kafka topic and implement the business logic for the anomaly detection in handmade scripts.

Anomaly detection data pipelines using Xapix

If you are interested in using the Xapix simulator and use a graphical tool to create data pipelines for event streaming use cases, we recommend trying our free Xapix Community Edition. It gives you access to our Kafka demo cluster and the simulator streams. You can build and deploy a simple text notification pipeline that sends a text notification to a webhook.

To get started, create a Xapix account if you haven’t already. If you followed previous posts in this series, then just reuse your existing webhook data source. Otherwise, set up a webhook test site receiving and displaying the product text notifications by clicking this link and waiting for 5 seconds to get redirected. Keep the URL from the webhook test site around and create a REST Data Source in Xapix Community Edition by clicking “Data Sources” on the left sidebar, then “Add Data Source” and next “Setup Data Source Manually.” Use the webhook URL as address here, set the HTTP method to “post” and make the body parameters look like this screenshot. Finally, click “Save Data Source.”

When clicking the “Preview Data Source” button your webhook test site should receive a request and display its content. After you set this pipeline up successfully the product data coming from the simulator event stream will be displayed the same way on the webhook test site.

Next, make up a 6 digits and letters random identifier for your event stream and fill it in for <random_id> going forward. On the left sidebar below “Kafka Event Streams” click “Add New” and enter the topic “demo_iot_factory-product_simulator_<random_id>”, consumer group “iiotFactorySimulatorAnomalyDetection” and initial position “Latest”.</random_id></random_id>

In the same menu create a new Kafka server named “XapixDemoKafkaCluster” with boot servers “b-3.gen3-demo-stable.a74654.c3.kafka.eu-west-1.amazonaws.com:9092,b-1.gen3-demo-stable.a74654.c3.kafka.eu-west-1.amazonaws.com:9092,b-2.gen3-demo-stable.a74654.c3.kafka.eu-west-1.amazonaws.com:9092”. Finally, click “Create Stream”.

Next, on the pipeline dashboard click on the “Event” unit and paste in the following data sample:

On the pipeline dashboard create a new decision unit and connect it to the event. Click on the decision unit and create one branch called “out-of-bounds”. As a formula for the branch enter the formula: 

Connect the decision unit’s branch with the webhook text notification unit and the webhook unit with the sink. Finally click on the webhook unit and map all attributes from the Kafka event into the webhook parameters. Change the value of attribute name to “PRODUCT_SIZE_ERROR” or similar.

Publish your project and test your setup by starting your factory. Throw in some data anomaly coasters and watch your pipeline on Xapix Community Edition webhook send text notifications. 

Replay the recorded IIoT factory model’s trace files with a simulator

Use the Trace File Player software to replay the factory trace files as described earlier. Use your <random_id> from above and a different “id” value for each simulated machine you start. With cURL the terminal command to start a simulator with only valid data looks like this:</random_id>

To start a simulator with data anomalies use “file_name” value “iiot_factory_product_dimensions_few_anomalies.json” instead.

Join our IIoT community

Got any questions? We are working on numerous follow-ups to this blog-series and will be posting updates regularly within our Discord community to help inspire you on building IIoT models and simulators. We would love to hear your ideas too and collaborate on this fun project. 

Please contact us on our Discord community channel if you would like to discuss this tutorial or if you have questions or feedback about the Xapix Community Edition. Or really anything else as well. We look forward to hearing from you.

About the author


Oliver is a senior software developer and an API and data transformation enthusiast. He’s a co-founder of Xapix, writer and conference speaker on all things API. Most recently, he’s bringing his perspective and experience to the world of Industrial IoT.

You can get in touch with Oliver via https://twitter.com/POSoftware or find him on our Xapix Community Discord.