Apache Kafka: Open Source Streaming Data Lake for Threat Detection

Apache Kafka: Open Source Streaming Data Lake for Threat Detection

Customers would normally need to deploy a proprietary software with a database and /or data lake to contain large volumes of Data to detect threats, such as inside a SIEM.

This of course becomes extremely expensive and your are locked into a vendor.

Apache Kafka, provides a platform to deploy your own Threat Detection Data lake.

You can send data Directly in CEF format 2: Normalise Raw laws inside TransformationHub;

If the Data source can send it (produce it) to kafka or if there is another tool to do so. TransformationHub is passively sitting there, waiting for producers (i.e. SmartConnector) to publish data into the TH topic.

You can have other producers pushing data into TH: ie. ArcSight SmCollector, Apache NiFi, etc anything that can produce into Kafka should do. (BTW Kafka supports next to Producers and Consumers its own Kafka Connectors, don’t confuse them with ArcSight Connectors, but those are very simple collectors that will most likely not help you and its not something we support so I would not go this route nor confuse)

Once you have the data published by any means to the Hub, it will support routing and filtering CEF date and will parse RAW syslog only data if sent to syslog topic

If data is produced by non ArcSight connector, events will be probably unparsed. Is there option for adding custom parsers or parser overrides into TH ?

Syslog works if sent to th-syslog topic, then you deploy CTH (Connector on TH) to parse it and produce standard CEF or BIN data into your CEF/BIN topics. Unless recently changed and I missed it, CTH supports syslog only and there is no official way to override the parsers nor add a flex. (One partner played with it and loaded custom parsers into it, but also had to “publish” the tweaked CTH back into the TH repo so the tweked CTH is deployed once a pod crushes or node restarts. Doable if you understand k8s, pods, wrappers etc.., but not exposed in mgmt UI nor supported)




  • https://aiven.io/
  • https://www.conduktor.io/pricing/
  • https://lenses.io/
  • Zookeeper


  • https://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html
  • https://heroku.github.io/kafka-demo/
  • https://github.com/confluentinc/kafka-streams-examples
  • https://github.com/confluentinc/examples
  • https://www.cloudera.com/tutorials/kafka-in-trucking-iot/2.html

Leave a Reply