Apache Kafka: Open Source Streaming Data Lake for Threat Detection

Apache Kafka: Open Source Streaming Data Lake for Threat Detection

Customers would normally need to deploy a proprietary software with a database and /or data lake to contain large volumes of Data to detect threats, such as inside a SIEM.

This of course becomes extremely expensive and your are locked into a vendor.

Apache Kafka, provides a platform to deploy your own Threat Detection Data lake.

You can send data Directly in CEF format 2: Normalise Raw laws inside TransformationHub;

If the Data source can send it (produce it) to kafka or if there is another tool to do so. TransformationHub is passively sitting there, waiting for producers (i.e. SmartConnector) to publish data into the TH topic.

You can have other producers pushing data into TH: ie. ArcSight SmCollector, Apache NiFi, etc anything that can produce into Kafka should do. (BTW Kafka supports next to Producers and Consumers its own Kafka Connectors, don’t confuse them with ArcSight Connectors, but those are very simple collectors that will most likely not help you and its not something we support so I would not go this route nor confuse)

Once you have the data published by any means to the Hub, it will support routing and filtering CEF date and will parse RAW syslog only data if sent to syslog topic

If data is produced by non ArcSight connector, events will be probably unparsed. Is there option for adding custom parsers or parser overrides into TH ?

Syslog works if sent to th-syslog topic, then you deploy CTH (Connector on TH) to parse it and produce standard CEF or BIN data into your CEF/BIN topics. Unless recently changed and I missed it, CTH supports syslog only and there is no official way to override the parsers nor add a flex. (One partner played with it and loaded custom parsers into it, but also had to “publish” the tweaked CTH back into the TH repo so the tweked CTH is deployed once a pod crushes or node restarts. Doable if you understand k8s, pods, wrappers etc.., but not exposed in mgmt UI nor supported)




  • https://aiven.io/
  • https://www.conduktor.io/pricing/
  • https://lenses.io/
  • Zookeeper


  • https://docs.confluent.io/current/streams/kafka-streams-examples/docs/index.html
  • https://heroku.github.io/kafka-demo/
  • https://github.com/confluentinc/kafka-streams-examples
  • https://github.com/confluentinc/examples
  • https://www.cloudera.com/tutorials/kafka-in-trucking-iot/2.html

AWS Logging and Monitoring Design

AWS Logging and Monitoring Design

With practical and tangible Action plan – not just theoretical fluff ignored by hackers.

Firstly, as much as AWS want to advertise they are secure, enabling Logging Monitoring AWS is;

  1. Not straight forward
  2. Missing allot of information from AWS, which falls under your shared responsibility. (Public Cloud Security Get out of Jail Card.)
  3. There are so many ways skin the cat, but not real best practices.
  4. You need to be well aware of the service limits.
  5. AWS release new products that don’t exists anywhere else, so you have no idea, what can be abused/exploited and how to detect these threats. (Of course, no one is going to question a Behemoth. Because everyone wants to work for them! right.)
  6. They all ways advise you that the product is documented, but dont give you any advice on Business outcomes and gaps.
  7. Here is a HUGE example;
    1. AWS CloudWatch agents are used to OS Logs and metrics, but it does not integrate with AWS SecurityHub, so majority of our threat exposures isn’t covered by AWS security! So you need another solution to detect and correlate these threats. They offer a custom science experiment for you to develop your own SIEM. hahahah https://aws.amazon.com/blogs/security/how-to-monitor-and-visualize-failed-ssh-access-attempts-to-amazon-ec2-linux-instances/
  8. Check the AWS HCL – People also look at the features, but basic tenant of Solution Architecture is to check the HCL, hardware Compatibility List, this applies to AWS, where you need to check what is not supported.
    1. CloudTrail Unsupported Services https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-unsupported-aws-services.html
  1. The AWS documentation is vague at best and as the, just plagiarising existing information on just gives zero insights or advice.
  2. AWS encourage customer to build bespoke AWS solutions to keep you locked in without considering any business requirements. e.g. https://aws.amazon.com/solutions/implementations/centralized-logging/
  3. AWS (complementary and additive) native Architecture.
    1. AWS forces you to use all of they services for single requirement, making Bezo a Trillionaire . It’s a nonsensical intricate web, where no one has a farking clue what is going on. Look at this as a example from SecurityHub FAQ;
    1. https://docs.aws.amazon.com/securityhub/latest/userguide/control-finding-list.html
    2. Q: Will Security Hub replace the consoles of our other security services, such as Amazon GuardDuty, Amazon Inspector, or Amazon Macie?
    3. No. Security Hub is complementary and additive to the AWS security services. In fact, Security Hub will link back into the other consoles to help you gain additional context. Security Hub does not replicate the setup, configuration, or specialised features available within each security service.
    4. CloudTail can also send logs into CloudWatch Logs, (i have no clue what you would need to do that.. )
    5. Also, another one, DNS Traffic is not captured in VPC Flow logs and VPC Flow logs are not real-time and also does not support some instance types –
      1. https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-s3.html
      2. https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html
      3. You can’t modify a Flow Log’s configuration parameters once it is created. Instead, you have to delete it and create a new log. That’s not difficult, but it’s a bit annoying from a usability perspective.
    6. Network interfaces with multiple IP addresses will have data logged only for the primary IP as the destination address. This makes Flow Logs less useful in configurations involving multiple IPs on a single interface.
    7. Flow Logs exclude traffic related to DHCP requests and Amazon DNS activity. (Traffic for a non-Amazon DNS server is logged.) In many cases, this may not matter, but it is a limitation if you need to troubleshoot an issue with your site related to DHCP or DNS. For example, you may be experiencing poor performance due to slow DNS resolution. There are also valuable security insights that you can glean from DHCP and DNS traffic, such as detecting packet sniffing attempts by looking for unusual rates of IP conflicts, usage of the same MAC address by multiple hosts or the sharing of DNS records by machines with the same IP address.
  1. Here
YOU WANT THE TRUTH? YOU CAN'T HANDLE THE TRUTH! | Jack Nicholson - You  can't handle the truth! | Keto quote, Best movie lines, Jack nicholson

When exec decided to digitally transform into AWS, did they evaluate the cost of talent, AWS isn’t a single product, it is as of this writing 170 products that get upgrade and changed on a daily basis, did you assess this risk. Of course you didn’t. Oh, yeah don’t get me started on the Multi-Cloud stupidity.

This is why AWS is just so easy to master! And also super easy to secure! 🙂 🙂 🙂 🙂

“nobody got fired for buying ibm” old proverb, Now its Public Cloud!

Jeff Bezos could become world's first trillionaire by 2026

AWS Security Actionable Security Monitoring Plan

You should make sure you get a clear answer from AWS for the following questions;

  • So you’re logging, thats great… what are you detecting?
  • What is your best practice for sending logs into a central SIEM?
  • Can you list top use cases AWS cover/detect?

Threat Detection SOC Use cases;

Essentially, you need to log everything centrally (for investigations and compliance) and Threat detection. What are you logging and what can you detect. You should run a Red Team against this configuration to see what you can detect or not.

In terms of Security Operations perspective the following are the key Use cases required to support your Incident Response Plans;

  1. Threat Detection and Alerting.
  2. Governance and Compliance Reporting.
  3. Investigation Searches and Digital Forensics.

Cloud Control Plane vs Cloud Data Plane Concept

To establish baseline monitoring, security teams should gather and process the following:

  • Cloud control plane logs (such as AWS CloudTrail1 logs
  • Data Plane Workload OS/application logs
  • AWS Product (Access Logs)
  • Network flow logs for virtual private clouds (VPCs)
  • Inventory your threat landscape and exposure

Requirements for Threat Detection

  • Event Sources
  • Metrics
  • UpStream Security Monitoring
  • Detection Rules

Cloud Control Plane Logging

First, there’s the idea of a control plane. The control plane is the master controller (usually in the form of a master node) and includes API services, scheduling capabilities for containers and operational management tools/services. A master-level configuration database is also maintained in the control plane. In general, the control plane can be considered the brains of the Kubernetes infrastructure, and it needs to be very carefully protected.

Focus on the types of events that could be problematic to the environment. Examples include critical assets accessed or changed, identity policies modified, cryptographic keys deleted or changed, and so on.

Data Pane

AWS Product Access Logs

On top of the Control and Data plane, you need to consider the Access logs for specific AWS Products/Services. In terms of services such as AWS CloudFront, the access logs are not captured via the Control Plan, therefore, you need to capture; Access Logs, Account Activity, and Configuration;

AWS Detective

AWS Budget

Billing alarms—If you have a reasonable idea of a monthly billing range, you can break this down to define “checkpoints” that your bill should be at any given time. If these thresholds are crossed, you can be alerted and investigate the reason for the additional cost. Tools like AWS Budgets provide simple alerting and reporting for cloud billing.

  • These are key! If you have a reasonable idea of a monthly billing range, you can break this down to define “checkpoints” of what your bill should be at any given time. If these thresholds are crossed, a billing alarm could alert you and investigate what is causing the additional cost.
  • Resources and resource utilization—Cloud control plane logs from services like AWS CloudTrail can (and should) be heavily leveraged to monitor new, modified and deleted assets in the environment, as well as access to assets and service interaction in the cloud environment. These logs need to be integrated with a SIEM and/or cloud-native cloud monitoring solution like Amazon CloudWatch to build the appropriate triggers for alerting, as well as monitoring and reporting metrics as warranted. Some behavioral trending over time can also be assessed and reported through analytics tools like AWS Security Hub and Amazon GuardDuty, as well
  • https://console.aws.amazon.com/billing/home#/

Amazon CloudWatch filters

Identity and Access Management (IAM) and KMS

Monitor your user activity within the cloud. Admins, in particular, should be monitored carefully, because these accounts are prime targets for attackers. Any nonfederated user access should also be a high priority.

Network Security

VPC Flow Logs for your VPCs; they are not enabled by default.

Endpoint Security

AWS Inspector

AWS Config

AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so that you can see how the configurations and relationships change over time.

However, AWS Config only collects information about EC2/VPC-related resources, not everything in your AWS account.

You should monitor changes to you AWS real estate and insure all changes are via ITIL Change Management and/or approved automation only.

Firstly, need to understand what AWS services and/or devices are in scope, then map them to your AWS native security logging into ArcSight SmartConnectors.

Click on Resource Groups next to the AWS Services in your aws console page, and select All Regions in region field and All Resources in the resources field. You will get the list of all the resources up and running in your AWS account. You can even tag them separately so you can check how much each resource is costing you.
If there is any other way, for example through AWS CLI, I am curious to know that.

  • Adding context—If logs can be “tagged” as originating from a specific ISP or CSP, that can help provide context on the use cases of the service. For example, logs from identity management services like AWS Identity and Access Management (IAM) have a specific user context, whereas events from Amazon EC2 may need additional details about workloads to provide the proper context for evaluation.
aws resourcegroupstaggingapi get-resources --region region_name
relationships.resourceId = 'vpc-#######'

What do you use, AWS SecurityHub, GuardDuty, CloudWatch, CloudTrail or EventHub.

Answer is all of these are complementary and additives services. So let’s example each of them and there primary use cases. So its best to begin with your use cases in terms of SOC operations and Threat Detection;

  1. Investigation and Search 
  2. Governance and Reporting
  3. Threat Detection and Alerts 

AWS GuardDuty vs CloudTrail vsSecurityHub vs CloudWatcth acts as an aggregation for other AWS services, which are supported by corresponding ArcSight SmartConnectors. You need to determine where you want to do Threat Detection and hold raw logs for long term retention and investigation.

Image for post
Image for post

Here is an overview;

AWS SecurityHub integrates with; https://docs.aws.amazon.com/securityhub/latest/userguide/securityhub-internal-providers.html

  • AWS Firewall Manager
  • IAM Access Analyzer
  • Amazon GuardDuty
  • Amazon Inspect
  • Amazon Macie

AWS GuardDuty integrates with;


  • AWS CloudTrail Event Logs 
  • AWS CloudTrail Management Events 
  • AWS CloudTrail S3 Data Events 
  • VPC Flow Logs 
  • DNS logs

ArcSight SmartConnectors for SecurityHub supports;


  • GuardDuty Default
  • GuardDuty AWS_API_CALL
  • GuardDuty DNS_REQUEST
  • Resource Header ResourcesDetailsAwsEc2Instance 
  • ResourcesDetailsAwsIamAccessKey 
  • ResourcesDetailsAwsEc2NetworkInterface
  • ResourcesDetailsAwsEc2SecurityGroup 
  • ResourcesDetailsAwsIamRole 
  • ResourcesDetailsAwsKmsKey 
  • ResourcesDetailsAwsS3Bucket ResourcesDetailsAwsS3Object 
  • ResourcesDetailsAwsSnsTopic 
  • ResourcesDetailsAwsSqsQueue 
  • ResourcesDetailsAwsLambdaFunction 

ArcSight SmartConnector supports CloudTrail, S3 and CloudWatch, that maybe ingest logs from AWS native services. 

ArcSight SmartConnector for AWS

AWS GuardDuty, CloudTrail, SecurityHub and CloudWatcth acts as an aggregation for other AWS services, which are supported by corresponding ArcSight SmartConnectors. AWS (complementary and additive) native Architecture comes into play;


  • Control Plane     -> AWS GuardDurty -> AWS SecurityHub -> ArcSight SmartConnector -> ESM/Logger
  • Data Plane          -> AWS EC2 -> Windows (SYSMON/WEC/WEF) -> ArcSight SmartConnector -> ESM/Logger
  • Data Plane           -> AWS EC2 -> Linux (AuditD/Syslogs) -> ArcSight SmartConnector -> ESM/Logger

 ArcSight SmartConnector for WiNC (Windows Native Connector) – Recommended for Production Environments

This is where the AWS (complementary and additive) native Architecture comes into play; 

  1. AWS Firewall Manager à AWS CloudTrail à AWS GuardDurty à AWS SecurityHub -> ArcSight SmartConnector for AWS SecurityHub
  2. IAM Access Analyzer à AWS CloudTrail à AWS GuardDurty à AWS SecurityHub -> ArcSight SmartConnector for AWS SecurityHub

 IAM Access Analyzer à AWS SecurityHub à ArcSight SmartConnector

You can review the supported data sources here- https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_data-sources.html

AWS IAM Access Analyzer supports ; 

ArcSight SmartConnector for AWS SecurityHub support for AWS (complementary and additive) native ArchitectureSo, supported data flow;

  1. AWS Firewall Manager à AWS CloudTrail à AWS GuardDurty à AWS SecurityHub -> ArcSight SmartConnector for AWS SecurityHub
  2. AWS Identity and Access Management roles -> IAM Access Analyzer à AWS CloudTrail à AWS GuardDurty à AWS SecurityHub -> ArcSight SmartConnector for AWS SecurityHub

You can review the supported data sources here- https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_data-sources.html

AWS GuardDuty, CloudTrail, SecurityHub and CloudWatcth acts as an aggregation for other AWS services, which are supported by corresponding ArcSight SmartConnectors.

AWS SecurityHub integrates with; https://docs.aws.amazon.com/securityhub/latest/userguide/securityhub-internal-providers.html

  • AWS Firewall Manager
  • IAM Access Analyzer
  • Amazon GuardDuty
  • Amazon Inspect
  • Amazon Macie

AWS GuardDuty integrates with;


  • AWS CloudTrail Event Logs 
  • AWS CloudTrail Management Events 
  • AWS CloudTrail S3 Data Events 
  • VPC Flow Logs 
  • DNS logs

ArcSight SmartConnectors for SecurityHub supports;


  • GuardDuty Default
  • GuardDuty AWS_API_CALL
  • GuardDuty DNS_REQUEST
  • Resource Header ResourcesDetailsAwsEc2Instance 
  • ResourcesDetailsAwsIamAccessKey 
  • ResourcesDetailsAwsEc2NetworkInterface
  • ResourcesDetailsAwsEc2SecurityGroup 
  • ResourcesDetailsAwsIamRole 
  • ResourcesDetailsAwsKmsKey 
  • ResourcesDetailsAwsS3Bucket ResourcesDetailsAwsS3Object 
  • ResourcesDetailsAwsSnsTopic 
  • ResourcesDetailsAwsSqsQueue 
  • ResourcesDetailsAwsLambdaFunction

CloudTrail vs CloudWatch

  • CloudTrail is for API logging
  • CloudWatch is for Log data

ArcSight SmartConnector for CloudWatch supports CloudWatch events

ArcSight SmartConnector supports CloudTrail, S3 and CloudWatch, that maybe ingest logs from AWS native logging services. 

Threat Modelling and Applying Risk to AWS Services and Resources

You need to develop a Threat Model and apply some abuse cases, which is far beyond this blog, so lets just use ATT&CK to identify top risk and develop detection for them.

Using ATT&CK to Develop Baseline for TTP Monitoring

Attack PhaseTTP
Initial AccessDiscovering valid accounts to AWS account
PersistenceCreating new accounts
Defense EvasionEstablishing presences in unused / unsupported cloud regions. Continuing to leverage valid accounts.
Credential AccessQuerying an identify role with a cloud instance’s metadata API. Discovering credentials in files
DiscoveryCloud service discovery (through network visibility, interaction with other services, and so on.)
CollectionData from cloud storage objects (items in S3 buckets, for example.)
ExfiltrationOutbound data to cloud storage account elsewhere
Connectivity to unknown outbound source addresses
Using ATT&CK to Develop Baseline for TTP Monitoring

Mapping Detection/Response Controls to TTPs

Attack PhaseTTPAWS Detection
Initial AccessDiscovering valid accounts to cloud environments.AWS CloudTrail event: Account login via AWS CLI or AWS Management Console (IAM Account.)
PersistenceCreating new accounts.AWS CloudTrail event: New IAM account created.
Defense EvasionEstablishing a presence in unused/unsupported cloud regions.

Continuing to leverage valid accounts.

AWS CloudTrail event represented in Amazon GuardDuty or Amazon Detective: New API event in a previously unused region.

AWS CloudTrail event represented in Amazon GuardDuty or Amazon Detective: Account use in new region
Credential AccessQuerying an identity role with a cloud instance’s metadata API.

Discovery credentials in files.
AWS CloudTrail event represented in Amazon GuardDuty, third- party SIEM or Amazon Detective: Metadata service queried for new services and role permissions

AWS CloudTrail event: Account login via AWS CLI or AWS Management Console.
DiscoveryCloud services discovery (through network visibility, interaction with other services, and so on.)

System information discovery.
System network connection discovery.
CollectionData from cloud storage objects (items in S3 buckets, for example.)

Data from local systems
ExfiltrationOutbound data to a cloud storage account elsewhere.

AWS Use cases and Detection Rules

“eventTime”: “2017-01-20T18:53:02Z”, “eventSource”: “iam.amazonaws.com”, “eventName”: “DeactivateMFADevice”, “awsRegion”: “us-east-1”, “sourceIPAddress”: “”, “userAgent”: “signin.amazonaws.com”, “requestParameters”: {

“userName”: “dave”,
“serialNumber”: “arn:aws:iam::000012345678:mfa/dave” },
“responseElements”: null,
“requestID”: “d1a9ebf8-5fc8-11e5-9d8f-1bc7c6757e61”,

Suspicious AWS CloudTrail event that
indicates a cloud user trying to deactivate
an MFA device.

How to Improve Security Visibility and Detection/Response Operations in AWS

  • IAM activity (logins in particular)—Monitor your user activity within the cloud. In particular, monitor admins carefully, because these user credentials are prime targets for attackers. Any nonfederated user access should also be a high priority.

How to Improve Security Visibility and Detection/Response Operations in AWS

Priority 1
– Launching a workload that is not from an approved template
– Launching any containers from unapproved images in a repository
– Launching any assets in unapproved regions
– Modifying any IAM roles or policies
– Modifying or disabling cloud control plane logging or other security controls – Logins to the web console (unauthorized)

• Priority 2
– Unusual user behaviors (trying to access unauthorized resources, etc.) – Adding/updating new workload images
– Adding/updating new container images
– Logins to the web console (authorized)
– Updating/changing serverless configuration

• Priority 3
– Changes to security groups or network access control lists (ACLs) – Updating/changing serverless function code

How to Improve Security Visibility and Detection/Response Operations in AWS

able 1. Starting Points for Event Searches

AWS CloudTrail EventReason for Investigation
ConsoleLoginA user initiates console login activity.
StopLoggingA user tries to stop AWS CloudTrail.
CreateNetworkAclEntrySomeone creates a network ACL, which could expose attack surfaces or vectors.
CreateRouteSomeone creates a new route for data path control, which could expose attack surfaces or vectors.
AuthorizeSecurityGroupEgress AuthorizeSecurityGroupIngress RevokeSecurityGroupEgress RevokeSecurityGroupIngressMonitor all changes to security groups.
ApplySecurityGroupsToLoadBalancer SetSecurityGroupsSecurity group changes that tie to elastic load balancers are interesting, often in scaling operations. This may indicate unusual traffic surges in the environment.
AuthorizeDBSecurityGroupIngress CreateDBSecurityGroup DeleteDBSecurityGroup RevokeDBSecurityGroupIngressAmazon RDS instances have a different nomenclature for security groups, but are the same thing conceptually. Security teams should monitor such instances.

Starting Points for Event Searches

How to Improve Security Visibility and Detection/Response Operations in AWS

AWS Lambda EventReason for Monitoring
DeleteEventSourceMappingSomeone could delete the data source that triggers an AWS Lambda function, making it “blind.”
DeleteFunctionA function could be deleted purposefully or accidentally, leading to security issues.
RemovePermissionThis could lead to a lockout scenario or lack of access when needed (think IAM service account or role access to AWS Lambda).
UpdateEventSourceMappingData could be pulled from a different source, leading to incorrect function results.
UpdateFunctionCodeThe function could be broken or tampered with to prevent security-specific functionality from executing (for example, by adding comments).
UpdateFunctionConfigurationThe configuration of the function could be changed to limit its resources, causing poor or flawed execution.
Events for Immediate Monitoring

AWS Security Best Practices Check list

  1. Setup AWS Budget alerts
  2. Setup Root Security challenge questions
  3. Setup Password policy
  4. Deactivate Regions not required
  5. Document and monitor your access keys and deactivate and cycle
  6. Enable root IAM and MFA
  7. Update your Incident Response Plan and Digital Forensics Investigation to accommodate AWS
  8. Enable MFA for AWS Root account
  9. Secure KMS keys
  10. Enable Amazon VPC Flow logs for your VPCs; they are not enabled by default.
  11. Uses AWS Nitro EC2 instance can mirror traffic from any EC2 instance (A1, C5, C5d, C5n, I3en, M5, M5a, M5ad, M5d, p3dn.24xlarge, R5, R5a, R5ad, R5d, T3, T3a, and z1d).
  12. Ultizing default DNS services as it is intergrated with CloudTrail and GuardDuty, if you using a 3rd party for DNS, you need to make sure you can monitor that and correlate that within your SIEM.. e.g. Cisco Umercal support by ArcSight SmartConnector
  13. Outbound IP address alerting
  14. Deploy Cloud Watch agents as part of your SOE – https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-on-EC2-Instance.html

Understanding Digital Forensics inside AWS

CapabilityAWS ServicesDigital Forencics
ComputeAmazon Elastic Cloud Compute (EC2)Uses Amazon Machine Images (AMIs) to get started
Multiple OS support Pay for what you use
Next-gen Nitro infrastructure, created by AWS
Amazon Elastic Block Store (EBS), Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS)

Amazon S3 offers multiple storage classes for multiple
use cases. Amazon EBS is used for the “block device” or hard drive for Amazon EC2 instances. Amazon EFS is used for file sharing storage with two storage classes to choose from.
Amazon VPC Flow Logs, Amazon VPC Traffic Mirroring

Capture information of network traffic going in and out of a VPC
AWS CloudTrail

User attribution data
Log integrity can be enabled
Can send data to an Amazon S3 bucket for storage/archival
AWS Digital Forensics


  1. Create a security group that does not allow outbound traffic
  2. Attach to compromised Amazon EC2 instance
  3. Take snapshot of Amazon EC2 instance
  4. Perform memory acquisition, if possible
  5. Share snapshot with Security Account (if using one)
  6. Create volume from snapshot
  7. Attach volume to SIFT EC2 instance
  8. Conduct forensics

Digital Forensic Analysis of Amazon Linux EC2 Instances; https://www.sans.org/reading-room/whitepapers/cloud/digital-forensic-analysis-amazon-linux-ec2-instances-38235


How to Perform a Security Investigation in AWS A SANS Whitepaper

  • Username—Search by the user’s name
  • Event name—Search by a specific API call (e.g., DeleteTrail)
  • Resource type—Search by an AWS service type (e.g., Amazon EC2 instance)
  • Resource name—Search by a resource name (e.g., instance ID, ENI)
  • Event source—Search results from specific AWS services
  • Event ID—Search based on a unique ID for an AWS CloudTrail event
  • AWS access key—Search by access key to show what was done in a single session
AWS CloudTrail Event Example

VPC Flows


Structure of a VPC Flow Log

SOAR Use Cases

How to Improve Security Visibility and Detection/Response Operations in AWS

  • Initial investigation and threat hunting—Analysts need to quickly find evidence of compromise or unusual activity, and often need to do so at scale.
  • Opening and updating incident tickets/cases—Due to improved integration with ticketing systems, event management and monitoring tools used by response teams can often generate tickets to the right team members and update these as evidence comes in.
  • Producing reports and metrics—Once evidence has been collected and cases are underway or resolved, generating reports and metrics can take a lot of analysts’ time.

How to Improve Security Visibility and Detection/Response Operations in AWS

  1. Automated DNS lookups of domain names never seen before • Automated searches for detected indicators of compromise • Automated forensic imaging of disk and memory from a suspect system, driven by alerts triggered in network- and host-based anti-malware platforms and tools • Network access controls automatically blocking outbound command and control (C2) channels from a suspected system

AWS Athena CloudTrail search script examples

CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
useridentity STRUCT<
eventtime STRING,
eventsource STRING,
eventname STRING,
awsregion STRING,
sourceipaddress STRING,
useragent STRING,
errorcode STRING,
errormessage STRING,
requestparameters STRING,
responseelements STRING,
additionaleventdata STRING,
requestid STRING,
eventid STRING,
resources ARRAY<STRUCT<
eventtype STRING,
apiversion STRING,
readonly STRING,
recipientaccountid STRING,
serviceeventdetails STRING,
sharedeventid STRING,
vpcendpointid STRING
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://mycloudtrailbucket-faye/AWSLogs/757250003982/';
FROM cloudtrail_logs
LIMIT 100;



HP ArcSight Vs. IBM QRadar Vs. ​McAfee Nitro Vs. Splunk Vs. RSA Security Vs. LogRhythm

HP ArcSight Vs. IBM QRadar Vs. ​McAfee Nitro Vs. Splunk Vs. RSA Security Vs. LogRhythm

Original – https://www.itcentralstation.com/product_reviews/logrhythm-nextgen-siem-review-32130-by-vinod-shankar

The key products compared here are based on Gartner Magic Q which is what Organizations typically use to select SIEM vendors. The Vendors mentioned here in the deck are :

1. HP ArcSight

2. McAfee Nitro

3. IBM QRadar

4. Splunk SIEM

5. RSA Security Analytic

6. LogRhythm.

SIEM Technology Space

SIEM market analysis of the last 3 years suggest:Market consolidation of SIEM players (25 vendors in 2011 to 16 vendors in 2013)Only products with technology maturity and a strong road map have featured in leaders quadrant.HP ArcSight & IBM Q1 Labs have maintained leadership in SIEM industry with continued technology upgradeMcAfee Nitro has strong product features & road map to challenge HP & IBM for leadership


The ArcSight Enterprise Threat and Risk Management (ETRM) Platform is an integrated set of products for collecting, analysing, and managing enterprise Security Event information.

  • ArcSight Enterprise Security Manager (ESM): Correlation and analysis engine used to identify security threat in real-time& virtual environments
  • ArcSight Logger: Log storage and Search solution
  • ArcSight IdentityView: User Identity tracking/User activity monitoring
  • ArcSight Connectors: For data collection from a variety of data sources
  • ArcSight Auditor Applications: Automated continuous controls monitoring for both mobile
Extensive Log collection support for commercial IT products & applicationsComplex deployment & configuration
Advanced support for Threat Management, Fraud
Management & Behavior Analysis
Mostly suited for Medium to Large Scale deployment
Mature Event Correlation, Categorization & ReportingRequires skilled resources to manage the solution
Tight integration with Big data Analytics platform like HadoopSteep learning curve for Analysts & Operators
Highly customizable based on organization’s requirements
Highly Available & Scalable
Architecture supporting Multi-tier & Multi-tenancy     

IBM QRadar

The QRadar Integrated Security Solutions (QRadar) Platform is an integrated set of products for collecting, analysing, and managing enterprise Security Event information.

  • QRadar Log Manager – Turn key log management solution for Event log collection & storage 
  • QRadar SIEM – Integrated Log, Threat & Risk Management solution
  • QRadar Risk Manager – Predictive threat & risk modelling, impact analysis & simulation
  • QRadar QFlow – Network Behaviour Analysis & Anomaly detection using network flow data
  • QRadar vFlow– Application Layer monitoring for both Physical & Virtual environment
Very simple deployment & configurationLimited customizations capabilities
Integrated view of the threat environment using Netflow data, IDS/IPS data & Event logs from the environmentLimited Multi-tenancy support
Behavior & Anomaly Detection capabilities for both Netflow & Log dataLimited capability to perform Advanced Use Case development & analytics
Suited for small, medium & large enterprises
Highly Scalable & Available architecture

McAfee Nitro

The McAfee Enterprise Security Management (formerly Nitro Security) Platform is an integrated set of products for collecting, analysing, and managing enterprise Security Event information. 

  • McAfee Enterprise Log Manager – turn key log management solution for Event log collection & storage
  • McAfee Event Receiver – collecting log data & native flow data
  • McAfee Database Event Monitor – database transaction & Log monitoring
  • McAfee Application data Monitor  – application layer event monitoring
  • McAfee Advanced Correlation Engine – advanced correlation engine for correlating events both historical & real time
Integrated Application Data monitoring & Deep Packet InspectionVery basic correlation capabilities when compared with HP & IBM
Integrated Database monitoring without dependence on native audit functionsLimitations in user interface when it concerns navigation
High event collection rate suited for very large scale deploymentRequires a lot of agent installs for Application & database monitoring thereby increasing management complexity
Efficient query performance in spite of high event collection rateNo Big Data Analytics capability
Limited customization capabilities
Limited support for multi-tier & multi-tenancy architecture


Splunk Enterprise is an integrated set of products that provide Log Collection, management & reporting capabilities using

  • Splunk Indexer – used to collect and index logs from IT environment
  • Splunk Search Heads – used to search & report on IT logs
  • Splunk App for Enterprise Security – used to collect external threat intelligence feeds, parse log sources and provide basic analytics for session monitoring (VPN, Netflow etc.)
Extensive Log collection capabilities across the IT environmentPre-SIEM solution with very limited correlation capabilities
Log search is highly intuitive – like Google searchEven though easy to deploy, increasingly difficult to configure for SIEM related functions
Flexible dash boarding & analytics capability improves Log visualization capabilities
Built-in support for external threat intelligence feeds both open source & commercial
“App Store” based architecture allowing development of Splunk Plugins to suit monitoring & analytics requirements

RSA Security

RSA Security Analytics is an integrated set of products that provide Network Forensics, Log Collection, management & reporting capabilities using

  • Capture Infrastructure
    • RSA Security Analytics Decoder – Real time capture of Network Packet and log data with Analysis and filtering capabilities
    • RSA Security Analytics Concentrator – Aggregates metadata from the Decoder
    • RSA Security Analytics Broker Server – For reporting, management and administration of capture data
  • Analysis & Retention Infrastructure
    • Event Stream Analysis – Correlation Engine
    • Archiver – Long term retention, storage, security & compliance reporting
    • RSA Security Analytics Warehouse – Big Data Infrastructure for Advanced Analytics
Great Analytics using Event Log Data & Network Packet CaptureNew Product release from RSA, hence advanced Security correlation support is poor
Network forensics, Big Data (Parallel Computing) are cornerstones in SIEM worldSecurity Analytics Warehouse is a new capability with very little real world use cases
Tightly Integrates with RSA ecosystem for Threat Intelligence, Fraud Detection, Malware Analysis etc. (each requires separate RSA Tools)Suited only for large enterprises with need for complex deployment and management resources. Poor deployment options for small and midsize customers


The LogRhythm SIEM 2.0 Security Intelligence Platform is an integrated set of products for collecting, analysing, and managing enterprise Security Event information.

  • Log Manager – high performance, distributed and redundant log collection and management appliance
  • Event Manager – provide centralized event management and administration for a LogRhythm deployment
  • Network Monitor – provide full visibility into network traffic, identifying applications via deep packet inspection, providing real-time unstructured search access to all metadata and packet captures
Well balanced log management, reporting, event management, privileged user monitoring and File integrity monitoring capabilitiesSuitable for Security event data only, as Operational data sets cause slowing performance for searches and reports
Fast deployment with minimal configuration because of appliance form factorNo Support for Active Directory integration for Role- Based Access Control
Quarterly Health Check programs post-deployment offers great After sales-Service experienceSuited best for small and mid size companies with basic security, regulatory compliance and reporting needs. Not scalable for very large deployments.

A Summary scoring sheet for SIEM Vendors based on their Core capabilities is given below

CapabilityRSA Security AnalyticsLog RhythmSplunkMcAfee NitroIBM QradarHP ArcSight
Real-time Security Monitoring3.
Threat Intelligence3.
Behavior Profiling2.
Data & End User Monitoring3.
Application Monitoring3.
Log Management & Reporting3.
Deployment & Support Simplicity3.
Total (Weighted Score)25.725.321.828.830.431.7

1.0 = Low level of capability

5.0 = High level of capability

SIEM Vendors – Use Cases Score Card

Use CasesRSA Security AnalyticsLog RhythmSplunkMcAfee NitroIBM QradarHP ArcSight
Overall Use Cases3.
Compliance Use Cases3.
Threat Monitoring3.
Total (Weighted Score)12.813.411.414.715.115.7

1.0 = Low level of capability

5.0 = High level of capability

Comprehensive Explanation: What is a SIEM (in 2020 and beyond.)

SOCComprehensive Explanation: What is a SIEM (in 2020 and beyond.)

[I have not had the time to proof read nor correct grammatical errors, spelling mistakes and typos. ]

SIEM unifies Threat Detection and Hunting.

This is an old topic worth revising and level setting with the latest advancements, concepts and learning from a decades of unsuccessful SIEM deployments! It is worth revisiting as allot people don’t understand the value and even less understand how to effectively operationalise and achieve business outcomes utilising the power of a SIEM.

After reading this you will gain enough insight into the basics of SIEM.

I am continually asked the same questions around SIEM design, so glad to finally brain dump this knowledge and share with the community

(SIEM in Public Cloud is beyond the scope of this article, while all the information is relevant, I will write another article focusing specifically for Threat Detection for Public Cloud environments. )

Security Information and Event Management

A SIEM seeks to provide a holistic approach to an organisation’s IT security. A SIEM represents a combination of services, appliances, and software products. It performance real-time collection of log data from devices,  applications and hosts. It also process the collected log data, enabling real-time analysis of security alerts generated by network hardware and applications, Advanced Correlation for security and operational events, as well as real-time alarming and scheduled  reporting.

SIEM technology is used in many enterprise organizations to provide real time reporting and long term analysis of security events. SIEM products evolved from two previously distinct product categories, namely security information management (SIM) and security event management (SEM).

Table 1 shows this evolution.

Table 1 . SIM and SEM Product Features Incorporated into SIEM

Separate SIM and SEM Products

Security Information Management:

Log collection, archiving, historical reporting, forensics

Security Event Management:

Real time reporting, log collection, normalization, correlation, aggregation

Combined SIEM Product

Log collection





SIEM combines the essential functions of SIM and SEM products to provide a comprehensive view of the enterprise network using the following functions:

  • Log collection of event records from sources throughout the organization provides important forensic tools and helps to address compliance reporting requirements.
  • Normalization maps log messages from different systems into a common data model, enabling the organization to connect and analyze related events, even if they are initially logged in different source formats.
  • Correlation link slogs and events from disparate systems or applications, speeding detection of and reaction to security threats.
  • Aggregation reduces the volume of event data by consolidating duplicate event records.
  • Reporting presents the correlated aggregated event data in real-time monitoring and long-term summaries.

Internal IT environment consists of services, networking equipment, application, and components that they want to protect and prevent intrusion into. In order to protect these assets and data, you can deploy protection in the form of firewalls, antivirus, IPS/IDS and Authentication. Protection Examples such as;

  • Firewalls
  • Antivirus
  • IPS
  • IDS
  • Authentication
  • Web Security
  • Email Security
  • Traffic Capture
  • WAF
  • DLP
  • FIM
  • Secure Access Service Edge
  • MFA
  • EDR

Despite all of the systems and effort put into these solutions, those trying to breach that environment will get in. Once they are in, detecting and responding to their attack is time critical. 

A SIEM receives or taps into all of these activity as it is continually receiving thousands of logs per second from all of these devices and systems within the environment. The SIEM process log data to make meaning of what is actually happening on a device aka Detection, and analytics are used to analyses data activity, providing more input into what is actually happening.

SIEM solutions also provides the ability to analysis log historic data and generate reports for compliances purposes as well as providing digital forensic and fulfilling additional parts of overall information security strategy. 

SIEM solutions centralising log data within IT environments, augmenting security measures and enabling real-time analysis. It is constantly watching, monitoring and analysing events and alerts with the environment in an effort to detect attacks and intrusions.

Fourth Wave of SIEM

SIEMs sometimes gets a bad name as it is incredibly powerful and yet takes enormous amount of skills and effort to get working. Not because of the SIEM, but it requires data from all of your IT environment and that particularly causes massive delays in successful SIEM deployment. (This can be easily solved. Keep reading.) SIEM has evolved to very mature platforms. E.g. ArcSight 20+ years of evolution. Read ArcSight History here



  • First Wave
    • PCI-DSS really drove first phase of SIEM deployment for Complaint Business outcome.
  • Second Wave
    • Then people started to detect bad things in network activity.
  • Third Wave
    • This phase was when customer started to build SOCs.
  • Fourth Wave
    • This is about SOCs developing Threat Hunting utilising NDR, EDR, SIEM and SOAR

Machine Data

SIEM processes all types of Machine data produced by devices in a IT environment.

Machine data is one of the most underused and undervalued assets of any organization. But some of the most important insights that you can gain—across IT and the business—are hidden in this data: where things went wrong, how to optimize the customer experience, the fingerprints of fraud. All of these insights can be found in the machine data that’s generated by the normal operations of your organization.

Machine data is valuable because it contains a definitive record of all the activity and behavior of your customers, users, transactions, applications, servers, networks and mobile devices. It includes configurations, data from APIs, message queues, change events, the output of diagnostic commands, call detail records and sensor data from industrial systems, and more.

The challenge with leveraging machine data is that it comes in a dizzying array of unpredictable formats, and traditional monitoring and analysis tools weren’t designed for the variety, velocity, volume or variability of this data.


In computingsyslog /ˈsɪslɒɡ/ is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the software type generating the message, and assigned a severity level.

The syslog protocol, defined in RFC 3164, protocol provides a transport to allow a device to send event notification messages across IP networks to event message collectors, also known as syslog servers. The protocol is simply designed to transport these event messages from the generating device to the collector. The collector doesn’t send back an acknowledgment of the receipt of the messages.

Syslog uses the User Datagram Protocol (UDP), port 514, for communication. Being a connectionless protocol, UDP does not provide acknowledgments. Additionally, at the application layer, syslog servers do not send acknowledgments back to the sender for receipt of syslog messages. Consequently, the sending device generates syslog messages without knowing whether the syslog server has received the messages. In fact, the sending devices send messages even if the syslog server does not exist.

The syslog packet size is limited to 1024 bytes and carries the following information:

  • Facility
  • Severity
  • Hostname
  • Timestamp
  • Message

Computer system designers may use syslog for system management and security auditing as well as general informational, analysis, and debugging messages. A wide variety of devices, such as printers, routers, and message receivers across many platforms use the syslog standard. This permits the consolidation of logging data from different types of systems in a central repository. Implementations of syslog exist for many operating systems.

When operating over a network, syslog uses a client-server architecture where a syslog server listens for and logs messages coming from clients.

The Syslog protocol is defined by Request for Comments (RFC) documents published by the Internet Engineering Task Force (Internet standards). The following is a list of RFCs that define the syslog protocol:[13]

  • The BSD syslog ProtocolRFC3164. (obsoleted by The Syslog ProtocolRFC5424.)
  • Reliable Delivery for syslogRFC3195.
  • The Syslog ProtocolRFC5424.
  • TLS Transport Mapping for SyslogRFC5425.
  • Transmission of Syslog Messages over UDPRFC5426.
  • Textual Conventions for Syslog ManagementRFC5427.
  • Signed Syslog MessagesRFC5848.
  • Datagram Transport Layer Security (DTLS) Transport Mapping for SyslogRFC6012.
  • Transmission of Syslog Messages over TCPRFC6587.

More reading on Syslog;


SIEM is a mandatory requirement for Compliance Audits such as PCI-DSS, ISO, 27001, Sarbanes–Oxley Act of 2002(thanks Enron), and other standards.

The Payment Card Industry  (PCI) Security Standards Council was founded by five global payment brands: American Express, Discover Financial Services, JCB International, MasterCard, and Visa. These five payment brands had a common vision of strengthening  security policies across the industry to prevent data breaches for businesses that accept and process payment cards. Together they drafted and released the first version of PCI Data Security Standard (PCI DSS 1.0) on December 15, 2004.

PCI DSS is a regulation with twelve requirements that serve as a security baseline to secure payment card data.

  • PCI-DSS v 3.2.1 Requirements;
    • Requirement 10: Track and monitor all access to network resources and cardholder data.
    • Requirement 11.5: Deploy a change detection mechanism (for example, file integrity monitoring tools) to alert 24 personnel to unauthorized modification (including changes, additions, and deletions) of critical system files, configuration files or content files. Configure the software to perform critical file comparisons at least weekly. Implement a process to respond to any alerts generated by the change-detection solution.
    • PCI DSS v3.2.1 Quick Reference Guide 2020-05-01 11-42-23

Depending on your PCI-DSS merchant level and number of Credit Card transactions you process, you will need to adhere to different levels of PCI-Auditing.

Cyber Threat Intelligence

Threat intelligence, or cyber threat intelligence, is information an organization uses to understand the threats that have, will, or are currently targeting the organization. This info is used to prepare, prevent, and identify cyber threats looking to take advantage of valuable resources.

Cyber Threat Intelligence consists of many number of information including; Indicators of Comprise and Indicators of Attacks

Indicators of compromise (IOCs) are “pieces of forensic data, such as data found in system log entries or files, that identify potentially malicious activity on a system or network.” Indicators of compromise aid information security and IT professionals in detecting data breaches, malware infections, or other threat activity. By monitoring for indicators of compromise, organizations can detect attacks and act quickly to prevent breaches from occurring or limit damages by stopping attacks in earlier stages.

Indicators of compromise act as breadcrumbs that lead infosec and IT pros to detect malicious activity early in the attack sequence. These unusual activities are the red flags that indicate a potential or in-progress attack that could lead to a data breach or systems compromise.

Indicators of attack are similar to IOCs, but rather than focusing on forensic analysis of a compromise that has already taken place, indicators of attack focus on identifying attacker activity while an attack is in process. Indicators of compromise help answer the question “What happened?” while indicators of attack can help answer questions like “What is happening and why?” A proactive approach to detection uses both IOAs and IOCs to discover security incidents or threats in as close to real time as possible

Example IoCs;

  • Unusual Outbound Network Traffic
  • Anomalies in Privileged User Account Activity
  • Geographical Irregularities
  • Log-In Red Flags
  • Increases in Database Read Volume
  • HTML Response Sizes
  • Large Numbers of Requests for the Same File
  • Mismatched Port-Application Traffic
  • Suspicious Registry or System File Changes
  • Unusual DNS Requests
  • Unexpected Patching of Systems
  • Mobile Device Profile Changes
  • Bundles of Data in the Wrong Place
  • Web Traffic with Unhuman Behavior
  • Signs of DDoS Activity

ATPs and Tactics, Techniques and Procedures (TTPs)

SIEM can utilise Cyber threat intelligence/IoCs/IoAs/TTPS and correlate with the IT environment log data to Detect threats in real-time and history log data. 

Correlation Rules, Behaviour patterns, Pattern matching, Anomaly detection, Conditions, Thresholds, Network Modelling and Machine learning (Phew give me a pay rise. )

Correlation is one of the key components of any effective SIEM tool. As information from across your digital environment feeds into a SIEM, it uses correlation to identify any possible issues. It does so by comparing sequences of activity against preset rules, conditions and thresholds. SIEMs allow sophisticated ways to implement risk based rules.

The latest SIEM, can now implement Anomaly detection via Machine learning.

All integrated with Threat Intelligence information.

The Brains inside a SIEM is based on Correlation Rules, Pattern matching, Conditions, Thresholds and now implementation of Machine learning via Unsupervised and Supervised Models.

  • Correlation Rules
  • Pattern Matching
  • Conditions
  • Thresholds
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Network Modelling and Risk Scoring

Use Case

Use case is a term used for Threat Detection in terms of Business Context. It combines the value and context in SIEM platform.

Leading SIEM platforms such as ArcSight has built-in ESM Default Content Use Cases for 80% of your Threat Detection requirements. There are also 3rd Party Use Case library’s including SOCPrimehttps://my.socprime.com/en/integrations/MITRE ATT&CK® and SIGMA generic SIEM rules format. SIGMA Rules 

You can catch just about everything with ArcSight Default Content and SIGMA Rules! The rest you need to pay someone like me to workshop and write.

Machine Data Sources

Data Type Use Cases Examples
Amazon Web Services Security & Compliance, IT Operations Data from AWS can support service monitoring, alarms and a dashboards for metrics, and can also track security-relevant activities, such as login and logout events.
APM Tool Logs Security & Compliance, IT Operations APM tool logs can provide end-to-end measurement of complex, multi-tier applications, and be used to perform post-hoc forensic analytics on security incidents that span multiple systems.
Authentication Security & Compliance, IT Operations, Application Delivery Authentication data can help identify users that are struggling to log in to applications and provide insight into potentially anomalous behaviors, such as activities from different locations within a specified time period.
Firewall Security & Compliance, IT Operations Firewall data can provide visibility into blocked traffic in case an application is having communication problems. It can also be used to help identify traffic to malicious and unknown domains.
Industrial Control Systems (ICS) Security & Compliance, Internet of Things, Business Analytics ICS data provides visibility into the uptime and availability of critical assets, and can play a major role in identifying when these systems have fallen victim to malicious activity.
Medical Devices Security & Compliance, Internet of Things, Business Analytics Medical device data can support patient monitoring and provide insights to optimize patient care. It can also help identify compromised protected health information.
Network Protocols Security & Compliance, IT Operations Network protocol data can provide visibility into the network’s role in overall availability and performance of critical services. It’s also an important source for identifying advanced persistent threats.
Sensor Data Security & Compliance, IT Operations, Internet of Things Sensor data can provide visibility into system performance and support compliance reporting of devices. It can also be used to proactively identify systems that require maintenance.
System Logs Security & Compliance, IT Operations System logs are key to troubleshooting system problems and can be used to alert security teams to network attacks, a security breach or compromised software.
Web Server Security & Compliance, IT Operations, Business Analytics Web logs are critical in debugging web application and server problems, and can also be used to detect attacks, such as SQL injections.

SIEM Data  formats

Typical formats supported by SIEM platform to ingest Log data;

Syslog, SNMP, SMTP,  SCP, FTP, flat file, SQL query, Database Reader, cloud APIs, REST_api, XML, Secure syslog, Cisco FIREsight and SDEE, Checkpoint LEA. AWS Guard duty, Cloudwatch, AWS S3, SCP, JDBC, etc.

Common Event Format (CEF)

In the realm of security event management, a myriad of event formats streaming from disparate devices makes for a complex integration. Common Event format by ArcSight  promote interoperability between various event- or log-generating devices.

Although each vendor has its own format for reporting event information, these event formats often lack the key information necessary to integrate the events from their devices.
The ArcSight standard attempts to improve the interoperability of infrastructure devices by aligning the logging output from various technology vendors.
Common Event Format (CEF) is a Logging and Auditing file format from ArcSight and is an extensible, text-based format designed to support multiple device types by offering the most relevant information.
Message syntaxes are reduced to work with Arcisght normalization. Specifically, Common Event Format defines a syntax for log records comprised of a standard header and a variable extension, formatted as key-value pairs.The format called Common Event Format (CEF) can be readily adopted by vendors of both security and non-security devices.
This format contains the most relevant event information, making it easy for event consumers to parse and use them. To simplify integration, the syslog message format is used as a transport mechanism.


  • Time Normalisation
    • Ensures timestamps all reflect the same time zone to correlate events from different timezones.
    • Time is an important piece for threat detection. Some time zones around the world don’t observe Daylight Savings Time (DST) and some time zones are actually a half hour different than others. In addition to time zone issues, some devices don’t include a time in the log message. A SIEM needs to timestamp a log with a single time zone.
  • Data Enrichment (Meta data extracting, tagging and enrichment)
    • SIEM parses and breaks down log message into core components and adding context. e.g. adding customer tag, etc.
    • Log data is not uniform, they following a standard protocol, but the information within isn’t standard followed by  log source providers, so a SIEM has to process the log into a unified threat detection taxonomy and universal schema in order to run mathematical rules.
    • Log information needs to be assigned into common schema so that a [User Log on] message from various system from Unix, Windows, Active Directory, AWS, etc will all be tagged as User Log on to assist threat detection search rules.
  • Threat and Risk Contextualisation
    • Evaluate each log and provide risk-based priority value. e.g. Information for Edge services / DMZ or Authentication such as Active Direction, DNS information, etc.
May 11 10:00:39 scrooge SG_child[808]: [ID 748625 user.info] m:WR-SG-SUMMARY c:X  vhost:iscrooge61.seclutions.com:80 (http) GET / => http://bali/ , status:200 , redirection URL: , referer: , mapping:bali , request size: 421 , backend response size: 12960 , audit token:- , time statistics (microseconds): [request total 16617 , allow/deny filters 1290 , backend responsiveness 11845 , response processing 1643 , ICAP reqmod  , ICAP respmod  ] timestamp: [2012-05-11 10:00:39] [ rid:T6zHJ38AAAEAAAo2BCwAAAMk sid:910e5dd02df49434d0db9b445ebba975 ip: ]

Securty Schema


Events are a collections of syslogs that is created after processing with Threat Intelligence and/or correlation rules. An Event is a actionable log items sent to human Analysts for further triage, performing investigations and reporting.

Sizing SIEM solutions

Sizing a SIEM solutions, begins with the basic list of devices that you want to monitor. See Example Device List collection Tool;

Device List
Device TypeVendorModelLocationQuantity
Windows Server (Active Directory)Microsoft1
Windows Server (DNS)Microsoft1
AWS (CloudTrail)AWS1
Fortinet Firewall (IDS/IPS/VPN)Fortinet1
Citrix Access GatewayCitrix1

SIEM Sizing (Events Per Second)

Critical to sizing and design of a SIEM platform, is to determine Events Per Second produced by the quantity of devices Size,

You need to determine and estimate the following SIEM fundamentals;

  • Events Per Second
  • Events Per Day:
  • Online Retention Period and requirement Storage in GBs
  • Retention Period and required Storage in GBs
  • Network Bandwidth Peak requirements: (GB /per second for all Devices.)
  • EPS Peak
  • EPS average (Day, Week, Month, etc.)
  • Estimated Device Growth over 3 years
  • EPS Headroom (Allow 10-30%)
  • Recovery Point Objective
  • Recovery Time Objective
  • Uptime requirement
  • Event / Alert Size (512 Kbs per Event is a rough estimate.)

SIEM Sizing Rosetta Stone

GB (1 GB = 1,000,000,000 BYTES)EPS (1 EVENT = 600 BYTES)

Rosetta Stone

Storage and Archival are critical for any Security Logging platform

  • Raw Event Size
  • Normalised Event Size
  • Retention Time
  • Online Retention Period
  • Events Per Day
  • Compression Ration
  • GB Storage per day/Retention time.


It is vital to understand the way your SIEM platform receivers and processing data; What is the Schema format, Schema on Read, Schema on Write. Is it using Distributed Search or in-memory Real-time, etc. The last thing you want to do is HORD data and not understand what you are collecting and be scared of getting rid of it and not even be able to get any value from the data; Don’t turn into this guy, because the Finance department will start knocking on your door and the day will come when you will have to provide justification and prove business results. If you ever get breached and can’t even useful information after you stored tons of data. You might need to find another job.

Hoarding (With images) | Compulsive hoarding, Hoarding, Hoarder


Overwhelming about of logs sources without proper sanitisation and normalisation can lead to massive amount of useless information in SIEM leading to alert fatigue

False-Positive and False-Negatives

false positive state is when the SIEM identifies an activity as an attack but the activity is acceptable behavior. A false positive is a false alarm.

false negative state is the most serious and dangerous state. This is when the SIEM identifies an activity as acceptable when the activity is actually an attack. That is, a false negative is when the SIEM fails to catch an attack. This is the most dangerous state since the security professional has no idea that an attack took place.

False positives, on the other hand, are an inconvenience at best and can cause significant issues. However, with the right amount of overhead, false positives can be successfully adjudicated; false negatives cannot.

  • Airport Security: a “false positive” is when ordinary items such as keys or coins get mistaken for weapons (machine goes “beep”)
  • Medical screening: low-cost tests given to a large group can give many false positives (saying you have a disease when you don’t), and then ask you to get more accurate tests.
  • Antivirus software: a “false positive” is when a normal file is thought to be a virus

Popular SYSLOG Servers

  • ArcSight Logger
  • Nagios
  • Zabix
  • Logstash
  • NXLog

Log Sources Categories

  • Operations Systems
    • Windows
    • Linux
    • OSX
  • Mobile
    • iOS
    • Android
    • Microsoft
    • windows Phone
  • OT/IOT
    • err no clue
  • APIs
  • Databases
  • Policy Devices
    • Firewals
    • IDS/IPS
    • Authentication
    • Antivirus
  • Network Devices
    • Switches
    • Firewalls
    • Routers
  • Applications
  • Entities/Users
  • Public Cloud

SIEM – Real-Time vs Search

As the ever increasing volume of data increases, it becomes increasingly difficult to gain critical insights into to massive volumes of data for SIEMs and other data analytics platforms. SIEMs need to detect threats in-real time and search years of log source archives at the same time. So you are trying to solve two critical problems at the same time;

  1. Security Event Management 
    1. Real-Time Streaming Data Analytics
  2. Security Information Management
    1. Searching Large Data sets at scale and speed

These two requirements are incredibly difficult to solve at scale.  So, lo and behold, Open source to the rescue; Apache Kafka and Apache Hadoop provide solutions for both of these requirements.

Apache Kafka

A streaming platform has three key capabilities:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
  • Store streams of records in a fault-tolerant durable way.
  • Process streams of records as they occur.

Kafka is generally used for two broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems or applications
  • Building real-time streaming applications that transform or react to the streams of data

Apache Hadoop (aka Data Lake)

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

Security Operations and Automated Response (SOAR.)

This subject is beyond the scope of this article. I will dive into this in the near future.

Leading SIEM Vendor Solutions

  • ArcSight Data Platform
    • ArcSight really almost invited the SIEM industry with 20+ year Product portfolio and invented CEF format for cyber security now supports Apache Kafka and Apache Hadoop. Integrating Unsupervised Machine learning via Vertica, IDOL and Interset.
  • Splunk
    • While gaining popularity for general purpose IT monitoring, they do have some capability in Security and Big Data Analytics. Splunk Enterprise is the Base, solution, with Splunk Enterprise Security, Splunk UBA, Splunk Cloud and Splunk Phantom. , Splunk Machine Learning Toolkit, Splunk uses Common Information Model
  • IBM QRadar
    • Another original SIEM vendor.
    • I don’t have any experience with QRadar.
  • ELK Security Onion / HELK
    • Fastest growing Open source Search stack. ELK is Opensource. Elastic is very powerful opensource platform, recently acquired Endgame. ELK stack; Elasticsearch, Kibana, Logstash, Beats. ECS  Elastic Common Schema
  • McAfee Nitro
    • Popular due to McAfee Enterprise license agreements.
  • LogRythm
    • 100% Windows Server Based, no linux edition. Every complex to deploy and requires high resources and application administration. Does have SYSMON, FIM, NETMON, UEBA and SOAR as part of the solution.
  • FireEye / Mandiant
    • Premium products for Banking and Defence Grade Technology combined with 24/7 DFIR SOC services. So this is Product solution and arguably the best DFIR Team (Mandiant). Every expensive.. HX, NX, MX proud lines, for Endpoint, Network and Cloud SIEM.

Thank you for reading this article, please support my sharing, Next article,  I will look at Log collection and SIEM Design patterns in Cloud.

If you would like to sponsor my next article or this blog, please get in touch.

More info



InterSET UEBA – Unsupervised Machine Learning for SOC operations.

InterSET UEBA – Unsupervised Machine Learning for SOC operations.


1. StartwithYourUseCase

You can’t find a solution without understanding the problem. Before buying or implementing new machine learning technology, identify the security use cases that are most critical for your organization. Once you understand and can articulate the problem you’re trying to solve, you are then ready to select the technol- ogy that is best suited for your needs .

2. AvoidtheBuzzwordTrap

AI and machine learning are ubiquitous terms in cybersecurity, but there is plenty of snake oil among ven- dors who claim to use these technologies . Do your homework to understand what type of machine learning is behind a vendor’s solution and whether or not that type of machine learning meet your security team’s needs. You don’t need to be a data scientist, but knowing just a little bit about how machine learning works can help you ask better questions when evaluating a vendor, like “What threats are not covered with existing tools and techniques?” or “Which data feeds contain valuable information but are currently underused?”

3. Don’tTreatMachineLearningLikeaSilverBullet

Your best defense comes from covering as many bases as possible. Machine learning alone will not find and stop a bad actor. Pairing a powerful UEBA with a next-gen SIEM provides a layered approach to security analytics that enables more visibility, better detection, and easier, quicker avenues for responding to both known and unknown threats. Real-time correlation quickly and effectively finds the known threats, and UEBA detects the subtle threats that will otherwise escape detection. The truth is that real-world threat scenarios often require a combination of both of these approaches .

4. CreateaHuman-MachineTeam

The humans in your SOC are more valuable than ever, but they are facing formidable challenges. SOC teams consistently struggle to deal with snowballing feeds of data and constantly evolving threats . A proactive security posture comes from a human-machine team that leverages the strengths of each: faster-than-human analysis by machines to identify leads for investigation and the contextual understand- ing of SOC analysts and threat hunters .

Different Types of Machine Learning

machine-learning-in-the-soc-wp.pdf (page 4 of 7) 2020-04-03 09-39-41

UEBA MITRE – Machine learning Use Case Examples

machine-learning-in-the-soc-wp.pdf (page 5 of 7) 2020-04-03 09-41-03

A sample of MITRE ATT&CK threat tactics and associated behavioral indicators detected by anomaly detection powered by unsupervised machine learning, such as Interset UEBA.



ArcSight Multi-Tenancy Design for MSSPs

ArcSight Multi-Tenancy Design for MSSPs

Customer tagging is a feature developed mainly to support MSSP environments, although private organizations can use the technique to denote cost centers, internal groups, or business units.

A Customer is not a source or target of an event, but it can be thought of as the owner of an event. Content developers can also use the Customer tag to develop customer-aware content.

Why is customer tagging critical in MSSP environments? The Customer designation identifies who owns the events. This ensures each customer (tenant) can view only its own events.

Consider this scenario: The customer tag is usually assigned based on the reporting device IP address. In an MSSP environment, different customers can have overlapping networks. This requires an elaborate mechanism for assigning a customer attribute to events.

Since most organizations use private address spaces (see https://en.wikipedia.org/wiki/Private_ network), addresses included in events from different customers may contain identical addresses but referring to different assets. For example, two tenants may use the private address space 192.0.2.x, and therefore the address may be used by both tenants to refer to a local system.

Make sure you have the proper network information model, which includes zone information, and the asset model, which requires correct zone information. When a connector enriches an event with asset information derived from the ESM asset model, the event uses the asset address as key for locating asset information. The ESM asset model would therefore need a mechanism to differentiate between assets with the same address but belonging to different customers.

  • SMART Connector CEF Customer URI field
  • Isolation of Resources
  • Logger Storage Groups Separation
  • ESM Network Modeling
  • ESM Best Practices: MSSP
  • Logger Storage Groups
    • Allow for separation of events by retention periods or by event type and categorisation URI field.
    • Segregate Customers
  • Voltage SecureData data encryption and masking for Data Sovereignty
  • Global ID will assign a unique ID to each security event coming into ArcSight. This ID is globally unique and can be used to facilitate easier cross-portfolio analysis across multiple ESM installations as well as other ArcSight solutions.
  • ArcSight – Network Model in an MSSP Environment



Microfocus Security ArcSight ESM

Microfocus / ArcSight Data Platform / ArcSight ESM

Content Player 2020-03-11 12-38-08

arcsight_securedata_add_on_for_adp_enabling_privacy_compliance_flyer.pdf 2020-03-26 10-48-37

arcsight_enterprise_security_manager_ds.pdf 2020-03-26 10-51-16

Reference Architecture

  1. Connector -> Logger -> ESM (Ideally.)
  2. Connector -> ESM -> Logger
  3. Connector -> Logger & ESM


Here’s the video showing what is possible with that CIRCP MISP integration

How ArcSight, CIRCL MISP and MITRE ATT&CK matrix can be used to provide realtime protection against these attacks capitalizing on Corona/COVID-19 fears.

Achieving True Zero-Day Protection with ArcSight, MITRE ATT&CK, and MISP CIRCL


How To: Configure MISP & ESM to address COVID-19 & Coronavirus threats

ArcSight Family

Difference between a Smart Connector and Smart Collector

To undersand the Collectors v.s Connectors, we need to step back and look at what the SmarConnectors do.

Conceptually, the standard SmartConnectors have two main responsibilties: “Collect” raw data from various sources, and “Process” the collected data to become enriched security events and post them to a destination.

Introduced in ADP 2.30, customers can take advantage of the massive scalabilty and robustness of the Event Broker infrastructure, and move the computationaly intensive “Process” step to the highly scalable and more robust Event Broker streaming infrastructure.

This is done by using syslog Colelctors and syslog CEBs: Collectors are standalone compnents very similar to the SmartConenctors, but they only “Collect” raw syslog data like the syslog SmartConnectors do, wrap it up and post it to a dedicated eb-con-syslog topic in Event Broker.

At that point, the Event Broker’s CEB stream processors (CEB stands for Connector in Event Broker) read the data from the eb-con-syslog topic, do the parsing/normalization/enrichment/filtering processing (as the standalone SmartConnectors destination pipelines do) and post the security events on the EB topics for consumption.

In other words: as their name suggests, the syslog Collectors are lightweight component responsible for collecting raw syslog data and passing it to Event Broker for processing.

Main advantages of the new architecture:

  1. Potential for hardware consolidation and data throughput increase in the data collection layer where the Collectors are deployed: due to moving the processing to the EB streaming infrastructure.
  2. Improved stabilty and easy horizontal scalability as the data flows increase with time, or fluctuate during operations: CEBs are deployed or undeployed on the EB nodes with a single click in the ArcMC UI.
  3. Reduced network traffic due to a single data feed to Event Broker, instead of having tmultiple destinations coming from SmartConnectors
  4. The raw Syslog data is now available on the EB topic for any system that customer would like to share it with.

Note that at this time Colectors and CEBs are only available for Syslog data.

SmartConnector formats;


  • Log File Readers (including text and log file)
  • Syslog
  • SNMP
  • Database
  • XML
  • Proprietary protocols, such as OPSEC

Connector Types

  • API Connectors
  • Database Connectors
    • Database connectors use SQL queries to periodically poll for events. Connectors support major database types, including
    • MS SQL, MS Access, MySQL, Oracle, DB2, Postgres, and Sybase.
    • IBM DB2 connectors: DB2 drivers are no longer provided in the connector installation due to licensing requirements.
    • Microsoft SQL Server Multiple Instance DB connector
    • McAfee Vulnerability Manager DB.
    • Time-Based Queries use a time field to retrieve events found since the most recent query time until the current time.
    • ID-Based Queries use a numerically increasing ID field to retrieve events from the last checked ID until the maximum ID.
    • Job ID-Based Queries use Job IDs that are not required to increase numerically. Processed Job IDs are filed in such a way that only new Job IDs are added. Unlike the other two types of database connector, Job IDs can run in either Interactive mode or Automatic mode
  • FlexConnectors
  • File Connectors
    • Real Time
    • Folder Follower:
  • Microsoft Windows Event Log Connectors
    • SmartConnector for Microsoft Windows Event Log
    • SmartConnector for Microsoft Windows Event Log – Native
    • SmartConnector for Microsoft Windows Event Log – Unified
  • Model Import Connectors
    • Rather than collecting and forwarding events from devices, Model Import Connectors import user data from an Identity Management system into ArcSight ESM. See individual configuration guides for Model Import Connectors on Protect724 for information about how these connectors are used
    • Model Import Connectors extract the user identity information from the database and populate the following lists in ESM with the data:
    • Identity Roles Session List
    • Identity Information Session List
    • Account-to-Identity Map Active List
  • Scanner Connectors
  • SNMP Connectors
    • SNMP Traps contain variable bindings, each of which holds a different piece of information for the event. They are usually sent over UDP to port 162, although the port can be changed. SNMP connectors listen on port 162 (or any other configured port) and process the received traps. They can process traps only from one device with a unique Enterprise OID, but can receive multiple trap types from this device. SNMP is based upon UDP, so there is a slight chance of events being lost over the network. Although there are still some SNMP connectors for individual connectors, most SNMP support is provided by the SmartConnector for SNMP Unified. Parsers use the knowledge of the MIB to map the event fields, but, unlike some other SNMP-based applications, the connector itself does not require the MIB to be loaded
  • Syslog Connectors
    • Syslog messages are free-form log messages prefixed with a syslog header consisting of a numerical code (facility + severity), timestamp, and host name. They can be installed as a syslog daemon, pipe, or file connector. Unlike other file connectors, a syslog connector can receive and process events from multiple devices. There is a unique regular expression that identifies the device.
    • Syslog Daemon connectors listen for syslog messages on a configurable port, using port 514 as a default. The default protocol is UDP, but other protocols such as Raw TCP are also supported. It is the only syslog option supported for Windows platforms.
    • Syslog Pipe connectors require syslog configuration to send messages with a certain syslog facility and severity. The Solaris platform tends to under perform when using Syslog Pipe connectors. The operating system requires that the connector (reader) open the connection to the pipe file before the syslog daemon (writer) writes the messages to it. When using Solaris and running the connector as a nonroot user, using a Syslog Pipe connector is not recommended. It does not include permissions to send an HUP signal to the syslog daemon.
    • Syslog File connectors require syslog configuration to send messages with a certain syslog facility and severity. For high throughout connectors, Syslog File connectors perform better than Syslog Pipe connectors because of operating system buffer limitations on pipe transmissions
    • Raw Syslog connectors generally do no parsing and takes the syslog string and puts it in the rawEvent field as-is . The Raw Syslog destination type takes the rawEvent field and sends it as-is using whichever protocol is chosen (UDP, Raw TCP, or TLS). The Raw Syslog connector is always used with the Raw Syslog destination. The event flow is streamlined to eliminate components that do not add value (for example, with the Raw Syslog transport the category fields in the event are ignored, so the categorization components are skipped). If you are transporting data to ArcSight Logger, you can use specific configuration parameters to provide minimal normalization of the syslog data (for source and timestamp)
    • Syslog NG Daemon connectors support Syslog NG version 3.0 for BSD syslog format. Support is provided for collection of IETF standard events. This connector is capable of receiving events over a secure (encrypted) TLS channel from another connector (whose destination is configured as CEF Syslog over TLS), and can also receive events from devices
    • CEF Encrypted Syslog (UDP) connectors allow connector-to-connector communication through an encrypted channel by decrypting events previously encrypted through the CEF Encrypted Syslog (UDP) destination. The CEF connector lets ESM connect to, aggregate, filter, correlate, and analyze events from applications and devices that deliver their logs in the CEF standard, using the syslog transport protocol.
    • UNIX supports all types of syslog connector. If a syslog process is already running, you can end the process or run the connector on a different port. Because UDP is not a reliable protocol, there is a slight chance of missing syslog messages over the network. Generally, TCP is a supported protocol for syslog connectors. There is a basic syslog connector, the connector for UNIX OS Syslog, which provides the base parser for all syslog sub-connectors. For syslog connector deployment information, see the connector Configuration Guide for UNIX OS Syslog. For device-specific configuration information and field mappings, see the connector configuration guide for the specific device. Each syslog sub-connector has its own configuration guide. During connector installation, for all syslog connectors, choose Syslog Daemon, Syslog Pipe, or Syslog File. The names of the syslog sub-connectors are not listed
  • IP NetFlow (NetFlow/J-Flow) Retrieves data over TCP in a Cisco-defined binary format.
  • ArcSight Streaming Connector Retrieves data over TCP from Logger in an ArcSight-proprietary format
  • Connectors for Transformation Hub
    • Connectors in =Transformation Hub supports ArcSight customers who want to have large-scale distributed ingestion pipelines with 100% availability, where data from any existing or new source at any scale can be ingested while maintaining enterprise level robustness. Transformation Hub can take messages with raw data collected from any source the ArcSight connector framework understands and automatically perform the data ingestion processing currently done by connectors, but deployed and managed at scale as Transformation Hub processing engines. Users deploy the Transformation Hub using the ArcSight Installer and Management Center to achieve the desired layout. New topics can be created in Management Centerand designated to process raw data from a particular technology framework with output into a specific format.
    • The connector technology in Transformation Hub performs all processing a connector would normally do: parser selection, normalization, main flow, destination specific flows, and categorization, as well as applying network zoning and Agent Name resolution. For more information, see the ArcSight Transformation Hub Administrator’s Guide and the ArcSight Management Center Administrator’s Guide.
Note: If you are using the Linux Red Hat 6.x or later platforms, ensure that you have these libraries or packages installed before installing a connector: 
X libraries 
fontconfig \ dejavu-sans-fonts 
When installing the 32-bit SmartConnector executable on 64-bit machines, the 32-bit versions of glibc, libXext, libXrender, and libXtst must be installed as well as the 64-bit versions

ESM Install

Hyper-V Configuration

Partitions Sizes



  • /tmp – more than 6 GB
  • /opt – more than 100 GB

CentOS Software Selection

  • GNOME Desktop
    • Compatibility Libraries
    • Development Tools
    • System Administration Tools


ESM bin



Log files /


Properties files


  • Make sure that the partition in which your /tmp directory resides has at least 6 GB of space. Make sure that the partition in which your /opt/arcsight directory resides has at least 100 GB of space.
  • Specifying a Global Event ID Generator ID, Global event IDs uniquely identify events across the ArcSigh
  • The Manager host name is used to generate a self-signed certificate. The Common Name (CN) in the certificate is the host name that you specify when prompted
  • The Manager host name is the IP address (for IPv4 only) or the fully-qualified domain name of the machine where the Manager is installed. All clients (for example, the ArcSight Console) use this name to connect to the Manager. For flexibility, Micro Focus recommends using a fully-qualified domain name instead of an IP address.
  • Make sure that the IP address is resolved to localhost in the /etc/hosts file, otherwise, the ESM installation will fail. This applies to IPv4 and IPv6 systems.

  • If you do not want the host name on your DNS server, add a static host entry to the /etc/hosts file to resolve the host name locally.
  • 8443/tcp 22/tcp (ssh)
  • TCP ports used internally for inter-component communication: 1976, 28001, 2812, 3306, 5555, 6005, 6009, 7777, 7778, 7779, 7780, 8005, 8009, 8080, 8088, 8089, 8666, 8765, 8766, 8881, 8808, 8880, 8888, 8889, 9095, 9090, 9123, 9124, 9999, 45450
  • 8443/TCP – SmartConnectors and consoles 9000/TCP – Peering694/UDP – High Availability module 7789/TCP – High Availability module 22/TCP – SSH login

  • Open the following TCP ports for inter-component communication:

  • 1976, 2812, 3306, 5555, 6005, 6009, 7777, 7778, 7779, 7780, 8005, 8009, 8080, 8088, 8089, 8666, 8765, 8766, 8808, 8880, 8881, 8888, 8889, 9000, 9090, 9095, 9123, 9124, 9999, 28001, 45450

  • The information repository uses ports 3179, 3180, 3181, and 3182.

  • Port





    SSH log in (Unix only)



    DNS requests and responses



    SmartConnectors and Consoles



    SMTP to mail server



    POP3 to mail server, if applicable



    IMAP to mail server, if applicable



    RADIUS, if applicable



    RADIUS, if applicable



    LDAP to LDAP server, if applicable

    636/TCP Outbound LDAP over SSL to LDAP server, if applicable

  • <ARCSIGHT_HOME>/config/jetty/keystore (to prevent the ArcSight Manager private key from being stolen)
  • <ARCSIGHT_HOME>/config/jetty/truststore (with SSL Client authentication only, to prevent injection of new trusted CAs)
  • <ARCSIGHT_HOME>/config/server.properties (has database passwords)
  • <ARCSIGHT_HOME>/config/esm.properties (has cluster configuration properties and SSL properties common to persistor, correlator, and aggregator services on the node) This properties file is present on each node in a distributed correlation cluster.
  • <ARCSIGHT_HOME>/config/jaas.config (with RADIUS or SecurID enabled only, has shared node secret)
  • <ARCSIGHT_HOME>/config/client.properties (with SSL Client authentication only, has keystore passwords)
  • <ARCSIGHT_HOME>/reports/sree.properties (to protect the report license)
  • <ARCSIGHT_HOME>/reports/archive/* (to prevent archived reports from being stolen)
  • <ARCSIGHT_HOME>/jre/lib/security/cacerts (to prevent injection of new trusted CAs)
  • <ARCSIGHT_HOME>/lib/* (to prevent injection of malicious code) l <ARCSIGHT_HOME>/rules/classes/* (to prevent code injection)
  • The xmlrpc.accept.ips property restricts access for ArcSightConsoles.

  • The agents.accept.ips property restrict saccess for SmartConnectors.

  • For registration, the SmartConnectors need to be in xmlrpc.accept.ips as well, so that they can be registered. (Being “registered” does not mean you can then remove them.)

    • The format for specifying subnets is quite flexible, as shown in the following example:

    • xmlrpc.accept.ips=

    • agents.accept.ips=10.*.*.*,

System Requirements for ESM 7.2;

  • Community Enterprise Operating System (CentOS) 7.6 and 6.10



High Performance


8 cores (16 preferred)

32 cores

40 cores


48 GB RAM (64 preferred)

192 GB RAM

512 GB RAM

Hard Disk

Six 600 GB disks (1.5 TB) (RAID 10)

10,000 RPM

20 1 TB disks (10 TB) (RAID 10)

15,000 RPM

12 TB (RAID 10)

Solid state

Linux Install

Download Install CentOS 7.6 http://ftp.iij.ad.jp/pub/linux/centos-vault/7.6.1810/isos/x86_64/

//Use CentOS 7.6 - http://ftp.iij.ad.jp/pub/linux/centos-vault/7.6.1810/
Boot intro Troubleshooting —&gt; install CentOS 7 in basic graphics mode

Download the ArcSightESMSuite- 7.0.0.xxxx.1.tar from <a href="https://softwaresupport.softwaregrp.com/.">https://softwaresupport.softwaregrp.com/.</a>

scp  [email protected]:tmp/esminstall

//Install TMUX for remote installations

yum install tmux
tmux list-sessions
tmux attach -t number-of-session

// USB Mount
fdisk -l
mkdir /mnt/usb
mount -v -t auto /dev/sdf1 /mnt/usb
cd /mnt/usb/
umount /dev/sdf1

//Nic on laptop enp0s31f6
nmtui edit enp0s31f6

// Add hostanme to IP address in hosts file
nano /etc/host

//Mount USB
fdisk -l mkdir
/mnt/usb mount -v -t auto /dev/sdf1 /mnt/usb
cd /mnt/usb/
umount /dev/sdf1

// Unarchive installer

Create arcsight user with GUID and SU rights
Create a folder called esm_installer
chown arcsight: esm_installer
<span style="color:var(--color-text);">tar xvf ArcSightESMSuite-7.0.0.xxxx.1.tar</span>
cd \Tools
sudo ./prepare_system.sh

// Copy the license files to same location

ulimit -a (<span style="color:var(--color-text);">open files 65536/</span><span style="color:var(--color-text);">max user processes 10240)</span>

// Download and set Timezone
wget tzdata-2019b-1.el7.noarch.rpm <span style="color:var(--color-text);">/opt/work/
rpm -Uvh /opt/work/</span>

sudo yum install tzdata -y
timedatectl list-timezones
timedatectl list-timezones | egrep -o “*Australian*.*”
timedatectl set-timezone “Asia/Kolkata”
timedatectl set-timezone America/Los_Angeles
timedatectl set-timezone UTC
timedatectl set-time 15:58:30
timedatectl set-time 20151120
timedatectl status
timedatectl | grep local
timedatectl set-local-rtc 1
timedatectl set-local-rtc 0
timedatectl set-ntp true

su arcsight | Pwd
Login under user account: arcsight into Console and install
/etc/init.d/arcsight_services stop all
/opt/arcsight/manager/bin/arcsight tzupdater /opt/arcsight /opt/arcsight/manager/lib/jre-tools/tzupdater
/etc/init.d/arcsight_services start all

//Starting the installer

chmod +x /tmp/esm_install/ArcSightESMSuite.bin

chown -R arcsight:arcsight ../Tools

// Error: You are installing this product on an unsupported platform.
// If you are install on later version you might need to downgrade the version manual then update it later
sudo nano /etc/centos-release
sudo nano /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
CentOS Linux release 7.6 (Core)

// LOGIN into CONSOLE as arcsight
./ArcSightESMSuite.bin -i console

/opt/arcsight/manager/bin/arcsight firstbootsetup -boxster -soft -i console

/opt/arcsight/kubernetes/scripts/cdf-updateRE.sh &gt; /tmp/ca.crt

//To install the time zone update package after you complete the ESM
/etc/init.d/arcsight_services stop all

/opt/arcsight/manager/bin/arcsight tzupdater /opt/arcsight
/etc/init.d/arcsight_services start all

// As arcsight user

// Install ESM Login under user account: arcsight into Console and install
/opt/arcsight/manager/bin/arcsight firstbootsetup -boxster -soft -i console

IMPORTANT: The root user must run the following script to start up required services:

// START SERVICES as arcsight user
/etc/init.d/arcsight_services start
/etc/init.d/arcsight_services stop all
/etc/init.d/arcsight_services start all

//Set the hostname in local hosts file

//Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --ignore-certificate-errors &amp;&gt; /dev/null &amp;

// Access https://arcsight:8443

/Chrome SSL Error type "thisisunsafe"

// Remove ESM
su arcsight

Remove all files in /tmp and /opt/arcsight rm -r *

The volume or partition required for installation of the /opt/arcsight directory does not contain the minium of 50GB of space to successfully install arcsight



df /opt/arcsight

df /opt/arcsight 	50GB
df/tmp	 		6GB

df -Th

echo 1 &gt; /sys/block/sda/device/rescan

pvresize /dev/sda3
lvextend -l +100%FREE -r /dev/mapper/centos-root





/opt/arcsight/connector/replay_pd/current/bin/arcsight agents

/opt/arcsight/connector/replay/current/bin/arcsight agents

/sbin/service arcsight_services start

/sbin/service arcsight_services start manager

/sbin/service arcsight_services stop

/sbin/service arcsight_services stop manager

tail -f /opt/arcsight/var/logs/manager/default/server.std.log

/opt/arcsight/manager/bin/arcsight deploylicense

Installations Options
0- ArcSight Content Management - This package contains resources to track content that is being managed across multiple ESM systems.
1- ArcSight ESM HA Monitoring - This package contains resources to track High Availability (HA) status and changes.
2- ArcSight Transformation Hub Monitoring - This package contains resources for monitoring Transformation Hub.
3- Security Threat Monitoring - This package contain default security threat monitoring content.
4- Threat Intelligence Platform - This package contains default content for threat intelligence platform.

Install ArcSight Console

  • Download software\


  1. DisableHyperThreading.This setting exists on most server class processors (for example, Intel processors) that support hyper threading. AMD processors do not have an equivalent setting.
  2. DisableIntelVT-d.This setting is specific to Intel processors and is likely to be present on most recent server class processors. AMD processors have an equivalent setting called AMD- Vi.
  3. SetPowerRegulatortoStaticHighPerformance.This setting tells the CPU(s) to always run at high speed, rather than slowing down to save power when the system senses that load has decreased. Most recent CPUs have an equivalent setting.
  4. SetThermalConfigurationtoIncreasedCooling.This setting increases the server fan speed to avoid issues with the increased heat that results from constantly running the CPU(s) at high speed.
  5. EnabletheMinimumProcessorIdlePowerPackageStatesetting.This setting tells the CPU not to use any of its C-states (various states of power saving in the CPU).
  6. SetPowerProfiletoMaximumPerformance. This setting results in the following changes:
    • QPI power management (the link between physical CPU sockets) is disabled.
    • PCIe support is forced to Gen 2.
    • C-states are disabled.
    • Lower speed settings on the CPUs are disabled so that the CPUs constantly run at high speed.

Silent Deployment using Terraform

ArcSight SmartConnector Install

//Insure FULL Java version on CentOS
[arcsight@vm-esm700-demo ~]$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
[arcsight@vm-esm700-demo ~]$

Microfocus has many product lines that is very interesting for cyber security intergrations;


ESM 101 2020-02-12 11-09-36

ESM 101 2020-02-03 15-35-31

User Roles

ESM_101_7.0P1.pdf (page 14 of 161) 2020-02-03 14-36-18

ESM 101 2020-02-12 12-57-40

ArcSight Connectors

automate the process of collecting and managing logs from any device and in any format through normalization and categorization of logs into a unified format known as Common Event Format (CEF), which is now an industry standard for log format. You can use this unified data for searching, reporting, analyzing or storing logs. ArcSight Connectors also manage ongoing updates, upgrades, configuration changes and administration of distributed deployments through a centralized web-based interface. They can be deployed as software or on an appliance

ESM 101 2020-02-03 15-39-55

ArcSight Connectors helps you with:

  • Scale easily to manage extreme machine data across IT
  • Reduce the cost of handling large volumes of logs and events in various formats
  • Automate the process of managing connectors to collect audit-quality log data
  • Share, upload, or download connectors within your ArcSight community
  • Seamlessly integrate with the ArcSight platform
  • Broadest set of built-in connectors that collect, aggregate, filter, and parse the logs
  • Managing log records in hundreds of different formats from hundreds of vendors
  • Patented technology to normalize and categorize logs that enables full-text English searching on rich metadata
  • High compression of log data up to 10:1 to reduce your storage costs significantly
  • Automate bandwidth management with low footprint

FlexConnector The FlexConnector framework is a software development kit (SDK) that enables you to create your own SmartConnector tailored to the nodes on your network and theirspecific event data. FlexConnector typesinclude file reader, regular expression file reader, time-based database reader, syslog, and Simple Network Management Protocol (SNMP) readers.

Forwarding Connector The Forwarding Connectorsforward events between multiple Managersin a hierarchical ESM deployment, and/or to one or more Logger deployments.

ArcSight Manager

The ArcSight Manager isthe heart of the solution. It is a Java-based server that drives analysis, workflow, and services. It also correlates output from a wide variety of security systems. The Manager writes eventsto the CORR-Engine asthey stream into the system. Itsimultaneously processesthem through the correlation engine, which evaluates each event with network model and vulnerability information to develop real-time threatsummaries. ESM comes with default configurations and standard foundation use cases consisting of filters, rules, reports, data monitors, dashboards, and network modelsthat make ESM ready to use upon installation


The Correlation Optimized Retention and Retrieval (CORR) Engine is a proprietary data storage and retrieval framework that receives and processes events at high rates, and performs high-speed searches

Security Use Case and Activate Framework Marketplace

ArcSightActivate Framework

ArcSight Activate Framework is a modular content development framework that allows you to implement ArcSight SIEM quickly and effectively. The framework provides a standard way of creating content. Standardized content means new analysts and engineers can easily review and understand existing content reducing the ramp-up time for new employees. It also opens up the possibility of sharing content with other ArcSight users. Best of all, the base content has been created from 10 years of experience implementing ArcSight in thousands of environments. What does this mean? It is proven and it works! ArcSight Activate Framework makes implementing SIEM easy. It helps you with:

  • Deploy modular content and standardized use cases to implement ArcSight quickly and effectively in your environment with minimal setup required.
  • Enable inexperienced users to create content quickly. Content created is easier to understand reducing training and maintenance costs.
  • Provide a standardized approach to creating content that can be shared between ArcSight installations and within the community to easily keep up on the latest IT security threats. This results in a robust SIEM that is easier to set up and maintain.
  • Leverage proven use cases developed by ArcSight SIEM experts to provide a robust implementation to increase your effectiveness and deployment success.

Downloads 309

Interactive Discovery

ArcSight Interactive Discovery (AID) is a separate software application that augments Pattern Discovery, dashboards, reports, and analytical graphics. AID provides enhanced historical data analysis and reporting capabilities using a comprehensive selection of pre-built interactive statistical graphics. You can use AID to: l Quickly gain visibility into your complex security data l Explore and drill down into security data with precision control and flexibility l Accelerate discovery of hard-to-find eventsthat may be dangerous l Presentstate of security in compelling visualsummaries l Build a persuasive, non-technical call to action l Prove IT Security value and help justify budgets

Pattern Discovery

Pattern Discovery can automatically detectsubtle, specialized, or long-term patternsthat might otherwise go undiscovered in the flow of events. You can use Pattern Discovery to: l Discover zero-day attacks—Because Pattern Discovery does not rely on encoded domain knowledge (such as predefined rules or filters), it can discover patternsthat otherwise go unseen, or are unique to your environment. l Detect low-and-slow attacks—Pattern Discovery can process up to a million eventsin just a few seconds(excluding read-time from the disk). This makes Pattern Discovery effective to capture even low-and-slow attack patterns. l Profile common patterns on your network—New patterns discovered from current network traffic are like signaturesfor a particularsubset of network traffic. By matching against a repository of historical patterns, you can detect attacksin progress. The patterns discovered in an event flow that either originate from or target a particular asset can be used to categorize those assets. For example, a pattern originating from machinesthat have a back door (unauthorized program that initiates a connection to the attacker) installed can all be visualized as a cluster. If you see the same pattern originating from a new asset, it is a strong indication that the new asset also has a back door installed. l Automatically create rules—The patterns discovered can be transformed into a complete rule set with a single mouse click. These rules are derived from data patterns unique to your environment, whereas predefined rules must be generic enough to work in many customer environments. Pattern Discovery is a vital tool for preventive maintenance and early detection in your ongoing security management operations. Using periodic, scheduled analysis, you can always be scanning for new patterns over varying time intervalsto stay ahead of new exploitative behavior

Logger ArcSight Logger is an event data storage appliance that is optimized for extremely high event throughput. Loggerstoressecurity events on board in compressed form, but can alwaysretrieve unmodified events on demand for historical analysis-quality litigation data. Logger can be deployed stand-alone to receive eventsfrom syslog messages or log files, or to receive eventsin Common Event Format from SmartConnectors. Logger can forward selected events assyslog messagesto ESM. Multiple Loggers work together to scale up to support high sustained input rates. Event queries are distributed across a peer network of Loggers.

Content, Solutions, and CIPs for ESM and Logger

ArcSight ESM Compliance Insight Package for the Payment Card Industry (PCI) version 4.1 is now generally available. It can be downloaded by licensed customers from the HP support web site. The solution guide and release notes can be found here.

What’s New?

ESM Compliance Insight Package for PCI 4.1 contains the following important updates:

  • Support for PCI requirements specified in Payment Card Industry Data Security Standard 3.2 (PCI DSS 3.2)
  • Support for logs generated by applications subject to Payment Application Data Security Standard 3.2 (PA DSS 3.2)

About ESM Compliance Insight Package for PCI:

The ESM Compliance Insight Package for PCI provides a system of reports and real-time checks specifically designed to monitor systems that contain cardholder data, manage vulnerability and access control, monitor networks, and maintain security policies to help demonstrate to stakeholders and auditors that the controls over your company’s credit card data systems expose little or no risk.


ESM uses objects called resources to manage event-processing logic. A resource defines the properties, values, and relationships used to configure the functions that ESM performs. Resources can also be the output of such a configuration (such as archived reports, or Pattern Discovery snapshots and patterns).

ESM has more than 30 different types of resources and comes with hundreds of these resources already configured to give you functionality as soon as the product is installed. These resources are presented in the Navigator panel of the ArcSight Console.

Modeling Resources “The Network Model” on page 120 enables you to build a businessoriented view of data derived from physical information systems. These distinctions help ESM to clearly identify events in your network, providing additional layers of detail for correlation. “The Actor Model” on page 146 creates a real-time user model that maps humans or agents to activity in applications and on the network. Once the actor model is in place, you can use category models to visualize relationships among actors, and correlation to determine if their activity is above board. l Assets l Asset Ranges l Asset Categories l Zones l Networks l Customers l Vulnerabilities l Locations l Actors l Category Models

Correlation Resources Correlation is a process that discovers the relationships between events, infers the significance of those relationships, prioritizes them, then provides a framework for taking action. l Filters l Rules l Data Monitors l Active Lists l Session Lists l Integration Commands l Pattern Discovery

Monitoring and Investigation Resources Active channels and dashboards are tools that monitor all the activity that ESM processes for your network. Each of these views enables you to drill down on a particular event or series of events in order to investigate their details. Saved searches are those you run on a regular basis. They include query statements, the associated field set, and a specified time range. Search filters contain only the query statements. You define and save searches and search filters in the ArcSight Command Center, and export these resources as packages in the ArcSight Console. l Active Channels l Field Sets l Saved Searches and Search Filters l Dashboards l Query Viewers

Workflow and User Management Resources Workflow refers to the way in which people in your organization are informed about incidents, how incidents are escalated to other users, and how incident responses are tracked. l Annotations l Cases l Stages l Users and User Groups l Notifications l Knowledge Base l Reference Pages

Reporting Resources Reporting resources work together to create batch-oriented functions used to analyze incidents, find new patterns, and report on system activity. l Reports l Queries l Trends l Templates l Focused Reports

Administration Resources Administration resources are tools that manage ESM’s daily maintenance and long-term health. l Packages l Files l Storage and storage volumes l Retention periods

Standard Content Standard content is a series of coordinated resources that address common enterprise network security and ESM management tasks. Many of these resources are installed automatically with ESM to provide essential system health and status operations. Others are presented as install-time options organized by category. l ArcSight Administration l ArcSight System

Content Synchronization and Management Content synchronization provides the ability to publish content from one ESM instance to multiple ESM instances. Synchronization is managed through the creation of supported packages, establishment of ESM subscribers, and scheduling the publication of content. Packages

Normalising Event Data

Normalize meansto conform to an accepted standard or norm. Because networks are heterogeneous environments, each device has a different logging format and reporting mechanism. You may also have logsfrom remote sites where security policies and procedures may be different, with different types of network devices, security devices, operating systems and application logs. Because the formats are all different, it is difficult to extract information for querying without normalizing the eventsfirst. The following examples are logsfrom differentsourcesthat each report on the same packet traveling acrossthe network. These logsrepresent a remote printer buffer overflow that connectsto IIS servers over port 80.

Check Point:

“14” “21Nov2016” “12:10:29” “eth-s1p4c0” “ip.of.firewall” “log” “accept” “www-http” “” “” “tcp” “4” “1355” “” “” “” “” “” “” “” “” “” “firewall” “len 68”

Cisco Router:

Nov 21 15:10:27: %SEC-6-IPACCESSLOGP: list 102 permitted tcp ->, 1 packet Cisco PIX: Nov 21 2016 12:10:28: %PIX-6-302001: Built inbound TCP connection 125891 for faddr gaddr laddr


[**] [1:971:1] WEB-IIS ISAPI .printer access [**] [Classification: Attempted Information Leak] [Priority: 3] 11/21-12:10:29.100000 -> TCP TTL:63 TOS:0x0 ID:5752 IpLen:20 DgmLen:1234 DF ***AP*** Seq: 0xB13810DC Ack: 0xC5D2E066 Win: 0x7D78 TcpLen: 32 TCP Options (3) => NOP NOP TS: 493412860 0 [Xref => http://cve.mitre.org/cgi-bin/cvename.cgi?name=CAN2001-0241] [Xref => http://www.whitehats.com/info/IDS533]

In order to productively store this diverse data in a common data store, SmartConnectors evaluate which fields are relevant and arrange them in a common schema. The choice of fields are content driven, ESM 101 Chapter 4: Data Collection and Event Processing Micro Focus ESM (7.0 Patch 1) Page 31 of 161 not based on syntactic differences between what Checkpoint may call target address and what Cisco calls destination address. To normalize, SmartConnectors use a parser to pull out those valuesfrom the event and populate the corresponding fieldsin the schema. Here is a very simple example of these same alerts after they have been normalized.

ESM 101 2020-02-12 11-29-43

Time stamp

Another factor in normalization is converting timestampsto a common format. Since the devices may all use different time zones, ESM normalization convertsthe timestampsto UTC (GMT).

Event Severity

During the normalization process, the SmartConnector collects data about the level of danger associated with a particular event asinterpreted by the data source that reported the event to the connector. These data points, device severity and agentseverity, become factorsin calculating the event’s overall priority described in “Evaluate the Priority Formula” on page 41.

Device severity capturesthe language used by the data source to describe itsinterpretation of the danger posed by a particular event. For example, if a network IDS detects a DHCP packet that does not contain enough data to conform to the DHCP format, the device flagsthis as a high-priority exploit.

Agent severity is the translation of the device severity into ESM-normalized values. For example, Snort uses a device severity scale of 1-10, whereas Checkpoint uses a scale of high, medium and low. ESM normalizesthese valuesinto a single agentseverity scale. The default ESM scale is Low, Medium, High, and Very High. An event can also be classified as AgentSeverity Unknown if the data source did not provide a severity rating.

Event Categories

Like the logsthemselves, differentsecurity devices also include a model for describing the characteristics of the eventsthey process. But no two devices or vendors use the same eventcharacteristic model. To solve this problem, ArcSight has also developed a common model for describing events, which enables you to understand the realsignificance of a particular event asreported from different devices. This common model also enables you to write device-independent content that can correlate events with normalized characteristics. This model is expressed as event categories, and the SmartConnector assignsthem using default criteria, which can be configured during connectorsetup. Event categories are a series of six criteria that translate the core meaning of an event from the system that generated it into a common format. These six criteria, taken individually or together, are a central tool in ESM’s analysis capability.

ESM 101 2020-02-12 11-41-23ESM 101 2020-02-12 11-40-53

Correlation is a four-dimensional processthat draws upon the network model, the priority formula, and optionally, Pattern Discovery to discover, infer meaning, prioritize, and act upon eventsthat meet specific conditions. For example, varioussystems on a network may report the following events: l UNIX operating system: multiple failed log-ins l IDS: Attempted brute force attack l Windows operating systems: multiple failed log-ins A correlation rule putsthese data pointstogether and detectsfive or more failed log-insin a oneminute period targeting the same source. Based on these facts, this combination of eventsis considered an attempted brute force attack. The Windows operating system next reports a successful log-in from the same source. The attempted brute force attack followed by a successful login from the same source elevatesthe risk that the attack may have been successful. To verify whether an attack wassuccessful, you can analyze the volume of traffic going to the Windows target. In this case, a sudden spike in traffic to thistarget can verify that a brute force attack was successful. ESM’s correlation tools use statistical analysis, Boolean logic, and aggregation to find events with particular characteristics you specify. Rules can then take automated action to protect your network.




ArcSight is developing Open and Extensible integration with BigData Analytics Kafka and Hadoop technology;