Used for ingestion, storage and management of metrics.
Public service with public space endpoint, accessible from aws and on-premise.
Any service using CloudWatch should have access to public space endpoint, usually via Internet Gateway or an Interface Endpoint within a VPC
- Public VPC's connect to CloudWatch using `Internet Gateway`.
- Private VPC's connect to CloudWatch using `Interface Endpoint`.
- On-premise VMs and Applications use either an Agent, CLI or API.
- Internet connected applications use CLI or API.
Many AWS Services include native `management plane integration`.
- EC2 for example provides metric data into CloudWatch without any user configuration (externally visible metrics)
- This includes only metrics available from outside the EC2
For metrics inside the instances or services, `CloudWatch Agent is required`.
**On-Premise integration** and **Application Integration** is done `either via Agent or via API` (custom metrics).
### CloudWatch Agent
- Injects detailed and custom metrics from an EC2 instance into CloudWatch
- Logs system, application and custom logs into CloudWatch logs
Alarms are used to react to metrics. And can be used to notify or perform actions.
- SNS Notifications can be sent
- Auto Scaling Group can perform scaling
- EventBridge Events
Valid alarm states:
- Insufficient Data
Datapoint - Smallest component of CloudWatch. It consists of:
- Unit of measure (Optional)
Metric - Time ordered set of Datapoints. It includes:
- Metric Name
Namespace - CloudWatch data for multiple application or services is segregated using Namespace such as 'AWS/EC2', 'AWS/Lambda'
Dimension - Used to differentiate between datapoints of diffrent application or services (instances). When you add data into CloudWatch you provide:
- Unit of measure (Optional)
- Metric Name
- 0 or more dimensions
# CloudWatch Logs
CloudWatch logs is a product which can store, manage and provide access to logging data for on-premises and AWS environments including systems and applications.
- It has native support for logs from aws products and services.
- For application or system logs for any on-premise system, you need to install CloudWatch agent.
CloudWatch Logs also supports ingestion of logs from:
- VPC flow logs
- Various other products like Elatic Beanstalk, ECS Container Logs, API Gateway, Lambda execution logs
- Route53 DNS request logging
## Log Events
Consists of two parts:
- Raw Message
## Log Stream
Sequense of Log Events that share the same Logging Source say `/var/output.log`.
If you have multiple EC2 instances doing similar logging to `/var/output.log` in their instance volume, then we can have 3 log stream, one for each EC2 instance.
## Log Group
Log Group is to monitor all the Log Stream for a given logging source say `/var/output.log` originating from each EC2 instance. So, in this case Log Group will have three Log Stream.
- Log Group is used to set retention, access permissions to the logs and encryption of logs at rest using KMS
## Metric filters
Metric filters can be used to generate:
- alarms based on patterns within a Log Group and
- Metrics within Cloudwatch
- eventual events within Eventbridge.
## Subscription filter
Via subscription filters, CloudWatch Logs can stream the data in `realtime` to following for further delivery.
- Kinesis streams and firehose
This is done on `per Log Group basis`. The Subscription Filter needs to be created on that LogGroup. Following is configured by the subscription filter:
- Pattern (To limit what gets handled by that filter)
- Set Destination ARN of the destination
- Distribution (Controls how logging data gets grouped up)
Post this we need to define permission for the CloudWatch log to get access to that destination.
### Destination S3 - For long term storage
In this case you can configure Kinesis Data Firehose as the destination, which allows `near realtime` delivery of logging data to S3.
- The delay is added because of the buffer (about 60 seconds)
### Destination ElasticSearch
You can configure destination as AWS Managed Lambda function that allows data to be delivered in `realtime` to ElasticSearch.
### Destination Custom
You can configure destination as a Lambda function that can be used to export data to nearly any destination in `realtime`.
### Destination Kinesis Data Stream
You can deliver logs to Kinesis Data Stream in `realtime`, this allows data to be used within other systems.
## Common Subscription Filter Architecture
We generally use subscription filter to aggregate logging data from multiple accounts.
We use multiple subscription filter in each account to send the data into `Kinesis Data Stream` which can allow realtime access to the logging data.
This data can then be sent to `Kinesis Data Firehose` which sends this for long term storage into S3.
## Export to S3
You can export logs to S3 via `CreateExportTask`. But this operation takes about 12 hours.
- Logs exported cannot be encrypted using KMS
- Logs can only be encrypted using S3 encryption
To configure logs within an EC2 instance we need the following at the minimum:
- EC2 instance role with CloudWatch permissions
- CloudWatch Agent Install with configuration
# AWS X-Ray
Equivalent to AppDynamics.
Its a distributed tracing application. `Designed to track session through an application`.
- Application could be monolithic, serverless or using an distributed microservice architecture
- Can include API Gateway, Lambda, Elastic Beanstalk, Load Balancer or Downstream services such as DynamoDB
We can now see a web application composed of multiple microservices, multiple VPCs, multiple regions and multiple availability zones.
X-Ray is able to trace only requests upon entry into your systems. Connectivity issues between a client and your resources cannot be traced.
## Data Captured
Traces are formed by correlating segments using a `unique trace ID`. X-Ray trace is set of data points that share the common trace ID.
Consider an client that initiates a new request to our application. The request is tagged with a `unique trace ID`. As the service makes its way downstream through further services in your application, the services relay information regarding the request back to X-Ray using the same `unique trace ID`.
### Tracing Header
The unique trace ID generated to track a request through our distributed application is inserted into the tracing header.
The applicatons sends the data in the form of segments. A segment provides the resource name, request and response details. And details about the job done.
For an HTTP request following data might be recorded:
- **host**: hostname, alias and/or ip address
- **request**: Includes the HTTP method, client address, path and/or user agent
- **response**: Includes the status and content
- **work done**: Start and End time
- **subsegments**: Other called upon subsegments
A Segment can includes Subsegments if that level of granularity is required.
Subsegments are used to record downstream calls from the point of view of the service that calls it.
X-Ray uses subsegments to identify downstream services that dont send segments.
### Service Graphs
X-Ray uses all the data that the application sends and generates what is called as Service Graph
- JSON Document detailing services and resources which make up your application
### Service Map
X-Ray uses these service graphs and generates a service map, which is the visual version of the service graph showing Traces.
## How Traces are collected for Various Services
- EC2 `(X-Ray Agent)`
- ECS `(Agent is installed as part of tasks)`
- Lambda `(Requires enabling data collection with X-Ray for the Lambda function)`
- Beanstalk `(Agent is preinstalled)`
- API Gateway `(Agent is enabled as a per stage basis)`
- SNS & SQS `(Can be configured to send data into X-Ray)`
## X-Ray Components
- **AWS X-Ray SDK**: Used to instrument the application code.
- **AWS X-Ray Daemon**: This collects all the local trace data and batches the information up and periodically sends it over the internet to the AWS X-Ray service. _The Daemon listens on port 2000 by default for UDP connections_.
- **AWS X-Ray API**: Used to receive the collected telemetry as delivered by AWS X-Ray Daemon.
- **AWS X-Ray Console**: This is where all the visualization happens.
## AWS X-Ray Daemon
Listens on UDP port 2000.
Credentials used by X-Ray Daemon can be either supplied via:
- an IAM rule (EC2 Environment)
- by setting environment variable for AWS access key ID and secret key (Non-EC2 Environment)
X-Ray daemon is provided appropriate IAM permissions to allow it to upload segment data and telemetry to X-Ray API. This is done by providing access to the following X-Ray API endpoints:
`AWSXrayWriteOnlyAccess` - Write persmission to authorize X-Ray Daemon, CLI or SDK to upload segment documents and telemetry to the X-Ray API.
`AWSXrayReadOnlyAccess` - Read persmission to authorize X-Ray Daemon, CLI or SDK to get trace data and service maps from the X-Ray API.
# VPC Flow Logs
VPC FLow logs is a feature allowing the monitoring of traffic flow to and from interfaces within a VPC
- Flow Logs DON'T monitor packet contents ... that requires a packet sniffer (installed on the EC2 instance).
- Flow Logs can be stored on S3 or CloudWatch Logs
Data captured includes:
- Source and Destination IP Address and Port
- Packet Size
- Account ID
- Interface ID
- And other externally available metadata
## Monitoring Points
VPC Flow logs can be added at a
- Subnet or
- Interface level (ENI).
- Flow logs arent Real Time.
- Destination can be S3 or CloudWatch Logs.
Following is traffic to and from is not recorded
- `169.254.169.254` (metadata IP)
- `169.254.169.123` (aws time synchronization server)
- Amazon DNS Server
- Amazon Windows Licence
Under the logs if the traffic was controlled through:
- `Security Group`: You would see only one entry in the log, as security groups are stateful. If the traffic is allowed INBOUND, its automatically allowed OUTBOUND.
- `Network ACL (NACL)`: Yould see two log entry if the Security Group allows traffic, NACL allows INBOUND traffic but blocks OUTBOUND traffic.
Sample Log format:
[version, account, eni, source, destination, srcport, destport="22", protocol="6", packets, bytes, windowstart, windowend, action="REJECT", flowlogstatus]