AWS EKK Log System Setup: Elasticsearch + Kinesis + Kibana Hands-On Guide
Step-by-step tutorial for building an EKK log collection system on AWS using Amazon Elasticsearch Service, Kinesis, and Kibana to collect and analyze Nginx access logs with custom field parsing
ElasticsearchAWSKinesisKibana日志分析
969  Words
2018-09-12
EKK is a log collection stack built entirely on AWS managed services: Amazon Elasticsearch Service, Amazon Kinesis, and Kibana. Compared to a self-managed ELK stack, EKK is significantly easier to set up and maintain since AWS handles the infrastructure. Here is the basic architecture:

This guide focuses on the practical aspects of collecting Nginx logs and getting them into Elasticsearch with the correct field mappings, rather than covering every AWS console click.
Prerequisites
- Launch an EC2 instance (Ubuntu 16.04) with Nginx installed. Configure a custom log format:
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'$connection "$upstream_addr" '
'upstream_response_time $upstream_response_time request_time $request_time';
This produces access log entries like:
192.168.13.1 - - [12/Sep/2018:03:59:12 +0000] "GET /v1/home HTTP/1.1" 200 2787 "https://test.com/product/example.html" "Mozilla/5.0 ..." "2002:c7:6f02:9801:..." 12340 "127.0.0.1:9000" upstream_response_time 0.11 request_time 0.11
Create an IAM user with permissions to access Kinesis Stream/Firehose and Amazon Elasticsearch Service. Save the
awsAccessKeyIdandawsSecretAccessKey.Launch an Amazon Elasticsearch Service domain (e.g., “TestES”). Choose public access during creation – you can add security policies later.
Create a Kinesis Firehose delivery stream with the destination set to your ES domain. If you need to fan out one source to multiple destinations, use Kinesis Data Streams instead.
Collecting Logs with Kinesis Agent
Install the Amazon Kinesis Agent
# Clone the source
git clone https://github.com/awslabs/amazon-kinesis-agent.git
# Install Java JDK (required on Ubuntu 16.04)
sudo apt-get install openjdk-8-jdk
# Run the installer
sudo ./setup --install
Configure the Agent
Edit /etc/aws-kinesis/agent.json. The default config is a starting point, but here are two production-ready configurations.
Configuration 1: Kinesis Firehose with custom log parsing
{
"awsAccessKeyId": "YOUR_ACCESS_KEY",
"awsSecretAccessKey": "YOUR_SECRET_KEY",
"cloudwatch.emitMetrics": false,
"firehose.endpoint": "firehose.us-west-2.amazonaws.com",
"cloudwatch.endpoint": "https://monitoring.us-west-2.amazonaws.com",
"kinesis.endpoint": "https://kinesis.us-west-2.amazonaws.com",
"flows": [
{
"filePattern": "/usr/local/programs/nginx/logs/access.log",
"deliveryStream": "api-nginx-access-log",
"dataProcessingOptions": [
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG",
"matchPattern": "^([\\d.]+) \\S+ \\S+ \\[([\\w:/]+)\\s[+\\-]\\d{4}\\] \"([A-Z]+) (.+?) ([\\w./]+)\" (\\d{3}) (\\d+) \"(.+?)\" \"(.+?)\" \"(.+?)\" (\\d+) \"(.+?)\" upstream_response_time (\\d.+) request_time (\\d.+)",
"customFieldNames": [
"remote_addr", "datetime", "request_type", "request_url",
"http_version", "response_status", "body_bytes_sent",
"http_referer", "http_user_agent", "http_x_forwarded_for",
"connection_serial_number", "upstream_addr",
"upstream_response_time", "request_time"
]
}
]
}
]
}
Configuration 2: Kinesis Data Streams with default parsing
{
"awsAccessKeyId": "YOUR_ACCESS_KEY",
"awsSecretAccessKey": "YOUR_SECRET_KEY",
"cloudwatch.emitMetrics": false,
"cloudwatch.endpoint": "https://monitoring.us-west-2.amazonaws.com",
"kinesis.endpoint": "https://kinesis.us-west-2.amazonaws.com",
"flows": [
{
"filePattern": "/usr/local/programs/nginx/logs/access.log",
"kinesisStream": "api-nginx-access-log",
"partitionKeyOption": "RANDOM",
"dataProcessingOptions": [
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG"
}
]
}
]
}
Key Configuration Notes
- Endpoints are region-specific. Find yours at the AWS endpoint reference.
- The default
COMMONAPACHELOGformat won’t parse custom fields likeupstream_response_time. You need a custommatchPatternregex withcustomFieldNames. - The Kinesis Agent is a Java application, so the
matchPatternmust use Java regex syntax. This is an easy mistake to make. - For detailed configuration options, see the official documentation.
Debugging Your Regex
Since getting the regex right is critical, here is a useful approach. Use an online Java code runner to test your pattern:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main(String args[]) {
String line = "132.31.43.24 - - [12/Sep/2018:05:58:36 +0000] \"POST /v1/tracks/hello HTTP/1.1\" 200 79993 \"https://test.com/page\" \"Mozilla/5.0 ...\" \"17.47.23.134, 12.128.106.104\" 74518 \"127.0.0.1:9000\" upstream_response_time 20.186 request_time 0.186";
String pattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"([A-Z]+) (.+?) ([\\w./]+)\" (\\d{3}) (\\d+) \"(.+?)\" \"(.+?)\" \"(.+?)\" (\\d+) \"(.+?)\" upstream_response_time (\\d.+) request_time (\\d.+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("remote_addr: " + m.group(1));
System.out.println("datetime: " + m.group(4));
System.out.println("request_type: " + m.group(5));
System.out.println("response_status: " + m.group(8));
// ... test all groups
} else {
System.out.println("NO MATCH");
}
}
}
Remember to escape Java strings properly. Once your regex matches correctly, copy it into the matchPattern field in agent.json.
Agent Service Commands
sudo service aws-kinesis-agent start # Start
sudo service aws-kinesis-agent restart # Restart (after config changes)
sudo service aws-kinesis-agent status # Check status
Configuring the ES Index Template
The data pipeline (Agent -> Firehose -> ES) works out of the box, but there is a field type problem: everything arrives as text type. For example, you probably want datetime as a date type and body_bytes_sent as long. The solution is an index template.
Create a template that automatically applies to matching index names:
curl -XPUT https://your-es-endpoint:9200/_template/nginx-access-log_template \
-H 'Content-Type: application/json' -d '{
"template": "*-nginx-access-log-*",
"mappings": {
"log": {
"_all": { "enabled": false },
"properties": {
"remote_addr": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"request_type": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"request_url": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"http_version": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"response_status": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"body_bytes_sent": { "type": "long" },
"http_referer": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"http_user_agent": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"http_x_forwarded_for": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"connection_serial_number": { "type": "long" },
"upstream_addr": { "type": "text", "fields": { "keyword": { "type": "keyword" }}},
"upstream_response_time": { "type": "double" },
"request_time": { "type": "double" },
"datetime": { "type": "date", "format": "dd/MMM/YYYY:HH:mm:ss" }
}
}
}
}'
The template name is nginx-access-log_template, and the pattern *-nginx-access-log-* means any index matching that glob (e.g., api-nginx-access-log-2018-08-02) will automatically get these field mappings.
Watch out for the datetime format. Nginx uses time_local by default (e.g., 12/Sep/2018:03:59:12), which requires the format dd/MMM/YYYY:HH:mm:ss – not the more common ISO 8601 format. This is a subtle gotcha that can cost you hours of debugging.
Result
Here is a screenshot of the logs flowing into Kibana:

Known Limitations
- No geolocation data (country information is missing)
- Browser and device details are not parsed from the user agent string
- Solution approach: Add a Lambda function between Firehose and ES to enrich the data with GeoIP lookups and user agent parsing
Related Articles
- Elasticsearch Tutorial: Core Concepts of Indices, Documents, and Query APIs - Deep dive into ES fundamentals and query syntax
- ELK Stack Setup Guide: Elasticsearch + Logstash + Kibana + Kafka Full Architecture - Complete enterprise logging platform deployment
Comments
Join the discussion — requires a GitHub account