Pav­ing the way to suc­cess­ful Log Man­age­ment meta­morph­osis – Deploy­ing Gray­log in AWS (Ubuntu 20.04 LTS)



Today’s change in tech­no­lo­gical and meth­od­o­lo­gical devel­op­ments to Cloud Com­put­ing, Con­tinu­ous Integration/ Con­tinu­ous Deliv­ery (CI/CD) and DevOps – together with the shift from mono­lithic to light­weight micro-ser­vice archi­tec­ture pat­tern, is enabling organ­isa­tions to speed up devel­op­ment and deploy­ment pro­duc­tion applications.

A paradigm shift that also comes with short­com­ings. Dis­trib­uted logs, includ­ing the pro­lif­er­a­tion of instances and con­tain­ers, are mak­ing log man­age­ment and mon­it­or­ing much more of a chal­lenge. Not only sheer the volume of inter­con­nec­ted data points across modularised/ dis­trib­uted sys­tems is to be con­sidered. Moreover, the struc­tured and semi-struc­tured log data entails being parsed, nor­m­al­ised and ana­lysed in real-time. As micro-ser­vices run on mul­tiple hosts, log mes­sages gen­er­ated by micro-ser­vices are spread across mul­tiple serv­ers – mak­ing it exceed human abil­it­ies to find valu­able inform­a­tion or per­mit track­ing errors to their source for cor­rec­tion, amidst many logs files (without even men­tion­ing auto-scaled envir­on­ments). In many ways, organ­isa­tions embark­ing on the jour­ney to this paradigm shift – suc­cess­ful devel­op­ment and oper­a­tions comes down to suc­cess­ful Log Man­age­ment to grant full vis­ib­il­ity into the health of micro-ser­vice envir­on­ments and ful­fil log­ging and mon­it­or­ing require­ments for compliance.

What is Log Data?

Logs or log files can be described as the lin­gua franca of a com­puter sys­tem, soft­ware and other net­work appar­atus emit­ted in response to an event occur­ring within a sys­tem or network.

In gen­eral, a log file con­sists of 3 attributes:

  1. Timestamp – the time and date the mes­sage was generated.
  2. Source sys­tem – the appar­atus cre­at­ing the log file.
  3. Log mes­sage – the actual log data.

NIST[1] cat­egor­ises log events in 3 types: secur­ity software‑, oper­at­ing sys­tem- and applic­a­tion logs. Yet, there is no stand­ard­isa­tion on the exten­sion of the log files or the schema of the log data i.e. con­tent, format, or sever­ity – lead­ing to each sys­tem, applic­a­tion or net­work gen­er­at­ing dif­fer­ent log files in dif­fer­ent formats.

Get­ting Insights from Log Data

Cent­ral Log Man­age­ment is crit­ical and essen­tial when organ­isa­tions become steeped in the mind­set of mov­ing towards Cloud Com­put­ing and light-weight micro-ser­vice archi­tec­tures. First and fore­most, a hol­istic view of log data gen­er­ated across the enter­prise infra­struc­ture elim­in­ates the com­plex­ity and is much more power­ful than ana­lys­ing log data in isol­a­tion. Per con­tra, inges­tion log data from dif­fer­ent source points is lead­ing up to implic­a­tions arising by non-stand­ard­isa­tion, which makes it excep­tion­ally hard or even infeas­ible to ana­lyse log events side-by-side without a tool. A Log Man­age­ment solu­tion is required to cent­ral­ise, cor­rel­ate and ana­lyse all log files, to ensure that data hid­den in logs are turned into mean­ing­ful, action­able insights. Besides a cent­ral­ised log man­age­ment solu­tion, also a sys­tem­atic and com­pre­hens­ive approach is required to be able to ana­lyse log data from the entire infra­struc­ture stack. Typ­ical use cases facil­it­ated by cent­ral­ised log man­age­ment solu­tions are:

Real-time Mon­it­or­ing and Troubleshoot­ing: The accu­mu­la­tion of all per­form­ance and error log data in one cent­ral loc­a­tion and mak­ing it access­ible to author­ised users plays a cru­cial role in redu­cing MTTR (mean-time-to-recov­ery) through time-effi­cient and pro­act­ive mon­it­or­ing and break­ing down the bar­ri­ers between IT Ops and developers. Auto­mated mon­it­or­ing and issue troubleshoot­ing help to assure applic­a­tion and infra­struc­ture health by tail­ing logs in real-time to pin­point and alert on oper­a­tional prob­lems to fur­ther drill-down to find the root cause of the issue.

Event Cor­rel­a­tion: A power­ful ana­lysis tech­nique that allows draw­ing com­plex rela­tions from vari­ous log events into iden­ti­fi­able pat­terns. If those iden­ti­fied pat­terns indic­ate anom­alies – auto­mated actions (i.e. alerts or alarms) based on defined con­di­tions and rules can be per­formed to achieve a stream­lined in-depth con­trol. Event cor­rel­a­tion is typ­ic­ally used to identify indic­at­ors of an attack to enhance secur­ity and enable secur­ity pro­fes­sion­als to detect and alert on threats.

Com­pli­ance and Reg­u­la­tions: After all, not only the rap­idly evolving tech­no­logy land­scape has rein­forced the need for a log man­age­ment solu­tion. Secur­ity and com­pli­ance reg­u­la­tions man­date organ­isa­tions to col­lect, retain and pro­tect log data and provide its avail­ab­il­ity for audit­ing e.g. PCI DSS, ISO 27002 or GDPR.

Des­pite COVID-19 hit­ting enter­prise wal­lets – the demand for Log Man­age­ment solu­tions is still anti­cip­ated to grow. Data Insights is a hol­istic solu­tion integ­rator with extens­ive expert­ise in Log Man­age­ment. In the fol­low­ing – it will be demon­strated how to install Gray­log in AWS on an Ubuntu 20.04 LTS machine.

Gray­log as Cent­ral Log Man­age­ment Solution

Gray­log is a power­ful open-source enter­prise-grade log man­age­ment sys­tem solu­tion, provid­ing an integ­rated plat­form for the col­lec­tion, stor­age, nor­m­al­isa­tion, search, ana­lysis and visu­al­isa­tion of log data from across the entire IT infra­struc­ture and applic­a­tion stack on a cent­ral­ised server. The soft­ware oper­ates on a three-tier archi­tec­ture and scal­able stor­age – built around Elast­ic­search and Mon­goDB.
The min­imum sys­tem setup con­sists of the Gray­log web inter­face, Gray­log server, Elast­ic­search nodes to store log data and provide search cap­ab­il­it­ies to Gray­log, and Mon­goDB to store con­fig­ur­a­tion data.

Graylog AWS

Fig­ure 1: Source: Graylog

Guide: How to Deploy Gray­log in AWS (Ubuntu 20.04 LTS)

So, here we go. This brief tutorial takes you step-by-step through the pro­cess of installing a Gray­log server in AWS on a clean Ubuntu 20.04 LTS machine, and the con­fig­ur­a­tion of a simple input that receives sys­tem logs.

  • Step 1: Deploy Ubuntu 20.04 LTS server in AWS
  • Step 2: Install Open­JDK, Mon­goDB, Elasticsearch
  • Step 3: Install Graylog
  • Step 4: Setup Sys­log Input

Note: This tutorial does not cover secur­ity set­tings! Make sure the Gray­log server is not pub­licly exposed, and (enter­prise) secur­ity best prac­tices and guidelines are followed.

Step 1: Deploy Ubuntu 20.04 LTS machine in AWS

  1. Launch an EC2 instance
    Log in the AWS con­sole and in the top nav­ig­a­tion bar – go to > Ser­vices > EC2 > Choose Amazon Machine Image (AMI)
    Select > Ubuntu Server 20.04 LTS (HVM), SSD Volume Type
  2. Choose an Instance Type
    Gray­log requires at least 4GB memory – depend­ing on the data volume inten­ded to be col­lec­ted RAM is to be increased
  3. Fin­ish con­fig­ur­a­tion wiz­ard and spin up the Vir­tual Machine (VM)
    Note: Add Stor­age – In this tutorial, we are only doing a basic setup with our VM’s sys­log input.
    Optional: If you aim to con­fig­ure more inputs, increase the disk stor­age to at least 40GB.
    Go to >Launch
    Choose > Cre­ate a new pair, save .pem file loc­ally and select > Launch instances
  4. Con­fig­ure Secur­ity Group
    Once the instance is launched click on the newly launched instance the instance over­view page. In the descrip­tion sec­tion, select to the default secur­ity group that has been launched by the con­fig­ur­a­tion wiz­ard. Open ports: 9000, 514, 1514 (Select > Source > Anywhere)
  5. Alloc­ate Elastic IP address
    In the nav­ig­a­tion pane – go to > Elastic IPs
    Select > Alloc­ate new address > Amazons pool (IPv4 address pool) > Alloc­ate
    Go to > Actions > Asso­ci­ate address
    Select >launched_instance > Asso­ci­ate
  6. SSH to launched AWS instance as Ubuntu user
    In the EC2 instance over­view dash­board, select the launched_instance and go to > Con­nect – and fol­low the ter­minal instructions.
  7. Update Ubuntu machine
    To update the Ubuntu machine run the fol­low­ing com­mands below:1sudo apt-get update2sudo apt-get upgrade

When promp­ted enter y

Step 2: Install Open­JDK, Mon­goDB and Elasticsearch

Since Elast­ic­search is a Java-based soft­ware – a pre­requis­ite to run Elast­ic­search is the install­a­tion of Java.

Open­JDK Installation

To install the open-source ver­sion of Java – run the fol­low­ing com­mands below:

1sudo apt-get update && sudo apt-get upgrade
2sudo apt-get install apt-transport-https openjdk-8-jre-headless uuid-runtime pwgen

When promp­ted enter y

To verify the Java install­a­tion – run the fol­low­ing com­mand below:

1java -version

The out­put should be sim­ilar as below:

Graylog AWS

Mon­goDB Installation

To install Mon­goDB – run the com­mands below:

1sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4
2echo "deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list
3sudo apt-get update
4sudo apt-get install -y mongodb-org

To enable Mon­goDB auto­mat­ic­ally dur­ing the oper­at­ing system’s star­tup and verify it is run­ning – run the com­mands below:

1sudo systemctl daemon-reload
2sudo systemctl enable mongod.service
3sudo systemctl restart mongod.service
4sudo systemctl --type=service --state=active | grep mongod
5sudo systemctl status mongod

The out­put should be sim­ilar as below:

Graylog AWS


Elast­ic­search Installation

To install the open-source ver­sion of Elast­ic­search 6.x – run the com­mands below:

1wget -q https://artifacts.elastic.co/GPG-KEY-elasticsearch -O myKey
2sudo apt-key add myKey
3echo "deb https://artifacts.elastic.co/packages/oss-6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
4sudo apt-get update && sudo apt-get install elasticsearch-oss

To modify the Elast­ic­search con­fig file – run the com­mand below:

1sudo tee -a /etc/elasticsearch/elasticsearch.yml > /dev/null <<EOT
2cluster.name: graylog
3action.auto_create_index: false
4EOT

To start Elast­ic­search and verify it is run­ning – run the com­mands below:

1sudo systemctl daemon-reload
2sudo systemctl enable elasticsearch.service
3sudo systemctl restart elasticsearch.service
4sudo systemctl --type=service --state=active | grep elasticsearch
5sudo systemctl status elasticsearch.service

The out­put should be sim­ilar as below:

Graylog AWS

Step 3: Gray­log Installation

By now, Java, Elast­ic­search and Mon­goDB are installed and configured.

To install the Gray­log repos­it­ory con­fig­ur­a­tion and Gray­log itself – run the com­mands below:

1wget https://packages.graylog2.org/repo/packages/graylog-3.3-repository_latest.deb
2sudo dpkg -i graylog-3.3-repository_latest.deb
3sudo apt-get update && sudo apt-get install graylog-server graylog-enterprise-plugins graylog-integrations-plugins graylog-enterprise-integrations-plugins

Con­fig­ure Graylog

To gen­er­ate a secret key (admin­is­trator account pass­word) of at least 64 char­ac­ters – run the com­mand below: <pre>

1pwgen -N 1 -s 96

To set a hash (sha256) pass­word for the root user – copy the gen­er­ated secret key and cre­ate its sha256 check­sum – run the com­mand below (replace pass­word with the pre­vi­ously gen­er­ated password):

1echo -n secret_key | sha256sum

To open Gray­log con­fig file – run the com­mand below:

1sudo nano /etc/graylog/server/server.conf

Replace password_secret (secret_key) and root_password_sha2 with the val­ues pre­vi­ously cre­ated values.

Uncom­ment http_bind_address and replace it with the pub­lic host­name or a pub­lic IP address of the Ubuntu machine – leave the default port 9000. Save and exit the editor.

To restart Gray­log, enforce it on the server star­tup and verify Gray­log is run­ning – run the com­mands below:

1sudo systemctl daemon-reload
2sudo systemctl enable graylog-server.service
3sudo systemctl start graylog-server.service
4sudo systemctl --type=service --state=active | grep graylog
5sudo systemctl status graylog-server

The out­put should be sim­ilar as below:

Graylog AWS

Login to Graylog

In your local browser enter http:// and the Ubuntu’s server pub­lic IP address (as defined in the con­fig­ur­a­tion file) fol­lowed by 9000 (port num­ber). Now, you should be able to see the Gray­log server portal. Next – log in as a default admin – with the pre­vi­ously gen­er­ated password_secret (as defined in the con­fig­ur­a­tion file).

Graylog AWS

Step 4: Setup Sys­log Input

Gray­log nodes pro­cess log data via Inputs. There­fore, the Input needs to be con­figured in the Gray­log UI and in the server config.

Gray­log UI Configuration

To setup or ter­min­ate inputs go to Sys­tem > Inputs (in the Gray­log UI).

Graylog AWS

In the drop­down box that con­tains the text Select Input, select > Sys­log UDP, and then click > Launch new input. A form appears to fill in the fol­low­ing attributes:

  1. Node: Choose the deployed Ubuntu server (Private IP – should be the only option in the drop­down list)
  2. Title: Fill-in an appro­pri­ate name- i.e. Linux Server Logs.
  3. Bind address: Set it to the server’s private IP.
    Note: To col­lect log data from external serv­ers (not recom­men­ded – Sys­log does not sup­port authen­tic­a­tion) bind address is to be set to 0.0.0.0.
  4. Port, enter 1514.
    Note: Ports 0 through 1024 can be only used by the root user. Any port num­ber above 1024 should be work­ing as long no con­flict is caused by other services.
  5. Click > Save

So, far we con­figured an input that listens on port 1514. Next, the server needs to be con­figured to send the log data to Graylog.

Server Con­fig­ur­a­tion

To cre­ate and open a new rsys­log con­fig­ur­a­tion file – run the com­mand below:

1sudo nano /etc/rsyslog.d/60-graylog.conf

Note: To send logs to Gray­log from other serv­ers, the fire­wall excep­tion for UDP port needs to be added first for the defined port e.g. 2514

1sudo ufw allow 2514/udp

Add the line below to the rsys­log con­fig­ur­a­tion file – replace private_ip with the Gray­log server’s private IP. Save and exit the editor.

1*.* @private_ip:1514;RSYSLOG_SyslogProtocol23Format

To restart rsys­log – run the com­mand below:

1sudo systemctl restart rsyslog

This step is to be repeated for each server to be send­ing logs to Graylog.

Go to > Search tab in the nav­ig­a­tion bar in the Gray­log UI. The most recent logs are shown there.

Graylog AWS

Con­clu­sion

This guide showed how to deploy Gray­log on Ubuntu 20.04 LTS in AWS and how to con­fig­ure a straight­for­ward input source. Next steps are:

  • Fin­ish setup in Gray­log: Streams, extract­ors, dash­boards, con­di­tions and alerts.
  • Pro­tect log data: author­iz­a­tion concept, role-based access con­trol, encryp­tion, secur­ity set­tings, etc.
  • Imple­ment­a­tion of data log man­age­ment life­cycle policies i.e. dele­tion, archiv­ing, aggreg­a­tion of log data (to avoid stor­age demand explosions

More inform­a­tion on Gray­log and its doc­u­ment­a­tion are to be found here.

Lit­er­at­ure

[1] NIST https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf
[2] Gray­log https://docs.graylog.org/en/3.3/