Cloud Nat­ive Ab Ini­tio Data Applications



Intro­duc­tion

Cloud com­put­ing has become a pop­u­lar option for many busi­nesses in the European Union. As of 2021, 42% of EU busi­nesses are using cloud com­put­ing ser­vices, and most of the com­pan­ies are using cloud data­bases, CRM applic­a­tions, and other advanced cloud ser­vices (source: Eurostat).

Use of cloud computing services in EU enterprises in 2021

Fig­ure 1. Use of cloud com­put­ing ser­vices in EU enter­prises in 2021 (source: Eurostat)

Why host Ab Ini­tio data applic­a­tions in a cloud?

Using a con­tainer orches­tra­tion plat­form for host­ing Ab Ini­tio data applic­a­tions in a cloud provides sev­eral advant­ages, such as effi­cient resource util­iz­a­tion, auto­mated scal­ing, and sim­pli­fied man­age­ment of con­tain­er­ized applic­a­tions. With fea­tures like load bal­an­cing, ser­vice dis­cov­ery, and auto­mated rol­louts, con­tainer orches­tra­tion plat­forms like Kuber­netes and Docker are essen­tial tools for organ­iz­a­tions look­ing to max­im­ize the bene­fits of cloud computing.

The key decision driver for host­ing Ab Ini­tio data applic­a­tions in the cloud is whether your data sources or tar­gets are in the cloud. Typ­ical use cases for host­ing Ab Ini­tio data applic­a­tions in the cloud include:

  • pro­cessing data in cloud-based data­bases (like Snow­flake, Amazon DynamoDB and oth­ers), cloud-based file or object stor­ages (e.g. Amazon S3)
  • migrat­ing your cur­rent on-premise data ware­house solu­tion to pure-cloud or hybrid-cloud solution.

Why Use a Con­tainer Orches­tra­tion Platform?

No mat­ter whether you host your Ab Ini­tio applic­a­tions on-premise or in a cloud, most prob­ably you’re pay­ing for idle com­pute resources since your Ab Ini­tio applic­a­tions con­sume the resources only when you execute jobs. At the same time your teams may com­pete for server resources since they can­not share the same infra­struc­ture simultaneously.

container orchestration Platform

Host­ing Ab Ini­tio data applic­a­tions in a con­tainer orches­tra­tion plat­form may become increas­ingly pop­u­lar due to its numer­ous bene­fits, includ­ing increased scalab­il­ity, port­ab­il­ity, and ease of deploy­ment. Rolling out your Ab Ini­tio applic­a­tion to the Kuber­netes cluster may solve these problems.

First, you can put the exact amount of com­put­ing resources and stor­age you need to run it in your applic­a­tion specification.

Second, you can cre­ate as many isol­ated applic­a­tions as you need for mul­tiple teams – e.g., test teams may con­sume the same test data sources. Still, they need to test sev­eral applic­a­tions inde­pend­ently of each other.

Third, with elastic resources, you con­sume only the resources at the time of test­ing – major cloud pro­viders sup­port auto­matic scal­ing of com­put­ing resources. So, you can hori­zont­ally grow your applic­a­tion within spe­cified bound­ar­ies – for instance, set up the auto-scal­ing nodes group up to 5 worker nodes. Kuber­netes will scale out the applic­a­tions across worker nodes up to the avail­able limit when needed. Once your tests are com­plete, Kuber­netes will auto­mat­ic­ally scale in the worker nodes.

New envir­on­ment setup is usu­ally a soph­ist­ic­ated task – you must pre­pare the infra­struc­ture, install all com­pon­ents, con­fig­ure para­met­ers, and wire up the environment’s data sources and tar­gets. Even if you have an auto­mated solu­tion, con­fig­ur­ing it is always a big chal­lenge. Con­tain­er­iz­a­tion solves these issues – when you design the image of Ab Ini­tio data applic­a­tion, you must spe­cify the whole stack of soft­ware required to run your applic­a­tion (OS, Co>Operating Sys­tem, third-party lib­rar­ies), includ­ing envir­on­ment-inde­pend­ent para­met­ers and con­fig­ur­a­tion files.

Dur­ing Kuber­netes-applic­a­tion design and envir­on­ment-spe­cific con­fig­ur­a­tion steps, you final­ize this work – your applic­a­tion gets the uni­form image bin­ary and envir­on­ment-spe­cific con­fig­ur­a­tions (e.g. wir­ing with all required data­bases, dir­ect­or­ies, etc.).

Applic­a­tion deploy­ment on test and pro­duc­tion envir­on­ments is much more con­sist­ent in Kuber­netes – the applic­a­tion pack­aging is uni­fied across all envir­on­ments and con­tains all required com­pon­ents, and con­fig­ur­a­tion ele­ments are decoupled from the applic­a­tion image.

From leg­acy to containers

applications in containers

In the tra­di­tional deploy­ment era, we used to run applic­a­tions on bare-metal serv­ers. It has one not­able advant­age – a sim­pli­city of applic­a­tion main­ten­ance, but sev­eral disadvantages:

  1. Lack of resources segreg­a­tion – memory leak in App 1 may cause all other applic­a­tions to freeze;
  2. Lack of resi­li­ency – you needed to take care of app crashes
  3. Lack of seam­less upgrade – usu­ally, you need to stop the cur­rent applic­a­tion before an upgrade

Nowadays, the reli­ab­il­ity of cus­tomer-facing applic­a­tions is one of the key drivers of a suc­cess­ful com­pany. This is espe­cially import­ant for online applic­a­tions – from real-estate search engines to soph­ist­ic­ated e‑commerce solu­tions and online bank­ing applications.

The next era began with evolving of hyper­visors (both types I and II), which allowed “to slice” phys­ical server’s resources between dif­fer­ent vir­tual machines and run applic­a­tions. The main advant­age gained was the sep­ar­a­tion of resources. Each applic­a­tion or a set of applic­a­tions were run­ning in isol­ated vir­tual machines with lim­ited resources. But this advant­age comes at a cost – each VM requires an oper­at­ing sys­tem, which cre­ates an over­head of required com­put­ing resources.

The con­tainer era tried to over­come all men­tioned dis­ad­vant­ages. The con­tain­ers concept is based on con­tainer runtime, which allows run­ning applic­a­tions in isol­ated con­tain­ers reusing the resources of under­ly­ing OS – from disk to ker­nel OS func­tions. Also, it brings the same advant­age of resource con­straints based on the Linux-ker­nel cgroups feature.

Kuber­netes brought addi­tional fea­tures to make deploy­ments and main­ten­ance of con­tain­er­ized applic­a­tions easier:

  1. Hori­zontal scal­ing – Kuber­netes allows you to con­fig­ure your applic­a­tion to run in par­al­lel in a required num­ber of con­tain­ers. Ab Ini­tio sup­ports this paradigm, includ­ing data parallelism;
  2. Seam­less upgrades – no-down­time deploy­ments with rolling upgrades or more soph­ist­ic­ated blue-green and canary deployments;
  3. Ser­vice dis­cov­ery and load bal­an­cing – allows auto-dis­cover ser­vices and evenly dis­trib­ute the work­load across the cluster.
  4. Self-heal­ing – if your applic­a­tion crashes, Kuber­netes will try to restart it automatically.

Con­tain­er­ized Ab Ini­tio application

Can we con­tain­er­ize Ab Ini­tio applications?

The short answer is yes, we can, and we recom­mend going for it wherever it’s possible:

  • Use the con­tain­er­ized applic­a­tions to run both Ab Ini­tio web applic­a­tions (like Con­trol Cen­ter, Author­iz­a­tion Gate­way, Query>It, etc.) and Co>Operating Sys­tem which is the heart of data applications.
  • Ab Ini­tio provides great fea­tures to bring elastic scalab­il­ity to your data applic­a­tions in Kuber­netes clusters. It also sup­ports all major cloud pro­viders both for online and batch data processing.

How to con­tain­er­ize Ab Ini­tio data application?

To con­tain­er­ize Ab Ini­tio data applic­a­tions, fol­low these five major steps:

  1. Design image – pre­pare your Ab Ini­tio applic­a­tion pack­age to run in containers
  2. Build image – assemble the applic­a­tion image with all run-time dependencies
  3. Pub­lish image – pub­lish the image to private image registry
  4. Design Kuber­netes-applic­a­tion – pre­pare the envir­on­ment-spe­cific con­fig­ur­a­tion of your application
  5. Con­fig­ure applic­a­tion – pre­pare the envir­on­ment-spe­cific con­fig­ur­a­tion of your application

In most use cases, you will need an auto­mated Con­tinu­ous Integ­ra­tion pro­cess and tools to per­form these steps above as soon as your data applic­a­tions are grow­ing rap­idly. We dis­tin­guish two loose-coupled flows of this process:

  1. Con­tinu­ous Integ­ra­tion of Ab Ini­tio applic­a­tion Docker-image
  2. Con­tinu­ous Integ­ra­tion of Kuber­netes applic­a­tion and envir­on­ment-spe­cific configuration
continious integration of ab initio application docker-image

Design Docker-image

Design an Ab Ini­tio data applic­a­tion image spe­cific­a­tion so that it is run­nable inside a con­tainer runtime environment:

  • Include all run-time depend­en­cies (e.g. Co>Operating sys­tem, data­base drivers, JRE, etc.);
  • Sep­ar­ate con­fig­ur­a­tion from the app image – you need to avoid envir­on­ment-spe­cific con­fig­ur­a­tions inside the applic­a­tion image;
  • Optim­ize the struc­ture of your image so that it con­tains a min­imal num­ber of lay­ers with max­imum cach­ing in mind.

As a res­ult, you’ll get an applic­a­tion image spe­cific­a­tion that can then be used at the build stage.

Build image

This means assem­bling the applic­a­tion image accord­ing to a spe­cific­a­tion – usu­ally on some ded­ic­ated build host. We prefer using Docker to build the images as an enter­prise stand­ard, but you can choose from vari­ous avail­able options like con­tainerd, Kaniko, and others.

As an out­put, you’ll get an applic­a­tion image stored in the local Docker registry on a build host.

Pub­lish image

Once your image is ready, you need to pub­lish it to some Docker-image private registry (like Amazon ECR) access­ible by your Kuber­netes cluster. At this step, we also recom­mend per­form­ing the auto­matic secur­ity scan (at least for major changes).

Design Kuber­netes-applic­a­tion

With the pre­vi­ous stage, you can run your applic­a­tion in any con­tainer runtime (Docker, con­tainerd and oth­ers), but to get most of the bene­fits, you need to design an applic­a­tion con­fig­ur­a­tion to run it using Kuber­netes as an orches­tra­tion sys­tem. We recom­mend using pack­age man­agers (like Helm) to sim­plify the design and fur­ther roll-out / roll-back oper­a­tions of Kuber­netes applications.

Con­fig­ure Kubernetes-application

Setup of con­fig­ur­a­tion ele­ments of your applic­a­tion is decoupled from the applic­a­tion image design and even from Kuber­netes-applic­a­tion design process:

  1. It should reflect the tar­get envir­on­ment setup – i.e., data volume mounts, data­base con­nec­tion strings, secrets, and other envir­on­ment-spe­cific para­met­ers of the Ab Ini­tio application.
  2. It should also con­tain all required con­fig­ur­a­tion files if your applic­a­tion design requires them.

We recom­mend put­ting all con­fig­ur­a­tion ele­ments under a source-con­trol sys­tem to enable the ver­sion­ing and release man­age­ment pro­cess to run easier.

How to deploy your Ab Ini­tio application?

The deploy­ment pro­cess is usu­ally auto­mated in the CD pipeline, and we recog­nize sev­eral steps:

  1. Choose the tar­get envir­on­ment – Kuber­netes cluster
  2. Deploy your applic­a­tion to the tar­get Kuber­netes together with envir­on­ment-spe­cific setup, i.e., con­nec­tions to spe­cific data­bases, file paths, cre­den­tials, etc..
  3. In SIT envir­on­ment, we also run auto­mated integ­ra­tion and regres­sion tests as part of the deploy­ment pipeline.
  4. Qual­ity gate­way checks – if the qual­ity cri­teria pass, then report the deploy­ment suc­cess; oth­er­wise, roll back the deployment.

Fur­ther insights

Using Kuber­netes as an orches­tra­tion sys­tem implies that your Ab Ini­tio applic­a­tion design respects import­ant topics:

  1. Secur­ity – design your applic­a­tions so that they apply the best secur­ity prac­tices for Kuber­netes applications.
  2. Data par­al­lel­ism – avoid using the stand­ard approach with a multi-file sys­tem since data par­al­lel­ism is sup­por­ted dif­fer­ently within the Kuber­netes cluster.
  3. File oper­a­tions – use cloud stor­age, like S3 object store, or con­sider per­sist­ent volumes if you want to retain your data if your app dies dur­ing an incident.