End-To-End MLOps in Azure



Integ­rat­ing Arti­fi­cial Intel­li­gence (AI) and Machine Learn­ing (ML) into com­pany sys­tems is easier said than done: numer­ous chal­lenges emerge when pro­duc­tion­al­ising a machine learn­ing model, at every step of its life­cycle. Some of these dif­fi­culties include retrain­ing the model quickly to incor­por­ate improve­ments, keep­ing track of met­rics and para­met­ers, model ver­sion­ing and com­par­ison, and its deploy­ment. Address­ing these chal­lenges effect­ively is what makes a suc­cess­ful project.

In this art­icle today we’re going to present a seam­less, pro­duct­ive way of deal­ing with these dif­fi­culties using the synvert MLOps (Machine Learn­ing Oper­a­tions) meth­od­o­logy in Microsoft Azure. We’ll show you a real cus­tomer use case, demon­strat­ing a full, end-to-end imple­ment­a­tion of this paradigm. Need­less to say, for pri­vacy reas­ons, we’ll be work­ing with a fic­tional data­set sim­ilar to the ori­ginal. The imple­ment­a­tion is not closely con­nec­ted to the use case, so it can be read­ily adap­ted to other scen­arios, effect­ively mak­ing it a blueprint.

The MLOps Methodoly

Draw­ing on the concept of DevOps, and sig­ni­fic­antly influ­enced by the Data Engin­eer­ing field, the MLOps meth­od­o­logy aims to address the chal­lenges of con­tinu­ous integ­ra­tion and con­tinu­ous deploy­ment (CI/CD) when engin­eer­ing ML solu­tions, as shown in Fig­ure 1 below.

Once an ini­tial ML model has been developed, it is unlikely that it will remain effect­ive indef­in­itely: at some point, per­haps after a few months or even weeks, the data may change, or your object­ive may be slightly dif­fer­ent. This is gen­er­ally known as model decay, where the model’s per­form­ance dimin­ishes over time, as indic­ated by a decrease in the met­ric of interest, such as accur­acy, mean squared error (MSE), or F1 score. This usu­ally means that the model should be retrained, eval­u­ated, then redeployed. Read­apt­ing quickly and smoothly to these require­ments is at the heart of MLOps. When suc­cess­fully imple­men­ted, the model becomes more reli­able and main­tain­able, as its life­cycle is stream­lined, and modi­fic­a­tions can be incor­por­ated easily.

MLOps as an intersection of ML, DevOps, and Data Engineering
Fig­ure 1: MLOps as an inter­sec­tion of ML, DevOps, and Data Engineering

This way of work­ing is becom­ing a stand­ard in the ML industry, with the body of doc­u­ment­a­tion grow­ing every day, and more and more people and organ­isa­tions get­ting on board. Here at synvert, we fol­low the stand­ard industry MLOps best prac­tices, shaped by our unique exper­i­ence and expert­ise gained devel­op­ing and pro­duc­tion­al­ising ML use cases.

Why Azure?

Azure offers Azure Machine Learn­ing, an AI ser­vice designed espe­cially for the end-to-end imple­ment­a­tion of ML use cases. This ser­vice integ­rates lots of tools that work together per­fectly and cover the main needs of enter­prise-level ML use cases: data pre­par­a­tion, build­ing and train­ing ML mod­els, val­id­a­tion and deploy­ment, and monitoring.

Let’s take a quick look at some of the tools that Azure ML provides:

  • To save the raw data, the data­store is ideal. We can then par­ti­tion it into a train­ing and test­ing data­set and register these as data assets. These assets facil­it­ate ver­sion­ing and main­tain a much richer col­lec­tion of inform­a­tion about the dataset.
  • To exper­i­ment with and build ML mod­els, note­books (the equi­val­ent of Jupy­ter Note­books) can be of great use.
  • To train the mod­els, dif­fer­ent com­pute engines can be cre­ated and jobs can be sent to them.
  • If the pro­cess is com­plex enough, the code can be divided into dif­fer­ent com­pon­ents, each with its own asso­ci­ated envir­on­ment and ver­sion, which can then be assembled with a pipeline.
  • For the deploy­ment, a nat­ural option is to use an end­point, either real-time or batch, or we can integ­rate it as part of other ser­vices such as Data Fact­ory.
  • Finally, mon­it­or­ing can be done in many ways, for instance by using data­set monitors.

In addi­tion, Azure DevOps will also be lever­aged. We will employ this ser­vice as a con­tinu­ous integ­ra­tion tool, although its poten­tial is much greater, offer­ing effi­cient team­work cap­ab­il­it­ies and its own soft­ware devel­op­ment frame­work. In our case, Azure DevOps will be used to orches­trate the dif­fer­ent parts of the MLOps imple­ment­a­tion in a single pipeline and con­nect to Git­Hub, which will allow changes in the code repos­it­ory to trig­ger the pipeline.

Use Case Overview

In short, our use case focuses on boost­ing profits from a mar­ket­ing cam­paign; we are using the Mar­ket­ing Cam­paign data­set from Kaggle, which is sim­ilar to the customer’s real data­set.  It con­sists of 2,240 obser­va­tions and 28 dif­fer­ent vari­ables. Each observation cor­res­ponds to a per­son, and the vari­ables provide inform­a­tion on dif­fer­ent spend­ing pat­terns, per­sonal details, and whether they have accep­ted pre­vi­ous (sim­ilar) offers or not.

Our goal, then, is to develop an ML model cap­able of pre­dict­ing who will respond to the offer based on per­sonal inform­a­tion, which effect­ively means get­ting to know the people inter­ested in the company’s products. This can be very use­ful, for example, in a “flash offer” scen­ario, where every day a sub­sample of cus­tom­ers is selec­ted for the present­a­tion of a spe­cial lim­ited-time offer, with the model help­ing to choose the sub­sample. Fig­ure 2 gives us a sim­pli­fied visu­al­isa­tion of this goal:

Fig­ure 2: Schem­atic rep­res­ent­a­tion of the Mar­ket­ing Cam­paign use case goal

As we will see, our MLOps meth­od­o­logy will enable the model to be eas­ily adap­ted to the unpre­dict­able nature of a pro­duc­tion envir­on­ment. Changes will be neces­sary, but once imple­men­ted, there will be a new, ready-to-con­sume model, all auto­mated, with min­imal fric­tion. This includes auto­mat­ic­ally retrain­ing, eval­u­at­ing, and deploy­ing the model; in other words, it means that the model will be sus­tain­able and robust over time.

Solu­tion Overview

The solu­tion con­sists of dif­fer­ent steps, each with a well-defined func­tion. These steps are orches­trated by the main pipeline in Azure DevOps, which executes them in order once triggered. The first to be executed is the Azure ML pipeline, a small pipeline (don’t con­fuse it with the big­ger DevOps pipeline) which encap­su­lates the pre-pro­cessing, fea­ture engin­eer­ing, retrain­ing, and test­ing phases in dif­fer­ent com­pon­ents. The next step com­pares the newly trained model to pre­vi­ous mod­els, then pro­motes it to the pro­duc­tion stage if it turns out to be bet­ter. The third and final step is deploy­ment, where the model is deployed to an end­point and the traffic is updated, as we can see in Fig­ure 3:

Simplified schema of the solution, showing the different steps of the main Azure DevOps pipeline
Fig­ure 3: Sim­pli­fied schema of the solu­tion, show­ing the dif­fer­ent steps of the main Azure DevOps pipeline

Trig­ger­ing the main Azure DevOps pipeline can be fully cus­tom­ised. In this imple­ment­a­tion, the trig­ger is simply a change in the “main” branch of the asso­ci­ated Git­Hub repository.

In par­al­lel to the Azure DevOps pipeline, a data drift detec­tion sys­tem has been set up, con­sist­ing of a recur­ring, sched­uled exe­cu­tion of a data drift detec­tion job, using a data­set mon­itor. This provides met­rics, graphs and alerts for the prompt detec­tion of any data dis­tri­bu­tion problems.

As well as the dif­fer­ent tech­no­lo­gies already men­tioned, the open-source MLflow plat­form is used extens­ively to facil­it­ate man­aging the model’s life­cycle, incor­por­at­ing con­cepts such as model ver­sion­ing and tag­ging. MLflow also seam­lessly integ­rates with Azure, mak­ing it a bet­ter option than sim­ilar altern­at­ives. So, to sum­mar­ise, the gen­eral tech­no­lo­gical land­scape needed to imple­ment the solu­tion presen­ted here looks like this:

Main technologies used in the proposed solution
Fig­ure 4: Main tech­no­lo­gies used in the pro­posed solution

The ori­ginal model under­went sev­eral iter­a­tions fol­low­ing MLOps prac­tices and was ulti­mately replaced by a Ran­dom Forest model, which showed super­ior per­form­ance. How­ever, since the spe­cif­ics of the model are bey­ond the scope of this blog post, we will not delve into these details.

In sum­mary, our pro­posed solu­tion auto­mates the entire pro­cess, from modi­fic­a­tions in the model or data pro­cessing to the full retrain­ing and deploy­ment of the updated model, ready for use.

The Azure DevOps Pipeline

This is the heart of the pro­ject, serving a dual pur­pose: firstly, it facil­it­ates the trig­ger­ing of cer­tain pro­cesses if the code is mod­i­fied, and secondly, it serves as an orches­trator for the dif­fer­ent com­pon­ents. This is why Azure DevOps must be cor­rectly con­nec­ted to the Git­Hub repos­it­ory with the model’s code, and to the Azure work­space where the data and com­pute are.

The necessary interconnectedness of Azure, Azure DevOps, and GitHub
Fig­ure 5: The neces­sary inter­con­nec­ted­ness of Azure, Azure DevOps, and GitHub

These con­nec­tions involve cre­at­ing a Git­Hub-type con­nec­tion using the Git­Hub Azure Pipelines App, and an Azure Resource Man­ager con­nec­tion which con­nects dir­ectly to your Azure sub­scrip­tion, spe­cific­ally to the work­space where the ML resources are loc­ated. The lat­ter con­nec­tion employs a Ser­vice Prin­cipal, used in Azure to per­form the neces­sary auto­mated tasks.

Once everything has been suc­cess­fully con­nec­ted, a single YAML file (inside the Git­Hub repos­it­ory) will suf­fice to define how and when everything will be executed. This means set­ting up dif­fer­ent tasks such as installing depend­en­cies and using the Azure Com­mand Line Inter­face (CLI) to send the cor­res­pond­ing jobs to the Azure com­pute resources. As everything is con­nec­ted, this last step will be sig­ni­fic­antly easier. For example, Fig­ure 6 shows how to send a par­tic­u­lar job, which in this case cor­res­ponds to the com­par­ing and pro­mot­ing phase, although we would do basic­ally the same for other phases as well:

Extract from the YAML file that controls the Azure DevOps pipeline, showing how to execute a script from the GitHub repository using Azure credentials
Fig­ure 6: Extract from the YAML file that con­trols the Azure DevOps pipeline, show­ing how to execute a script from the Git­Hub repos­it­ory using Azure credentials

Now let’s see how each step works in more detail!

The Azure Machine Learn­ing Pipeline

The pipeline is our first step and it’s pure ML: pre-pro­cessing, fea­ture engin­eer­ing, train­ing, and test­ing hap­pen here, lever­aging the cap­ab­il­it­ies of the pipelines and com­pon­ents fea­tures of the Azure Machine Learn­ing Stu­dio. The code is split into dif­fer­ent Python files that per­form dif­fer­ent tasks, such as imputa­tion, nor­m­al­isa­tion, or fea­ture selec­tion. These files are spe­cific­ally designed so that they can be trans­formed into Azure com­pon­ents, which means adding some extra syn­tax around the ML logic.

To define the Azure ML pipeline (dif­fer­ent from the main DevOps pipeline), a sep­ar­ate Python file is used, which con­sists of import­ing the indi­vidual files (cor­res­pond­ing to the com­pon­ents) and con­nect­ing inputs to out­puts. In this file, the data source is spe­cified, the pipeline is sub­mit­ted, and the com­pon­ents are registered, so a new ver­sion is cre­ated every time they change. Fig­ure 7 shows what the pipeline looks like inside the Azure Machine Learn­ing Stu­dio once everything has been set up:

Fig­ure 7: The Azure ML pipeline, encom­passing the main Machine Learn­ing steps

Inside the train­ing and test­ing com­pon­ents, MLflow is used to register the new ver­sion of the model, save the hyper­para­met­ers used and log all the neces­sary met­rics, such as the accur­acy and F1 score, in this case. As MLflow is dir­ectly integ­rated with Azure, all this inform­a­tion is access­ible through Azure’s sys­tems, includ­ing the UI.

The data source, as men­tioned, is spe­cified in the same file that is used to define the ML pipeline. In this imple­ment­a­tion, we have saved the train­ing and test­ing data in an Azure Blob Stor­age asso­ci­ated with the same work­space as the com­pute, and then cre­ated the cor­res­pond­ing data assets, which is the usual way of work­ing with data in Azure. Data assets sim­plify mov­ing data around the dif­fer­ent files and reduce the neces­sary authen­tic­a­tion steps to use the data. Cre­at­ing data assets is not lim­ited just to using blob stor­ages as sources of data: there are many other options, such as cre­at­ing them from Azure File Share, Azure Data Lake, SQL data­bases, or web files loc­ated in pub­lic URLs, among other alternatives.

Main­tain­ing the dif­fer­ent ML steps sep­ar­ately in Azure com­pon­ents offers sev­eral bene­fits: firstly, it makes the pro­cess con­cep­tu­ally easier to grasp, allow­ing a clearer under­stand­ing of the code’s func­tion­al­ity. Secondly, the code’s inher­ent mod­u­lar­ity greatly sim­pli­fies debug­ging. Thirdly, if neces­sary, we can isol­ate exe­cu­tion envir­on­ments, asso­ci­at­ing a dis­tinct YAML file with each com­pon­ent. And what’s more, we can see the exe­cu­tion times and logs for each com­pon­ent, facil­it­at­ing the iden­ti­fic­a­tion of future improvements.

Once the ML pipeline has fin­ished execut­ing, we’ll have a new model registered in Azure, fully trained and tested, with all the related inform­a­tion prop­erly registered and eas­ily access­ible, so it can be com­pared to other mod­els if neces­sary, in pre­par­a­tion for the next step.

How the Best Model is Always in Production

In this phase, the new model is com­pared to the pre­vi­ous model in pro­duc­tion and, if it’s bet­ter, it is pro­moted using a tag­ging sys­tem man­aged by MLflow. More spe­cific­ally, all the neces­sary met­rics saved dur­ing the test­ing phase are retrieved for both the in-pro­duc­tion and the new model, and then some logic is applied to decide whether the new model is indeed bet­ter or not. For instance, in this par­tic­u­lar imple­ment­a­tion, we com­pare the accur­acy and the F1 score, and if both are bet­ter, then the new model is promoted:

Code example from the compare and promote step, showing how the metrics from the model in production and the new model are retrieved and then compared
Fig­ure 8: Code example from the com­pare and pro­mote step, show­ing how the met­rics from the model in pro­duc­tion and the new model are retrieved and then compared

It is import­ant to note that even if the model is pro­moted, this is only done sym­bol­ic­ally, with tags using MLflow; the actual deploy­ment of the new model is the next step.

The logic gov­ern­ing this phase is highly adapt­able, allow­ing for the imple­ment­a­tion of more com­plex con­di­tions and checks to sat­isfy any qual­ity standards.

Mak­ing the Model Avail­able: Deployment

If the pre­vi­ous step has determ­ined that the newly trained model is bet­ter than the model in pro­duc­tion, then it should be deployed so it can be used, and once again we’ll lever­age the exist­ing cap­ab­il­it­ies in Azure ML to carry this out. Spe­cific­ally, we’ll use the Azure end­points func­tion­al­ity, which allows us to cre­ate an end­point and deploy one or more mod­els to it.

All the cor­res­pond­ing logic can be packed in a single Python file, which checks whether there is already an end­point or not (if not, it cre­ates one), then cre­ates a new deploy­ment for the new model and, depend­ing on the exist­ence of a prior deploy­ment, updates the traffic accord­ingly. We can see the pro­cess below:

Code example from the deployment step, showing how different tasks are performed
Fig­ure 9: Code example from the deploy­ment step, show­ing how dif­fer­ent tasks are performed

The infer­ence logic is encap­su­lated in a Python file (usu­ally called the “scor­ing script”) asso­ci­ated with the deploy­ment. This serves as the link between the new data and the trained model, pre-pro­cessing the new, incom­ing data before it’s fed to the model. MLflow can be used to log the mod­els employed for nor­m­al­isa­tion and imputa­tion as arte­facts, ensur­ing the con­sist­ent applic­a­tion of these tech­niques at the time of infer­ence. Nev­er­the­less, there is a fully auto­mated altern­at­ive, har­ness­ing the inher­ent cap­ab­il­it­ies of Azure ML to auto­mat­ic­ally gen­er­ate the scor­ing script.

Nat­ur­ally, the gen­eral work­ings of the deploy­ment phase can be expan­ded to incor­por­ate more advanced tech­niques such as blue-green deploy­ment, canary deploy­ment, A/B test­ing, or an extens­ive array of unit tests to make sure everything is work­ing cor­rectly. As in the pre­vi­ous step, the logic is very mal­le­able and can be adap­ted to spe­cific needs.

Check­ing for Data Drift

In par­al­lel with the Azure DevOps pipeline, a peri­odic check for data drift can be per­formed to ensure the model is still valid. Although still in its early stages, Azure ML offers a tool, Data­set Mon­it­ors, which enables check­ing for data drift between the train­ing data­set and new data, or, in more gen­eral terms, between a baseline and a tar­get dataset.

Our pro­posed imple­ment­a­tion spe­cific­ally uses the DataDrift­De­tector class, offer­ing unique func­tion­al­it­ies when work­ing with AKS clusters, such as updat­ing the data­sets it com­pares. How­ever, as this scen­ario does not apply to us, in our imple­ment­a­tion we have sidestepped updat­ing the data­sets by cre­at­ing a new class instance each time we want to check for data drift.

To integ­rate this, a note­book is first used to set up a sched­uler which will be used to execute the data drift detec­tion script peri­od­ic­ally and auto­mat­ic­ally. The exe­cu­tion fre­quency can be cus­tom­ised as desired, depend­ing on the par­tic­u­lar needs of the scen­ario. In our case we opted for daily, as we expect new data to arrive on a daily basis. When the script is executed, the new data is retrieved, the data­sets for com­par­ison (baseline and tar­get) are updated, the data drift detec­tion job is sent to the com­pute, and the met­rics are gathered.

Every time a data drift detec­tion job is executed, we end up with met­rics, graphs, and if data drift is detec­ted, an email alert (which can be fur­ther cus­tom­ised). Fig­ure 10 shows a few example results:

Different metrics and graphs provided by the Dataset monitor in Azure
Fig­ure 10: Dif­fer­ent met­rics and graphs provided by the Data­set mon­itor in Azure

The main met­ric aims to sum­mar­ise, in a single num­ber, the extent of dif­fer­ence across data­sets, regard­less of the volume of columns or obser­va­tions. This is easy to grasp at a glance, but not a lot of inform­a­tion is avail­able on how this num­ber is cal­cu­lated. The other met­rics are simple stat­ist­ics for each fea­ture that gauge the vari­ance in dis­tri­bu­tions. These include the cal­cu­la­tion of the min­imum, max­imum, mean, Euc­lidean dis­tance, and the Wasser­stein metric.

These met­rics can then be used to retrain the model auto­mat­ic­ally if neces­sary, using the new data obtained after model deploy­ment. Nat­ur­ally, this can only be done if we have access to the labels cor­res­pond­ing to the obser­va­tions (the y‑values). For example, in our flash offer scen­ario, we would know at the end of the day whether the cus­tom­ers have accep­ted the offer or not, so we could obtain the labels. When and how to retrain the model can depend heav­ily on the use case in ques­tion. In some instances, instead of auto­mat­ing retrain­ing, it might be bet­ter to set up an alert and del­eg­ate this decision to an actual person.

This data drift imple­ment­a­tion is key in the basic under­stand­ing of how data dis­tri­bu­tions shift, as well as identi­fy­ing any issues war­rant­ing atten­tion. While it is suf­fi­cient for numer­ous applic­a­tions, there may be cases that need more advanced tech­niques such as data sli­cing or concept drift ana­lysis. There are lots of meth­od­o­lo­gies to explore!

Con­clu­sion

In this blog post, we have presen­ted a solu­tion that brings MLOps to life using Azure ML. Lever­aging the dif­fer­ent tools that this ser­vice offers, we have been able to archi­tect an end-to-end pipeline that has taken an ini­tial ML model to the next level, robust in face of the dangers of a pro­duc­tion environment.

Now the model’s man­age­ment is stream­lined: mak­ing changes is easy, as the whole pro­cess is auto­mated. Once this solu­tion is up and run­ning, we can focus our energy and time on the model, and what’s more, every step is trans­par­ent and can be fully con­trolled and adap­ted to spe­cific requirements.

As we men­tioned at the begin­ning, think of this blog post as a blue­print. Our par­tic­u­lar use case was simply a way to illus­trate the whole pro­cess. It’s really just a mat­ter of how far you want to take it, and the pre­cise goals and needs of each usage scen­ario – the pos­sib­il­it­ies are vir­tu­ally lim­it­less! If you’d like to see all this in action, watch our video on You­Tube.

If you would like to know more about MLOps in Azure, and whether it might be the solu­tion to your spe­cific needs that you’ve been look­ing for, do not hes­it­ate to con­tact us. Our team of experts are ready to help in build­ing and imple­ment­ing a solu­tion tailored to your spe­cific use case!