How to Oper­ate your Enter­prise Data Plat­form – Meet Our EDPOps Accelerator



By now, nearly every medium to large-sized organ­isa­tion recog­nises that data is the new gold, and invest­ing in data ana­lyt­ics is essen­tial to stay com­pet­it­ive – there’s no arguing with that! It gets even more inter­est­ing when we explore how to choose, build and oper­ate an Enter­prise Data Plat­form (EDP) that can power all our ana­lyt­ics work­loads, from BI to AI, and how to do so at scale, in the most con­trolled and effi­cient way to serve a grow­ing num­ber of use cases and users.

Last year, in our ‘How to Choose your Data Plat­form’ series, we pub­lished two blog posts: in the first blog post, we traced the evol­u­tion of data plat­form archi­tec­tures, from the early data ware­houses that entered organ­isa­tions in the 1980s, to the fea­ture-rich EDPs our cus­tom­ers rely on today. We also out­lined the cap­ab­il­it­ies we now expect every plat­form (and its under­ly­ing tech stack) to deliver. In our second post, we took our read­ers on a tour of the lead­ing tech stacks we deploy with our cus­tom­ers, like AzureAWSGCPOracleClouderaSnow­flake and Dat­ab­ricks, each fully equipped to elev­ate data ana­lyt­ics to the next level.

While this post doesn’t delve into the details of build­ing an EDP, remem­ber that we’ve spent years rolling out data plat­forms on every stack lis­ted above, both in highly secure on-premises envir­on­ments and across all major clouds. For example, our ready-made Ter­ra­form tem­plates can stream­line deploy­ment on the lead­ing cloud providers.

In this blog post, we’ll focus on how to oper­ate an EDP, and espe­cially on how to do so at scale for hun­dreds of use cases, users and data con­sumers. Seasoned read­ers will spot a sig­ni­fic­ant over­lap between run­ning an EDP and the dis­cip­lines typ­ic­ally grouped under Data Strategy and Data Gov­ernance, and indeed there is an over­lap: just as an organ­isa­tion needs a robust Data Strategy and Data Gov­ernance pro­ced­ure when scal­ing, it also requires a clear EDP oper­at­ing model that gov­erns every activ­ity within the plat­form, whether or not those activ­it­ies fall under formal Data Gov­ernance. In other words, Data Gov­ernance con­trols the data, but EDP oper­a­tions need their own gov­ernance to ensure scal­able, repeat­able, and com­pli­ant exe­cu­tion. Over the past couple of years, we have helped sev­eral large organ­isa­tions cre­ate or refine such oper­at­ing mod­els, and today we’ll share our proven method for build­ing a suc­cess­ful EDP oper­at­ing model.

What’s more, we’ll also intro­duce you to EDPOps Accelerator, our pro­gramme that not only guides you through the cre­ation or enhance­ment of an EDP oper­at­ing model, but also accel­er­ates the deploy­ment of its key com­pon­ents. Won­der­ing how? Keep reading!

Recipe for Suc­cess­fully Cre­at­ing an EDP Oper­at­ing Model

For an EDP that works at scale:

  1. Gather the goals and require­ments for the EDP from top spon­sors and align them with the exist­ing Data Strategy and Data Gov­ernance programmes.
  2. Cre­ate an invent­ory of cur­rent and fore­seen pat­terns of usage within your EDP.
  3. Determ­ine the type of oper­at­ing model (cent­ral­ised, decent­ral­ised, data mesh, hub and spoke) that fits the Data Strategy and Data Gov­ernance programmes.
  4. Select a core tech­no­logy stack, a com­pon­ent archi­tec­ture design, and an envir­on­ment strategy that can sup­port all the pat­terns in the selec­ted type of oper­at­ing model and is in line with the Data Strategy and Data Gov­ernance programmes.
  5. Define and cre­ate reusable, stand­ard frame­works (and the tool­ing and approaches required) for all data and metadata operations.
  6. Identify, stand­ard­ise, and determ­ine clear own­er­ship for key pro­cesses within the EDP; define the roles and respons­ib­il­it­ies of each EDP per­sona so that every­one knows exactly what they can and can­not do.

Stand­ard­isa­tion is the key ingredi­ent for suc­cess: an EDP can only scale when con­sist­ent frame­works, tool­ing and pro­cesses gov­ern everything that hap­pens within the plat­form. This require­ment holds true whether you’re build­ing a data mesh, a hub-and-spoke archi­tec­ture, or any other oper­at­ing model.

Pat­terns Inventory

Before we can stand­ard­ise how to per­form every oper­a­tion in the EDP, we need to under­stand which pat­terns of usage are run­ning or will run in the plat­form. These pat­terns define how each cat­egory of activ­ity is, or will be, per­formed:  data inges­tion, data pro­cessing, report­ing, data cata­loguing, AI, etc.

We often find very sim­ilar pat­terns in large organ­isa­tions. Take data inges­tion, for example: one team might use one par­tic­u­lar approach to ingest tables from RDBMS sources, and a dif­fer­ent approach for file-based feeds, whereas another team relies solely on a single approach in both cases.

For every pat­tern, we need to under­stand how the rel­ev­ant com­pon­ents of the tech stack inter­act. When a pat­tern already exists, we col­lect met­rics (num­ber of jobs, volume of data, etc.), and for future pat­terns, we help the cus­tomer to identify poten­tial risks and recom­mend mitigations.

We build this invent­ory through a series of tem­plated dis­cov­ery ses­sions with rep­res­ent­at­ives from the vari­ous groups of EDP users, typ­ic­ally from dif­fer­ent busi­ness units within the organisation.

Types of EDP Oper­at­ing Models

There are dif­fer­ent types of oper­at­ing mod­els. Until a few years ago, most organ­isa­tions recog­nised only two core mod­els: cent­ral­ised and decent­ral­ised (often called fed­er­ated). In a cent­ral­ised model, a single ‘cent­ral’ team owns every aspect of the EDP, and all busi­ness units rely on it, whereas in a decent­ral­ised model each busi­ness unit main­tains one or more data teams man­aging ana­lyt­ics end-to-end, with, ideally, some fed­er­ated coordin­a­tion. Both mod­els bring their own pros and cons, yet they share a crit­ical weak­ness: neither scales grace­fully. As the num­ber of use cases and users grows, oper­at­ing the EDP effi­ciently and in a timely way becomes a nightmare!

In recent years, two newer mod­els have emerged to tackle, among other chal­lenges, this scalab­il­ity issue: data mesh and hub-and-spoke. Data mesh (see our pre­vi­ous blog post here) extends the decent­ral­ised approach whilst fix­ing its short­com­ings, remain­ing fully decent­ral­ised yet backed by robust inter­op­er­ab­il­ity and gov­ernance stand­ards. The hub-and-spoke model evolves the cent­ral­ised approach, re-archi­tect­ing it for scale. Like their pre­de­cessors, both have advant­ages and dis­ad­vant­ages, and determ­in­ing which is best for an organ­isa­tion depends on many factors, but this lies bey­ond the scope of this art­icle. For read­ers inter­ested in more details on the vari­ous types of oper­at­ing mod­els and how they com­pare with each other, we recom­mend check­ing this blog post.

graphic of EDP Operating Models

Tech­no­logy, Archi­tec­ture, and Environments

Regard­ing the choice of a core tech­no­logy stack, any of the plat­forms lis­ted above are solid choices. Note that here we are talk­ing strictly about the core stack, the found­a­tional tech­no­logy run­ning the back end of your EDP, which you may later com­ple­ment with other tools (often from dif­fer­ent vendors) to handle spe­cific oper­a­tions (more on this in the Frame­works sec­tion). Ideally, we would invent­ory usage pat­terns before select­ing the tech­no­logy. How­ever, in many cus­tomer engage­ments the decision is already locked in, usu­ally driven by enter­prise-wide agree­ments between the organ­isa­tion and the vendor.

In either case, once the core tech­no­logy stack has been selec­ted, we need to define the com­pon­ent archi­tec­ture design which determ­ines, at a high level, which ser­vices you are going to use and for what. Each hyper­scale cloud (Azure, AWS, GCP, and so on) offers mul­tiple ser­vices that can accom­plish the same task, so it is cru­cial to estab­lish which com­pon­ents are pre­ferred for each workload.

Finally, you must define an envir­on­ment strategy: how many envir­on­ments you are going to use, and their rela­tion­ships and con­straints (only prod; dev and prod; dev, UAT, prod; dev, UAT, per­form­ance and prod, etc.). Some­times we dis­tin­guish between phys­ical and logical envir­on­ments: a logical envir­on­ment coex­ists with another logical envir­on­ment in the same phys­ical envir­on­ment. For example, an organ­isa­tion might host both dev and prod in one phys­ical envir­on­ment, dif­fer­en­ti­at­ing them through nam­ing con­ven­tions, sep­ar­ate com­pute queues, etc.

Frame­works

Once the pat­tern invent­ory is com­plete, a type of model is agreed, and we have a core stack, a pre­ferred set of com­pon­ents, and an envir­on­ment strategy in place, we urge our cus­tom­ers to stand­ard­ise every data and metadata oper­a­tion pat­tern on those choices, extend­ing only when truly neces­sary. In prac­tice, if two teams carry out the same task (for example, ingest­ing data from an RDBMS table), they should fol­low the same tool­ing and approach. We recom­mend defin­ing and cre­at­ing a set of frame­works and then enfor­cing their usage (this might require refact­or­ing data pipelines but the effort is worth it). A frame­work, here, is the com­bin­a­tion of tool­ing and an auto­mated, doc­u­mented method for tack­ling an agreed-upon set of pat­terns, fully aligned with the over­all EDP oper­at­ing model. The nar­rower the scope of each frame­work, the easier it is to manage.

Below you’ll find a com­pre­hens­ive list of frame­works and what they define and enforce, based on what we have observed in our cus­tomer projects:

  • Cre­ation – Define or update data assets such as tables, Kafka top­ics, and more.
  • Inges­tion – Ingest data in batches or streams from source sys­tems into the EDP’s ini­tial layer (often called raw, bronze, land­ing, etc.).
  • Pro­cessing – Trans­form data, in batch or stream­ing mode, as it moves between EDP lay­ers, from the ini­tial layer upwards.
  • Export­ing – Export EDP data to other external down­stream sys­tems via files, APIs, JDBC, etc.
  • Cata­logue and Lin­eage – Cata­logue data assets (with their busi­ness con­text) and map their rela­tion­ships. The frame­works above (cre­ation, inges­tion, pro­cessing, and export­ing) should enforce and pop­u­late the metadata required for cata­logue and lineage.
  • Data Qual­ity – Main­tain a search­able invent­ory of data-qual­ity issues within EDP assets. In some scen­arios, this frame­work is embed­ded in the pro­cessing layer itself.
  • DevOps and Pipeline Pro­mo­tion – Organ­ise, ver­sion, and share code; pro­mote pipelines (cre­ation, inges­tion, pro­cessing, etc.) to higher envir­on­ments. Semi-auto­mated check­lists and gated pro­mo­tion ensure adher­ence to the EDP oper­at­ing model, speed­ing up time-to-production.
  • Infra­struc­ture Cre­ation – Cre­ate the infra­struc­ture and ser­vices that sup­port the EDP, ideally via stand­ard­ised Infra­struc­ture-as-Code (IaC) tools such as Ter­ra­form, whether eph­em­eral or per­man­ent, espe­cially in cloud deployments.
  • Data Rep­lic­a­tion and Mask­ing – Copy pro­duc­tion data to lower envir­on­ments with appro­pri­ate mask­ing. A ded­ic­ated frame­work auto­mates these tasks so devel­op­ment and test­ing can pro­ceed safely without expos­ing sens­it­ive information.
  • MLOps & GenAI – Man­age the life­cycle and pro­duc­tion­isa­tion of ML mod­els. The growth in the use of Gen­er­at­ive AI (LLMs, RAG, and, more recently, agents) also requires stand­ard­ised access patterns.
  • Data Observ­ab­il­ity and Mon­it­or­ing – Mon­itor pipelines and the data­sets they update to con­firm that jobs run as expec­ted, that data remains cur­rent, and that the plat­form stays healthy.
  • Data Archival and Life­cycle – Archive or delete age­ing data. In some cases, this is con­sidered part of the cre­ation or pro­cessing frameworks.
  • Orches­tra­tion – Sched­ule all the oper­a­tions that need to run periodically.
  • Access Man­age­ment and Secur­ity – Con­trol access to the data and also to other assets, such as related pipelines.
  • Con­sump­tion – Cre­ate con­sum­able, graph­ical rep­res­ent­a­tions of the EDP data for visu­al­isa­tion and explor­a­tion, fol­low­ing organ­isa­tional standards.
  • Mod­el­ling – Design effi­cient data struc­tures (BI KPIs, ML fea­tures), main­tain a con­trolled invent­ory, avoid duplic­ated mod­els, and enforce clear sep­ar­a­tion of con­tent across EDP lay­ers in line with mas­ter data man­age­ment principles.

So, as we can see, an effect­ive EDP oper­at­ing model clearly defines each of the above frame­works, spe­cify­ing both the tool­ing and the pre­ferred approach within each tool. Ideally, it also ensures that:

  • All data oper­a­tions within the EDP, includ­ing metadata oper­a­tions, must only be car­ried out with these frameworks.
  • All frame­works must use a com­mon mon­it­or­ing and audit­ing sys­tem, so each run and every internal step is fully logged.
  • All frame­works must have error hand­ling and enforce error hand­ling when used; errors must also be audited.
  • All code for the frame­works them­selves (if frame­work tool­ing is built) should be ver­sion-con­trolled, pre­serving back­wards com­pat­ib­il­ity as the frame­works evolve.

Build or Buy?

In gen­eral, there are two options an organ­isa­tion can take to cre­ate frame­works: to build their own tool­ing on top of the core tech stack, or to buy third-party tech to com­ple­ment it. Each option has its pros and cons; and in many organ­isa­tions we often find a com­bin­a­tion of both:

EDP Operating Models/ 3rd party tech, Build your own

At ClearPeaks and the synvert group, we’ve been sup­port­ing our cus­tom­ers for years, build­ing frame­works that cover both approaches across every tech stack: a frame­work for data pro­cessing in Clouderaa frame­work for Data Qual­ity on Snow­flake, or one for Dat­ab­ricks and Cloudera, a frame­work for MLOps for Azure, or for Dat­ab­ricks, etc.

Approaches

As dis­cussed above, a frame­work com­bines both the tool­ing and the way to use it. Whether you choose to build or buy that tool­ing, the cru­cial step is defin­ing how it should be applied in a gov­erned, con­trolled man­ner. In prac­tice, this usu­ally means agree­ing on nam­ing con­ven­tions, decid­ing how to seg­ment data assets by matur­ity or stage, and so on: details that vary from one frame­work to the next, etc.

As men­tioned above, we’ve long sup­por­ted cus­tom­ers in doing exactly that: cre­at­ing the tool­ing and shap­ing the usage approach. Over count­less pro­jects, we’ve built and con­tinu­ally expan­ded an extens­ive cata­logue of guidelines that cap­ture these best-prac­tice guidelines, giv­ing every new frame­work a proven, ready-made start­ing point.

Pro­cesses, Roles and Responsibilities

And last but not least, we must identify and stand­ard­ise the key pro­cesses within the EDP and clearly assign roles and respons­ib­il­it­ies to those involved.

Defin­ing these pro­cesses, roles and respons­ib­il­it­ies is shaped by the chosen EDP oper­at­ing model. Take data inges­tion, for example, a key pro­cess in any set-up. While we always recom­mend a stand­ard frame­work, the work­flow itself dif­fers by model. In a data mesh, each domain can build its own inges­tion pipeline, whereas in a hub-and-spoke model, the hub might own every inges­tion pipeline, build­ing them in response to requests from the spokes. Whatever the model, it is vital to doc­u­ment the pro­cess and spe­cify exactly who does what.

In addi­tion to data inges­tion, cus­tom­ers often focus on other crit­ical pro­cesses: pro­mot­ing pipelines to higher envir­on­ments, pro­vi­sion­ing pro­duc­tion data in lower envir­on­ments, chargeback (split­ting plat­form costs), grant­ing access to data­sets, expos­ing mon­it­or­ing or observ­ab­il­ity met­rics, cre­at­ing dash­boards from EDP data, and so on.

Dur­ing this defin­i­tion of pro­cesses, roles and respons­ib­il­it­ies, a cru­cial point is self-ser­vice, i.e. decid­ing how much inde­pend­ence each role has over spe­cific tasks. A prop­erly defined EDP oper­at­ing model must clearly state what the dif­fer­ent roles can do.

EDPOps Accelerator

By now, you should have an over­view of how to suc­cess­fully cre­ate your EDP oper­at­ing model. Need help? Don’t worry, we’ve got you covered!

Draw­ing on years of exper­i­ence help­ing our cus­tom­ers to evolve their EDP oper­at­ing mod­els, and on the many frame­works we’ve delivered across every tech stack, we’ve cre­ated the EDPOps Accelerator to fast-track your jour­ney to serving ana­lyt­ics use cases at scale.

First, the EDPOps Accelerator guides you through defin­ing (or refin­ing) the right oper­at­ing model for your organ­isa­tion, fol­low­ing the blue­print out­lined in this blog. Next, it speeds up frame­work cre­ation: you can reuse sub­stan­tial por­tions of the proven tool­ing we’ve per­fec­ted with other cus­tom­ers. And we go bey­ond tool­ing: you’ll also be able to tap into our grow­ing lib­rary (soon avail­able via chat­bot) of guidelines and best prac­tices for each framework.

Con­clu­sion

In this post, we’ve shared our recipe for design­ing an oper­at­ing model that lets your EDP deliver ana­lyt­ics at scale, serving hun­dreds of users and use cases with ease. The secret ingredi­ent is stand­ard­isa­tion: stand­ard­ised frame­works that address every usage pat­tern, and stand­ard­ised pro­cesses that keep the plat­form con­sist­ent, con­trolled and secure for every­one involved.

If you’d like our expert sup­port to fine-tune your model or to accel­er­ate your evol­u­tion with our EDPOps Accelerator, simply get in touch today!