How to Operate your Enterprise Data Platform – Meet Our EDPOps Accelerator

By now, nearly every medium to large-sized organisation recognises that data is the new gold, and investing in data analytics is essential to stay competitive – there’s no arguing with that! It gets even more interesting when we explore how to choose, build and operate an Enterprise Data Platform (EDP) that can power all our analytics workloads, from BI to AI, and how to do so at scale, in the most controlled and efficient way to serve a growing number of use cases and users.

Last year, in our ‘How to Choose your Data Platform’ series, we published two blog posts: in the first blog post, we traced the evolution of data platform architectures, from the early data warehouses that entered organisations in the 1980s, to the feature-rich EDPs our customers rely on today. We also outlined the capabilities we now expect every platform (and its underlying tech stack) to deliver. In our second post, we took our readers on a tour of the leading tech stacks we deploy with our customers, like Azure, AWS, GCP, Oracle, Cloudera, Snowflake and Databricks, each fully equipped to elevate data analytics to the next level.

While this post doesn’t delve into the details of building an EDP, remember that we’ve spent years rolling out data platforms on every stack listed above, both in highly secure on-premises environments and across all major clouds. For example, our ready-made Terraform templates can streamline deployment on the leading cloud providers.

In this blog post, we’ll focus on how to operate an EDP, and especially on how to do so at scale for hundreds of use cases, users and data consumers. Seasoned readers will spot a significant overlap between running an EDP and the disciplines typically grouped under Data Strategy and Data Governance, and indeed there is an overlap: just as an organisation needs a robust Data Strategy and Data Governance procedure when scaling, it also requires a clear EDP operating model that governs every activity within the platform, whether or not those activities fall under formal Data Governance. In other words, Data Governance controls the data, but EDP operations need their own governance to ensure scalable, repeatable, and compliant execution. Over the past couple of years, we have helped several large organisations create or refine such operating models, and today we’ll share our proven method for building a successful EDP operating model.

What’s more, we’ll also introduce you to EDPOps Accelerator, our programme that not only guides you through the creation or enhancement of an EDP operating model, but also accelerates the deployment of its key components. Wondering how? Keep reading!

Recipe for Successfully Creating an EDP Operating Model

For an EDP that works at scale:

Gather the goals and requirements for the EDP from top sponsors and align them with the existing Data Strategy and Data Governance programmes.
Create an inventory of current and foreseen patterns of usage within your EDP.
Determine the type of operating model (centralised, decentralised, data mesh, hub and spoke) that fits the Data Strategy and Data Governance programmes.
Select a core technology stack, a component architecture design, and an environment strategy that can support all the patterns in the selected type of operating model and is in line with the Data Strategy and Data Governance programmes.
Define and create reusable, standard frameworks (and the tooling and approaches required) for all data and metadata operations.
Identify, standardise, and determine clear ownership for key processes within the EDP; define the roles and responsibilities of each EDP persona so that everyone knows exactly what they can and cannot do.

Standardisation is the key ingredient for success: an EDP can only scale when consistent frameworks, tooling and processes govern everything that happens within the platform. This requirement holds true whether you’re building a data mesh, a hub-and-spoke architecture, or any other operating model.

Patterns Inventory

Before we can standardise how to perform every operation in the EDP, we need to understand which patterns of usage are running or will run in the platform. These patterns define how each category of activity is, or will be, performed: data ingestion, data processing, reporting, data cataloguing, AI, etc.

We often find very similar patterns in large organisations. Take data ingestion, for example: one team might use one particular approach to ingest tables from RDBMS sources, and a different approach for file-based feeds, whereas another team relies solely on a single approach in both cases.

For every pattern, we need to understand how the relevant components of the tech stack interact. When a pattern already exists, we collect metrics (number of jobs, volume of data, etc.), and for future patterns, we help the customer to identify potential risks and recommend mitigations.

We build this inventory through a series of templated discovery sessions with representatives from the various groups of EDP users, typically from different business units within the organisation.

Types of EDP Operating Models

There are different types of operating models. Until a few years ago, most organisations recognised only two core models: centralised and decentralised (often called federated). In a centralised model, a single ‘central’ team owns every aspect of the EDP, and all business units rely on it, whereas in a decentralised model each business unit maintains one or more data teams managing analytics end-to-end, with, ideally, some federated coordination. Both models bring their own pros and cons, yet they share a critical weakness: neither scales gracefully. As the number of use cases and users grows, operating the EDP efficiently and in a timely way becomes a nightmare!

In recent years, two newer models have emerged to tackle, among other challenges, this scalability issue: data mesh and hub-and-spoke. Data mesh (see our previous blog post here) extends the decentralised approach whilst fixing its shortcomings, remaining fully decentralised yet backed by robust interoperability and governance standards. The hub-and-spoke model evolves the centralised approach, re-architecting it for scale. Like their predecessors, both have advantages and disadvantages, and determining which is best for an organisation depends on many factors, but this lies beyond the scope of this article. For readers interested in more details on the various types of operating models and how they compare with each other, we recommend checking this blog post.

Technology, Architecture, and Environments

Regarding the choice of a core technology stack, any of the platforms listed above are solid choices. Note that here we are talking strictly about the core stack, the foundational technology running the back end of your EDP, which you may later complement with other tools (often from different vendors) to handle specific operations (more on this in the Frameworks section). Ideally, we would inventory usage patterns before selecting the technology. However, in many customer engagements the decision is already locked in, usually driven by enterprise-wide agreements between the organisation and the vendor.

In either case, once the core technology stack has been selected, we need to define the component architecture design which determines, at a high level, which services you are going to use and for what. Each hyperscale cloud (Azure, AWS, GCP, and so on) offers multiple services that can accomplish the same task, so it is crucial to establish which components are preferred for each workload.

Finally, you must define an environment strategy: how many environments you are going to use, and their relationships and constraints (only prod; dev and prod; dev, UAT, prod; dev, UAT, performance and prod, etc.). Sometimes we distinguish between physical and logical environments: a logical environment coexists with another logical environment in the same physical environment. For example, an organisation might host both dev and prod in one physical environment, differentiating them through naming conventions, separate compute queues, etc.

Frameworks

Once the pattern inventory is complete, a type of model is agreed, and we have a core stack, a preferred set of components, and an environment strategy in place, we urge our customers to standardise every data and metadata operation pattern on those choices, extending only when truly necessary. In practice, if two teams carry out the same task (for example, ingesting data from an RDBMS table), they should follow the same tooling and approach. We recommend defining and creating a set of frameworks and then enforcing their usage (this might require refactoring data pipelines but the effort is worth it). A framework, here, is the combination of tooling and an automated, documented method for tackling an agreed-upon set of patterns, fully aligned with the overall EDP operating model. The narrower the scope of each framework, the easier it is to manage.

Below you’ll find a comprehensive list of frameworks and what they define and enforce, based on what we have observed in our customer projects:

Creation – Define or update data assets such as tables, Kafka topics, and more.
Ingestion – Ingest data in batches or streams from source systems into the EDP’s initial layer (often called raw, bronze, landing, etc.).
Processing – Transform data, in batch or streaming mode, as it moves between EDP layers, from the initial layer upwards.
Exporting – Export EDP data to other external downstream systems via files, APIs, JDBC, etc.
Catalogue and Lineage – Catalogue data assets (with their business context) and map their relationships. The frameworks above (creation, ingestion, processing, and exporting) should enforce and populate the metadata required for catalogue and lineage.
Data Quality – Maintain a searchable inventory of data-quality issues within EDP assets. In some scenarios, this framework is embedded in the processing layer itself.
DevOps and Pipeline Promotion – Organise, version, and share code; promote pipelines (creation, ingestion, processing, etc.) to higher environments. Semi-automated checklists and gated promotion ensure adherence to the EDP operating model, speeding up time-to-production.
Infrastructure Creation – Create the infrastructure and services that support the EDP, ideally via standardised Infrastructure-as-Code (IaC) tools such as Terraform, whether ephemeral or permanent, especially in cloud deployments.
Data Replication and Masking – Copy production data to lower environments with appropriate masking. A dedicated framework automates these tasks so development and testing can proceed safely without exposing sensitive information.
MLOps & GenAI – Manage the lifecycle and productionisation of ML models. The growth in the use of Generative AI (LLMs, RAG, and, more recently, agents) also requires standardised access patterns.
Data Observability and Monitoring – Monitor pipelines and the datasets they update to confirm that jobs run as expected, that data remains current, and that the platform stays healthy.
Data Archival and Lifecycle – Archive or delete ageing data. In some cases, this is considered part of the creation or processing frameworks.
Orchestration – Schedule all the operations that need to run periodically.
Access Management and Security – Control access to the data and also to other assets, such as related pipelines.
Consumption – Create consumable, graphical representations of the EDP data for visualisation and exploration, following organisational standards.
Modelling – Design efficient data structures (BI KPIs, ML features), maintain a controlled inventory, avoid duplicated models, and enforce clear separation of content across EDP layers in line with master data management principles.

So, as we can see, an effective EDP operating model clearly defines each of the above frameworks, specifying both the tooling and the preferred approach within each tool. Ideally, it also ensures that:

All data operations within the EDP, including metadata operations, must only be carried out with these frameworks.
All frameworks must use a common monitoring and auditing system, so each run and every internal step is fully logged.
All frameworks must have error handling and enforce error handling when used; errors must also be audited.
All code for the frameworks themselves (if framework tooling is built) should be version-controlled, preserving backwards compatibility as the frameworks evolve.

Build or Buy?

In general, there are two options an organisation can take to create frameworks: to build their own tooling on top of the core tech stack, or to buy third-party tech to complement it. Each option has its pros and cons; and in many organisations we often find a combination of both:

EDP Operating Models/ 3rd party tech, Build your own

At ClearPeaks and the synvert group, we’ve been supporting our customers for years, building frameworks that cover both approaches across every tech stack: a framework for data processing in Cloudera, a framework for Data Quality on Snowflake, or one for Databricks and Cloudera, a framework for MLOps for Azure, or for Databricks, etc.

Approaches

As discussed above, a framework combines both the tooling and the way to use it. Whether you choose to build or buy that tooling, the crucial step is defining how it should be applied in a governed, controlled manner. In practice, this usually means agreeing on naming conventions, deciding how to segment data assets by maturity or stage, and so on: details that vary from one framework to the next, etc.

As mentioned above, we’ve long supported customers in doing exactly that: creating the tooling and shaping the usage approach. Over countless projects, we’ve built and continually expanded an extensive catalogue of guidelines that capture these best-practice guidelines, giving every new framework a proven, ready-made starting point.

Processes, Roles and Responsibilities

And last but not least, we must identify and standardise the key processes within the EDP and clearly assign roles and responsibilities to those involved.

Defining these processes, roles and responsibilities is shaped by the chosen EDP operating model. Take data ingestion, for example, a key process in any set-up. While we always recommend a standard framework, the workflow itself differs by model. In a data mesh, each domain can build its own ingestion pipeline, whereas in a hub-and-spoke model, the hub might own every ingestion pipeline, building them in response to requests from the spokes. Whatever the model, it is vital to document the process and specify exactly who does what.

In addition to data ingestion, customers often focus on other critical processes: promoting pipelines to higher environments, provisioning production data in lower environments, chargeback (splitting platform costs), granting access to datasets, exposing monitoring or observability metrics, creating dashboards from EDP data, and so on.

During this definition of processes, roles and responsibilities, a crucial point is self-service, i.e. deciding how much independence each role has over specific tasks. A properly defined EDP operating model must clearly state what the different roles can do.

EDPOps Accelerator

By now, you should have an overview of how to successfully create your EDP operating model. Need help? Don’t worry, we’ve got you covered!

Drawing on years of experience helping our customers to evolve their EDP operating models, and on the many frameworks we’ve delivered across every tech stack, we’ve created the EDPOps Accelerator to fast-track your journey to serving analytics use cases at scale.

First, the EDPOps Accelerator guides you through defining (or refining) the right operating model for your organisation, following the blueprint outlined in this blog. Next, it speeds up framework creation: you can reuse substantial portions of the proven tooling we’ve perfected with other customers. And we go beyond tooling: you’ll also be able to tap into our growing library (soon available via chatbot) of guidelines and best practices for each framework.

Conclusion

In this post, we’ve shared our recipe for designing an operating model that lets your EDP deliver analytics at scale, serving hundreds of users and use cases with ease. The secret ingredient is standardisation: standardised frameworks that address every usage pattern, and standardised processes that keep the platform consistent, controlled and secure for everyone involved.

If you’d like our expert support to fine-tune your model or to accelerate your evolution with our EDPOps Accelerator, simply get in touch today!