The Art of Con­sol­id­a­tion: Har­mon­iz­ing a Multi-Ten­ant DWH in an On-Premises World



While cloud solu­tions are increas­ingly becom­ing the stand­ard, many com­pan­ies con­tinue to rely on their exist­ing on-premises sys­tems – driven by reg­u­lat­ory require­ments, spe­cific secur­ity man­dates, or his­tor­ic­ally grown infra­struc­tures. How­ever, the real chal­lenge begins when these tra­di­tional sys­tems are expec­ted to serve as the found­a­tion for future-facing tech­no­lo­gies like AI or Advanced Ana­lyt­ics: For this, a har­mon­ized, con­sist­ent data state is an abso­lute prerequisite.

In this post, we high­light an archi­tec­tur­ally demand­ing scen­ario from the retail industry. We show how we trans­formed a com­plex on-prem multi-ten­ant DWH after the source sys­tems had already been con­sol­id­ated into a global uni­fied sys­tem. We take a look at the excit­ing inter­play between struc­tural sim­pli­fic­a­tion at the source and the neces­sary logical pre­ci­sion in the Data Ware­house to mas­ter the bridge between his­tor­ical depth and mod­ern single-sys­tem logic.

The Data Eco­sys­tem: From Local Silos to a Global Vision

To under­stand the com­plex­ity of this pro­ject, we must first look at the start­ing point. We are in a clas­sic, inter­na­tion­ally oper­at­ing retail envir­on­ment. In the ori­ginal struc­ture, each coun­try main­tained its own data with its own cod­ing logic in an isol­ated, coun­try-spe­cific sys­tem. For IT, this meant: Every new coun­try con­sequently meant another isol­ated ERP instance, its own art­icle mas­ter data, and indi­vidual pro­cesses for record­ing sales.

This frag­ment­a­tion was the obstacle to our over­arch­ing goal: the cre­ation of com­pany-wide ana­lyt­ical cap­ab­il­it­ies. To lever­age syn­er­gies and con­trol pro­cesses glob­ally, the com­pany wanted to move away from local isol­ated solu­tions. The vis­ion was a uni­fied, global sys­tem where all coun­try data is main­tained cent­rally. An art­icle should have the same defin­i­tion every­where, and a pro­cess should run in Vienna exactly as it does in Berlin.

The Archi­tec­ture of the Data Pipeline

In our under­ly­ing data archi­tec­ture, the path of a record is clearly defined: data is inges­ted from vari­ous data sources through ETL jobs and trans­formed and staged as a first inter­me­di­ate layer in ded­ic­ated coun­try tables within an Oper­a­tional Data Store (ODS) in rela­tional format. Our philo­sophy here is to store the data as close to the source as pos­sible to main­tain full data lineage.

While data in the ODS is kept for only a fleet­ing period for oper­a­tional pur­poses, the actual “refine­ment” takes place dur­ing the trans­fer to the Data Ware­house (DWH) sys­tem. The ODS data is trans­ferred to the DWH sys­tem with tech­nical val­id­a­tions, data enrich­ment, and data type cleans­ing. Here, the data is archived in coun­try-sep­ar­ated schemas for more than ten years. This his­tor­ized data forms the basis for our BI tools and enables sound, sus­tain­able report­ing over long time series.

Multi-Tennant-DWH Architecture

Tar­get State Strategy: Anchor­ing the Global Vis­ion in Exist­ing Data Architecture

While the archi­tec­ture described above forms the stable frame­work for data flow, this struc­ture now col­lides with the changed real­ity of the new sys­tem vis­ion. The intro­duc­tion of a global uni­fied sys­tem that har­mon­izes all coun­try pro­cesses was a sig­ni­fic­ant mile­stone for the IT land­scape, but it presen­ted us with two core ques­tions in the DWH environment:

  1. How do we define which coun­try each record belongs to? Pre­vi­ously, the world was rel­at­ively simple for us: coun­try detec­tion in the DWH was solved impli­citly. Since the data flowed from phys­ic­ally sep­ar­ate source sys­tems, we knew exactly which record belonged to which coun­try by identi­fy­ing the source, which fit per­fectly with our multi-ten­ant DWH archi­tec­ture. With the intro­duc­tion of the har­mon­ized uni­fied sys­tem, this bound­ary dis­ap­peared. Sud­denly, all data ended up “in one pot,” and we had to find new ways to pre­cisely con­trol the ori­gin for our reports.
  2. How do we com­bine the har­mon­ized logic and exist­ing struc­tures? A crit­ical aspect of our coun­try-sep­ar­ated data archi­tec­ture was the his­tor­ic­ally grown isol­a­tion of cod­ings. Since the source sys­tems were oper­ated sep­ar­ately by coun­try for dec­ades, deep semantic incon­sist­ency pre­vailed. Art­icle num­ber 1234 in Ham­burg could cor­res­pond to a com­pletely dif­fer­ent art­icle with the same ID in Zagreb. Through har­mon­iz­a­tion, our declared goal was to con­sol­id­ate these cod­ings so that every ID has a unique, com­pany-wide mean­ing. Only then could we ensure that ana­lyses across dif­fer­ent loc­a­tions did not com­pare “apples and oranges.”

Thus, our data archi­tec­ture with the strict sep­ar­a­tion into coun­try-spe­cific schemas in the DWH and years of his­tor­iz­a­tion was chal­lenged in a pos­it­ive way by the new source sys­tem. We had to develop a solu­tion that seam­lessly unites both the new “one-sys­tem world” and the “coun­try-spe­cific history.”

Stra­tegic Plan­ning: Why no “Big Bang”?

Faced with these chal­lenges, the ques­tion of the imple­ment­a­tion strategy arose. A com­plete “Big Bang,” i.e., the phys­ical migra­tion and com­plete re-cod­ing of all his­tor­ical stocks to the new logic, was not prac­tical given lim­ited on-prem server capa­cit­ies and the enorm­ous daily data volume in the tera­byte range. Such a step would have para­lyzed the sys­tems for days and sig­ni­fic­antly increased the risk of data loss and inconsistencies.

We there­fore decided on a hybrid archi­tec­ture that com­bined secur­ity and pro­gress: while the source now acted uni­formly, we retained the ten­ant-spe­cific sep­ar­a­tion in the DWH stor­age. The solu­tion for the ana­lyt­ical layer was an abstrac­tion layer with Union Views. These vir­tual lay­ers logic­ally merge the data from the dif­fer­ent coun­tries and present the BI applic­a­tions with a har­mon­ized over­all pic­ture with the coun­try code as an addi­tional func­tional key. This allowed us to keep per­form­ance stable without hav­ing to phys­ic­ally rewrite the proven his­tor­ical database.

1. Core Ques­tion: How to achieve Coun­try Assignment?

Since the source sys­tem no longer provided a phys­ical sep­ar­a­tion, identi­fy­ing suit­able attrib­utes for coun­try detec­tion was our first major hurdle. Such a fea­ture must have two decis­ive prop­er­ties: it must be used con­sist­ently across all coun­tries and have a uni­form mean­ing within the entire organization.

The Func­tional Per­spect­ive: Con­text is Everything

Behind the scenes of the SQL quer­ies, the func­tional per­spect­ive plays a cru­cial role. We quickly learned that “coun­try” is not equal to “coun­try.” For logist­ical data in par­tic­u­lar, it is a huge dif­fer­ence whether you per­form the coun­try assign­ment from the per­spect­ive of the Sales Organ­iz­a­tion (VKORG) or the Pur­chas­ing Organ­iz­a­tion (EKORG). Espe­cially in inter­na­tional inter-com­pany pro­cesses, pur­chas­ing and sales coun­tries can dif­fer. Data from a dis­tri­bu­tion cen­ter in a neigh­bor­ing coun­try that sup­plies a store in Ger­many must be flagged dif­fer­ently depend­ing on the report­ing requirement.

After a thor­ough busi­ness ana­lysis and many inter­views with the func­tional depart­ments, we were able to identify the fol­low­ing attrib­utes for this pur­pose, tak­ing report­ing require­ments into account:

  • Pur­chas­ing and Sales Organ­iz­a­tions for logist­ical as well as oper­a­tional or dis­tri­bu­tion data.
  • Plant, Store, or Dis­tri­bu­tion Cen­ter IDs for real estate or assort­ment data.
  • The Com­pany Code or Con­trolling or Billing Area for fin­an­cial and account­ing data.
  • Lan­guage Keys for mas­ter data with text or product descriptions.

Tech­nical Imple­ment­a­tion and ISO Standards

To imple­ment coun­try detec­tion in a data-driven and future-proof way, we defined cent­ral ref­er­ence tables. For con­sist­ent map­ping, we con­sist­ently used the two-digit ISO-3166-Alpha‑2 format. In our ETL pro­cesses, the asso­ci­ated coun­try code is auto­mat­ic­ally determ­ined for each incom­ing ID via cent­ral look­ups or User-Defined Func­tions (UDF).

Detailed View: Lan­guage Key as an Example

A par­tic­u­larly excit­ing fea­ture for coun­try detec­tion of art­icle descrip­tions in a retail sys­tem is the lan­guage key. How­ever, two tech­nical stick­ing points appeared right here, requir­ing a par­tic­u­larly soph­ist­ic­ated design of the ETL logic.

1. Hand­ling Default Lan­guages: In many sys­tems, one lan­guage (often Eng­lish or the cor­por­ate lan­guage) is defined as a “default.” This means: if no spe­cific descrip­tion is avail­able in the national lan­guage, the default text is used. Tech­nic­ally, for our data archi­tec­ture, this means: we must rep­lic­ate these default records into all rel­ev­ant coun­try tables if no record in the national lan­guage exists for the primary key. To ensure that both entries do not end up in the DWH, we imple­men­ted a pri­or­it­iz­a­tion logic using a win­dow­ing func­tion that checks in the final step before load­ing data into the DWH whether a record in the national lan­guage already exists. If so, the default entry is discarded.

Example of windowing function for determining the language key.

2. Lan­guage Over­laps: The second chal­lenge is geo­graph­ical-lin­guistic in nature: coun­tries like Ger­many and Aus­tria or Ser­bia and Bos­nia often share the same lan­guage key. Here, a simple 1‑to‑1 lookup in our multi-ten­ant DWH archi­tec­ture is not suf­fi­cient. In such cases, we used a left join to rep­lic­ate the records into the rel­ev­ant coun­try tables. This means that the data is mul­ti­plied if necessary—or spe­cific­ally doubled in the case of Ger­many and Austria.

Example for language overlapping. Several countries share the same language, and this phenomenon is known as linguistic diversity.

2. Core Ques­tion: Bridging the Gap between the Old and Har­mon­ized World

The most com­plex func­tional chal­lenge in this pro­ject was deal­ing with the over ten-year his­tory. Since the new source sys­tem intro­duced a com­pletely new, har­mon­ized cod­ing logic, we had to build a bridge between the old coun­try-spe­cific IDs and the new global keys.

Chal­lenge 1: The Risk of ID Col­li­sion dur­ing Mapping

Before we turned to the tech­nical details, we had to address an exist­en­tial risk: ID over­laps between the old and new sys­tems. Our archi­tec­ture approach stip­u­lates that new IDs from the source are first checked against map­ping tables. If a trans­la­tion is found, it is re-coded to the old ID. If none is found, it is pro­cessed as a new record. This ensures the ana­lysis of his­tor­ical data while sim­ul­tan­eously allow­ing new IDs from the har­mon­ized sys­tem to be pro­cessed 1:1 in the DWH.

The danger: if a new ID (which has no trans­la­tion) hap­pens to be identical to an already exist­ing old ID, his­tor­ical data would be incor­rectly over­writ­ten. Sup­pose the new sup­plier Bien­lein Logistik receives ID 65934 in the new sys­tem. Since Bien­lein Logistik is new, there is no map­ping for the ID. The ETL pro­cess would there­fore write this ID 1:1 into the DWH. But in the DWH, the sup­plier Pingu Gut­mann has already exis­ted for years with exactly this old ID 65934. Without a strict sep­ar­a­tion of num­ber ranges, the sys­tem would over­write the his­tor­ical data of Pingu Gut­mann with the inform­a­tion from Bien­lein Logistik. This “clash” had to be strictly excluded in advance through a care­ful defin­i­tion of the new num­ber ranges for crit­ical attrib­utes such as sup­plier or art­icle numbers.

Chal­lenge 2: Data Type Conflicts

A tech­nical high­light was the har­mon­iz­a­tion of Ship­ping Points. In the on-prem DWH, the Ship­ping Point attrib­ute was defined as SMALLINT (up to 32,767). How­ever, the busi­ness depart­ment wanted new IDs that would con­tain a coun­try and regional fea­ture to make the ori­gin recog­niz­able at first glance for the func­tional users.

Since we could not change the data type glob­ally, we used a tech­nical trick: Base-24 encod­ing. The new ID con­sists of a let­ter for the coun­try, a 2‑digit regional num­ber in Base 24, and a local sequen­tial num­ber. Math­em­at­ic­ally, this allowed the busi­ness depart­ment to define up to 575 (24 * 24 – 1) dif­fer­ent ship­ping point IDs – a com­fort­able buf­fer upwards. Tech­nic­ally, we con­ver­ted the Base-24 com­pon­ent into a decimal value and simply appen­ded the sequen­tial num­ber to lead the IDs math­em­at­ic­ally back into the exist­ing SMALLINT logic while ensur­ing the IDs remained unique.

Example for calculating the numeric value of an alphanumeric ID.

Chal­lenge 3: Dynamic Num­ber Ranges

For attrib­utes like Doc­u­ment Num­bers (invoices, orders), we primar­ily referred to the cur­rent fiscal year. From the migra­tion onwards, for example, invoice num­bers were defined as 10-digit with a lead­ing “8”. This re-cod­ing is largely imple­ment­able inde­pend­ently of the deep his­tory, as the new har­mon­ized num­ber range takes effect from the new fiscal year any­way. Nev­er­the­less, cross-com­pany func­tional expert­ise was required here to ensure that num­ber ranges for dif­fer­ent doc­u­ment types were defined con­sist­ently across all countries.

Sum­mary: Tech­nical Mile­stones of Harmonization

Look­ing back at the pro­ject, the cent­ral tech­nical suc­cesses can be sum­mar­ized in three pillars:

  1. Vir­tual Con­sol­id­a­tion instead of Phys­ical Migra­tion: The decision against the “Big Bang” and in favor of the Union View archi­tec­ture was the key to suc­cess. It allowed us to make the new, har­mon­ized source world imme­di­ately usable while the peta­bytes of his­tor­ical data remained secure and per­form­ant in their ori­ginal schemas.
  2. Intel­li­gent Iden­ti­fic­a­tion in the ETL Layer: By using ISO stand­ards and con­text-depend­ent attrib­utes (such as VKORG/EKORG or lan­guage keys), we suc­cess­fully recon­struc­ted the lost coun­try-spe­cific iden­tity of the data in the har­mon­ized data stream.
  3. Math­em­at­ical Bridges: Solu­tions like Base-24 encod­ing show that tech­nical lim­it­a­tions (such as rigid data types in on-prem sys­tems) can be over­come through math­em­at­ical logic without sac­ri­fi­cing data integrity.

Con­clu­sion: Data Engin­eer­ing as a Bridge between Busi­ness and Technology

The har­mon­iz­a­tion of a multi-ten­ant DWH in an inter­na­tional retail envir­on­ment is far more than a pure data­base oper­a­tion. It is a pro­cess that requires a deep under­stand­ing of the busi­ness his­tory, the logist­ical pro­cesses, and a clear vis­ion for the future.

We learned that suc­cess­ful har­mon­iz­a­tion at the source (the “one-sys­tem goal”) inev­it­ably cre­ates new tasks in the DWH. The art of data engin­eer­ing here con­sists of absorb­ing these require­ments through intel­li­gent map­ping logics and vir­tual layers.

Today, our sys­tem is far more than just a digital archive. It is a con­sist­ent, com­pany-wide found­a­tion. The bridge between the “old” coun­try-spe­cific world and the “new” global logic is built – provid­ing the per­fect launch­pad for Advanced Ana­lyt­ics and AI pro­jects that rely on a clean, his­tor­ized data base. On-prem is not an obstacle in this con­text, but—with the right tricks—a highly stable and power­ful plat­form for the future.