BVG train in berlin BVG train in berlin

Mod­ern pas­sen­ger stat­ist­ics plat­form for Ber­liner Verkehrsbetriebe


Show all

BVG train in berlin

Logist­ics & Transport  - Mod­ern pas­sen­ger stat­ist­ics plat­form for Ber­liner Verkehrsbetriebe







Ber­liner Verkehrs­be­triebe (BVG) is the back­bone of mobil­ity in the Ger­man cap­ital. Foun­ded in 1928 through the mer­ger of sev­eral private and pub­lic trans­port­a­tion com­pan­ies, the com­pany looks back on nearly 100 years of his­tory that is inex­tric­ably linked to Berlin’s development.

Today, BVG is the largest pub­lic transit oper­ator in the Ger­man-speak­ing world, safely and reli­ably trans­port­ing over one bil­lion pas­sen­gers annu­ally via sub­way, tram, bus, and ferry.

To man­age and fur­ther develop this net­work using data, BVG needs a reli­able data foundation—for every line, every stop, and every hour of the day.



Ini­tial situation


The exist­ing pas­sen­ger stat­ist­ics sys­tem had been con­tinu­ously expan­ded over many years and formed the basis for estab­lished busi­ness processes.

How­ever, as data volumes grew and demands for greater detail, timeli­ness, and the scope of ana­lysis increased, the oper­a­tional work­load rose significantly.

The struc­tured integ­ra­tion of addi­tional data sources and the align­ment with new ana­lyt­ical require­ments could only be imple­men­ted with increased manual effort. Against this back­drop, the goal emerged to con­sol­id­ate data pro­cessing through expan­sion, auto­ma­tion, and bund­ling within a cent­ral sys­tem. As a res­ult, internal and external requests can be pro­cessed more quickly and trans­par­ently in the future, and it will increas­ingly be pos­sible to pro­act­ively derive data-driven recom­mend­a­tions for action.

Goals & challenges


Com­plex­ity

The real chal­lenge lay not in the tech­nical replace­ment itself, but in the func­tional depth of the sys­tem. To gen­er­ate com­pre­hens­ive pas­sen­ger stat­ist­ics, highly com­plex data sources must be seam­lessly integ­rated. This includes daily timetable data, auto­matic pas­sen­ger count­ing (APC) data from the vehicles, and geo­graphic inform­a­tion such as stops and tar­iff zones. Only when these sources are linked does a pre­cise, action­able, and eval­u­able over­all pic­ture emerge.

Fur­ther­more, not all vehicles in the BVG net­work are equipped with auto­matic count­ing devices. For trips without APC data, extra­pol­a­tions must be derived from com­par­able, coun­ted trips—a stat­ist­ic­ally demand­ing core com­pon­ent of the plat­form. This extra­pol­a­tion logic is based on a spe­cially developed, domain-spe­cific map­ping of com­plex route interdependencies.

The Object­ive

The migra­tion of the leg­acy sys­tem was inten­tion­ally embraced as an oppor­tun­ity: not as a simple 1:1 replace­ment, but as the found­a­tion for a future-proof, expand­able ana­lyt­ics plat­form. The core object­ives were:

  • Com­plete decom­mis­sion­ing of the leg­acy sys­tem with zero data loss and zero oper­a­tional disruption.

  • Estab­lish­ment of a cloud-nat­ive data plat­form based on mod­ern archi­tec­tural principles.

  • Highly gran­u­lar KPI pro­vi­sion­ing: board­ing, alight­ing, vehicle occu­pancy, and passenger-kilometers—broken down by route, trip, stop, and hourly intervals.

  • A self-ser­vice approval work­flow for extra­pol­a­tion res­ults and offi­cial reporting.

  • A scal­able found­a­tion for future use cases, such as head­way and ser­vice optimization.

Archi­tec­ture


The plat­form was built fully cloud-nat­ive on Microsoft Azure in close col­lab­or­a­tion with BVG IT, fol­low­ing a clearly struc­tured Medal­lion Archi­tec­ture within a Data Lake­house approach. Archi­tec­tural decisions were tightly coordin­ated between BVG IT, the busi­ness depart­ment, and synvert, lever­aging both domain-spe­cific and IT expert­ise from BVG.

Inges­tion & Processing

A dis­trib­uted com­pute layer handles the fully auto­mated integ­ra­tion of all source sys­tems and the step-by-step trans­form­a­tion of the data. Every trans­form­a­tion is trace­able, and each pro­cessing step is isol­ated and inde­pend­ently scal­able. The pipelines are orches­trated via a ded­ic­ated work­flow layer.

Data Stor­age: Bronze → Sil­ver → Gold → Gold-Result

  • Bronze: Raw data lands unchanged in the land­ing zone—ensuring full audit­ab­il­ity and reconstructability.

  • Sil­ver: Data is cleaned, nor­mal­ized, and har­mon­ized across all source systems.

  • Gold: Extra­pol­a­tions, KPI cal­cu­la­tions, and busi­ness-driven aggreg­a­tions are performed.

  • Gold-Res­ult: Con­tains approved, pub­lish­able res­ults ready for report­ing and external communication.

Approval & Analysis

A cus­tom-developed web applic­a­tion (Plotly Dash) gives busi­ness users full con­trol over extra­pol­a­tion para­met­ers, qual­ity checks, and the release of res­ults for offi­cial channels—completely elim­in­at­ing IT depend­en­cies. Power BI deliv­ers these approved met­rics for group-wide internal utilization.

Ser­vices accom­plished by synvert a Glob­al­Lo­gic company 


End-to-End Deliv­ery from a Single Source

synvert co-designed and imple­men­ted the plat­form across all lay­ers in close cooper­a­tion with BVG—spanning everything from cloud pro­vi­sion­ing to KPI deliv­ery in the dashboard:

  • Pro­vi­sion­ing of the Azure infra­struc­ture: Blob Stor­age, App Ser­vice, and SQL Server databases.

  • Integ­ra­tion and his­tory track­ing (his­tor­iz­a­tion) of all source systems.

  • Estab­lish­ment of the Medal­lion Archi­tec­ture within the Data Lakehouse.

  • Imple­ment­a­tion & orches­tra­tion of pro­cesses between the indi­vidual layers.

  • Imple­ment­a­tion sup­port for the domain-spe­cific extra­pol­a­tion logic and KPI definitions.

  • BI Sup­port: Devel­op­ment of busi­ness logic and KPI deliv­ery via Power BI.

  • Design and imple­ment­a­tion of the Dash web applic­a­tion for con­fig­ur­a­tion, qual­ity assur­ance, and approval.

Hand­ling Leg­acy Systems

The busi­ness logic of the pre­de­cessor sys­tem was recon­struc­ted and re-imple­men­ted by BVG IT in close tan­dem with the Traffic Stat­ist­ics depart­ment through sys­tem­atic reverse engineering—with synvert serving as an archi­tec­tural spar­ring part­ner and provid­ing addi­tional devel­op­ment capa­city. Through­out this pro­cess, the con­tinu­ous oper­a­tion of the pas­sen­ger stat­ist­ics sys­tem was fully maintained.

Func­tional Depth, Not Just Technology

The extra­pol­a­tion logic, the stat­ist­ical deriv­a­tion for trips without count­ing data, the defin­i­tion of KPI hier­arch­ies: synvert did­n’t just develop soft­ware, but act­ively helped shape the func­tional domain model.

Deliv­ery / conclusion


The Num­bers

  • XML extrac­tion from source sys­tem: 4 hours $righ­tar­row$ ~15 minutes

  • Data pro­cessing into tables: 2.5 hours $righ­tar­row$ ~60 minutes

  • Pas­sen­ger extra­pol­a­tion: 4 days $righ­tar­row$ ~30 minutes

  • Approval effort: Manual log main­ten­ance pro­cess $righ­tar­row$ Web-based and auto­mated, fea­tur­ing sys­tem-sup­por­ted recom­mend­a­tions and manual approval

  • Developer depend­ency: 1 per­son $righ­tar­row$ Scal­able team

What This Means

What truly sets BVG’s traffic stat­ist­ics apart today is not just the abil­ity to cal­cu­late pas­sen­ger metrics—the leg­acy sys­tem could fun­da­ment­ally do that too. The real dif­fer­en­ti­ator is speed and flex­ib­il­ity. How many people rode the U5 from Alex­an­der­platz to Schilling­straße between 7:00 AM and 8:00 AM on Monday, April 14, 2025? Which door did they use to board or alight? What was the occu­pancy rate of that spe­cific vehicle? Mov­ing for­ward, such ques­tions can be answered much faster and easier—updated daily and dynam­ic­ally accessible.

Out­look

The archi­tec­ture is inten­tion­ally designed to be open. It enables straight­for­ward and con­trolled data access for vari­ous internal and external user groups. At the same time, the plat­form can be flex­ibly expan­ded to include addi­tional data sources, util­ized for data sci­ence applic­a­tions, and the dash­board can be scaled as needs evolve. The found­a­tion for this has been laid.

Categories

All

Banking

Communications, Media & Telecom

Consumer Goods & Retail

Energy & Resources

Health & Life Science

Industrials

Insurance

Logistics & Transport

Public Sector


Mod­ern pas­sen­ger stat­ist­ics plat­form for Ber­liner Verkehrsbetriebe


Com­mu­nic­a­tions, Media & Telecom

Con­sol­id­at­ing Big Data & BI at a Major Media Powerhouse


Indus­tri­als

Vehicle Data Store


Indus­tri­als

Soft­ware­Fact­ory at a Lead­ing Auto­mot­ive Soft­ware Company


Con­sumer Goods & Retail

DevX Impact at a Fin­ance Company


Con­sumer Goods & Retail

Driv­ing Inter­na­tional Expan­sion for a Lead­ing Online Car Seller


Indus­tri­als

Accel­er­at­ing Cloud-Nat­ive Devel­op­ment for a Car Manufacturer


Com­mu­nic­a­tions, Media & Telecom

Retail Com­mis­sion Dia­gnostics Anom­aly Detection


Energy & Resources

Set­ting up a Reli­able and Power­ful Platform


Com­mu­nic­a­tions, Media & Telecom

Pre­dic­tion of cus­tomer prob­ab­il­ity to churn


Indus­tri­als

Early Detec­tion of Mech­an­ical Issues


Indus­tri­als

NLP​ in Auto­mot­ive Reporting


Com­mu­nic­a­tions, Media & Telecom

Oper­a­tional Data Lake


Con­sumer Goods & Retail

Real-Time Mar­ket­ing


Indus­tri­als

Auto­matic clas­si­fic­a­tion of cars by their model


Pub­lic Sector

Data plat­form and cata­log for com­pens­a­tion claims


Com­mu­nic­a­tions, Media & Telecom

Data Recon­cili­ation


Indus­tri­als

Cent­ral­ized Data Warehouse


Indus­tri­als

Data as a Ser­vice – BI & Big Data CC


Com­mu­nic­a­tions, Media & Telecom

Oper­a­tional Work/Support in Data Warehousing


Indus­tri­als

Erosion Detec­tion with Com­puter Vision


Com­mu­nic­a­tions, Media & Telecom

Devel­op­ment and Imple­ment­a­tion of a Fully Auto­mated MLOps Platform


Con­sumer Goods & Retail

Pre­dic­tion and Alloc­a­tion Optim­iz­a­tion for sta­tion­ary Channels


Com­mu­nic­a­tions, Media & Telecom

Build­ing a Near Real-Time Recom­mend­a­tion System


Con­sumer Goods & Retail

Migra­tion AWS data plat­form to Google Cloud


Com­mu­nic­a­tions, Media & Telecom

Design and Devel­op­ment of a Google Cloud-Based Data Platform


Con­sumer Goods & Retail

Refact­or­ing, shift and new devel­op­ment of Data Ser­vices in Google Cloud


Estab­lish­ment of a Mas­ter Data Man­age­ment Foundation


Indus­tri­als

Cent­ral Data Lake


Replace­ment of a com­plex IBM Data­stage structure


CRM ana­lyt­ics and lead management


Stand­ard­iz­a­tion of eval­u­ation processes


Devel­op­ment of an ERP data hub


Estab­lish­ment and fur­ther devel­op­ment of a DWH


Devel­op­ment of a group-wide BI platform


Estab­lish­ment of a new sales report­ing system


Source data migra­tion with auto­ma­tion of the source data connection


Cloud self-ser­vice BI for data-driven cor­por­ate management


Devel­op­ment of vari­ous ML use cases with AWS


Replace­ment of an in-house soft­ware development


Com­pany-wide data integration


Imple­ment­a­tion of real-time data links


Con­sumer Goods & Retail

Smart Data Catalog


Contact us









* Required fields


top