Logistics & Transport - Modern passenger statistics platform for Berliner Verkehrsbetriebe
Berliner Verkehrsbetriebe (BVG) is the backbone of mobility in the German capital. Founded in 1928 through the merger of several private and public transportation companies, the company looks back on nearly 100 years of history that is inextricably linked to Berlin’s development.
Today, BVG is the largest public transit operator in the German-speaking world, safely and reliably transporting over one billion passengers annually via subway, tram, bus, and ferry.
To manage and further develop this network using data, BVG needs a reliable data foundation—for every line, every stop, and every hour of the day.
Niko Stäger
“synvert has been a key partner in reshaping our data architecture for more than two years, enabling us to manage the mobility of the future through data-driven decision-making. To meet these requirements, the Traffic Statistics department, BVG’s in-house IT organization, and synvert worked closely together from the very beginning.”
Initial situation
The existing passenger statistics system had been continuously expanded over many years and formed the basis for established business processes.
However, as data volumes grew and demands for greater detail, timeliness, and the scope of analysis increased, the operational workload rose significantly.
The structured integration of additional data sources and the alignment with new analytical requirements could only be implemented with increased manual effort. Against this backdrop, the goal emerged to consolidate data processing through expansion, automation, and bundling within a central system. As a result, internal and external requests can be processed more quickly and transparently in the future, and it will increasingly be possible to proactively derive data-driven recommendations for action.
Goals & challenges
Complexity
The real challenge lay not in the technical replacement itself, but in the functional depth of the system. To generate comprehensive passenger statistics, highly complex data sources must be seamlessly integrated. This includes daily timetable data, automatic passenger counting (APC) data from the vehicles, and geographic information such as stops and tariff zones. Only when these sources are linked does a precise, actionable, and evaluable overall picture emerge.
Furthermore, not all vehicles in the BVG network are equipped with automatic counting devices. For trips without APC data, extrapolations must be derived from comparable, counted trips—a statistically demanding core component of the platform. This extrapolation logic is based on a specially developed, domain-specific mapping of complex route interdependencies.
The Objective
The migration of the legacy system was intentionally embraced as an opportunity: not as a simple 1:1 replacement, but as the foundation for a future-proof, expandable analytics platform. The core objectives were:
-
Complete decommissioning of the legacy system with zero data loss and zero operational disruption.
-
Establishment of a cloud-native data platform based on modern architectural principles.
-
Highly granular KPI provisioning: boarding, alighting, vehicle occupancy, and passenger-kilometers—broken down by route, trip, stop, and hourly intervals.
-
A self-service approval workflow for extrapolation results and official reporting.
-
A scalable foundation for future use cases, such as headway and service optimization.
Architecture
The platform was built fully cloud-native on Microsoft Azure in close collaboration with BVG IT, following a clearly structured Medallion Architecture within a Data Lakehouse approach. Architectural decisions were tightly coordinated between BVG IT, the business department, and synvert, leveraging both domain-specific and IT expertise from BVG.
Ingestion & Processing
A distributed compute layer handles the fully automated integration of all source systems and the step-by-step transformation of the data. Every transformation is traceable, and each processing step is isolated and independently scalable. The pipelines are orchestrated via a dedicated workflow layer.
Data Storage: Bronze → Silver → Gold → Gold-Result
-
Bronze: Raw data lands unchanged in the landing zone—ensuring full auditability and reconstructability.
-
Silver: Data is cleaned, normalized, and harmonized across all source systems.
-
Gold: Extrapolations, KPI calculations, and business-driven aggregations are performed.
-
Gold-Result: Contains approved, publishable results ready for reporting and external communication.
Approval & Analysis
A custom-developed web application (Plotly Dash) gives business users full control over extrapolation parameters, quality checks, and the release of results for official channels—completely eliminating IT dependencies. Power BI delivers these approved metrics for group-wide internal utilization.
Services accomplished by synvert a GlobalLogic company
End-to-End Delivery from a Single Source
synvert co-designed and implemented the platform across all layers in close cooperation with BVG—spanning everything from cloud provisioning to KPI delivery in the dashboard:
-
Provisioning of the Azure infrastructure: Blob Storage, App Service, and SQL Server databases.
-
Integration and history tracking (historization) of all source systems.
-
Establishment of the Medallion Architecture within the Data Lakehouse.
-
Implementation & orchestration of processes between the individual layers.
-
Implementation support for the domain-specific extrapolation logic and KPI definitions.
-
BI Support: Development of business logic and KPI delivery via Power BI.
-
Design and implementation of the Dash web application for configuration, quality assurance, and approval.
Handling Legacy Systems
The business logic of the predecessor system was reconstructed and re-implemented by BVG IT in close tandem with the Traffic Statistics department through systematic reverse engineering—with synvert serving as an architectural sparring partner and providing additional development capacity. Throughout this process, the continuous operation of the passenger statistics system was fully maintained.
Functional Depth, Not Just Technology
The extrapolation logic, the statistical derivation for trips without counting data, the definition of KPI hierarchies: synvert didn’t just develop software, but actively helped shape the functional domain model.
Delivery / conclusion
The Numbers
-
XML extraction from source system: 4 hours $rightarrow$ ~15 minutes
-
Data processing into tables: 2.5 hours $rightarrow$ ~60 minutes
-
Passenger extrapolation: 4 days $rightarrow$ ~30 minutes
-
Approval effort: Manual log maintenance process $rightarrow$ Web-based and automated, featuring system-supported recommendations and manual approval
-
Developer dependency: 1 person $rightarrow$ Scalable team
What This Means
What truly sets BVG’s traffic statistics apart today is not just the ability to calculate passenger metrics—the legacy system could fundamentally do that too. The real differentiator is speed and flexibility. How many people rode the U5 from Alexanderplatz to Schillingstraße between 7:00 AM and 8:00 AM on Monday, April 14, 2025? Which door did they use to board or alight? What was the occupancy rate of that specific vehicle? Moving forward, such questions can be answered much faster and easier—updated daily and dynamically accessible.
Outlook
The architecture is intentionally designed to be open. It enables straightforward and controlled data access for various internal and external user groups. At the same time, the platform can be flexibly expanded to include additional data sources, utilized for data science applications, and the dashboard can be scaled as needs evolve. The foundation for this has been laid.
