The data requirements for an efficient bus service in Malaysia
Visit the flight tracking website Flightradar24 and you’ll realise you can not only track any flight in real-time but also details such as scheduled and actual departure-arrival times, ground speed, altitude, and even wind speed.
Why can’t this be the same for buses? Put simply, the data isn’t there.
To understand this further, let’s look at what bus data consists of. The four basic data sets that makes up a bus schedule are:
1. Stops — a list of stops where a bus will pick up and drop off riders;
2. Stop Times — times at which a bus will arrive and depart from each stop in a given trip;
3. Trips — a sequence of two or more stops that occur during a specific time period; and
4. Routes — a group of trips that are displayed for riders in a single service.
However, these four data sets aren’t enough to plan a route, as more data with specific links between certain data points is needed. While it isn’t complicated to get coordinates of a bus, it is much more complex to match a bus to the trip it is associated with.
In this case, every bus must be connected to its trip details and it must specify departure and arrival times at each stop in a trip. If not, it would be difficult to locate the bus and know if it is on schedule.
Besides the temporal and spatial data, there is other additional data that needs to be collected, some of which can be sensitive such as payment methods and personal information that can impact data privacy of the passengers.
A simplified representation of the basic building blocks of a bus schedule
In the public transit world, the General Transit Specification Feed (GTFS) is a standard that defines a common format for public transportation to guide public transit operators on how to publish their service schedules in a standardised format. GTFS only covers static information but an extension has been developed for publishing real-time information.
While GTFS specifies the data of scheduled services and format for all the necessary data entities, the structure of a GTFS feed is not routing-engine friendly, and this means software engineers must restructure the feed into a data model that allows for efficient computation of routes.
In Malaysia, data.gov.my is the official portal for open data in Malaysia but there isn’t much data for the public transit sector that is available. And for the few published datasets, the feeds do not conform to international standards such as GTFS.
This means that a lot of data will be missing and not presented in a format ready to be consumed by routing engines.
How can we improve
The key issue of the aforementioned challenge is interoperability.
Interoperability can be defined differently depending on the context. When speaking about data, interoperability is mainly concerned with the conceptual compatibility of data, which looks at two aspects:
a) Semantic Interoperability — Here, it’s about linguistics. Just like how British people call it “chips” and American people call it “fries”, different datasets can have different names referring to the same instance.
b) Syntactic Interoperability — Here, it’s about structure. Different datasets can be saved in different formats such as tables, JSON files, XML, Graphs, etc.
An example of how bus datasets can differ semantically and syntactically
Before we can publish data in an interoperable format, we need to first generate the right data and this starts with equipping the bus network with the right set of technology.
At a basic level, every public bus service needs to provide accurate vehicle location information, which requires them to equip their fleet of buses with a GPS tracker. Vehicle locations should then be linked to schedules through the GTFS data hierarchy of: Stops > Stop Times > Trips > Routes.
Additionally, this information needs to be in real-time and with updates on any disruptions and delays to the schedule. To enable bus passengers to make travel decisions that include buses, each service should also supplement their dataset with fare and ticket information.
To improve overall interoperability thereafter, there is a need to develop a solution that automates the data publishing for bus operators and to ensure that the data quality is suitable for route planning.
There is also a need to raise awareness about the right ways to publish data and how that can impact the efficiency of the bus network, which can be done through training sessions for bus operators.
Modern Bus Services
Now that we’ve laid down the building blocks of a scheduled bus dataset, let us consider a more complex public transit service that is not tied to using fixed schedules and fixed routes.
This modern service is called Demand-Responsive Transit (DRT). They have the capacity and affordability of a small public shuttle vehicle that services a geofenced localised area but with the convenience and rich features of an e-hailing app. In this way, the service responds to the demands of commuters and benefits them, while DRT transport operators can optimise their operational costs.
If we were to compare the datasets of a DRT service with a fixed schedule bus service, it would be like comparing the datasets of a taxi to a train. Datasets of DRT services are dynamic, generated in real-time, based on the demand in the region. The stops may also vary based on the type of DRT service.
Development of a standard for publishing data of on-demand services is still at an early stage. However, modern public transit services like DRT come with the advantage of being fully digitalised with accurate updates on trips that allows for reliable journey planning.
Although DRT datasets are far from standardised, implementing DRT at a large scale can revolutionise the way public transit functions and hence, how buses are perceived.
The future of public transit
It is an inescapable fact that buses today are inefficient and unreliable and the majority of public commuters shun such services and only use them as a last resort. The absence of reliable first-mile and last-mile solutions remains the biggest hurdle against increasing the utilisation of public transportation in Malaysia.
But this perception can change as new modes of transport are emerging and technology can transform the age-old bus services as we know it today.
But in order for such new services to work, bus service providers need to ensure that the data they produce is published in the correct format so that the intelligence gleaned can be further processed by automated routing engines. This way, services such as DRT and Bus Rapid Transit (BRT) can be introduced to modernise existing public transit services.
Share This Story!
RELATED POSTS
The data requirements for an efficient bus service in Malaysia
Visit the flight tracking website Flightradar24 and you’ll realise you can not only track any flight in real-time but also details such as scheduled and actual departure-arrival times, ground speed, altitude, and even wind speed.
Why can’t this be the same for buses? Put simply, the data isn’t there.
To understand this further, let’s look at what bus data consists of. The four basic data sets that makes up a bus schedule are:
1. Stops — a list of stops where a bus will pick up and drop off riders;
2. Stop Times — times at which a bus will arrive and depart from each stop in a given trip;
3. Trips — a sequence of two or more stops that occur during a specific time period; and
4. Routes — a group of trips that are displayed for riders in a single service.
However, these four data sets aren’t enough to plan a route, as more data with specific links between certain data points is needed. While it isn’t complicated to get coordinates of a bus, it is much more complex to match a bus to the trip it is associated with.
In this case, every bus must be connected to its trip details and it must specify departure and arrival times at each stop in a trip. If not, it would be difficult to locate the bus and know if it is on schedule.
Besides the temporal and spatial data, there is other additional data that needs to be collected, some of which can be sensitive such as payment methods and personal information that can impact data privacy of the passengers.
A simplified representation of the basic building blocks of a bus schedule
In the public transit world, the General Transit Specification Feed (GTFS) is a standard that defines a common format for public transportation to guide public transit operators on how to publish their service schedules in a standardised format. GTFS only covers static information but an extension has been developed for publishing real-time information.
While GTFS specifies the data of scheduled services and format for all the necessary data entities, the structure of a GTFS feed is not routing-engine friendly, and this means software engineers must restructure the feed into a data model that allows for efficient computation of routes.
In Malaysia, data.gov.my is the official portal for open data in Malaysia but there isn’t much data for the public transit sector that is available. And for the few published datasets, the feeds do not conform to international standards such as GTFS.
This means that a lot of data will be missing and not presented in a format ready to be consumed by routing engines.
How can we improve
The key issue of the aforementioned challenge is interoperability.
Interoperability can be defined differently depending on the context. When speaking about data, interoperability is mainly concerned with the conceptual compatibility of data, which looks at two aspects:
a) Semantic Interoperability — Here, it’s about linguistics. Just like how British people call it “chips” and American people call it “fries”, different datasets can have different names referring to the same instance.
b) Syntactic Interoperability — Here, it’s about structure. Different datasets can be saved in different formats such as tables, JSON files, XML, Graphs, etc.
An example of how bus datasets can differ semantically and syntactically
Before we can publish data in an interoperable format, we need to first generate the right data and this starts with equipping the bus network with the right set of technology.
At a basic level, every public bus service needs to provide accurate vehicle location information, which requires them to equip their fleet of buses with a GPS tracker. Vehicle locations should then be linked to schedules through the GTFS data hierarchy of: Stops > Stop Times > Trips > Routes.
Additionally, this information needs to be in real-time and with updates on any disruptions and delays to the schedule. To enable bus passengers to make travel decisions that include buses, each service should also supplement their dataset with fare and ticket information.
To improve overall interoperability thereafter, there is a need to develop a solution that automates the data publishing for bus operators and to ensure that the data quality is suitable for route planning.
There is also a need to raise awareness about the right ways to publish data and how that can impact the efficiency of the bus network, which can be done through training sessions for bus operators.
Modern Bus Services
Now that we’ve laid down the building blocks of a scheduled bus dataset, let us consider a more complex public transit service that is not tied to using fixed schedules and fixed routes.
This modern service is called Demand-Responsive Transit (DRT). They have the capacity and affordability of a small public shuttle vehicle that services a geofenced localised area but with the convenience and rich features of an e-hailing app. In this way, the service responds to the demands of commuters and benefits them, while DRT transport operators can optimise their operational costs.
If we were to compare the datasets of a DRT service with a fixed schedule bus service, it would be like comparing the datasets of a taxi to a train. Datasets of DRT services are dynamic, generated in real-time, based on the demand in the region. The stops may also vary based on the type of DRT service.
Development of a standard for publishing data of on-demand services is still at an early stage. However, modern public transit services like DRT come with the advantage of being fully digitalised with accurate updates on trips that allows for reliable journey planning.
Although DRT datasets are far from standardised, implementing DRT at a large scale can revolutionise the way public transit functions and hence, how buses are perceived.
The future of public transit
It is an inescapable fact that buses today are inefficient and unreliable and the majority of public commuters shun such services and only use them as a last resort. The absence of reliable first-mile and last-mile solutions remains the biggest hurdle against increasing the utilisation of public transportation in Malaysia.
But this perception can change as new modes of transport are emerging and technology can transform the age-old bus services as we know it today.
But in order for such new services to work, bus service providers need to ensure that the data they produce is published in the correct format so that the intelligence gleaned can be further processed by automated routing engines. This way, services such as DRT and Bus Rapid Transit (BRT) can be introduced to modernise existing public transit services.