This is a collaborate project with David Cartledge.
Original purpose and application
These datasets were compiled by the MBTA to best inform commuters about the daily flow of public transit within the city of Boston. Using real time data collected and distributed by the MBTA, commuters are informed of any delays, slowdowns, or closures across the public transit system. Along with the knowledge of the delays, riders are informed of potential down time for each alert, as well as the reasoning behind the alert. This is the primary system the city uses to inform its riders of delays, as well as charts the accuracy of the delay warnings and ETA’s of the past through its years of logged data to best keep Bostonians informed.
https://mbta-massdot.opendata.arcgis.com/datasets/155ab68df00145cabddfb90377201b0e/explore
https://mbta-massdot.opendata.arcgis.com/datasets/90ed9092bd7a4285b296d5ff938edf29_0/explore
History, standards, and format
The Massachusetts Department of Transportation (MASSDOT) has been compiling the dataset regarding the accuracy of the transit prediction system since 2020, while the system categorizing the reason for these shutdowns has been collected since the end of 2018. They have been the sole data collectors in regards to this data, and continue to report each of them on a weekly basis to the Boston open data website.
The dataset is updated weekly to best reflect the current state of the MBTA, while other real-time data is often sent through the MBTA’s official twitter site as well as the T app, ProximiT. The data is sent out promptly to ensure that commuters have up to date information about the public transport system in Boston, as well as providing alternate routes for tracks that may be facing closures.
This is a particular focus within the city following the COVID-19 pandemic, which saw the safety of the T become a highlight and priority to the city. After years of safety maintenance being overlooked, MASSDOT began the long overdue work to the T, specifically within the Orange and Green lines, which saw the longest of the shutdowns in 2020.
Many have expressed their distrust and frustration with the MBTA, such as Jonathan Richards, a 28 year old restaurant worker in the North End, saying “It’s become a real problem, even today. I sort of walked up the street unguided trying to find a shuttle bus after the Blue line shut down for the night for some electrical problem. It’s just frustrating, almost like Boston has become known for its poor transportation system.”
Organizational context
These datasets seem fairly inconsistent in their publication, which can pose a few problems when attempting to analyze and understand the data provided. The weekly schedule seems to be left for weeks at a time, finally uploading a few weeks of backlogged data, despite their agreed upon weekly publication.
One of the most glaring problems with this is a lack of full transparency with the data that is provided. If the MBTA is using real-time data to inform its riders over social media and through local transportation apps such as ProximiT, it should be just as easy to submit that data to the datasets that are at the forefront of the city of Boston open data website.
The lack of current data between the uploads can potentially lead to error in analyzing the data and the state of the MBTA, as if there were to be a shutdown, the data would take at least a week to be uploaded and archived within the dataset.
Workflow
On what is meant to be a weekly basis, MASSDOT data analysts upload and sort the data of that week’s rapid transit and bus shutdowns and delays. This process is the categorization and archival of the real-time datasets that are created throughout the week from any delays the MBTA saw through its fleet, as well as the potential shutdown length, the reason for the shutdown, and whether or not the predictions for that week were accurate for train departure and arrival times.
The arrival/departure prediction dataset works in conjunction with the delays and shutdown dataset to paint a complete picture of the state of the MBTA that week. If there were to be an especially troubling week at the MBTA, that would be reflected in the accuracy of the on-time predictions from the data, as well as the reason for the delays listed in the other.
Exploratory Visualization/s of the Data
-
- https://www.mbta.com/performance-metrics/service-reliability
-
This figure if there are data been reported from MBTA, range from 2020/8 till 2024/1
Things to know about the data, including limitations
- The specific publisher every week of the data is not provided. The only public information on the dataset is MASSDOT. This means there are many MBTA data analysts working interchangeably on the dataset.
- The earliest date categorized in either of the datasets is 2018. This can make comparisons to the past difficult when analyzing the current state of the MBTA compared to pre-covid travel.
- The prediction accuracy dataset fails to provide the bus route for any of the bus related data, making its categorization and analysis generalized.
- The prediction accuracy dataset only provides the number of times the departure prediction was correct or incorrect, failing to provide any detail on the margins the predictions were off by.
- Within the service alerts dataset, there are many delays and closures that are listed as UNKNOWN, providing no further information on the reason for the problems.
- The service alerts dataset fails to provide the meaning behind the severity code figure.