SFMTA API Data Collection and Analysis

I would suggest not reading this page on a potato.

This project explores and processes the dataset available from 511.org focusing on the San Francisco Municipal Transportation Agency (SFMTA) public transit system. The goal was to create a comprehensive database of every transit stop in San Francisco, including bus stops, cable car stops, and subway stations, supporting my multi-stop bus tracker project.

Data Collection

The project consists of several key datasets:

Interactive Map Implementation

To make this data useful for the Multi-stop bus tracker project, I developed an interactive map that combines all three datasets. The map:

Key Findings

Analysis of the data revealed several insights about San Francisco’s transit system:

The SFMTA API uses cable cars just like the tourists do. The dumb (tourist) way to use a cable car is to line up at the terminus and wait your turn for a cable car. The smart (local) way to use a cable car is to walk one block up the hill and get on the next car. The cable car operators know this, and do not fill the cars to capacity at the terminus; they leave space for locals.

The physical reality of the universe is reflected in the API data and pulling expected departures from the cable car terminus stops. It’s a 58 year wait for a cable car at the terminus (no, really, I’ve seen an estimated arrival in 58 years at the Market and Powell turnaround), but if you walk a block up the street, the next car will be there in a few minutes.

Other interesting bits include:

There is a way to square the circle on the last observation: these may be ‘opportunistic’ bus stops. For example, the dataset show Stop ID 16160 as not being on any route. I know for a fact that I can pick up a 48 bus at that stop. Likewise, the dataset shows the 29 bus does not stop at Stop ID 16549. I’ve never been on a 29 bus that doesn’t stop there if I want to get off there.

It’s not so much that the total dataset needs these abandoned stops removed, it’s that the data for each line needs stops added.

The processed data is available as stops_routes_data.csv, which includes:

All code and data files are available in the project repository.

back