There are few technologies more central to American culture than the automobile. Cars have shaped our movies and music, our cities and suburbs, and our daily patterns of life. They’re so ubiquitous that it’s almost impossible to escape them. You’ll probably see more cars than computers today.
But lurking beneath their banal familiarity, there is an ugly truth about cars: They kill. Each year, over 40,000 Americans die in traffic accidents — and over a people million worldwide. For a technology that’s such an intimate part of our lives, it is a horrifying price to pay. Despite improvements in automotive safety, the problem is tragically getting worse.
Can data science help? I spent a day at the National Transportation Data Summit in early November to find out.
The Summit Focus on Innovation
The Summit, hosted by the NSF’s Big Data Innovation Hubs and industry and government partners, focused on transportation safety questions that data science might be able to answer. These include how we can detect and mitigate distracted driving, how we can leverage connected and autonomous vehicle technology to improve road safety, and how we can fuse multiple data streams to create actionable insights for specific transit corridors.
These are complicated questions, and it quickly became clear during the introductions that no single set of stakeholders has all the relevant data and analysis techniques to develop solutions on their own. That spirit was reflected by the broad range of participants, from career federal bureaucrats, to tech startup CEOs, to academics. Even the venue echoed the cross-cutting theme: we met in the headquarters building of the General Services Administration — perhaps the single most government-esque, impersonal building in Washington — but found ourselves in the wing that hosts the Presidential Innovation Fellows program and 18F, oases for data nerds within the larger bureaucracy.
Since I’ve been writing about connected and autonomous vehicles (CAVs) lately, I dove into that session and learned about the second Strategic Highway Research Program (SHRP2) database developed by the US Department of Transportation (DOT). This open database has incredibly rich data on driving behaviors and accidents on US roads, with records of millions of trips. The DOT official overseeing it was interested in new ideas about how to leverage the data to improve autonomous vehicle technology. SHRP2 is only one example of recent transportation data sets developed and curated by the US government to improve transportation systems. However, many external analysts (like me) don’t know they exist. Conferences like this one are one way to fix that.
Several of the sessions at the conference highlighted how cities are where a lot of the innovation in transportation systems is taking place.
Cities Leading on Transportation Pilot Programs
A senior transportation engineer from Bellevue, Washington talked about Vision Zero, a project the city launched with Microsoft and the University of Washington that touches on this issue. A key part of the work is analyzing video imagery of traffic at intersections. They hope to improve road safety by training algorithms to recognize cars, pedestrians and cyclists, and learn patterns of collisions and near misses. The scale of the data is enormous — up to a petabyte of streaming video per intersection per day, equivalent to 15,000 full iPhone Xs — so the project is considering moving compute capability from the cloud to the physical intersections to do image analysis locally.
A large number of city-level pilots of CAVs are focusing on the 10 Proving Grounds that were set up by the DOT earlier this year. An important challenge for these pilots is to ensure that vehicle performance data is collected in a standard way that allows valid comparisons.
One participant pointed out that SAE has published a standard on collecting connected vehicle data (SAE J2735) but the pace of innovation is going incredibly fast. So fast, in fact, that many of the pilot programs are inventing new approaches for data collection that may not be compatible. It’s always challenging for a standards-development process to keep up in rapidly-moving fields, but this seems like a particularly important problem to address now, before the pilot programs are completed.
A Question of Privacy and Data Usefulness
We spent a lot of time discussing the privacy implications of transportation data. If data are released with improper anonymization, it’s not hard to pinpoint specific vehicles and identify individual drivers (see Vijay Pandurangan’s post on how NYC messed up the MD5-based anonymization of taxi data in 2014). Even if the data anonymization is correctly implemented, it’s not hard to correlate it with other publicly available data to figure out a lot about the actual person who took various trips – including identifying specific trips by celebrities and being able to individually name people who frequent strip clubs.
The question this leads to is whether there are ways to fully anonymize trip data, while still making it useful for traffic safety analysis. This is such a difficult problem that DOT is working with Oak Ridge National Laboratory’s cybersecurity division to try to solve it. If the only way to ensure that the privacy of individual drivers and passengers remains protected is to dramatically degrade the data, there’s little point to releasing those datasets, or collecting them in the first place.
Privacy is also a major issue for datasets that involve driver-facing video, something that’s rapidly being introduced to the market by companies like Nauto. In order to share this data when it’s been collected from volunteer participants, the driver’s face has to be obscured or anonymized in some way that guarantees privacy but maintains the usefulness of the data. There are several interesting approaches to solving this problem, including the use of avatar overlays developed by DOT and Oak Ridge.
The Summit included many more fascinating transportation data science topics that are outlined here. This is a field that needs a lot more data scientists, and one where successful data projects could literally save lives. A great way to get involved is to connect with the Big Data Hubs and learn more about their ongoing efforts. And more broadly, the field of intelligent transportation system engineering seems full of opportunities for data scientists willing to learn about transportation systems.