Be Wary of Waymo’s New Safety Record and Brad Templeton’s Declaration the System is Superhuman and should be Deployed Today
Brad Templeton’s Article — Waymo Data Shows Superhuman Safety Record. They Should Deploy Today — https://www.forbes.com/sites/bradtempleton/2020/10/30/waymo-data-shows-incredible-safety-record--they-should-deploy-today/?utm_source=dlvr.it&utm_medium=twitter&sh=35516adf3829
(This section is a direct quote)
The report shows an incredible, superhuman safety record and suggests it is past time for them to deploy a service at scale, at least in simpler low-urban/suburban zones like Chandler, Arizona.
The report is notable for several reasons:
- There is incredible transparency, of a sort we have seen from no other team. Indeed, a gauntlet is now thrown in front of all other teams — if you’re not this transparent, we will presume you are not doing so well.
- They have the ability to be that transparent because the numbers are good. In 6.1 million miles they report 30 “dings” with no injury expected, 9 with 10% chance of injury and 8 with airbag deployment but still 10% chance of injury, suggesting less than 2 modest injuries. Human drivers would have had about 6.
- All the events had the other driver/road user at fault in some way under the vehicle code, according to Waymo.
- There were no incidents of single vehicle incidents (ie. driving off the road) which are pretty common with human drivers.
- Nationally, 6.1 million miles of driving by a good driver should result in about 40–60 events, most of which are small dings, 22–27 or which would involve an insurance claim, 12 which would get reported to police and 6 injury crashes. With no at-fault events in 8 lifetimes of human driving, Waymo’s performance is significantly superior to a human, even in an easy place like Chandler. (Note — Templeton does mention there is no direct human driver comparison data to the exact same area — “ . . . we don’t have numbers for the reasonably serene suburban environs of Phoenix. Driving there is easier — that’s why they chose it as their first step — but even if the rate is 1/2 or even 1/3rd of tougher places, the number still impresses.”)
· The gauntlet is down. If Cruise, Zoox, Argo, Tesla and others want to say they are in the game, they need to show the same data. If they won’t show it, we should presume they are afraid of releasing it for a reason.
· As such, while Waymo can’t prove that their car would encounter a fatality at a better rate than the 80M mile rate of humans using just 6.1M miles of data (or even the more than 20M miles they have with other test area, or the billions of miles they have in simulator,) it can now be said that the risk that they are too dangerous is acceptably low. In addition, it would be discovered fairly quickly if this is not true, with harm, but minimized harm.
· It’s immoral for Waymo not to get going now, and for any regulation to stop them.
First, I applaud Waymo for putting out this type of information. (I also believe they have the most competent sensor system by far.) And I agree with Brad Templeton when he says it throws a gauntlet down. I also believe it is data like this that can help build trust and confidence in these systems.
Beyond that however it is nowhere near enough to field these systems nor to declare “It’s immoral for Waymo not to get going now, and for any regulation to stop them.” If all the data I mention in my article that should be provided by Waymo was positive for them I have to believe they would release it. Instead, we get a very small and notably misleading human, not machine learning, data point.
With regard to the data. It appears the majority of events were in simulation. 18 in the real-world and 29 in simulation. Waymo does not mention real-world vs simulated miles. Why? What is that ratio? They also mention a severely limited process for evaluating their simulation. All they seem to compare is the vehicle model on a similar surface. Not sensors. This tells me they either still do not simulate sensors or do it poorly and do not understand that. (It is very possible this benign area and scenarios will not highlight the sensor modeling issues.)
· From Waymo’s document — Waymo’s Safety Methodologies and Safety Readiness Determinations — https://storage.googleapis.com/sdc-prod/v1/safety-report/Waymo-Safety-Methodologies-and-Readiness-Determinations.pdf — They state in footnote 34 — Waymo uses closed course testing to ensure that various assumptions used in our simulation model are in fact accurate representations of our AV’s performance. For example, for our simulation to reflect how our AV would perform in a particular scenario requiring hard braking, we need to know that the simulation replicates the actual performance of our AV using the same braking profile. Here again, our methodologies intertwine rather than stand alone. Simulation is an important aspect of many of our methodologies, but its accuracy depends on effective V&V of actual system capabilities.
With respect to Templeton’s determination this system needs to be in the public domain now. As well as the following statement “With no at-fault events in 8 lifetimes of human driving, Waymo’s performance is significantly superior to a human, even in an easy place like Chandler.” I believe the data sample is way too small and misleading. My conclusion here is the data release is a great start but nowhere near enough. It is far more apt to create false confidence than to justify fielding the system now.
· Templeton states the 6.1M mile human equivalence is 80M miles. They have less than 10% of what they need. To his credit he also states Chandler’s geofence is not nearly as complex as average human driving. Stating at 1/3 as complex there is enough data. That makes no sense. As it is the 6.1M miles is 7% of 80M human miles. (Humans have accidents about every 165k miles. And a death every 10M miles. That is over a wide cross-section of driving complexity. Not the relatively benign, no rain or bad weather area of Chandler, Arizona.) If you go with the 1/3 they only have 2.5% of the data needed. And that assumes machine learning complex scenarios is linear. Which it is not. And that’s the rub. Getting the first 90% right in no way means you will ever get the last 10% right. (And in this case, they will never get that far due to the over reliance on public shadow and safety driving and gaming-based simulation technology. More on that in my articles below.)
· The test data provided is based on evaluating a human, not a machine learning system. It focuses on results not system intentions or actual system performance, like perception and planning, by focusing on collisions or near collisions. It does not include when the system failed to do as expected short of a collision or possible collision in this geofence or ODD, especially when the right thing occurred but only by coincidence or luck. They also do not provide disengagement or scenario data in these cases. Said differently, there is no “eye” not intention testing.
· In its documentation on this subject, Waymo refers to the UL4600 safety case standard from UL and Edge Case research, yet they do not provide any safety cases for review.
· They also do not provide the AV system root cause data for the disengagement or accidents they are at fault for.
· They provide the set of test cases they believe they need to reach the end-state safety goal. Or even define what that point is.
· And finally, some large portion of these 6.1M miles occurred in simulation. Simulation that has not been proven to be a legitimate digital twin, especially the sensors.
There is an easy way to settle this
· First, release the UL4600 safety case data.
· Release all tested scenario, system failure, disengagement, and root cause data
· Provide the set of scenarios they believe they need to test to reach a safety point better than a human they also need to state
· Much more importantly, we need to see a far better statistical sample of the real-world data needed. Come up with a statistically correct quantity of rides to fill the gap and a subset of live streamed events proving the capability. Preferably, that is in simulation once it is proved every model is a legitimate digital twin. (That must include exact sensors interacting with exact objects and all 3rd order reflections and associated interference. I am a technical lead on SAE’s ORAD Simulation task force, working on exactly these criteria now, and would be glad to assist.)
If Waymo decides to do this in the public domain, which perpetuates the reckless and untenable public shadow and safety driving approaches, I believe it may be a net value added in this case. First, it would be done in a geofence and the total miles driven would be low enough, that is unlikely to kill someone. Most importantly, the data it provides may finally put this issue to rest. I believe it will show the need to flip the paradigm and move most of this development and testing to proper simulation. Thereby saving lives now by stopping most of the public Guinea pig nonsense. And many more lives later because these companies would actually get to L4 and save the lives they have been saying they have been doing this for in the first place.
Finally, I want to say that while I want them to be successful, I do not trust Waymo here. This is for several reasons. The data dump left critical data on the simulation miles and the fidelity of it out. They do not have nearly enough data to justify driverless operations right now. They still needlessly use human Guineas pigs for testing. And they continue to avoid switching to proper simulation to remedy this. (I believe that is due far more now to ego than ignorance.) And finally, this industries overall hype and financing loop is out of control. Everyone is in a mad reckless dash for money and prestige. Odds are Waymo is simple ramping up the hype here and creating even more false confidence. (I have to believe Waymo and Brad Templeton have seen my posts on how to solve this through the use of proper simulation technology. Neither has accepted my offer to explain the capability and technical differences and show them a demo. To what degree this is arrogance, ignorance or simply shunning the person who is critical of them I don’t care. It’s wrong and extremely counterproductive of them, this industry to do so.)
More in my articles here
SAE Autonomous Vehicle Engineering Magazine — Simulation’s Next Generation (featuring Dactle)
The Autonomous Vehicle Industry can be Saved by doing the Opposite of what is being done now
Autonomous Vehicle Industry’s Self-Inflicted and Avoidable Collapse — Ongoing Update
Proposal for Successfully Creating an Autonomous Ground or Air Vehicle
Simulation can create a Complete Digital Twin of the Real World if DoD/Aerospace Technology is used
Simulation Photorealism is almost Irrelevant for Autonomous Vehicle Development and Testing
Autonomous Vehicles Need to Have Accidents to Develop this Technology
Using the Real World is better than Proper Simulation for AV Development — NONSENSE
The Hype of Geofencing for Autonomous Vehicles
SAE Autonomous Vehicle Engineering Magazine — End Public Shadow/Safety Driving
My name is Michael DeKort — I am a former system engineer, engineering and program manager for Lockheed Martin. I worked in aircraft simulation, the software engineering manager for all of NORAD, the Aegis Weapon System, and on C4ISR for DHS.
Key Industry Participation
- Founder SAE On-Road Autonomous Driving Simulation Task Force
- Member SAE ORAD Verification and Validation Task Force
- Stakeholder for UL4600 — Creating AV Safety Guidelines
- Member of the IEEE Artificial Intelligence & Autonomous Systems Policy Committee (AI&ASPC)
- Presented the IEEE Barus Ethics Award for Post 9/11 Efforts
My company is Dactle
We are building an aerospace/DoD/FAA level D, full L4/5 simulation-based testing and AI system with an end-state scenario matrix to address several of the critical issues in the AV/OEM industry I mentioned in my articles below. This includes replacing 99.9% of public shadow and safety driving. As well as dealing with significant real-time, model fidelity and loading/scaling issues caused by using gaming engines and other architectures. (Issues Unity will confirm. We are now working together. We are also working with UAV companies). If not remedied these issues will lead to false confidence and performance differences between what the Plan believes will happen and what actually happens. If someone would like to see a demo or discuss this further please let me know.