MIT Research Debacle — Human Side of Tesla Autopilot: Exploration of Functional Vigilance in Real-World Human-Machine Collaboration
The study — https://hcai.mit.edu/human-side-of-tesla-autopilot/
This is a deplorable, dangerous and reckless effort. It is so egregious I am calling on MIT to perform a full review, issue a public correction and review the tenure of the personnel involved.
1. First, I would like to compliment the authors for mentioning that Tesla using the word “Autopilot” to describe a system under development that is not yet autonomous, is wrong. Tesla as well as the entire industry should follow their example and use “AI assisted”.
2. Unfortunately, the rest of the paper is deplorable. It is anything but objective, ethical, scientific, realistic or responsible. The process the research team espouses and leans on, “Functional Vigilance” requires a super human and clairvoyant level of performance from human safety drivers. It requires drivers to know what capabilities the AV systems have at all times, even after software updates. It requires they know how these capabilities will handle every situation they are in regardless of complexity or location. It requires humans to be able to know ahead of time when they will need to disengage so they can take back control. And it requires humans to take over these systems, in critical and complex scenarios, in seconds. Finally, in an effort to support their untenable and dangerous approach they cherry pick data, malign and misrepresent data that counters their argument and mislead the reader into thinking data that exists does not exist. And they purposefully leave out the most critical event or “tricky situation” of all. When the AV does not disengage and there is a crash. This is where all of the Tesla accidents and subsequent deaths have occurred and will occur. They ignore when the AV makes the worse mistakes or errors. (They admit this area was not considered and bury it in the Limitations section “(3) do not include challenging scenarios that did not lead to Autopilot disengagement, . . “ This is very convenient. Shouldn’t these be the most crucial scenarios for “Functional Vigilance” to cover? If it doesn’t work for accident scenarios what’s the point?)
3. The study defines “Functional Vigilance” is defined as “the ability of the driver to choose when to serve as the operator of the vehicle and when to serve as the supervisor of the automation (in this case, Autopilot)”. That in turn will make people more likely to support the use of, become or extend their participation as human Guinea pigs and accept the injuries and deaths that result.
4. The report cleverly mentions objective science and data but then leaves out critical information or utilizes non-sequiturs to support and ultimately mislead people into supporting their “Functional Vigilance” process. While that process can make handover more successful it in no way mitigates issues in most complex or time critical scenarios.
5. The paper goes well beyond assuming public shadow/safety driving is the best or only way to develop and test these systems. And the deaths that occur because of it are for the greater good. Implying the lives lost in development will save more lives later when the efforts are complete. That assumption is grossly incorrect and reckless. The process is actually untenable and harms people for no reason. It will never facilitate getting anywhere near L4 nor saving the associated lives. It is impossible to drive the one trillion miles or spend over $300B to stumble and restumble on all the scenarios necessary to complete the effort. Many of which are accident scenarios no one will want you to run once let alone thousands of times. Also, handover cannot be made safe for most complex scenarios, by any monitoring and notification system, because they cannot provide the time to regain proper situational awareness and do the right thing the right way.
6. As “Functional Vigilance” is based on the driver controlling when handover will occur it is a completely impractical expectation unless any scenario that would cause any meaningful loss of situational awareness can be kept from occurring. Or the driver will have plenty of heads up as to when they will occur. Studies from the Universities of Leeds and Southampton and data from NASA show that is between 3 and 45 seconds based on the scenario. From the report — “We propose a measure of “Functional Vigilance” that conceptualizes vigilance when drivers are allowed to self-regulate by choosing when and where to leverage the capabilities of automation and when to perform the driving task manually. The central observations in the dataset is that drivers use Autopilot”. Given the data set provided it appears the author came to the conclusion there are no critical scenarios that handover or “Functional Vigilance” cannot be successful because there were none in the Tesla set. Assuming the Tesla data is accurate and complete all this means is either those scenarios did not occur, or the author does not include the 3–45 seconds criteria, depending on the scenario, due to his belief that it is hard to measure “Timing of Response”.
7. The “Functional Vigilance” process requires the driver to know where AP can be used. After the Brown tragedy the NTSB specifically stated that Tesla should not allow AP to be engaged where it cannot handle common scenarios. Given that why does the author put the ownness on the driver? (The truth of the matter is Tesla cannot advance its development, as untenable the shadow/safety driving process is, unless the drivers, or human Guinea pigs, drive and redrive EVERY scenario that needs to be learned. This is why Elon Musk routinely safety drives with his hands off the steering wheel in spite of the vehicle manual’s legal statements declaring the exact opposite. Tesla wants it both ways. They need the driver to cede steering control to develop the technology. But when something bad happens they use the legal language to avoid accountability.)
8. The paper dismisses the fact that most complex or time critical scenarios, of which most accidents fall, do not afford any human enough time to regain proper situational awareness, whether “Functionally Vigilant” or not and no matter what monitor and alarm system is used. The misleading take-away being there are no scenarios their “Functional Vigilance” process cannot handle. But again, the author’s plan relies on the driver being able to know when a handover will come. Even if that were possible and humans followed their exact recommended process, which it is not in most accident scenarios, there is a period of time where no process, no system can provide the time needed to regain proper situational awareness. Both when you do and do not know they are coming. Clearly more heads up can improve the situation. But does not in any way eliminate it. That is evidenced by common sense, NASA, a plethora of studies including from the Universities of Leeds and Southampton, Missy Cummings and even parties in the industry like Chris Urmson, Volvo and Ford (In spite of their ignoring this and using public shadow/safety driving anyway).
9. The author limits handover concerns to those where the humans over trust these systems. There are other significant contributing factors involving situational awareness that are not resolved by their being informed about or trusting the capabilities of automated systems. These areas include how long the human is driving, how infrequent handover is, duration of time to the handover event or between them, how short in duration the event is and the complexity of the event. In complex scenarios of short duration, which includes most accident scenarios, there is no combination of monitoring and control systems that can overcome these issues even if the driver “trusts” the autonomous system.
10. The report does not mention Tesla’s have not incurred most of the scenarios especially complex and dangerous scenarios they will need to encounter to get to L4. RAND estimates 500B miles and Toyota one trillion miles (At 10X better than a human) to get to a legitimate L4. The paper states Tesla has driven under two billion miles with AP engaged. At 500B miles this is only .4% of what is required. If you lower that 500B to 50B miles they are still only 4% complete. (In fairness to Tesla their geofence is all highways. This set is more complex than that being run by most AV makers. That could be the reason for their having more deaths to date. Having said that not having LiDAR could also explain these accident scenarios as well.)
11. The author misleads the reader in to thinking the only accounts of Tesla issues are from social media and therefore “anecdotal”. He avoids mentioning any press articles and any government findings related to them. And worse of all he never mentions the 3 confirmed deaths of people using Tesla’s AP. (Two more individuals died about 6 weeks ago in what appear to be AP related accidents. One being what appears to be the same scenario that killed Joshua Brown almost 3 years ago.)
12. The author also attempts to invalidate studies done in simulators by stating “The degree to which these studies of automation generalize to the real world is unknown [32].” This is factually incorrect. There is a proven and objective way to determine how realistic the driving simulator is. It is done in aerospace and defense, for racing simulators and even some simulators used by OEMs. You compare the performance curves of the exact vehicle, tires, road, environment etc to the simulation models. Having said that there can be a bias involving people knowing they are in a simulator vs the real-world. While that can be true in some circumstances especially when using inferior simulators, it is mitigated by precise models and a full motion system. These are the types of systems the airlines and DoD have been using for decades. (Including DoD ground vehicles.) These systems are so realistic that the user rarely avoids getting caught up in the tasks at hand. However, even in cases where this is relevant these people still fail the handover tests in critical scenarios. The reason is either because they ignore the scenarios before them when trying to regain situational awareness after being distracted, thereby lengthening the time to do so, or they realize they are in a simulator and try to be hyper vigilante (not unlike the author’s “Functional Vigilance” process) and they better their times. However, they still fail because it’s still an impossible task in critical scenarios because there is still not enough time to regain situational awareness.
13. Beyond this the author generally attempts to invalidate simulation by stating they prefer to use Tesla’s “in the wild”. Stating that is where “tricky situations arise”. This is where the author is making the point that simulation cannot replace the real world. This is incorrect on two levels. One being that it appears the author has only availed himself of simulation or simulators in this industry. Which in most cases has significant flaws in real-time and model fidelity. If he experienced aerospace/DoD level simulation technology, especially in an FAA Level D DoD urban simulated war game, he would realize simulation can do what is needed to replace 99.9% of public shadow/safety driving. The other issue being it is impossible to experience, let alone reexperience, most scenarios in the real world. This is for several reasons. The first being quantity. Given the vast quantity of scenarios, especially involving ensuring perception systems detect what they are supposed to, you simply cannot experience most of these once let alone many times in several lifetimes. The other part of this being the thousands of injuries and deaths that will be caused by stumbling and restumbling on thousands of accidents scenarios thousands of times over. When the world understands this is necessary, let alone when the first child or family dies needlessly, this process will never be permitted to complete.
Areas I feel I should address
· The right way to do this is to use aerospace/DoD simulation technology and systems engineering, informed and validated by real world and test tracks, to replace 99.9% of that public shadow and safety driving.
· The reason for my direct and critical approach and choice of language — This is due my belief that this report exemplifies the exact opposite of what these researchers should be doing and will quite literally contribute to many unnecessary deaths. It will also contribute to this technology never becoming a legitimate reality and the bankruptcy of many of those trying to build it. Information and professional opinions from those we respect and are supposed to trust are the most crucial. These authors as well as NHTSA in their associated 2015 L3 safety study, several former members of NHTSA (Rosekind, Strickland and Beuse), elected officials guided by them are all doing the exact opposite of what we need them to do. It is reprehensible for the professionals we count on to keep us safe, who use that as their mantra, to do the exact opposite and put us in harm’s way needlessly.
· My conflict of interest — I have created a company that intends to sell a system that would replace 99.9% of the shadow/safety driving being used by provide all of the scenarios, associated simulations and the full motion simulator needed to develop and test these systems to achieve a legitimate L4. That requires using aerospace/DoD simulation technology and systems engineering to remedy significant technical gaps in the majority of the simulation products currently being used in the industry. The vast majority of the simulation systems the industry is currently using have significant depth, breadth and fidelity issues. I spent 1.5 years trying to avoid that conflict of interest by endeavoring to assist the industry, especially the simulation providers, to provide the same capabilities. The response from those simulation providers was that they would follow their customers lead and fix the gaps when they were made aware of them and they paid for them. I believe this was to mitigate having to create and maintain a separate rearchitected product/system. As this chicken and the egg scenario was unacceptable, I reached out to folks in aerospace/DoD as well as some others to fix this. The other information I will provide demonstrating I am mission first involves a post 9/11 DoD/DHS whistleblowing ordeal in my past. It led to my receiving the IEEE Barus Ethics Award, being in several books on ethics, on 60 Minutes, the lead witness at a congressional hearing and a documentary movie.
Please see more of my articles below. The last one having links to the refences I made to NASA, the university studies and entities in the industry agreeing handover cannot be resolved in critical scenarios.
The Autonomous Vehicle Podcast — Featured Guest
SAE Autonomous Vehicle Engineering Magazine — End Public Shadow Driving
The Hype of Geofencing for Autonomous Vehicles
Common Misconceptions about Aerospace/DoD/FAA Simulation for Autonomous Vehicles
Elon Musk’s hype regarding “Autopilot” has now risen to Gross Negligence
Lex Fridman, MIT Deep Learning Research Scientist, is Misleading his Students and putting them at Risk
Autonomous Levels 4 and 5 will never be reached without Simulation vs Public Shadow Driving for AI