If you want to know the current state of artificial intelligence in medicine, then Eric Topol review in Nature is the article you have to read. A highlighted statement:
There are differences between the prediction metric for a cohort and an individual prediction metric. If a model’s AUC is 0.95, which most would qualify as very accurate,And this is good summary:
this reflects how good the model is for predicting an outcome, such as death, for the overall cohort. But most models are essentially classifiers and are not capable of precise prediction at the individual level, so there is still an important dimension of uncertainty.
Despite all the promises of AI technology, there are formidable obstacles and pitfalls. The state of AI hype has far exceeded the state of AI science, especially when it pertains to validation and readiness for implementation in patient care. A recent example is IBM Watson Health’s cancer AI algorithm (known as Watson for Oncology). Used by hundreds of hospitals around the world for recommending treatments for patients with cancer, the algorithm was based on a small number of synthetic, nonreal cases with very limited input (real data) of oncologists. Many of the actual output recommendations for treatment were shown to be erroneous, such as suggesting the use of bevacizumab in a patient with severe bleeding, which represents an explicit contraindication and ‘black box’ warning for the drug. This example also highlights the potential for major harm to patients, and thus for medical malpractice, by a flawed algorithm. Instead of a single doctor’s mistake hurting a patient, the potential for a machine algorithm inducing iatrogenic risk is vast. This is all the more reason that systematic debugging, audit, extensive simulation, and validation, along with prospective scrutiny, are required when an AI algorithm is unleashed in clinical practice. It also underscores the need to require more evidence and robust validation to exceed the recent downgrading of FDA regulatory requirements for medical algorithm approval
Therefore, take care when you look at tables like this one:
Prediction | n | AUC | Publication (Reference number) |
---|---|---|---|
In-hospital mortality, unplanned readmission, prolonged LOS, final discharge diagnosis | 216,221 | 0.93* 0.75+0.85# | Rajkomar et al.96 |
All-cause 3–12 month mortality | 221,284 | 0.93^ | Avati et al.91 |
Readmission | 1,068 | 0.78 | Shameer et al.106 |
Sepsis | 230,936 | 0.67 | Horng et al.102 |
Septic shock | 16,234 | 0.83 | Henry et al.103 |
Severe sepsis | 203,000 | 0.85@ | Culliton et al.104 |
Clostridium difficile infection | 256,732 | 0.82++ | Oh et al.93 |
Developing diseases | 704,587 | range | Miotto et al.97 |
Diagnosis | 18,590 | 0.96 | Yang et al.90 |
Dementia | 76,367 | 0.91 | Cleret de Langavant et al.92 |
Alzheimer’s Disease ( + amyloid imaging) | 273 | 0.91 | Mathotaarachchi et al.98 |
Mortality after cancer chemotherapy | 26,946 | 0.94 | Elfiky et al.95 |
Disease onset for 133 conditions | 298,000 | range | Razavian et al.105 |
Suicide | 5,543 | 0.84 | Walsh et al.86 |
Delirium | 18,223 | 0.68 | Wong et al.100 |