Challenges in AI Implementation in Healthcare

From Clinfowiki
Jump to: navigation, search

Introduction

Currently, there are several challenges in artificial intelligence (AI) implementation in the healthcare setting, ranging from ethical to practical. Despite the wealth of promising new AI technologies, few have seen successful implementation into clinical workflows. The disparity between the promising performance statistics and the lack of ultimate clinical efficacy has been deemed the “AI chasm.” Several healthcare organizations have developed AI governance committees to help evaluate potential AI tools and ease difficulties in implementation[1]. This wiki entry is meant to be a broad but high-level overview of various challenges facing AI translation to clinical care.

Technical Challenges

Data Sharing

AI and machine learning algorithms require large datasets for training and testing. Normally, one institution or source will not have enough data to power AI development or testing, so data-sharing is needed. While the are several national biobanks that have been developed[2], they may not include all data needed for training, and not all institutions may have access. Differences in the quality of data and how it is stored and retrieved make data sharing even more challenging.[3] Interoperability initiatives such as the Fast Healthcare Interoperability Resources (FHIR) framework on Health Level (HL7) are expected to be essential to overcoming this challenge. Furthermore, efforts in data sharing also raise concerns regarding cybersecurity, especially if a third-party developer is involved.

Susceptibility to Confounding Variables

Machine learning algorithms, especially those utilizing neural networks, will extract confounding variables from their input to develop their prediction model. For example, a machine learning model used CT scanner model and imaging urgency in its model to detect the likelihood of hip fracture on CT scan images. The developers did not necessarily intend for these variables to be included.[4] Depending on the clinical situation, this phenomenon could reduce a model's clinical effectiveness, especially if the user is unaware of this feature.

Generalizability

The generalizability of models is a significant challenge in the clinical application of AI models. This is due to the inevitable difference in the training and testing data used to develop the model, and the live real patient data where it may finally be deployed. These differences can include differences in clinical practice, demographics of the patient population, laboratory and imaging equipment, EHR systems, and more. Even validated and FDA-cleared models may not perform adequately with local patient population data, which Bizzo et al.’s team found when they did a retrospective deployment of a potential new commercial chest CT tool had systemically underestimated areas of radiological findings, and so stopped the implementation.[5] Ensuring generalizability may require some additional model training with local patient data, which only complicates the implementation of a new feature.

Dataset shift

Once an algorithm is implemented to a live population it needs to be monitored for data drift. A global pandemic or major changes in diagnostic or treatment guidelines may significantly alter the patient and treatment data that the algorithm takes in, thus possibly altering the performance of the model. Thus, institutions need to monitor performance and identify drift, and consider retraining as needed.[6]

Algorithmic Bias

If biases and disparities in care are entrenched in the EHR data used to develop the model, it will only amplify those disparities.[7] One group analyzed a typical model for care management program referral and found racial bias in its referral suggestions. Because it used healthcare costs as a proxy for health, it would categorize Black patients and White patients within the same risk level, even if those Black patients had, on average, several more active chronic conditions. This ultimately led to fewer Black patients who would benefit from additional care coordination from being referred to the program.[8] Other sources of bias originating from EHR data include missing data and low sample size due to access or fragmented care, and misclassification or measurement error due to poorer quality of care received by certain patient groups.[9]

Research Challenges

Many are calling for more rigorous standards to be applied to AI research. The majority of research for algorithms is retrospective, looking at historically-labeled data to train and test algorithms. [8] More prospective studies are needed to understand the performance of AI as a clinical intervention or CDS with real-time patient data. More peer-reviewed randomized control trials are also needed to assess predictive or diagnostic AI algorithms regarding clinical effectiveness and whether there is any difference in patient outcomes. Controlled clinical trials would also be needed to compare different algorithms developed for the same purpose to compare performance. [2][10]

Organizational Challenges

Developing regulatory state

Government regulation of AI models currently struggles to keep pace with advances in AI. The Food and Drug Administration (FDA) regulates AI technologies as “Software as a Medical Device” (SaMD) under its Digital Health Innovation Action Plan. The Precertification pathway within the Action Plan has drawn some concern, as it allows developers who have achieved “organizational excellence” to be exempt from premarket review for lower-risk SaMD products, or accelerated review for higher-risk SaMD products.[3] It is possible developers may take advantage of this designation, and faults in even low-risk models could be overlooked before they go to market. Most recently, the HTI-1 rule by the Office of the National Coordinator for Health Information Technology (ONC) required health IT developers who produce predictive decision support interventions, including AI, to meet certain transparency, risk management, source attribute, validation, and governance requirements.[11]

Liability

Furthermore, there is no clear legal precedent for liability if the use of an AI tool results in patient harm. For example, there are AI chatbot tools that interact with patients to triage them into those who are well and simply need reassurance or those who may need further evaluation. If this bot is wrong and defers evaluation for a patient who then suffers injury from that delay in care, who is responsible? It is unclear whether the developer, the vendor, the healthcare system, or even no one at all would be liable.[12]

Operational costs

While it takes substantial financial investment and years to build the ideal infrastructure required, it is still not clear whether healthcare systems will see business returns over time. Significant financial, personnel, and IT infrastructure investment is required for AI tool implementation. If a system is using a tool from a third-party vendor, it must also account for implementation costs, licensing fees, and contracting costs. Also, having sufficient personnel is one of the most frequently cited operational barriers to AI implementation in healthcare systems. [13] Normally, maintenance of AI tools becomes so resource-heavy that non-healthcare companies have developed specific teams (“MLOps”) to handle AI data capture, monitoring, and change. [14]These high costs may preclude smaller or nonprofit health systems, such as critical access hospitals, from implementing AI tools.

End User Challenges

Integration into Clinical Workflows

Firstly, it is important to consider whether an AI tool is even appropriate to solve the problem at hand and whether it is a usable tool. For example, during Yang et al.’s design of an AI CDS tool when, they discovered that the decision to implant a particular intracardiac device is rarely ever made in front of a computer, which certainly became a critical finding for the tool's development.[15] AI tool developers must analyze the clinical workflow surrounding the tool and invest in usability so that the tool does not simply add burden to physicians. Perceived utility and ease of use have been found to be predictors of a medical professional's likeliness to use a new tool.[16]

Explainability

Fully explainable AI requires both technical and mathematic clarity, but also interpretability by the physician. For many of the most promising AI models, it is almost impossible to tell how the model evaluates the patterns it sees and even what those patterns are that ultimately lead it its output. This has been coined the “black box” phenomenon. This leads to mistrust by end users, especially if model recommendations could alter clinical care.[17] Poor explainability also leads to difficulty detecting confounding variables and model bias. Performance statistics should also be communicated in measures that most physicians are familiar with, such as positive predictive value, sensitivity and specificity, and the number needed to treat. These statistics should be prominent and easy to access at the point of care.[18] The field of explainable AI (XAI) has emerged to address the explainability of AI algorithms.

User AI Literacy

The standard medical education curriculum needs to be updated to include basic AI literacy so that physicians can effectively use AI in clinical practice. Physicians should be educated on the limitations and potential biases of AI tools in healthcare to be aware of any errors or biases in the output they receive. Eventually, larger societal efforts should be made to educate patients on the use of AI in healthcare.


Notes

  1. Kashyap S, Morse KE, Patel B, Shah NH. A survey of extant organizational and computational setups for deploying predictive models in health systems. J Am Med Inform Assoc. 2021;28(11):2445-2450. doi:10.1093/jamia/ocab154
  2. 2.0 2.1 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195. doi:10.1186/s12916-019-1426-2
  3. 3.0 3.1 He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30-36. doi:10.1038/s41591-018-0307-0
  4. Badgeley MA, Zech JR, Oakden-Rayner L, et al. Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables. Published online November 8, 2018. Accessed May 2, 2024. http://arxiv.org/abs/1811.03695
  5. Bizzo BC, Dasegowda G, Bridge C, et al. Addressing the Challenges of Implementing Artificial Intelligence Tools in Clinical Practice: Principles From Experience. J Am Coll Radiol. 2023;20(3):352-360. doi:10.1016/j.jacr.2023.01.002
  6. Rahmani K, Thapa R, Tsou P, et al. Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction. Int J Med Inf. 2023;173:104930. doi:10.1016/j.ijmedinf.2022.104930
  7. London AJ. Artificial intelligence in medicine: Overcoming or recapitulating structural challenges to improving patient care? Cell Rep Med. 2022;3(5):100622. doi:10.1016/j.xcrm.2022.100622
  8. 8.0 8.1 Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi:10.1126/science.aax2342
  9. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018;178(11):1544. doi:10.1001/jamainternmed.2018.3763
  10. Sendak M, D’Arcy J, Kashyap S, et al. A Path for Translation of Machine Learning Products into Healthcare Delivery. EMJ Innov. Published online January 27, 2020. doi:10.33590/emjinnov/19-00172
  11. Office of the National Coordinator for Health Information Technology. HealthIT.gov. HealthIT.gov. https://www.healthit.gov/topic/laws-regulation-and-policy/health-data-technology-and-interoperability-certification-program
  12. Duffourc M, Gerke S. Generative AI in Health Care and Liability Risks for Physicians and Safety Concerns for Patients. JAMA. 2023;330(4):313. doi:10.1001/jama.2023.9630
  13. Weinert L, Müller J, Svensson L, Heinze O. Perspective of Information Technology Decision Makers on Factors Influencing Adoption and Implementation of Artificial Intelligence Technologies in 40 German Hospitals: Descriptive Analysis (Preprint). JMIR Medical Informatics; 2021. doi:10.2196/preprints.34678
  14. Daye D, Wiggins WF, Lungren MP, et al. Implementation of Clinical Artificial Intelligence in Radiology: Who Decides and How? Radiology. 2022;305(3):555-563. doi:10.1148/radiol.212151
  15. Yang Q, Steinfeld A, Zimmerman J. Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM; 2019:1-11. doi:10.1145/3290605.3300468
  16. Khanijahani A, Iezadi S, Dudley S, Goettler M, Kroetsch P, Wise J. Organizational, professional, and patient characteristics associated with artificial intelligence adoption in healthcare: A systematic review. Health Policy Technol. 2022;11(1):100602. doi:10.1016/j.hlpt.2022.100602
  17. Reddy S, Allan S, Coghlan S, Cooper P. A governance model for the application of AI in health care. J Am Med Inform Assoc. 2020;27(3):491-497. doi:10.1093/jamia/ocz192
  18. Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337-1340. doi:10.1038/s41591-019-0548-6


Submitted by Isabella Slaby, DO