Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

Abstract

Artificial intelligence can assist providers in a variety of patient care and intelligent health systems. Artificial intelligence techniques ranging from machine learning to deep learning are prevalent in healthcare for disease diagnosis, drug discovery, and patient risk identification. Numerous medical data sources are required to perfectly diagnose diseases using artificial intelligence techniques, such as ultrasound, magnetic resonance imaging, mammography, genomics, computed tomography scan, etc. Furthermore, artificial intelligence primarily enhanced the infirmary experience and sped up preparing patients to continue their rehabilitation at home. This article covers the comprehensive survey based on artificial intelligence techniques to diagnose numerous diseases such as Alzheimer, cancer, diabetes, chronic heart disease, tuberculosis, stroke and cerebrovascular, hypertension, skin, and liver disease. We conducted an extensive survey including the used medical imaging dataset and their feature extraction and classification process for predictions. Preferred reporting items for systematic reviews and Meta-Analysis guidelines are used to select the articles published up to October 2020 on the Web of Science, Scopus, Google Scholar, PubMed, Excerpta Medical Database, and Psychology Information for early prediction of distinct kinds of diseases using artificial intelligence-based techniques. Based on the study of different articles on disease diagnosis, the results are also compared using various quality parameters such as prediction rate, accuracy, sensitivity, specificity, the area under curve precision, recall, and F1-score.

Keywords: Artificial intelligence, Alzheimer, Cancer disease, Chronic disease, Heart disease, Tuberculosis

Introduction

Healthcare is shaping up in front of our eyes with advances in digital healthcare technologies such as artificial intelligence (AI), 3D printing, robotics, nanotechnology, etc. Digitized healthcare presents numerous opportunities for reducing human errors, improving clinical outcomes, tracking data over time, etc. AI methods from machine learning to deep learning assume a crucial function in numerous well-being-related domains, including improving new clinical systems, patient information and records, and treating various illnesses (Usyal et al. 2020; Zebene et al. 2019). The AI techniques are also most efficient in identifying the diagnosis of different types of diseases. The presence of computerized reasoning (AI) as a method for improved medical services offers unprecedented occasions to recuperate patient and clinical group results, decrease costs, etc. The models used are not limited to computerization, such as providing patients, “family” (Musleh et al. 2019; Dabowsa et al. 2017), and medical service experts for data creation and suggestions as well as disclosure of data for shared evaluation building. AI can also help to recognize the precise demographics or environmental areas where the frequency of illness or high-risk behaviors exists. Researchers have effectively used deep learning classifications in diagnostic approaches to computing links between the built environment and obesity frequency (Bhatt et al. 2019; Plawiak et al. 2018).

AI algorithms must be trained on population-representative information to accomplish presentation levels essential for adaptable “accomplishment”. Trends, such as the charge for putting away and directing realities, information collection through electronic well-being records (Minaee et al. 2020; Kumar 2020), and exponential client state of information, have made a data-rich medical care biological system. This enlargement in health care data struggles with the lack of well-organized mechanisms for integrating and reconciling these data ahead of their current silos. However, numerous frameworks and principles facilitate summation and accomplish adequate data quantity for AI (Vasal et al. 2020). The challenges in the operational dynamism of AI technologies in healthcare systems are immeasurable despite the information that this is one of the most vital expansion areas in biomedical research (Kumar et al. 2020). The AI commune must build an integrated best practice method for execution and safeguarding by incorporating active best practices of principled inclusivity, software growth, implementation science, and individual–workstation interaction. At the same time, AI applications have an enormous ability to work on patient outcomes. Simultaneously, they could make significant hazards regarding inappropriate patient risk assessment, diagnostic inaccuracy, healing recommendations, privacy breaches, and other harms (Gouda et al. 2020; Khan and Member 2020).

Researchers have used various AI-based techniques such as machine and deep learning models to detect the diseases such as skin, liver, heart, alzhemier, etc. that need to be diagnosed early. Hence, in related work, the techniques like Boltzmann machine, K nearest neighbour (kNN), support vector machine (SVM), decision tree, logistic regression, fuzzy logic, and artificial neural network to diagnose the diseases are presented along with their accuracies. For example, a research study by Dabowsa et al. (2017) used a backpropagation neural network in diagnosing skin disease to achieve the highest level of accuracy. The authors used real-world data collected from the dermatology department. Ansari et al. (2011) used a recurrent neural network (RNN) to diagnose liver disease hepatitis virus and achieved 97.59%, while a feed-forward neural network achieved 100%. Owasis et al. (2019) got 97.057 area under the curve by using residual neural network and long short-term memory to diagnose gastrointestinal disease. Khan and Member (2020) introduced a computerized arrangement framework to recover the data designs. They proposed a five-phase machine learning pipeline that further arranged each stage in various sub levels. They built a classifier framework alongside information change and highlighted choice procedures inserted inside a test and information investigation plan. Skaane et al. (2013) enquired the property of digital breast tomosynthesis on period and detected cancer in residents based screening. They did a self-determining dual analysis examination by engaging ladies of 50–69 years and comparing full-field digitized mammography plus data building tool with full-field digital mammography. Accumulation of the data building tool resulted in a non-significant enhancement in sensitivity by 76.2% and a significant increase by 96.4%. Tigga et al. (2020) aimed to assess the diabetic risk among the patients based on their lifestyle, daily routines, health problems, etc. They experimented on 952 collected via an offline and online questionnaire. The same was applied to the Pima Indian Diabetes database. The random forest classifier stood out to be the best algorithm. Alfian et al. (2018) presented a personalized healthcare monitoring system using Bluetooth-based sensors and real-time data processing. It gathers the user’s vital signs data such as blood pressure, heart rate, weight, and blood glucose from sensor nodes to a smartphone. Katherine et al. (2019) gave an overview of the types of data encountered during the setting of chronic disease. Using various machine learning algorithms, they explained the extreme value theory to better quantify severity and risk in chronic disease. Gonsalves et al. (2019) aimed to predict coronary heart disease using historical medical data via machine learning technology. The presented work supported three supervised learning techniques named Naïve Bayes, Support vector machine, and Decision tree to find the correlations in coronary heart disease, which would help improve the prediction rate. The authors worked on the South African Heart Disease dataset of 462 instances and machine learning techniques using 10-fold cross-validation. Momin et al. (2019) proposed a secure internet of things-based healthcare system utilizing a body sensor network called body sensor network care to accomplish the requirements efficiently. The sensors used analogue to digital converter, Microcontroller, cloud database, network, etc. A study by Ijaz et al. (2018) has used IoT for a healthcare monitoring system for diabetes and hypertension patients at home and used personal healthcare devices that perceive and estimate a persons’ biomedical signals. The system can notify health personnel in real-time when patients experience emergencies. Shabut et al. (2018) introduced an examination to improve a smart, versatile, empowered master to play out a programmed discovery of tuberculosis. They applied administered AI method to achieve parallel grouping from eighteenth lower request shading minutes. Their test indicated a precision of 98.4%, particularly for the tuberculosis antigen explicit counteracting agent identification on the portable stage. Tran et al. (2019) provided the global trends and developments of artificial intelligence applications related to stroke and heart diseases to identify the research gaps and suggest future research directions. Matusoka et al. (2020) stated that the mindfulness, treatment, and control of hypertension are the most significant in overcoming stroke and cardiovascular infection. Rathod et al. (2018) proposed an automated image-based retrieval system for skin disease using machine learning classification. Srinivasu et al. (2021a, b) proposed an effective model that can help doctors diagnose skin disease efficiently. The system combined neural networks with MobileNet V2 and Long Short Term Memory (LSTM) with an accuracy rate of 85%, exceeding other state-of-the-art deep models of deep learning neural networks. This system utilized the technique to analyse, process, and relegate the image data predicted based on various features. As a result, it gave more accuracy and generated faster results as compared to the traditional methods. Uehara et al. (2018) worked at the Japanese extremely chubby patients utilizing artificial brainpower with rule extraction procedure. They had 79 Non-alcoholic steatohepatitis, and 23 non- Non-alcoholic steatohepatitis patients analyse d to make the desired model. They accomplished the prescient exactness by 79.2%. Ijaz et al. (2020) propose a cervical cancer prediction model for early prediction of cervical cancer using risk factors as inputs. The authors utilize several machine learning approaches and outlier detection for different pre-processing tasks. Srinivasu et al. (2021a, b) used an AW-HARIS algorithm to perform automated segmentation of CT scan images to identify abnormalities in the human liver. It is observed that the proposed approach has outperformed in the majority of the cases with an accuracy of 78%.

To fully understand how AI assists in the diagnosis and prediction of a disease, it is essential to understand the use and applicability of diverse techniques such as SVM, KNN, Naïve Bayes, Decision Tree, Ada Boost, Random Forest, K-Mean clustering, RNN, Convolutional neural networks (CNN), Deep-CNN, Generative Adversarial Networks (GAN), and Long short-term memory (LSTM) and many others for various disease detection system (Owasis et al. 2019; Nithya et al. 2020). We conducted an extensive survey based on the machine and deep learning models for disease diagnosis. The study covers the review of various diseases and their diagnostic methods using AI techniques. This contribution explains by addressing the four research questions: RQ1. What is the state-of-the-art research for AI in disease diagnosis? RQ2. What are the various types of diseases wherein AI is applied? RQ3. What are the emergent limitations and challenges that the literature advances for this research area? RQ4.What are the future avenues in healthcare that might benefit from the application of AI? The rest of the work is organized into various sections. Initially, a brief description of AI in healthcare and disease diagnosis using multiple machines and deep learning techniques is given in Sect. 1. Then, it is named an introduction that includes Fig. 1 to describe all the papers taken from different organized sources for various diseases in the contribution sub-section. Materials and Methods is named as Sect. 2, which includes the quality assessment and the investigation part regarding AI techniques and applications. Section 3 covers symptoms of diseases and challenges to diagnostics, a framework for AI in disease detection modelling, and various AI applications in healthcare. Section 4 includes the reported work of multiple diseases and the comparative analysis of different techniques with the used dataset, applied machine and deep learning methods with computed outcomes in terms of various parameters such as accuracy, sensitivity, specificity, the area under the curve, and F-score. In Sect. 5, the discussion part is covered that answers the investigation part mentioned in Sect. 2. Finally, in Sect. 6, the work that helps researchers chooses the best approach for diagnosing the diseases is concluded along with the future scope.