A Study on Multimodal AI for Mild Cognitive Impairment Detection
Date of Award
6-15-2024
Document Type
Masters Thesis
Degree Name
M.S. in Computer Engineering
Organizational Unit
Daniel Felix Ritchie School of Engineering and Computer Science, Electrical and Computer Engineering
First Advisor
Mohammad H. Mahoor
Second Advisor
Yun-Bo Yi
Third Advisor
Kerstin Sophie Haring
Fourth Advisor
Haluk Ogmen
Keywords
Artificial intelligence, Cognitive impairment
Abstract
Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer’s. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer’s and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This thesis proposes a mid-level fusion architecture to make use of multimodal data for MCI prediction. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. To study and evaluated the proposed mid-level fusion model, the I-CONECT dataset was used. It contains a large number of semi-structured conversations via the internet/webcam between participants aged 75+ years old and interviewers. Our experimental results show that the proposed fusion method can detect MCI from NC with an average AUC of (85.3%) which outperforms the unimodal and bimodal baseline models.
This thesis demonstrates that multimodal deep learning models outperform unimodal models in detecting MCI in older adults. To generalize the applicability of these findings, further research employing larger datasets should be conducted.
Copyright Date
6-15-2024
Copyright Statement / License for Reuse
All Rights Reserved.
Publication Statement
Copyright is held by the author. Permanently suppressed.
Rights Holder
Farida Far Poor
Provenance
Received from author
File Format
application/pdf
Language
English (eng)
Extent
58 pgs
File Size
2.1 MB
Recommended Citation
Far Poor, Farida, "A Study on Multimodal AI for Mild Cognitive Impairment Detection" (2024). Electronic Theses and Dissertations. 2449.
https://digitalcommons.du.edu/etd/2449