A Study on Multimodal AI for Mild Cognitive Impairment Detection

Date of Award

6-15-2024

Document Type

Masters Thesis

Degree Name

M.S. in Computer Engineering

Organizational Unit

Daniel Felix Ritchie School of Engineering and Computer Science, Electrical and Computer Engineering

First Advisor

Mohammad H. Mahoor

Second Advisor

Yun-Bo Yi

Third Advisor

Kerstin Sophie Haring

Fourth Advisor

Haluk Ogmen

Keywords

Artificial intelligence, Cognitive impairment

Abstract

Mild Cognitive Impairment (MCI) is an early stage of memory loss or other cognitive ability loss in individuals who maintain the ability to independently perform most activities of daily living. It is considered a transitional stage between normal cognitive stage and more severe cognitive declines like dementia or Alzheimer’s. Based on the reports from the National Institute of Aging (NIA), people with MCI are at a greater risk of developing dementia, thus it is of great importance to detect MCI at the earliest possible to mitigate the transformation of MCI to Alzheimer’s and dementia. Recent studies have harnessed Artificial Intelligence (AI) to develop automated methods to predict and detect MCI. The majority of the existing research is based on unimodal data (e.g., only speech or prosody), but recent studies have shown that multimodality leads to a more accurate prediction of MCI. However, effectively exploiting different modalities is still a big challenge due to the lack of efficient fusion methods. This thesis proposes a mid-level fusion architecture to make use of multimodal data for MCI prediction. We introduce a multimodal speech-language-vision Deep Learning-based method to differentiate MCI from Normal Cognition (NC). Our proposed architecture includes co-attention blocks to fuse three different modalities at the embedding level to find the potential interactions between speech (audio), language (transcribed speech), and vision (facial videos) within the cross-Transformer layer. To study and evaluated the proposed mid-level fusion model, the I-CONECT dataset was used. It contains a large number of semi-structured conversations via the internet/webcam between participants aged 75+ years old and interviewers. Our experimental results show that the proposed fusion method can detect MCI from NC with an average AUC of (85.3%) which outperforms the unimodal and bimodal baseline models.

This thesis demonstrates that multimodal deep learning models outperform unimodal models in detecting MCI in older adults. To generalize the applicability of these findings, further research employing larger datasets should be conducted.

Copyright Date

6-15-2024

Copyright Statement / License for Reuse

All Rights Reserved
All Rights Reserved.

Publication Statement

Copyright is held by the author. Permanently suppressed.

Rights Holder

Farida Far Poor

Provenance

Received from author

File Format

application/pdf

Language

English (eng)

Extent

58 pgs

File Size

2.1 MB

This document is currently not available here.



Share

COinS