Date of Award
Summer 8-24-2024
Document Type
Masters Thesis
Degree Name
M.S. in Computer Science
Organizational Unit
Daniel Felix Ritchie School of Engineering and Computer Science, Computer Science
First Advisor
Rinku Dewri
Second Advisor
Tianjie Deng
Third Advisor
Matt Rutherford
Copyright Statement / License for Reuse
All Rights Reserved.
Keywords
Web application, Log data, Personally identifiable information (PII), Extraction, Real-time data analysis, Bull Extractor, Amazon Web Services (AWS), Cloud platform
Abstract
The increased amount of web applications and internet software solutions utilizing cloud frameworks has contributed to large data sets of system log messages being generated constantly. These messages may contain sensitive data, creating an additional security risk for the systems and contributing to the need for analysis of such large volumes of data in real time. Large commercial data monitoring systems can solve for these analysis requirements, but they can be costly. We present a solution to analyzing web application log data which ingests it, processes it and visualizes sensitive data found within in real time. Our solution utilizes an open source method for bulk data analysis and extraction of sensitive information, a digital forensics utility called Bulk Extractor. We focus specifically on Personally Identifiable Information (PII) artifacts present within log data as the targets for extraction. We call our solution the PII Scanner, and present prototype implementations of the scanner on the Amazon Web Services cloud platform, as well as results from tests performed on them to demonstrate their effectiveness and explore implementation options.
Copyright Date
8-2024
Publication Statement
Copyright is held by the author. User is responsible for all copyright compliance.
Rights Holder
John David
Provenance
Received from Author
File Format
application/pdf
Language
English (eng)
Extent
128 pgs
File Size
1.9 MB
Recommended Citation
David, John, "Real Time PII Scanning" (2024). Electronic Theses and Dissertations. 2472.
https://digitalcommons.du.edu/etd/2472
Included in
Cybersecurity Commons, Databases and Information Systems Commons, Data Storage Systems Commons, Information Security Commons, Software Engineering Commons