Date of Award
2020
Document Type
Masters Thesis
Degree Name
M. S.
Organizational Unit
Daniel Felix Ritchie School of Engineering and Computer Science, Computer Science
First Advisor
Rinku Dewri
Second Advisor
Scott Leutenegger
Third Advisor
Young Jin Lee
Keywords
Change detection, Natural language processing, Privacy policy
Abstract
Privacy policies notify Internet users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Also, the entities that provide these policies are sometimes unmotivated to make them comprehensible. Due to the complicated nature of these documents, it gets even harder for users to understand and take note of any changes of interest or concern when these policies are changed or revised.
With recent development of machine learning and natural language processing, tools that can automatically annotate sentences of policies have been developed. These annotations can help a user quickly identify and understand relevant parts of the policy. Similarly a tool can be developed that can help identify changes between different versions of a policy that can be informative for the user. For example, suppose according to the new policy a website will start sharing audio data as well. The proposed tool can help users to be aware of such important changes. This thesis presents a tool that takes two different versions of a privacy policy as input, matches the sentences of one version of a policy to the sentences of another version of the policy based on semantic similarity, and inform the user of key relevant changes between two matched sentences. We discuss different supervised machine learning models that are explored to develop a method to annotate the sentences of privacy policies according to expert-identified categories for organization and analysis of the contents. Different word-embedding and similarity techniques are explored and evaluated to develop a method to match the sentences of one version of the policy to another version of a policy. The annotation of the sentences are used to increase the efficiency of the matching process. Methods to detect changes between two matched sentences through analysis of the structure of sentences are then implemented. We combined the developed methods for annotation of policies, matching the sentences between two versions of a policy and detecting change between sentences to realize the proposed tool.
The research work not only shows the potential of machine learning and natural language processing as an important tool for privacy engineering but also introduces various techniques that can be utilized for any natural language document.
Publication Statement
Copyright is held by the author. User is responsible for all copyright compliance.
Rights Holder
Andrick Adhikari
Provenance
Received from ProQuest
File Format
application/pdf
Language
en
File Size
116 p.
Recommended Citation
Adhikari, Andrick, "Automated Change Detection in Privacy Policies" (2020). Electronic Theses and Dissertations. 1706.
https://digitalcommons.du.edu/etd/1706
Copyright date
2020
Discipline
Computer science, Artificial intelligence