Date of Award

2020

Document Type

Thesis

Degree Name

M. S.

Department

Computer Science

First Advisor

Rinku Dewri

Second Advisor

Scott Leutenegger

Third Advisor

Young Jin Lee

Keywords

Change detection, Natural language processing, Privacy policy

Abstract

Privacy policies notify Internet users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Also, the entities that provide these policies are sometimes unmotivated to make them comprehensible. Due to the complicated nature of these documents, it gets even harder for users to understand and take note of any changes of interest or concern when these policies are changed or revised.

With recent development of machine learning and natural language processing, tools that can automatically annotate sentences of policies have been developed. These annotations can help a user quickly identify and understand relevant parts of the policy. Similarly a tool can be developed that can help identify changes between different versions of a policy that can be informative for the user. For example, suppose according to the new policy a website will start sharing audio data as well. The proposed tool can help users to be aware of such important changes. This thesis presents a tool that takes two different versions of a privacy policy as input, matches the sentences of one version of a policy to the sentences of another version of the policy based on semantic similarity, and inform the user of key relevant changes between two matched sentences. We discuss different supervised machine learning models that are explored to develop a method to annotate the sentences of privacy policies according to expert-identified categories for organization and analysis of the contents. Different word-embedding and similarity techniques are explored and evaluated to develop a method to match the sentences of one version of the policy to another version of a policy. The annotation of the sentences are used to increase the efficiency of the matching process. Methods to detect changes between two matched sentences through analysis of the structure of sentences are then implemented. We combined the developed methods for annotation of policies, matching the sentences between two versions of a policy and detecting change between sentences to realize the proposed tool.

The research work not only shows the potential of machine learning and natural language processing as an important tool for privacy engineering but also introduces various techniques that can be utilized for any natural language document.

Publication Statement

Copyright is held by the author. User is responsible for all copyright compliance.

Provenance

Received from ProQuest

Rights holder

Andrick Adhikari

File size

116 p.

File format

application/pdf

Language

en

Discipline

Computer science, Artificial intelligence

Share

COinS