Haibo
Bian,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Recently, network infiltrations due to \textit{advanced persistent threats (APTs)} have grown significantly, resulting in considerable loses to businesses and organizations. APTs are stealthy attacks with the primary objective of gaining unauthorized access to network assets. They often remain dormant for an extended period of time, which makes their detection challenging.
In this thesis, we leverage \textit{machine learning (ML)} to detect hosts in a network that are a target of an APT attack. We evaluate a number of ML classifiers to detect susceptible hosts in the Los Alamos National Lab (LANL) dataset. We explore (i) graph-based features extracted from multiple data sources, i.e., network flows and host authentication logs, (ii) feature engineering to reduce dimensionality, and (iii) balancing the training dataset using numerous over- and under-sampling techniques. Finally, we compare our model to the state-of-the-art approaches that leverage the same dataset and show that our model outperforms them with respect to prediction performance and overhead.