PhD Seminar Notice: A Novel Framework of Board-Level Failure Localization in Optical Transport Networks

Monday, May 27, 2024 9:00 am - 10:00 am EDT (GMT -04:00)

Candidate: Yan Jiao

Title: A Novel Framework of Board-Level Failure Localization in Optical Transport Networks

Date: May 27, 2024

Time: 9:00 AM


Supervisor(s): Ho, Pin-Han


Optical transport networks (OTNs) serve as a pivotal role in Internet backbones thanks to their support for multi-tenant and multi-service environments with high reliability and low cost. A failure event may affect one or multiple boards in OTN that ignite a vast number of alarms, which significantly boosts the complexity of failure localization and alarm analysis. Accordingly, there is an urgent need for a systematic framework that harnesses the known network state and received alarms to achieve effective failure localization.

Alarm correlation has been considered as a representative approach to identifying the dependencies among alarms, aiming at eliminating as many descendent alarms as possible, thereby fulfilling failure localization with much decreased complexity. Nevertheless, existing methods of alarm correlation are subject to the following issues. Firstly, they ignore the fact that alarm propagation mostly takes place along certain connections and that the network topology and traffic distribution may solidly underpin the required alarm correlation process. Secondly, they necessitate heuristically setting initial parameters but lack a general rule that adjusts their values according to various network characters. Lastly, they are deficient in generality to versatile network environments, where the obtained result grounded in a specific network state may not be migrated to another.

Enlightened by its significance and stringent requirements, this thesis proposes a novel framework of board-level failure localization in OTN, called Failure-Alarm Correlation Tree based Failure Localization (FACT-FL), where one or multiple FACTs are the expected output. Foremost, we put forward the concept of FACT that takes a failed board and its associated alarms as the tree root and leaves, respectively. Then three methodologies are designed to implement FACT-FL. A scheme named FACT-FL-Heuristic is firstly attempted via a learned binary classifier that intelligently captures the historical correlations in the form of board → alarm and alarm → alarm, followed by heuristically creating the feasible FACT(s). To further improve FACT-FL-Heuristic's performance, a method termed FACT-FL-Chain treats each FACT as a suite of correlation chains with different order values and generates viable FACT(s) by elegantly solving an integer linear programming (ILP) problem. Moreover, to reduce the computational complexity incurred by enumerating all chain candidates with FACT-FL-Chain, an approach dubbed FACT-FL-GNN leverages graph neural network (GNN) for evaluating the edge weights of potential FACT(s), which facilitates formulating an alternative simplified ILP to yield the most likely FACT(s). Extensive case studies are conducted to unveil the proposed methods' advantage over their counterparts in terms of the metrics assessing the recognized failed boards/root alarms. We also explore their performance in volatile environmental variations such as diverse failure scenarios, network topologies, traffic distributions, and noise alarms.