MASc Seminar Notice: AlphaSMT: A Reinforcement Learning Guided SMT Solver

Wednesday, May 17, 2023 1:00 pm - 1:00 pm EDT (GMT -04:00)

Candidate: John Lu
Date: May 17, 2023
Time: 1:00 PM
Location: EIT 3145
Supervisor: Vijay Ganesh

 

Abstract:

Satisfiability Modulo Theories (SMT) solvers are programs that decide whether a first order logic formula is satisfiable. Over the last two decades these solvers have become central to many methods and tools in fields as diverse as software engineering, verification, security, and Artificial Intelligence. Most modern SMT solvers usually provide user controllable strategies, i.e., users can customize a solving algorithm for their own problems by combining various tactics. Each tactic is a reasoning step, which either simplifies, transforms, or solves the given input SMT formula. Such flexibility increases the capacity of the solver’s power; however, finding a good strategy for a specialized kind of formulas is challenging even for experts. 

In this thesis we present a novel class of reinforcement-learning (RL) guided methods, implemented in the Z3 SMT solver and that we refer to as AlphaSMT, which adaptively constructs the expected best strategy for any given input SMT formula. Briefly, the AlphaSMT RL framework combines deep Monte-Carlo Tree Search (MCTS) and logical reasoning in a unique way in order to enable the RL agent to learn the best sequence of tactics for a given class of formulas. In more details, in the AlphaSMT framework a deep neural network serves as both the value function and the policy, evaluating state-action pairs and making the tactic decision. The neural network is trained towards the optimal policy by learning from sampling solving episodes. Each sampling episode uses MCTS as a lookahead planning step. 

We evaluated the performance of AlphaSMT on benchmark sets from two SMT logics, namely, quantifier-free non-linear real arithmetic (QF_NRA) and quantifier-free non-linear integer arithmetic (QF_NIA). In all the experiments we performed, AlphaSMT solves substantially more instances than its target solver, Z3. The improvement ranges from 10% to34% more instances solved with lower average run time. Evaluation results also show that reasonably longer tactic timeout helps solve more instances and a pre-solver contributes significantly to the speedup.

All are welcome.