CSE301 Project: SAT solver

Summary

The goal of the project is to write a relatively efficient SAT solver in Haskell that inputs satisfiability questions and outputs an answer. The choice of what precise algorithms and data structures to use is up to you, although there are some suggestions below. You can work either individually or with a partner, and you will be expected to give a brief oral presentation (~5 minutes) describing your experience writing the solver, explaining the design choices that you made as well as what kind of evaluation you performed.

Top-level interface

Your SAT solver should read in a file containing a CNF formula represented in the DIMACS format, which is a primitive file format supported by many SAT solvers. Here is an example of a DIMACS CNF file:

c This is a random 3-CNF formula with 5 variables and 4 clauses.
c It is satisfiable.
c 
p cnf 5 4
4 -3 -5 0
-3 -1 -2 0
1 -4 -5 0
-5 3 4 0

The first three lines beginning with “c” are comments. The next line beginning with “p” is a header describing the format of the formula (cnf), the number of variables (5) and the number of clauses (4). Finally, the remaining lines give the clauses, each indicated by a list of positive or negative literals terminated by a 0. Thus the example above corresponds to the following formula in standard logical notation:

(x₄ ∨ ¬x₃ ∨ ¬x₅) ∧ (¬x₃ ∨ ¬x₁ ∨ ¬x₂) ∧ (x₁ ∨ ¬x₄ ∨ ¬x₅) ∧ (¬x₅ ∨ x₃ ∨ x₄)

The output of your SAT solver should be a single line “SAT” or “UNSAT” indicating whether or not the formula is satisfiable, followed by a line with a satisfying assignment in the former case. For instance, running on the above example, one possible output is:

SAT
1 -2 -3 4 -5

corresponding to the satisfying assignment

[ (x₁, ⊤), (x₂, ⊥), (x₃, ⊥), (x₄, ⊤), (x₅, ⊥) ]

or equivalently to the conjunction of literals:

x₁ ∧ ¬x₂ ∧ ¬x₃ ∧ x₄ ∧ ¬x₅

A formula may have more than one satisfying assignment, and in that case it is okay for your SAT solver to return any of them. For example, another possible output for the above input is:

SAT
-1 -2 3 -4 -5

Template code

To simplify your task, the file Project.zip already contains some Haskell code that you can use as a template:

Note that the DIMACS parsing routines make use of the Parsec monadic parser combinator library. To install this library on your machine, run cabal install --lib parsec.

Additionally, the subdirectory Tests contains a few test formulas in DIMACS format:

Running ghc -O2 MySat.hs from within the directory where you unzipped the archive (assuming you’ve installed the Parsec library) will produce an executable called MySat. This is already a working SAT solver, albeit a very inefficient one! You can try invoking it like so:

$ ./MySat Tests/random5.cnf
SAT
-1 -2 -3 -4 -5

You should feel free to modify any of the template code, and certainly you should replace the call to the naive solver in MySat.hs by a call to your own solver.

You are also encouraged to create additional tests (see below for more advice about that).

Module system

The code in the template is distributed across different files, making use of Haskell’s module system. Be aware that Haskell’s module system relies on the underlying directory structure, and module names have to match their file name and location. For example, the module CNF is defined in the file CNF.hs, and the module Solver.Naive is defined in the file Solver/Naive.hs. A module declaration can optionally give a list of definitions to export. For example, the file Solver/Naive.hs begins with the line

module Solver.Naive (solution) where

which says to only export the function

solution :: CNF -> Maybe Subst

to the outside world (i.e., to somebody who does an import Solver.Naive). On the other hand, the file CNF.hs begins with the line

module CNF where

which says to export all of the definitions in the module.

You are encouraged to write your code as modularly as possible. For example, you can try implementing different SAT solving algorithms as different modules under the Solver directory, which might make use of utility functions defined in a separate module. You can also write separate modules for testing your SAT solver and for comparing different algorithms.

Administrative and technical notes

Advice on writing your SAT solver

Basic backtracking

Most SAT solving algorithms are based on some form of backtracking. That means that they gradually build up a partial assignment for some of the variables in a formula, until either the whole formula is satisfied or there is no way to make progress, and hence the formula is unsatisfiable. The classic DPLL algorithm is based on backtracking, as is the CDCL algorithm, both of which form the foundation for many modern SAT solvers.

To understand how these approaches work, it is worth trying to first implement the simple algorithm below, which is not as efficient as DPLL or CDCL but illustrates the basic idea of SAT solving via backtracking. It relies on the operation of conditioning a formula by a literal: given a formula \(`F`\) and a literal \(`ℓ`\), the formula \(`F|ℓ`\) (read “\(`F`\) given \(`ℓ`\)”) is defined by removing all clauses that contain \(`ℓ`\) from \(`F`\), and eliminating the negation of \(`ℓ`\) from remaining clauses. For example, if

F = (x_1 ∨ ¬x_2) ∧ (x_2 ∨ x_3) ∧ (¬x_1 ∨ ¬x_3) ∧ (¬x_1 ∨ ¬x_2 ∨ x_3)

then

F∣x_1 = (x_2 ∨ x_3) ∧ ¬x_3 ∧ (¬x_2 ∨ x_3)

while

F∣¬x_1 = ¬x_2 ∧ (x_2 ∨ x_3)

Now, for any CNF formula \(`F`\), we proceed as follows to construct a satisfying assignment \(`ρ`\):

  1. If \(`F`\) is the empty conjunction then it is trivially satisfiable, so our search succeeds and we return the empty assignment \(`ρ = []`\).

  2. Otherwise \(`F`\) must contain at least one clause. If it contains an empty clause then it is trivially unsatisfiable, so we fail.

  3. Otherwise \(`F`\) must contain at least one clause with at least one literal \(`ℓ`\). We try to construct a satisfying assignment \(`ρ`\) for \(`F|ℓ`\), and if that succeeds then we return the extended assignment \(`ℓ:ρ`\). Otherwise, we try to find a satisfying assignment \(`ρ`\) for \(`F|¬ℓ`\), and if that succeeds then we return the extended assignment \(`¬ℓ:ρ`\). If that fails, we fail.

The procedure may either fail or return a partial assignment \(`ρ`\) of values for some of the variables of \(`F`\). In that case, by construction, \(`F`\) is satisfied under any extension of \(`ρ`\) with arbitrary values for the remaining variables.

As a first step, you can implement the above algorithm in Haskell, either as a recursive function

solve :: [Cls] -> Maybe Subst

that attempts to construct a single partial satisfying assignment for a list of clauses, or as a recursive function

solve :: [Cls] -> [Subst]

that lazily constructs all partial satisfying assignments for a list of clauses. You may first want to write a helper function

condition :: Lit -> [Cls] -> [Cls]

that conditions a list of clauses by a literal.

Once you’re done, package everything up into a module that exports a function

solution :: CNF -> Maybe Subst

invoked from MySat.hs, and compare it with the first version of the solver using Solver.Naive. (Attention: make sure that solution returns a full substitution for all the variables of the CNF formula, and not just a partial substitution. It should be an invariant maintained by solve that any partial substitution it returns can be extended to a full substitution by assigning the missing variables arbitrary values. By the way, just to be clear I am using the words “substitution” and “assignment” interchangeably.)

Improvements

There are many potential ways of improving the performance of your basic backtracking SAT solver:

Be warned that improving the performance of your SAT solver is not easy, and even harder while maintaining correctness! Make sure to separate different approaches into different modules so that you can compare them, and keep them for reference even if they do not necessarily improve the performance of your solver. (A revision control system such as git or svn could eventually save you from losing your work.)

I am interested in seeing the different approaches you tried and hearing about the evaluation you performed, even if in the end you do not succeed in vastly improving your basic backtracking SAT solver.

Tests

You should test your solver(s) on both satisfiable and unsatisfiable formulas. Some valuable optimizations may help your solver to more quickly find a satisfying assignment, but will not necessarily be useful for unsatisfiable formulas. Of course you should only implement sound optimizations, in the sense that your solver should only return correct satisfying assignments, and should never say that a formula is unsatisfiable when it has a satisfying assignment.

In addition to the .cnf files already provided in the Tests directory (and others freely available online, for example on John Burkardt’s website), you can try generating your own formulas and turning them into .cnf files using the routine writeCNFtoDIMACS from CNF.DIMACS. For example, you can generate natural CNF formulas from different constraint satisfaction problems, such as graph-coloring or Sudoku. To give yourself the extra flexibility of arbitrary boolean logic formulas, you can try implementing the Tseitin translation to convert to CNF without incurring an exponential blowup.

Another way of generating hard satisfiability problems is by generating random CNF formulas with just the right amount of variables and clauses. For example, experimentally, it seems that a 3-CNF formula with \(`n`\) variables and \(`m`\) clauses is likely to be satisfiable if \(`m/n < α_3`\) and likely to be unsatisfiable if \(`m/n > α_3`\), where \(`α_3 ≈ 4.25 ± 0.05`\). Random 3-CNF formulas with a \(`m/n`\) ratio right around \(`α_3`\) are typically hard to decide for a SAT solver, whereas formulas with a ratio far from \(`α_3`\) in either direction are typically easy to decide. (More generally, from experimental evidence it seems there is a threshold constant \(`α_k`\) for k-CNF formulas, although this has not been proved formally except in the case \(`k = 2`\) where \(`α_2 = 1`\).)

Make sure to include any .cnf files that you created yourself in your archive submission, as well as any code you wrote to generate them.

Comparison

You can try downloading one or more of the following high-quality, open source SAT solvers:

All of these accept problems in DIMACS CNF format, so you can compare them to your solver (and don’t be discouraged when your solver is likely to be outperformed!).

Bibliography

Various references that may be useful: