The goal of the project is to write a relatively efficient SAT solver in Haskell that inputs satisfiability questions and outputs an answer. The choice of what precise algorithms and data structures to use is up to you, although there are some suggestions below. You can work either individually or with a partner, and you will be expected to give a brief oral presentation (~5 minutes) describing your experience writing the solver, explaining the design choices that you made as well as what kind of evaluation you performed.
Your SAT solver should read in a file containing a CNF formula represented in the DIMACS format, which is a primitive file format supported by many SAT solvers. Here is an example of a DIMACS CNF file:
c This is a random 3-CNF formula with 5 variables and 4 clauses.
c It is satisfiable.
c
p cnf 5 4
4 -3 -5 0
-3 -1 -2 0
1 -4 -5 0
-5 3 4 0
The first three lines beginning with “c” are comments. The next line beginning with “p” is a header describing the format of the formula (cnf), the number of variables (5) and the number of clauses (4). Finally, the remaining lines give the clauses, each indicated by a list of positive or negative literals terminated by a 0. Thus the example above corresponds to the following formula in standard logical notation:
(x₄ ∨ ¬x₃ ∨ ¬x₅) ∧ (¬x₃ ∨ ¬x₁ ∨ ¬x₂) ∧ (x₁ ∨ ¬x₄ ∨ ¬x₅) ∧ (¬x₅ ∨ x₃ ∨ x₄)
The output of your SAT solver should be a single line “SAT” or “UNSAT” indicating whether or not the formula is satisfiable, followed by a line with a satisfying assignment in the former case. For instance, running on the above example, one possible output is:
SAT
1 -2 -3 4 -5
corresponding to the satisfying assignment
[ (x₁, ⊤), (x₂, ⊥), (x₃, ⊥), (x₄, ⊤), (x₅, ⊥) ]
or equivalently to the conjunction of literals:
x₁ ∧ ¬x₂ ∧ ¬x₃ ∧ x₄ ∧ ¬x₅
A formula may have more than one satisfying assignment, and in that case it is okay for your SAT solver to return any of them. For example, another possible output for the above input is:
SAT
-1 -2 3 -4 -5
To simplify your task, the file Project.zip already contains some Haskell code that you can use as a template:
main
routine that reads in a formula in DIMACS CNF format
from a file specified on the command line and returns the output of the
naive solver.Note that the DIMACS parsing routines make use of the Parsec monadic
parser combinator library. To install this library on your machine, run
cabal install --lib parsec
.
Additionally, the subdirectory Tests
contains a few test
formulas in DIMACS format:
Running ghc -O2 MySat.hs
from within the directory where
you unzipped the archive (assuming you’ve installed the Parsec library)
will produce an executable called MySat
. This is already a
working SAT solver, albeit a very inefficient one! You can try invoking
it like so:
$ ./MySat Tests/random5.cnf
SAT
-1 -2 -3 -4 -5
You should feel free to modify any of the template code, and
certainly you should replace the call to the naive solver in
MySat.hs
by a call to your own solver.
You are also encouraged to create additional tests (see below for more advice about that).
The code in the template is distributed across different files,
making use of Haskell’s
module system. Be aware that Haskell’s module system relies on the
underlying directory structure, and module names have to match their
file name and location. For example, the module CNF
is
defined in the file CNF.hs
, and the module
Solver.Naive
is defined in the file
Solver/Naive.hs
. A module declaration can optionally give a
list of definitions to export. For example, the file
Solver/Naive.hs
begins with the line
module Solver.Naive (solution) where
which says to only export the function
solution :: CNF -> Maybe Subst
to the outside world (i.e., to somebody who does an
import Solver.Naive
). On the other hand, the file
CNF.hs
begins with the line
module CNF where
which says to export all of the definitions in the module.
You are encouraged to write your code as modularly as possible. For
example, you can try implementing different SAT solving algorithms as
different modules under the Solver
directory, which might
make use of utility functions defined in a separate module. You can also
write separate modules for testing your SAT solver and for comparing
different algorithms.
The project is worth 40% of your course grade.
You are allowed to work with a partner. In that case, both partners must be able to explain the code that you submit, and you will give a joint oral presentation. If you want to work with a partner, you must let me know via email by Monday, 18 October.
The deadline for submitting the project is 31 Oct at
23h59. Submission should take the form of a single archive file
Project.zip
uploaded via Moodle. (If you are working with a
partner, it is enough for one of you to submit your archive
file.)
It goes without saying that you should submit working Haskell
code. In particular, it should be possible to compile your code by
running ghc -O2 MySat.hs
on the lab machines. You can feel
free to import anything from the standard library,
but if you want to use other external libraries please write me first to
check this is okay.
The global structure of your code should be described in a
README.md
file included in your zip archive, and if you are
using any libraries, the installation instructions should also be
described.
Most SAT solving algorithms are based on some form of backtracking. That means that they gradually build up a partial assignment for some of the variables in a formula, until either the whole formula is satisfied or there is no way to make progress, and hence the formula is unsatisfiable. The classic DPLL algorithm is based on backtracking, as is the CDCL algorithm, both of which form the foundation for many modern SAT solvers.
To understand how these approaches work, it is worth trying to first implement the simple algorithm below, which is not as efficient as DPLL or CDCL but illustrates the basic idea of SAT solving via backtracking. It relies on the operation of conditioning a formula by a literal: given a formula \(`F`\) and a literal \(`ℓ`\), the formula \(`F|ℓ`\) (read “\(`F`\) given \(`ℓ`\)”) is defined by removing all clauses that contain \(`ℓ`\) from \(`F`\), and eliminating the negation of \(`ℓ`\) from remaining clauses. For example, if
F = (x_1 ∨ ¬x_2) ∧ (x_2 ∨ x_3) ∧ (¬x_1 ∨ ¬x_3) ∧ (¬x_1 ∨ ¬x_2 ∨ x_3)
then
F∣x_1 = (x_2 ∨ x_3) ∧ ¬x_3 ∧ (¬x_2 ∨ x_3)
while
F∣¬x_1 = ¬x_2 ∧ (x_2 ∨ x_3)
Now, for any CNF formula \(`F`\), we proceed as follows to construct a satisfying assignment \(`ρ`\):
If \(`F`\) is the empty conjunction then it is trivially satisfiable, so our search succeeds and we return the empty assignment \(`ρ = []`\).
Otherwise \(`F`\) must contain at least one clause. If it contains an empty clause then it is trivially unsatisfiable, so we fail.
Otherwise \(`F`\) must contain at least one clause with at least one literal \(`ℓ`\). We try to construct a satisfying assignment \(`ρ`\) for \(`F|ℓ`\), and if that succeeds then we return the extended assignment \(`ℓ:ρ`\). Otherwise, we try to find a satisfying assignment \(`ρ`\) for \(`F|¬ℓ`\), and if that succeeds then we return the extended assignment \(`¬ℓ:ρ`\). If that fails, we fail.
The procedure may either fail or return a partial assignment \(`ρ`\) of values for some of the variables of \(`F`\). In that case, by construction, \(`F`\) is satisfied under any extension of \(`ρ`\) with arbitrary values for the remaining variables.
As a first step, you can implement the above algorithm in Haskell, either as a recursive function
solve :: [Cls] -> Maybe Subst
that attempts to construct a single partial satisfying assignment for a list of clauses, or as a recursive function
solve :: [Cls] -> [Subst]
that lazily constructs all partial satisfying assignments for a list of clauses. You may first want to write a helper function
condition :: Lit -> [Cls] -> [Cls]
that conditions a list of clauses by a literal.
Once you’re done, package everything up into a module that exports a function
solution :: CNF -> Maybe Subst
invoked from MySat.hs
, and compare it with the first
version of the solver using Solver.Naive
. (Attention: make
sure that solution
returns a full substitution for
all the variables of the CNF formula, and not just a partial
substitution. It should be an invariant maintained by solve
that any partial substitution it returns can be extended to a full
substitution by assigning the missing variables arbitrary values. By the
way, just to be clear I am using the words “substitution” and
“assignment” interchangeably.)
There are many potential ways of improving the performance of your basic backtracking SAT solver:
One optimization performed by the DPLL algorithm is called unit propagation: if \(`F`\) has a clause \(`C = ℓ`\) with just a single literal (called a unit clause), then the formula may be reduced eagerly to \(`F∣ℓ`\), with no need to consider \(`F∣¬ℓ`\). This optimization is very important in practice.
Another optimization performed by DPLL is called pure literal elimination: if the negation of a literal \(`ℓ`\) does not occur in any clauses of \(`F`\), then the formula may be reduced eagerly to \(`F∣ℓ`\), again with no need to consider \(`F∣¬ℓ`\). This optimization seems to be less important in practice.
Practical implementations of DPLL use special data structures for quickly identifying unit clauses. A literal \(`ℓ`\) is said to be active if it is consistent with the current substitution \(`ρ`\), i.e., the underlying variable of \(`ℓ`\) is either unmentioned in \(`ρ`\) or assigned a value equal to the polarity of \(`ℓ`\). One possible way of identifying unit clauses is to keep a count of the number of currently active literals in each clause. Another approach adopted by many practical SAT solvers is called watched literals: each clause has a “watch” on up to two active literals, and each literal is associated with a list of clauses that watch it. If a literal \(`ℓ`\) is falsified during backtracking search, only the clauses \(`C`\) that watch it have to be updated by finding another active literal in \(`C`\) to replace it if possible — and if it is not possible, it means \(`C`\) is now either a unit clause or false.
You could try different heuristics for picking a literal to branch on, like those explained in the Wikipedia article Boolean satisfiability algorithm heuristics.
You could implement clause learning.
You could perform some kind of pre-processing on the input. For example, the subsumption rule says that if every literal in a clause \(`C`\) appears also in another clause \(`C'`\), then \(`C'`\) can be removed. (You can find more discussion in this paper by Eén and Biere, as well as in the Knuth reference listed in the Bibliography section below.) Another kind of pre-processing is conversion to 3-CNF, which you could potentially use to simplify the design of your solver.
From the perspective of Haskell, you could try experimenting with impure functional programming features such as IORef, or you could try using efficient purely functional data structures implemented in the standard library such as Data.Map.
You might get some more ideas by reading the Haskell wiki articles on performance and performance in GHC. Chapter 25 of Real World Haskell also gives some advice on profiling and optimization.
Be warned that improving the performance of your SAT solver is not easy, and even harder while maintaining correctness! Make sure to separate different approaches into different modules so that you can compare them, and keep them for reference even if they do not necessarily improve the performance of your solver. (A revision control system such as git or svn could eventually save you from losing your work.)
I am interested in seeing the different approaches you tried and hearing about the evaluation you performed, even if in the end you do not succeed in vastly improving your basic backtracking SAT solver.
You should test your solver(s) on both satisfiable and unsatisfiable formulas. Some valuable optimizations may help your solver to more quickly find a satisfying assignment, but will not necessarily be useful for unsatisfiable formulas. Of course you should only implement sound optimizations, in the sense that your solver should only return correct satisfying assignments, and should never say that a formula is unsatisfiable when it has a satisfying assignment.
In addition to the .cnf files already provided in the Tests directory
(and others freely available online, for example on John
Burkardt’s website), you can try generating your own formulas and
turning them into .cnf files using the routine
writeCNFtoDIMACS
from CNF.DIMACS
. For example,
you can generate natural CNF formulas from different constraint
satisfaction problems, such as graph-coloring or Sudoku. To give
yourself the extra flexibility of arbitrary boolean logic formulas, you
can try implementing the Tseitin
translation to convert to CNF without incurring an exponential
blowup.
Another way of generating hard satisfiability problems is by generating random CNF formulas with just the right amount of variables and clauses. For example, experimentally, it seems that a 3-CNF formula with \(`n`\) variables and \(`m`\) clauses is likely to be satisfiable if \(`m/n < α_3`\) and likely to be unsatisfiable if \(`m/n > α_3`\), where \(`α_3 ≈ 4.25 ± 0.05`\). Random 3-CNF formulas with a \(`m/n`\) ratio right around \(`α_3`\) are typically hard to decide for a SAT solver, whereas formulas with a ratio far from \(`α_3`\) in either direction are typically easy to decide. (More generally, from experimental evidence it seems there is a threshold constant \(`α_k`\) for k-CNF formulas, although this has not been proved formally except in the case \(`k = 2`\) where \(`α_2 = 1`\).)
Make sure to include any .cnf files that you created yourself in your archive submission, as well as any code you wrote to generate them.
You can try downloading one or more of the following high-quality, open source SAT solvers:
All of these accept problems in DIMACS CNF format, so you can compare them to your solver (and don’t be discouraged when your solver is likely to be outperformed!).
Various references that may be useful:
Wikipedia: Boolean satisfiability problem
Wikipedia: DPLL
Wikipedia: CDCL
Volume 4, Fascicle 6, “Satisfiability” of Donald E. Knuth’s The Art of Computer Programming (link)
Formalization and Implementation of Modern SAT Solvers by Filip Marić
Section 2.5.7 of Program = Proof by Samuel Mimram