The goal of this exercise is to solve the satisfiability problem for boolean formulas, which consists in determining whether a propositional formula is *satisfiable*, i.e. is true under some valuation.

Open a new file `sat.ml`

.

We define a type

`type var = int`

for variables. A *propositional formula* is either

- a variable,
- a conjunction or a disjunction of formulas,
- a negation of a formula
- true or false.

Define a type `formula`

for formulas (the constructors should be named `Var`

, `And`

, `Or`

, `Not`

, `True`

and `False`

).

Define a *substitution* procedure `subst`

of type `var -> formula -> formula -> formula`

which replaces a given variable \(X\) by a formula \(B\) in a formula \(A\). In mathematical notation, the resulting formula is often noted \(A[B/X]\).

Define a function `free_var`

of type `formula -> var`

which returns an arbitrary *free variable* of the formula (a variable which occurs in the formula). In the case where there is no such variable, the exception `Not_found`

will be raised.

Define a function `eval`

of type `formula -> bool`

which evaluates a *closed formula* (a formula without free variables).

We admit that a formula \(A\) with a free variable \(X\) is satisfiable if and only if either \(A[\top/X]\) or \(A[\bot/X]\) is satisfiable: trying both cases is called *splitting* on \(X\).

Define a function `sat`

of type `formula -> bool`

which determines whether a formula is satisfiable.

You can test it with

```
let () =
let x = Var 0 in
let x'= Not x in
let y = Var 1 in
let y'= Not y in
let a = And (And(Or (x,y), Or (x',y)), Or (x',y')) in
let b = And (And(Or (x,y), Or (x',y)), And(Or (x,y'), Or (x',y'))) in
assert (sat a);
assert (not (sat b))
```

and add your own tests! The `assert`

function will raise an exception if its argument is `false`

(try to `assert false`

).

In practice, the above procedure is very slow and could be improved. For instance, on the formula \[X\land\lnot X\land X_1\land\ldots\land X_n\] it will try all the possible valuations for \(X_1,\ldots,X_n\), whereas it is easy to see that it cannot be satisfied for \(X=\top\) and \(X=\bot\), independently of the values of the other variables.

We now implement the DPLL algorithm on which are based most modern SAT solvers. It works much faster in practice, although the problem is NP-complete. We admit that every boolean formula can be put into a *conjunctive normal form* (or *cnf*):

- a
*literal*is either a variable or the negation of a variable, - a
*clause*is a disjunction of literals, - a
*conjunctive normal form*is a conjunction of clauses.

and we aim at determining the satisfiability of such formulas. They can be represented by the following types:

```
type var = int
type literal = bool * var (* false means negated *)
type clause = literal list
type cnf = clause list
```

Note that the cnf `[]`

corresponds to true whereas the clause `[]`

corresponds to false.

We will be using standard operations on lists. They are part of the List module of the standard library, but we shall recode them here as an exercise.

Define a function `list_mem`

of type `'a -> 'a list -> bool`

which determines whether an element is present in a list. Test it with

```
let () =
assert (list_mem 2 [1;2;3]);
assert (not (list_mem 5 [1;2;3]));
assert (not (list_mem 1 []))
```

Define a function `list_map`

of type `('a -> 'b) -> 'a list -> 'b list`

which applies a function to every element of a list. Test it with

```
let () =
assert (list_map (fun x -> 2*x) [1;2;3] = [2;4;6]);
assert (list_map (fun _ -> ()) [] = [])
```

Define a function `list_filter`

of type `('a -> bool) -> 'a list -> 'a list`

which, in a list, keeps only the elements satisfying a given predicate.

Observe that, given a variable \(X\) and a formula in cnf \[A=C_1\land\ldots\land C_n\] where the \(C_i\) are the clauses, a cnf for the formula \(A[\top/X]\) can be obtained from \(A\) by

- removing all the clauses \(C_i\) containing \(X\),
- removing \(\lnot X\) from all the remaining clauses.

and similarly for \(A[\bot/X]\). Can you see why?

Using the functions on lists coded above, implement a *substitution* function `subst_cnf`

of type `var -> bool -> cnf -> cnf`

which replaces a variable \(X\) by either \(\top\) or \(\bot\) in a formula in cnf.

Implement again the function `dpll`

of type `cnf -> bool`

, using splitting as in the first part, but noting that

- the cnf
`[]`

is true, - a cnf containing the clause
`[]`

is false

(in particular, a cnf without free variables is in one of those two cases). This should allow to conclude that a formula is not satisfiable without giving a value to every variable. You can use the function `List.hd`

of type `'a list -> 'a`

which returns the first element of a list (or raises `Not_found`

if the list is empty). We also recall that you can extract the first and second components of a pair with the functions `fst`

and `snd`

.

Test your function with the same example as previously:

```
let () =
let x = true, 0 in
let x'= false,0 in
let y = true, 1 in
let y'= false,1 in
let a = [[x;y]; [x';y]; [x';y']] in
let b = [[x;y]; [x';y]; [x;y']; [x';y']] in
assert (dpll a);
assert (not (dpll b))
```

The algorithm can be further improved by carefully choosing the variables on which we split. In particular, we should eliminate first the variables for which we will easily be able to conclude (at least one branch of the splitting).

A clause \(C_i\) is *unitary* when it is reduced to a literal. If \(C_i=X\), we know that \(X\) must be true in order for the formula to be satisfied (the case \(X=\bot\) will immediately fail), and similarly in the case where \(C_i=\lnot X\).

Program a function `unit`

of type `cnf -> literal`

which finds a unitary clause in a formula (or raises `Not_found`

if there is none).

Test it with

```
let () =
let x = true, 0 in
let y = true, 1 in
let y'= false,1 in
assert (unit [[x;y]; [x]; [y;y']] = x)
```

A literal \(X\) (resp. \(\lnot X\)) is *pure* in a formula \(A\) if \(\lnot X\) (resp. \(X\)) does not occur in any clause of \(A\) (i.e. the variable always occurs with the same polarity). What can we say of the satisfiability of a formula which contains a pure literal?

Program a function `pure`

of type `cnf -> literal`

which finds a pure literal in a formula (or raises `Not_found`

if there is none).

Modify the function `dpll`

so that it operates in the following way on a given cnf

- if the formula is empty then it returns
`true`

, - if the formula contains the empty clause then it returns
`false`

, - if the formula contains a unitary clause then it replaces the corresponding variable by the only possible value,
- if the formula contains a pure literal then it replaces the corresponding variable by the value which preserves satisfiability,
- otherwise, it splits on an arbitrary variable.

In the last case, we can still (hope to) improve the algorithm by carefully choosing the variables on which we split. The *MOM heuristics* (for Maximum number of Occurences in the Minimum length) consists in choosing a variable which occurs most among clauses of minimal length.

Program a function `mom`

of type `cnf -> var`

which finds such a variable (or raises `Not_found`

if the formula has no variable), and use it in `dpll`

.

Given a clause \(C\), we write \(|C|\) for the number of literals in \(C\). Given a formula \(A\) and a literal \(L\), we write \(C\in_L A\) to indicate that \(C\) is a clause of \(A\) containing \(L\) and define \[\mathcal{J}(L)=\sum_{C\in_L
A}\left(\frac12\right)^{|C|}\] The *Jeroslow-Wang heuristics* consists in splitting on the variable corresponding to a literal which maximizes \(\mathcal{J}\).

Program a function `jw`

of type `cnf -> literal -> float`

which computes the above quantity.

Program a function `jw_var`

of type `cnf -> var`

which finds a variable according to the Jeroslow-Wang heuristics, and use it in `dpll`

.

The CNF file format is a text format to encode SAT problems, of which you can find many examples on the net.

A parser for the CNF file format can be implemented as follows:

```
(** Parse a CNF file. *)
let parse f : cnf =
let load_file f =
let ic = open_in f in
let n = in_channel_length ic in
let s = Bytes.create n in
really_input ic s 0 n;
close_in ic;
Bytes.to_string s
in
let f = load_file f in
let f = String.map (function '\t' -> ' ' | c -> c) f in
let f = String.split_on_char '\n' f in
let f = List.map (String.split_on_char ' ') f in
let f = List.filter (function "c"::_ | "p"::_ -> false | _ -> true) f in
let f = List.flatten f in
let aux (a,c) = function
| "" -> (a,c)
| "0" -> (c::a,[])
| n ->
let n = int_of_string n in
let x = if n < 0 then (false,-n) else (true,n) in
(a,x::c)
in
fst (List.fold_left aux ([],[]) f)
```

You can then test the file `file.cnf`

by something like

It is advised that you compile your program in native mode, so that it is faster, by typing on a console

`ocamlopt sat.ml -o sat`

You can then run the program with

`./sat`

You can even pass a file name as argument by executing

`./sat file.cnf`

and the file name can be retrieved in OCaml with `Sys.argv.(0)`

(see the documentation).

Here are some examples of satisfiable files:

and unsatisfiable ones:

A Sudoku can be encoded as a 9×9 matrix `cells`

of integers such that `cells.(i).(j)`

\(=9\) means that the cell \((i,j)\) is empty and \(0\leq\)`cell.(i).(j)`

\(<9\) means that the cell contains the number `cells.(i).(j)`

\(+1\). For instance, the game

. . 8 6 5 . . 2 1 |
. 9 3 4 . 2 . 8 . |
5 . . . . . 3 . . |

3 8 . . 7 . 1 . 9 |
. 6 . . . . . 4 . |
2 . 9 . 1 . . 7 3 |

. . 5 . . . . . 6 |
. 1 . 3 . 9 8 2 . |
7 3 . . 2 6 1 . . |

is encoded as

```
let simple_sudoku =
[|[|9;9;7;9;8;2;4;9;9|];
[|5;4;9;3;9;1;9;9;9|];
[|9;1;0;9;7;9;2;9;9|];
[|2;7;9;9;5;9;1;9;8|];
[|9;6;9;9;9;9;9;0;9|];
[|0;9;8;9;3;9;9;6;2|];
[|9;9;4;9;0;9;6;2;9|];
[|9;9;9;2;9;8;9;1;5|];
[|9;9;5;7;1;9;0;9;9|]|]
```

Our aim is now to encode such a game as a SAT problem. We define

the function which returns a variable depending on \(0\leq i<9\), \(0\leq j<9\) and \(0\leq n<9\). In a valuation, the value of `var i j n`

is true means that the cell \((i,j)\) contains the value \(n\). We now want to generate formulas which encode the fact that a valuation gives rise to a solution to a sudoku game.

Define a function `sudoku`

of type `int array array -> cnf`

, which takes a Sudoku game as above, and returns a cnf which is satisfiable if and only if the Sudoku game is. The function must generate formulas expressing that

- there is at least one number in each cell,
- each number occurs at most once in a row,
- each number occurs at most once in a column,
- each number occurs at most once in a square,
- the solution respect the game given in argument.

It can be shown that these conditions suffice (e.g. they imply that there is at most one number in a cell).

Test it with

Here is also an example of a game of medium difficulty:

```
let medium_sudoku =
[|[|9;1;9;7;4;3;9;9;9|];
[|9;9;9;5;9;9;9;9;7|];
[|9;0;9;9;9;9;9;9;8|];
[|1;9;9;9;9;9;9;8;2|];
[|9;6;4;2;9;7;5;1;9|];
[|7;8;9;9;9;9;9;9;6|];
[|3;9;9;9;9;9;9;5;9|];
[|2;9;9;9;9;1;9;9;9|];
[|9;9;9;6;5;0;9;3;9|]|]
```

and a hard one:

```
let hard_sudoku =
[|[|9;9;9;9;4;9;8;9;9|];
[|8;7;9;9;9;9;9;4;9|];
[|9;3;4;6;9;9;2;9;9|];
[|9;9;3;1;9;9;7;6;9|];
[|9;9;9;0;9;3;9;9;9|];
[|9;6;5;9;9;8;0;9;9|];
[|9;9;7;9;9;0;5;2;9|];
[|9;1;9;9;9;9;9;7;0|];
[|9;9;6;9;2;9;9;9;9|]|]
```

Define a procedure `cnf`

of type `formula -> cnf`

which puts a formula into conjunctive normal form. It might be a good idea to first define a function of type `bool -> formula -> cnf`

which puts a formula, or its negation (depending on the boolean argument), in cnf.