|
In order to provide some illustrations of the use of the IBM CNF Benchmark (from http://www.haifa.il.ibm.com/projects/verification/RB_Homepage/bmcbenchmarks.html), we ran zChaff and berkmin561 on it. We used the default configuration for both engines.
For each model, we used SAT_dat.k.cnf with k=1, 10, 15, 20, 25, 30, 35, 40, 45, 50. The time-out was set at 10 000 sec on a workstation with 867841X Intel(R) Xeon(TM) CPU 2.40GHz.
Here is a summary of the results we achieved:
| |
zchaff |
berkmin561 |
| Total Time (10000 sec timeouts) |
344765 |
414094 |
| First (# of CNF where the engine is the fastest) |
298 |
131 |
| Timeout Number |
25 |
30 |
| + (# of CNF where the engine is the fastest by more than a minute and 20%) |
67 |
32 |
| First by Model (# of models where the engine is the fastest) |
28 |
18 |
For complete details, see the attached ziped Excel file.
The CNFs are available on line; here is some information about their sizes.
The results shows that zChaff and Berkmin561 achieved close results. On some CNFs, zChaff runs faster, and on others, CNFs Berkmin561 runs faster. In most cases, the differences between their performance is not very significant (the faster speed is not faster by more than one minute or 20%). However, while zChaff seems to perform slightly better overall, Berkmin561 gets better results than zChaff on the UNSAT CNFs.
From these results, it is not possible to conclude that berkmin561 performs better than zChaff. This is not consistent with what can be read in the literature. The explanation could be in hardware differences (e.g., cache size) or in the benchmark composition (as we saw, the ranking between zChaff and Berkmin561 is very data dependant).
We believe that this experiment clearly shows the added value of using the IBM Benchmark for SAT tool evaluation.
We extended our comparison to siege_v4 (with 123456789 as a seed). Here is a summary of the results achieved:
| |
Total Time (including 10000sec timeout) |
#timeout |
| zChaff |
345000 sec |
25 |
| Berkmin561 |
414000 sec |
30 |
| siege_v4 |
187000 sec |
14 |
Siege_v4 is the fastest in 298 cases out of 498.
In many hard cases, siege_v4 is fastest by an order of magnitude or more. In some cases, siege_v4 performs worse than zchaff or berkmin in a significant way (e.g., for 26_rule, siege_v4 is slower than zchaff by an order of magnitude and slower than berkmin by two orders of magnitude).
For complete details, see this zipped Excel file.
We made some experiments with zChaff II (zChaff 2004.5.13) on the IBM public benchmark.
Here are some results (zChaff II code was left unchanged, so it used random seeds)
| |
Total Time (in seconds) |
| zChaff |
344 764 |
| berkmin561 |
414 035 |
| siege_v4 |
197 239 |
| zChaff II |
389 304 |
zChaff II is faster than:
- zChaff for 229 CNFs (out of 442)
- Berkmin561 for 208 CNFs (out of 442)
- Siege_v4 for 82 CNFs (out of 442)
The details of the results are available in a zipped Excel file.
All in all, it is difficult to conclude that zChaff II performs better than zChaff. Besides, Siege_v4 performs significantly better than zChaff II on the IBM CNF Benchmark.
|