C++ and the library from lingeling
Summary: A new approach, no new solutions, a nice program to play
with, and some interesting results of local non-improvability of the
known solutions. Oh, and some generally useful observations.
Using a SAT
based approach, I could completely
solve
the similar problem for 4x4 mazes with blocked cells instead of thin walls
and fixed start and exit positions at opposite corners.
So I hoped to be able to use the same ideas for this problem.
However, even though for the other problem I only used 2423 mazes
(in the meantime it has been observed that 2083 are enough) and it has
a solution of length 29, the SAT encoding used millions of variables
and solving it took days.
So I decided to change the approach in two important ways:
- Don't insist on searching a solution from scratch, but allow to fix
a part of the solution string. (That's easy to do anyway by adding unit
clauses, but my program makes it comfortable to do.)
- Don't use all mazes from the beginning. Instead, incrementally add one
unsolved maze at a time. Some mazes may be solved by chance, or they are
always solved when the ones already considered are solved. In the latter
case, it will never be added, without us needing to know the implication.
I also did some optimizations to use less variables and unit clauses.
The program is based on @orlp's.
An important change was the selection of mazes:
- First of all, mazes are given by their wall structure and the
start position only. (They also store the reachable positions.)
The function
is_solution
checks if all reachable positions are reached.
- (Unchanged: still not using mazes with only 4 or less reachable positions.
But most of them would be thrown away anyway by the following observations.)
- If a maze does not use any of the three top cells, it is equivalent to a
maze that is shifted up. So we can drop it. Likewise for a maze that does
not use any of the three left cells.
- It doesn't matter if unreachable parts are connected, so we insist that
each unreachable cell is completely surrounded by walls.
- A single path maze that is a submaze of a bigger single path maze is always
solved when the bigger one is solved, so we don't need it. Each single path
maze of size at most 7 is part of a bigger one (still fitting in 3x3), but
there are size 8 single path mazes that aren't. For simpliciy, let's just
drop single path mazes of size less than 8. (And I'm still using that only
the extreme points need to be considered as start positions.
All positions are used as exit positions, which only matters for the SAT
part of the program.)
In this way, I get a total of 10772 mazes with start positions.
Here is the program:
#include <algorithm>
#include <array>
#include <bitset>
#include <cstring>
#include <iostream>
#include <set>
#include <vector>
#include <limits>
#include <cassert>
extern "C"{
#include "lglib.h"
}
// reusing a lot of @orlp's ideas and code
enum { N = -8, W = -2, E = 2, S = 8 };
static const int encoded_pos[] = {8, 10, 12, 16, 18, 20, 24, 26, 28};
static const int wall_idx[] = {9, 11, 12, 14, 16, 17, 19, 20, 22, 24, 25, 27};
static const int move_offsets[] = { N, E, S, W };
static const uint32_t toppos = 1ull << 8 | 1ull << 10 | 1ull << 12;
static const uint32_t leftpos = 1ull << 8 | 1ull << 16 | 1ull << 24;
static const int unencoded_pos[] = {0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,0,3,
0,4,0,5,0,0,0,6,0,7,0,8};
int do_move(uint32_t walls, int pos, int move) {
int idx = pos + move / 2;
return walls & (1ull << idx) ? pos + move : pos;
}
struct Maze {
uint32_t walls, reach;
int start;
Maze(uint32_t walls=0, uint32_t reach=0, int start=0):
walls(walls),reach(reach),start(start) {}
bool is_dummy() const {
return (walls==0);
}
std::size_t size() const{
return std::bitset<32>(reach).count();
}
std::size_t simplicity() const{ // how many potential walls aren't there?
return std::bitset<32>(walls).count();
}
};
bool cmp(const Maze& a, const Maze& b){
auto asz = a.size();
auto bsz = b.size();
if (asz>bsz) return true;
if (asz<bsz) return false;
return a.simplicity()<b.simplicity();
}
uint32_t reachable(uint32_t walls) {
static int fill[9];
uint32_t reached = 0;
uint32_t reached_relevant = 0;
for (int start : encoded_pos){
if ((1ull << start) & reached) continue;
uint32_t reached_component = (1ull << start);
fill[0]=start;
int count=1;
for(int i=0; i<count; ++i)
for(int m : move_offsets) {
int newpos = do_move(walls, fill[i], m);
if (reached_component & (1ull << newpos)) continue;
reached_component |= 1ull << newpos;
fill[count++] = newpos;
}
if (count>1){
if (reached_relevant)
return 0; // more than one nonsingular component
if (!(reached_component & toppos) || !(reached_component & leftpos))
return 0; // equivalent to shifted version
if (std::bitset<32>(reached_component).count() <= 4)
return 0;
reached_relevant = reached_component;
}
reached |= reached_component;
}
return reached_relevant;
}
void enterMazes(uint32_t walls, uint32_t reached, std::vector<Maze>& mazes){
int max_deg = 0;
uint32_t ends = 0;
for (int pos : encoded_pos)
if (reached & (1ull << pos)) {
int deg = 0;
for (int m : move_offsets) {
if (pos != do_move(walls, pos, m))
++deg;
}
if (deg == 1)
ends |= 1ull << pos;
max_deg = std::max(deg, max_deg);
}
uint32_t starts = reached;
if (max_deg == 2){
if (std::bitset<32>(reached).count() <= 7)
return; // small paths are redundant
starts = ends; // need only start at extremal points
}
for (int pos : encoded_pos)
if ( starts & (1ull << pos))
mazes.emplace_back(walls, reached, pos);
}
std::vector<Maze> gen_valid_mazes() {
std::vector<Maze> mazes;
for (int maze_id = 0; maze_id < (1 << 12); maze_id++) {
uint32_t walls = 0;
for (int i = 0; i < 12; ++i)
if (maze_id & (1 << i))
walls |= 1ull << wall_idx[i];
uint32_t reached=reachable(walls);
if (!reached) continue;
enterMazes(walls, reached, mazes);
}
std::sort(mazes.begin(),mazes.end(),cmp);
return mazes;
};
bool is_solution(const std::vector<int>& moves, Maze& maze) {
int pos = maze.start;
uint32_t reached = 1ull << pos;
for (auto move : moves) {
pos = do_move(maze.walls, pos, move);
reached |= 1ull << pos;
if (reached == maze.reach) return true;
}
return false;
}
std::vector<int> str_to_moves(std::string str) {
std::vector<int> moves;
for (auto c : str) {
switch (c) {
case 'N': moves.push_back(N); break;
case 'E': moves.push_back(E); break;
case 'S': moves.push_back(S); break;
case 'W': moves.push_back(W); break;
}
}
return moves;
}
Maze unsolved(const std::vector<int>& moves, std::vector<Maze>& mazes) {
int unsolved_count = 0;
Maze problem{};
for (Maze m : mazes)
if (!is_solution(moves, m))
if(!(unsolved_count++))
problem=m;
if (unsolved_count)
std::cout << "unsolved: " << unsolved_count << "\n";
return problem;
}
LGL * lgl;
constexpr int TRUELIT = std::numeric_limits<int>::max();
constexpr int FALSELIT = -TRUELIT;
int new_var(){
static int next_var = 1;
assert(next_var<TRUELIT);
return next_var++;
}
bool lit_is_true(int lit){
int abslit = lit>0 ? lit : -lit;
bool res = (abslit==TRUELIT) || (lglderef(lgl,abslit)>0);
return lit>0 ? res : !res;
}
void unsat(){
std::cout << "Unsatisfiable!\n";
std::exit(1);
}
void clause(const std::set<int>& lits){
if (lits.find(TRUELIT) != lits.end())
return;
for (int lit : lits)
if (lits.find(-lit) != lits.end())
return;
int found=0;
for (int lit : lits)
if (lit != FALSELIT){
lgladd(lgl, lit);
found=1;
}
lgladd(lgl, 0);
if (!found)
unsat();
}
void at_most_one(const std::set<int>& lits){
if (lits.size()<2)
return;
for(auto it1=lits.cbegin(); it1!=lits.cend(); ++it1){
auto it2=it1;
++it2;
for( ; it2!=lits.cend(); ++it2)
clause( {- *it1, - *it2} );
}
}
/* Usually, lit_op(lits,sgn) creates a new variable which it returns,
and adds clauses that ensure that the variable is equivalent to the
disjunction (if sgn==1) or the conjunction (if sgn==-1) of the literals
in lits. However, if this disjunction or conjunction is constant True
or False or simplifies to a single literal, that is returned without
creating a new variable and without adding clauses. */
int lit_op(std::set<int> lits, int sgn){
if (lits.find(sgn*TRUELIT) != lits.end())
return sgn*TRUELIT;
lits.erase(sgn*FALSELIT);
if (!lits.size())
return sgn*FALSELIT;
if (lits.size()==1)
return *lits.begin();
int res=new_var();
for(int lit : lits)
clause({sgn*res,-sgn*lit});
for(int lit : lits)
lgladd(lgl,sgn*lit);
lgladd(lgl,-sgn*res);
lgladd(lgl,0);
return res;
}
int lit_or(std::set<int> lits){
return lit_op(lits,1);
}
int lit_and(std::set<int> lits){
return lit_op(lits,-1);
}
using A4 = std::array<int,4>;
void add_maze_conditions(Maze m, std::vector<A4> dirs, int len){
int mp[9][2];
int rp[9];
for(int p=0; p<9; ++p)
if((1ull << encoded_pos[p]) & m.reach)
rp[p] = mp[p][0] = encoded_pos[p]==m.start ? TRUELIT : FALSELIT;
int t=0;
for(int i=0; i<len; ++i){
std::set<int> posn {};
for(int p=0; p<9; ++p){
int ep = encoded_pos[p];
if((1ull << ep) & m.reach){
std::set<int> reach_pos {};
for(int d=0; d<4; ++d){
int np = do_move(m.walls, ep, move_offsets[d]);
reach_pos.insert( lit_and({mp[unencoded_pos[np]][t],
dirs[i][d ^ ((np==ep)?0:2)] }));
}
int pl = lit_or(reach_pos);
mp[p][!t] = pl;
rp[p] = lit_or({rp[p], pl});
posn.insert(pl);
}
}
at_most_one(posn);
t=!t;
}
for(int p=0; p<9; ++p)
if((1ull << encoded_pos[p]) & m.reach)
clause({rp[p]});
}
void usage(char* argv0){
std::cout << "usage: " << argv0 <<
" <string>\n where <string> consists of 'N', 'E', 'S', 'W' and '*'.\n" ;
std::exit(2);
}
const std::string nesw{"NESW"};
int main(int argc, char** argv) {
if (argc!=2)
usage(argv[0]);
std::vector<Maze> mazes = gen_valid_mazes();
std::cout << "Mazes with start positions: " << mazes.size() << "\n" ;
lgl = lglinit();
int len = std::strlen(argv[1]);
std::cout << argv[1] << "\n with length " << len << "\n";
std::vector<A4> dirs;
for(int i=0; i<len; ++i){
switch(argv[1][i]){
case 'N':
dirs.emplace_back(A4{TRUELIT,FALSELIT,FALSELIT,FALSELIT});
break;
case 'E':
dirs.emplace_back(A4{FALSELIT,TRUELIT,FALSELIT,FALSELIT});
break;
case 'S':
dirs.emplace_back(A4{FALSELIT,FALSELIT,TRUELIT,FALSELIT});
break;
case 'W':
dirs.emplace_back(A4{FALSELIT,FALSELIT,FALSELIT,TRUELIT});
break;
case '*': {
dirs.emplace_back();
std::generate_n(dirs[i].begin(),4,new_var);
std::set<int> dirs_here { dirs[i].begin(), dirs[i].end() };
at_most_one(dirs_here);
clause(dirs_here);
for(int l : dirs_here)
lglfreeze(lgl,l);
break;
}
default:
usage(argv[0]);
}
}
int maze_nr=0;
for(;;) {
std::cout << "Solving...\n";
int res=lglsat(lgl);
if(res==LGL_UNSATISFIABLE)
unsat();
assert(res==LGL_SATISFIABLE);
std::string sol(len,' ');
for(int i=0; i<len; ++i)
for(int d=0; d<4; ++d)
if (lit_is_true(dirs[i][d])){
sol[i]=nesw[d];
break;
}
std::cout << sol << "\n";
Maze m=unsolved(str_to_moves(sol),mazes);
if (m.is_dummy()){
std::cout << "That solves all!\n";
return 0;
}
std::cout << "Adding maze " << ++maze_nr << ": " <<
m.walls << "/" << m.start <<
" (" << m.size() << "/" << 12-m.simplicity() << ")\n";
add_maze_conditions(m,dirs,len);
}
}
First configure.sh
and make
the lingeling
solver, then compile the
program with something like
g++ -std=c++11 -O3 -I ... -o m3sat m3sat.cc -L ... -llgl
, where ...
is the
path where lglib.h
resp. liblgl.a
are, so both could for example be
../lingeling-<version>
.
Or just put them in the same directory and do
without the -I
and -L
options.
The program takes one mandatory command line argument, a string consisting
of N
, E
, S
, W
(for fixed directions) or *
. So you could search
for a general solution of size 78 by giving a string of 78 *
s (in quotes),
or search for a solution starting with NEWS
by using NEWS
followed by
as many *
s as you want for additional steps. As a first test, take your
favourite solution and replace some of the letters with *
.
This finds a solution fast for a surprisingly high value of "some".
The program will tell which maze it adds, described by wall structure and
start position, and also give the number of reachable positions and walls.
The mazes are sorted by these criteria, and the first unsolved one is added.
Therefore most added mazes have (9/4)
, but sometimes others appear as well.
I took the known solution of length 79, and for each group of adjacent 26
letters, tried to replace them with any 25 letters. I also tried to remove
13 letters from the beginning and from the end, and replace them by any 13 at the
beginning and any 12 at the end, and vice versa. Unfortunately, it all came
out unsatisfiable. So, can we take this as indicator that length 79 is
optimal? No, I similarly tried to improve the length 80 solution to length 79,
and that was also not successful.
Finally, I tried combining the beginning of one solution with the end of the
other, and also with one solution transformed by one of the symmetries.
Now I'm running out of interesting ideas, so I decided to show you what I
have, even though it didn't lead to new solutions.
3This is a really neat question. Is there one shortest string (or a number of strings and a proof that there can't be any shorter answers)? And if so, do you know it? – Alex Van Liew – 2015-07-31T18:43:52.473
@AlexVanLiew Thanks :) For a 2 by 2 maze the shortest possible string is 11 characters. I chose 3 by 3 for the question to make it unlikely that an optimal solution will be found. Judging by the results for the 2 by 2 case (which can be brute forced), there will be a shortest possible length, with a number of different valid strings of that length. – trichoplax – 2015-07-31T19:33:41.710
And no, I don't have any idea how long the shortest possible string will be for 3 by 3 mazes. More than 11, and less than 86... You could increase that lower bound by brute force, but my suspicion is that the shortest string will be too long to be found that way. – trichoplax – 2015-07-31T19:38:24.747
Huh. If I was better at this sort of thing I would give it a shot, but I'm not good enough with optimization and low-level speed and the like to beat orlp's submission. I hope you get somebody else to post something interesting! – Alex Van Liew – 2015-07-31T21:12:14.887
@AlexVanLiew optimisation is one way to approach it. I'm also interested to see if anyone finds some kind of pattern that allows for reducing the length without having to process millions of candidates... – trichoplax – 2015-07-31T21:27:42.123
Can the exit be the center cell? – Alex Reinking – 2015-08-03T19:34:01.157
1@AlexReinking yes the start can be any of the 9 cells and the exit can be any of the 9 cells, as long as they are not the same cell, and the exit is reachable from the start. – trichoplax – 2015-08-03T23:30:32.273
1
Slightly similar to this stackoverflow question: http://stackoverflow.com/questions/26910401/solve-all-4x4-mazes-simultaneously-with-least-moves - but start and end cell are top left and bottom right in that one, which reduces the possible maze count to 2423.
– schnaader – 2015-08-04T12:40:55.507@schnaader that's an interesting question you link to! In addition to start and end being fixed, the walls are filled cells in that question, rather than having walls along cell edges. Some of the techniques discussed there might still be applicable here though... – trichoplax – 2015-08-04T18:53:13.473
why is this specific to 3? in this case my code could just be "print NENENNNENENE" or something. why wouldn't the question be to all numbers, and be scored based on the string of length 3? – proud haskeller – 2015-08-04T19:39:43.653
1@proudhaskeller either way would be a valid question. The general case, scored for n=3, would require more generalised code. This specific case allows for optimisations that do not apply to general n, and that's the way I chose to ask it. – trichoplax – 2015-08-04T21:24:33.920
@proudhaskeller sorry I missed the point of your comment initially - you mean someone could hardcode a string rather than submitting code that finds one. If someone did, I would downvote and raise it on meta, as I am certain that they would need to have code elsewhere that they omitted from the answer in order to find a competitive string. A valid answer requires a string plus the code that produced it. – trichoplax – 2015-08-05T10:35:44.503
2Has anyone considered approaching this problem as finding the shortest accepted string to a regular expression? It would require a LOT of reductions in the number of problems before converting to regexes, but could theoretically find a verifiably optimal solution. – Kyle McCormick – 2015-08-09T03:47:38.123
The link to the python 3 validator is dead. – NonlinearFruit – 2017-05-19T23:04:02.810
@NonlinearFruit thanks for letting me know - I've fixed the link now – trichoplax – 2017-05-19T23:16:47.863
@AlexVanLiew, if NP-complete problems can't be effectively solved, almost surely optimal solution won't be found. – rus9384 – 2017-09-13T10:40:43.167
Im excited about this puzzle. it's probably going to be one of the first things we can solve via quantum computing. you only need 2*n qbits where n is the number of instructions that you want to check. – tuskiomi – 2018-10-15T14:56:08.177
@tuskiomi so as quantum computers progress we'll get increasing lower bounds on the optimal solution until eventually we find it... – trichoplax – 2018-10-15T19:31:35.723
@trichoplax yep, and then (given an average computational time), we would only need to perform 2 checks per Qbit. it's very computationally lax for a quantum computer, but for a classical machine, it's very hard to do, even with current technology. – tuskiomi – 2018-10-15T19:36:19.757
Can we call out to the validator you've written, in our code? – Stackstuck – 2019-03-25T16:42:02.553
@Stackstuck this is not code-golf, so you can include my validator directly in your code - there's no need to make it short. However, the validator is intended for checking a single solution with human readable output - it isn't fast enough to check large numbers of candidates in a practical time. Go ahead and use and modify it however you wish, but the existing answers managed to find ways of optimising this check dramatically, so looking at them will probably be more useful than looking at mine... :) – trichoplax – 2019-03-25T21:36:49.393
Yeah, I have absolutely no idea how they're doing it. I was just planning on exhaustively searching. – Stackstuck – 2019-03-26T02:10:38.590