My favorite three games to implement when learning a new programming are: TicTacToe, Connect Four, and Minesweeper. Last month, I wrote a post about implementing TicTacToe in Python. This time I implemented “Connect Four” (also known as “Four in a Row” and “Four in a Line”). When implementing some AI for TicTacToe I chose the basic “minimax” algorithm without any optimization. This time I went a little further and improved the minimax algorithm by adding a “horizon”, “alpha-beta pruning”, a “null-window”, “iterative deepening”, and a “transposition table” turning the minimax into a “Memory-enhanced Test Driver” also known as: MTD(f).
MTD(f) = minimax + horizon + alpha-beta pruning + null-window + iterative deepening + transposition table
MTD(f), is a higly optimized minimax algorithm. It was developed in 1994 by Aske Plaat, Jonathan Schaeffer, Wim Pijls, and Arie de Bruin. The name MTD(f) is an abbreviation for MTD(n,f) (Memory-enhanced Test Driver with node n and value f). It is a variant of minimax and an alternative to the alpha-beta pruning algorithm.
Minimax
The minimax theorem states (source: Wikipedia)
For every two-person, zero-sum game with finitely many strategies, there exists a value V and a mixed strategy for each player, such that:
- (a) Given player 2’s strategy, the best payoff possible for player 1 is V, and
(b) Given player 1’s strategy, the best payoff possible for player 2 is −V.
A game with just two possible moves on each turn will have a search tree as shown in the above picture. The circles represent possible moves by the current player (max), while the squares represent the opponents moves (min).
Horizon
One could limit the tree to a look-ahead of four moves to limit the time the AI is “thinking”. This is done in the tree above. This is called the “horizon effect”, since the algorithm will not look any further. On the 4th level a “heuristic” function will be used to score the nodes. The better the heuristic function, the better the AI will play. Values for won and lost games are typically +infinity and -infinity, but in MTD(f) they should probably be a valid integer, like +1000 and -1000.
Alpha-beta pruning
Using alpha-beta pruning one can reduce that amount of nodes that have to be visited by the minimax algorithm by keep track of the best and worst values. If it finds a move to be worse than a previously visited move it will not evaluate that move any further. Effectively the algorithm “prunes” away the branches of the search tree that do not influence the final decision. Below is an example of a search tree in which the moves are evaluated from left to right. The red marks are where the branches are pruned away, so that the gray branches are not evaluated.
Null window
The concept of a “null window” is also known as a “zero”, “scout” or “narrow” window. This is a way to reduce the search space by performing a series of boolean tests, thereby searching for a move that produces a worse or better score. With integers a null-window is defined as alpha == beta – 1. Note that a null window search does not return an exact score, but either an upper bound with score < beta (score <= alpha) or a lower bound with score >= beta (score > alpha). It is kind of hard to explain and I believe that if you want to truly understand null window you should read the code.
Iterative deepening
Iterative deepening means that we limit the search time to some fixed amount and then we evaluate the nodes in the search tree. We start evaluating with a horizon or look ahead of one move, then we look ahead for two moves, then three, etc. Whenever we hit the time limit we continue the search with the current look ahead value, but stop afterwards. This makes the AI “thinking” time more or less constant and the look ahead variable, but dependent on the (remaining) size of the tree. This way we avoid that the AI is very slow for the first moves (when the search tree is still large) and very fast at the end (when there are not many possible moves left).
Transposition table
A transposition table stores the outcome (upper and lower bound scores) of the evaluation of a board state in a hash table. This allows to reduce the search space if a single board state can be reached with different sequences of moves. The transposition table also stores the mirrored board state, since these are supposed to be equally scored. This may greatly reduce the size of the search tree in the early stage of the game.
Connect Four Python source
# coding=UTF8 from Tkinter import Tk, Button, Frame, Canvas from tkFont import Font from copy import deepcopy from time import time class Board: nodes = {} def __init__(self,other=None): self.player = 'X' self.opponent = 'O' self.empty = '.' self.width = 7 self.height = 6 self.fields = {} for y in range(self.height): for x in range(self.width): self.fields[x,y] = self.empty # copy constructor if other: self.__dict__ = deepcopy(other.__dict__) def move(self,x): board = Board(self) for y in range(board.height): if board.fields[x,y] == board.empty: board.fields[x,y] = board.player break board.player,board.opponent = board.opponent,board.player return board def __heuristic(self): return self.__heuristic_score(self.player)-self.__heuristic_score(self.opponent) def __heuristic_score(self, player): lines = self.__winlines(player) winpositions = self.__winpositions(lines,player) score = 0 for x in range(self.width): for y in range(self.height-1,0,-1): win = winpositions.get("{0},{1}".format(x,y),False) below = winpositions.get("{0},{1}".format(x,y-1),False) if win and below: score+=self.height-y*100 for line in lines: pieces = 0 height = [] for x,y in line: if self.fields[x,y] == player: pieces = pieces + 1 elif self.fields[x,y] == self.empty: height.append(y) heightscore = self.height - int(sum(height) / float(len(height))) score=score+pieces*heightscore return score def __winpositions(self, lines, player): lines = self.__winlines(player) winpositions = {} for line in lines: pieces = 0 empty = None for x,y in line: if self.fields[x,y] == player: pieces = pieces + 1 elif self.fields[x,y] == self.empty: if not empty == None: break empty = (x,y) if pieces==3: winpositions["{0},{1}".format(x,y)]=True return winpositions def __winlines(self, player): lines = [] # horizontal for y in range(self.height): winning = [] for x in range(self.width): if self.fields[x,y] == player or self.fields[x,y] == self.empty: winning.append((x,y)) if len(winning) >= 4: lines.append(winning[-4:]) else: winning = [] # vertical for x in range(self.width): winning = [] for y in range(self.height): if self.fields[x,y] == player or self.fields[x,y] == self.empty: winning.append((x,y)) if len(winning) >= 4: lines.append(winning[-4:]) else: winning = [] # diagonal winning = [] for cx in range(self.width-1): sx,sy = max(cx-2,0),abs(min(cx-2,0)) winning = [] for cy in range(self.height): x,y = sx+cy,sy+cy if x<0 or y<0 or x>=self.width or y>=self.height: continue if self.fields[x,y] == player or self.fields[x,y] == self.empty: winning.append((x,y)) if len(winning) >= 4: lines.append(winning[-4:]) else: winning = [] # other diagonal winning = [] for cx in range(self.width-1): sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0)) winning = [] for cy in range(self.height): x,y = sx-cy,sy+cy if x<0 or y<0 or x>=self.width or y>=self.height: continue if self.fields[x,y] == player or self.fields[x,y] == self.empty: winning.append((x,y)) if len(winning) >= 4: lines.append(winning[-4:]) else: winning = [] # return return lines def __iterative_deepening(self,think): g = (3,None) start = time() for d in range(1,10): g = self.__mtdf(g, d) if time()-start>think: break return g; def __mtdf(self, g, d): upperBound = +1000 lowerBound = -1000 best = g while lowerBound < upperBound: if g[0] == lowerBound: beta = g[0]+1 else: beta = g[0] g = self.__minimax(True, d, beta-1, beta) if g[1]!=None: best = g if g[0] < beta: upperBound = g[0] else: lowerBound = g[0] return best def __minimax(self, player, depth, alpha, beta): lower = Board.nodes.get(str(self)+str(depth)+'lower',None) upper = Board.nodes.get(str(self)+str(depth)+'upper',None) if lower != None: if lower >= beta: return (lower,None) alpha = max(alpha,lower) if upper != None: if upper <= alpha: return (upper,None) beta = max(beta,upper) if self.won(): if player: return (-999,None) else: return (+999,None) elif self.tied(): return (0,None) elif depth==0: return (self.__heuristic(),None) elif player: best = (alpha,None) for x in range(self.width): if self.fields[x,self.height-1]==self.empty: value = self.move(x).__minimax(not player,depth-1,best[0],beta)[0] if value>best[0]: best = value,x if value>beta: break else: best = (beta,None) for x in range(self.width): if self.fields[x,self.height-1]==self.empty: value = self.move(x).__minimax(not player,depth-1,alpha,best[0])[0] if value<best[0]: best = value,x if alpha>value: break if best[0] <= alpha: Board.nodes[str(self)+str(depth)+"upper"] = best[0] Board.nodes[self.__mirror()+str(depth)+"upper"] = best[0] elif best[0] >= beta: Board.nodes[str(self)+str(depth)+"lower"] = best[0] Board.nodes[self.__mirror()+str(depth)+"lower"] = best[0] return best def best(self): return self.__iterative_deepening(2)[1] def tied(self): for (x,y) in self.fields: if self.fields[x,y]==self.empty: return False return True def won(self): # horizontal for y in range(self.height): winning = [] for x in range(self.width): if self.fields[x,y] == self.opponent: winning.append((x,y)) if len(winning) == 4: return winning else: winning = [] # vertical for x in range(self.width): winning = [] for y in range(self.height): if self.fields[x,y] == self.opponent: winning.append((x,y)) if len(winning) == 4: return winning else: winning = [] # diagonal winning = [] for cx in range(self.width-1): sx,sy = max(cx-2,0),abs(min(cx-2,0)) winning = [] for cy in range(self.height): x,y = sx+cy,sy+cy if x<0 or y<0 or x>=self.width or y>=self.height: continue if self.fields[x,y] == self.opponent: winning.append((x,y)) if len(winning) == 4: return winning else: winning = [] # other diagonal winning = [] for cx in range(self.width-1): sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0)) winning = [] for cy in range(self.height): x,y = sx-cy,sy+cy if x<0 or y<0 or x>=self.width or y>=self.height: continue if self.fields[x,y] == self.opponent: winning.append((x,y)) if len(winning) == 4: return winning else: winning = [] # default return None def __mirror(self): string = '' for y in range(self.height): for x in range(self.width): string+=' '+self.fields[self.width-1-x,self.height-1-y] string+="\n" return string def __str__(self): string = '' for y in range(self.height): for x in range(self.width): string+=' '+self.fields[x,self.height-1-y] string+="\n" return string class GUI: def __init__(self): self.app = Tk() self.app.title('Connect4') self.app.resizable(width=False, height=False) self.board = Board() self.buttons = {} self.frame = Frame(self.app, borderwidth=1, relief="raised") self.tiles = {} for x in range(self.board.width): handler = lambda x=x: self.move(x) button = Button(self.app, command=handler, font=Font(family="Helvetica", size=14), text=x+1) button.grid(row=0, column=x, sticky="WE") self.buttons[x] = button self.frame.grid(row=1, column=0, columnspan=self.board.width) for x,y in self.board.fields: tile = Canvas(self.frame, width=60, height=50, bg="navy", highlightthickness=0) tile.grid(row=self.board.height-1-y, column=x) self.tiles[x,y] = tile handler = lambda: self.reset() self.restart = Button(self.app, command=handler, text='reset') self.restart.grid(row=2, column=0, columnspan=self.board.width, sticky="WE") self.update() def reset(self): self.board = Board() self.update() def move(self,x): self.app.config(cursor="watch") self.app.update() self.board = self.board.move(x) self.update() move = self.board.best() if move!=None: self.board = self.board.move(move) self.update() self.app.config(cursor="") def update(self): for (x,y) in self.board.fields: text = self.board.fields[x,y] if (text=='.'): self.tiles[x,y].create_oval(10, 5, 50, 45, fill="black", outline="blue", width=1) if (text=='X'): self.tiles[x,y].create_oval(10, 5, 50, 45, fill="yellow", outline="blue", width=1) if (text=='O'): self.tiles[x,y].create_oval(10, 5, 50, 45, fill="red", outline="blue", width=1) for x in range(self.board.width): if self.board.fields[x,self.board.height-1]==self.board.empty: self.buttons[x]['state'] = 'normal' else: self.buttons[x]['state'] = 'disabled' winning = self.board.won() if winning: for x,y in winning: self.tiles[x,y].create_oval(25, 20, 35, 30, fill="black") for x in range(self.board.width): self.buttons[x]['state'] = 'disabled' def mainloop(self): self.app.mainloop() if __name__ == '__main__': GUI().mainloop()
Links
http://en.wikipedia.org/wiki/MTD-f
http://chessprogramming.wikispaces.com/MTD%28f%29
http://people.csail.mit.edu/plaat/mtdf.html
The post Python Connect4 AI with MTD(f) algorithm appeared first on LeaseWeb labs.