1目的

学习和理解alphazero 算法

相关学习材料 https://github.com/chiefzzs/alphago_learnning/

参考:https://github.com/junxiaosong/AlphaZero_Gomoku 

2学习方式

序号步骤代码
1理解下棋过程代码参考
2理解下棋算法过程

3下棋过程

序号l1步骤l2步骤l3步骤l4步骤
1初始化
2对弈过程
2.1局面分析
2.1.1蒙特卡洛下棋

3.1 初始化

定义棋盘
定义游戏
定义棋手
绑定棋手到棋盘

 #变量定义
n = 5
width, height = 8, 8
model_file =  'best_policy_8_8_5.model'

#初始化棋盘
board = Board(width=width, height=height, n_in_row=n)
game = Game(board)

# ############### human VS AI ###################
# load the trained policy_value_net in either Theano/Lasagne, PyTorch or TensorFlow

# best_policy = PolicyValueNet(width, height, model_file = model_file)
# mcts_player = MCTSPlayer(best_policy.policy_value_fn, c_puct=5, n_playout=400)

# load the provided model (trained in Theano/Lasagne) into a MCTS player written in pure numpy

#得到策略                               
best_policy = PolicyValueNetNumpy(width, height, model_file)
#依据智能初始化棋手1
mcts_player1 = MCTSPlayer(best_policy.policy_value_fn,
                         c_puct=5,
                         n_playout=400)  # set larger n_playout for better performance
#依据智能初始化棋手2
mcts_player2 = MCTSPlayer(best_policy.policy_value_fn,
                         c_puct=5,
                         n_playout=400)  

# uncomment the following line to play with pure MCTS (it's much weaker even with a larger n_playout)
# mcts_player = MCTS_Pure(c_puct=5, n_playout=1000)

# human player, input your move in the format: 2,3
human = Human()

#依据棋手设置棋盘
player1 =  mcts_player1
player2 =  mcts_player2
start_player=0 
is_shown=1

game.board.init_board(start_player)
p1, p2 = game.board.players
player1.set_player_ind(p1)
player2.set_player_ind(p2)
players = {p1: player1, p2: player2}

3.1.1 棋盘内部状态

*
states
state
act
player
 self.width = int(kwargs.get('width', 8))
        self.height = int(kwargs.get('height', 8))
        # board states stored as a dict,
        # key: move as location on the board,
        # value: player as pieces type
        self.states = {}
        # need how many pieces in a row to win
        self.n_in_row = int(kwargs.get('n_in_row', 5))
        self.players = [1, 2]  # player1 and player2

3.1.1 棋盘外部状态

height*width
height*width
height*width
height*width
state
layer0
layer1
layer2
layer3
player1 moves
player2 moves
lastmove
current player
    def current_state(self):
        """return the board state from the perspective of the current player.
        state shape: 4*width*height
        """

        square_state = np.zeros((4, self.width, self.height))
        if self.states:
            moves, players = np.array(list(zip(*self.states.items())))
            move_curr = moves[players == self.current_player]
            move_oppo = moves[players != self.current_player]
            square_state[0][move_curr // self.width,
                            move_curr % self.height] = 1.0
            square_state[1][move_oppo // self.width,
                            move_oppo % self.height] = 1.0
            # indicate the last move location
            square_state[2][self.last_move // self.width,
                            self.last_move % self.height] = 1.0
        if len(self.states) % 2 == 0:
            square_state[3][:, :] = 1.0  # indicate the colour to play
        return square_state[:, ::-1, :]

3.2 对弈过程

依据当前局面,得到当前局面的全部可能"下发"acts 和 推荐概率 probs

##计算概率
current_player = game.board.get_current_player()
player_in_turn = players[current_player]
board=game.board
temp=1e-3
return_prob=0

sensible_moves = board.availables
move_probs = np.zeros(board.width*board.height)

## 依据当前局面,得到每个步骤
acts, probs = player_in_turn.mcts.get_move_probs(board, temp)
print(acts)
print(probs)

输出:
每个可以下的位置推荐的概率

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63)
[0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 5.91871068e-107
1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 1.53059365e-139
2.92327048e-039 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000]

3.2.1 局面分析:依据当前局面得到步骤可能性

1、采用蒙特卡洛下棋
2、依据蒙特卡洛树来统计每个动作的访问次数
3、对访问次数做softmax归一化得到概率。

state=board
temp=1e-3

## 
state=board
temp=1e-3

#蒙特卡洛下棋
for n in range(player_in_turn.mcts._n_playout):
    state_copy = copy.deepcopy(state)
    player_in_turn.mcts._playout(state_copy)

# 依据访问来计算可能性    
# calc the move probabilities based on visit counts at the root node
act_visits = [(act, node._n_visits)
              for act, node in player_in_turn.mcts._root._children.items()]
acts, visits = zip(*act_visits)

# 归一化
act_probs = softmax(1.0/temp * np.log(np.array(visits) + 1e-10))
print("---------acts----------")
print(acts)
print(visits)
print(act_probs)

3.2.1.1 分析蒙特卡洛树

当前局

在这里插入图片描述
蒙特卡洛后第一层结果:

在这里插入图片描述

蒙特卡洛树表达:采用mermaid表达

act18
act26
act44
act36
act27
act19
act10
act18
act20
act18
act36
act37
act46
act18
act36
act37
act37
act36
act20
act13
act43
act34
act43
act44
act52
act44
act20
act51
act12
act30
act26
act43
act37
act51
act51
act36
act19
act43
act36
act34
act36
act19
act27
act34
act37
act34
act33
act34
act27
act18
act37
act19
act27
act26
act29
act30
act30
act44
act29
act22
act26
act34
act33
act26
act26
act25
act43
act20
act27
act19
act10
act46
act33
act34
act19
act33
act33
act19
act27
act45
act46
act19
act10
act29
act27
act29
act29
act27
act45
act53
act45
act44
visited:400
visited:2
visited:5
visited:3
visited:2
visited:165
visited:98
visited:14
visited:2
visited:4
visited:2
visited:2
visited:5
visited:4
visited:2
visited:4
visited:3
visited:69
visited:44
visited:26
visited:4
visited:2
visited:2
visited:2
visited:5
visited:4
visited:17
visited:7
visited:6
visited:4
visited:2
visited:2
visited:5
visited:3
visited:2
visited:2
visited:4
visited:3
visited:4
visited:3
visited:2
visited:221
visited:2
visited:8
visited:2
visited:4
visited:2
visited:2
visited:7
visited:6
visited:2
visited:145
visited:105
visited:50
visited:14
visited:4
visited:2
visited:2
visited:2
visited:35
visited:9
visited:3
visited:5
visited:3
visited:2
visited:3
visited:2
visited:3
visited:2
visited:6
visited:5
visited:3
visited:2
visited:2
visited:8
visited:3
visited:2
visited:4
visited:3
visited:2
visited:2
visited:20
visited:6
visited:5
visited:2
visited:3
visited:2
visited:6
visited:4
visited:3
visited:2
visited:2
visited:2
def printNode(node,level=0,act=0):    
    if(node._n_visits<2):
        return
    str = '	'*level
    print("%s, act=%d,_n_visits=%d ,_Q=%f ,_u=%f ,_P=%f " %(str,act,node._n_visits,node._Q,node._u,node._P))
          
    for act,child in node._children.items():
          printNode(child,level+1,act)

prn_obj(game.board)          
printNode(player_in_turn.mcts._root)          

蒙特卡洛树表达1:采用缩进表达。

, act=0,_n_visits=800 ,_Q=-0.455971 ,_u=0.000000 ,_P=1.000000 
	, act=18,_n_visits=8 ,_Q=-0.185447 ,_u=0.781758 ,_P=0.049782 
		, act=27,_n_visits=3 ,_Q=-0.035337 ,_u=1.092187 ,_P=0.330246 
		, act=36,_n_visits=2 ,_Q=0.365520 ,_u=0.683974 ,_P=0.155111 
	, act=19,_n_visits=4 ,_Q=0.219712 ,_u=0.339472 ,_P=0.012010 
		, act=28,_n_visits=3 ,_Q=-0.214190 ,_u=1.373586 ,_P=0.475824 
			, act=27,_n_visits=2 ,_Q=0.232468 ,_u=1.274235 ,_P=0.360408 
	, act=20,_n_visits=2 ,_Q=0.080553 ,_u=0.501082 ,_P=0.010636 
	, act=21,_n_visits=12 ,_Q=0.058928 ,_u=0.458884 ,_P=0.042209 
		, act=27,_n_visits=2 ,_Q=-0.063028 ,_u=0.487490 ,_P=0.088190 
		, act=28,_n_visits=5 ,_Q=0.071194 ,_u=1.033736 ,_P=0.311683 
			, act=29,_n_visits=2 ,_Q=-0.497347 ,_u=1.085420 ,_P=0.217084 
		, act=35,_n_visits=2 ,_Q=-0.243935 ,_u=0.904106 ,_P=0.163559 
		, act=36,_n_visits=2 ,_Q=-0.152409 ,_u=0.580145 ,_P=0.104952 
	, act=26,_n_visits=2 ,_Q=-0.069493 ,_u=0.454502 ,_P=0.009647 
	, act=27,_n_visits=157 ,_Q=0.476385 ,_u=0.151632 ,_P=0.168441 
		, act=8,_n_visits=2 ,_Q=-0.329893 ,_u=0.081594 ,_P=0.003920 
		, act=9,_n_visits=2 ,_Q=-0.358136 ,_u=0.039543 ,_P=0.001900 
		, act=11,_n_visits=2 ,_Q=-0.359925 ,_u=0.206617 ,_P=0.009926 
		, act=18,_n_visits=6 ,_Q=-0.410891 ,_u=0.412434 ,_P=0.046230 
			, act=28,_n_visits=2 ,_Q=0.277199 ,_u=0.784231 ,_P=0.210431 
			, act=35,_n_visits=3 ,_Q=0.493989 ,_u=1.036746 ,_P=0.278188 
				, act=19,_n_visits=2 ,_Q=-0.307951 ,_u=1.972537 ,_P=0.557918 
		, act=20,_n_visits=18 ,_Q=-0.186914 ,_u=0.173711 ,_P=0.052850 
			, act=18,_n_visits=2 ,_Q=-0.205378 ,_u=0.827569 ,_P=0.120429 
			, act=19,_n_visits=2 ,_Q=0.284847 ,_u=0.473225 ,_P=0.068864 
			, act=28,_n_visits=8 ,_Q=0.161790 ,_u=0.918767 ,_P=0.401101 
				, act=29,_n_visits=6 ,_Q=0.063473 ,_u=1.567840 ,_P=0.711105 
					, act=26,_n_visits=2 ,_Q=-0.019465 ,_u=0.769820 ,_P=0.206564 
					, act=35,_n_visits=2 ,_Q=-0.016241 ,_u=0.788532 ,_P=0.141057 
			, act=36,_n_visits=4 ,_Q=0.281107 ,_u=1.072277 ,_P=0.208052 
				, act=18,_n_visits=2 ,_Q=0.176588 ,_u=1.850940 ,_P=0.427456 
		, act=21,_n_visits=2 ,_Q=-0.390710 ,_u=0.288127 ,_P=0.013841 
		, act=28,_n_visits=3 ,_Q=-0.636589 ,_u=0.534831 ,_P=0.034257 
			, act=36,_n_visits=2 ,_Q=0.785911 ,_u=1.563539 ,_P=0.442236 
		, act=29,_n_visits=4 ,_Q=-0.255149 ,_u=0.262146 ,_P=0.020988 
			, act=36,_n_visits=3 ,_Q=0.326441 ,_u=1.556929 ,_P=0.539336 
		, act=34,_n_visits=3 ,_Q=-0.541608 ,_u=0.435908 ,_P=0.027920 
			, act=35,_n_visits=2 ,_Q=0.636979 ,_u=1.690728 ,_P=0.478210 
		, act=35,_n_visits=3 ,_Q=-0.782964 ,_u=0.598944 ,_P=0.038363 
			, act=36,_n_visits=2 ,_Q=0.930136 ,_u=1.706458 ,_P=0.482659 
		, act=36,_n_visits=50 ,_Q=-0.550742 ,_u=0.570920 ,_P=0.457102 
			, act=20,_n_visits=2 ,_Q=-0.129428 ,_u=0.938021 ,_P=0.080402 
			, act=28,_n_visits=21 ,_Q=0.608117 ,_u=0.551983 ,_P=0.331190 
				, act=26,_n_visits=2 ,_Q=-0.587251 ,_u=0.623344 ,_P=0.083630 
				, act=29,_n_visits=16 ,_Q=-0.562059 ,_u=1.026563 ,_P=0.734549 
					, act=22,_n_visits=3 ,_Q=0.722939 ,_u=0.781887 ,_P=0.161506 
					, act=26,_n_visits=2 ,_Q=0.218605 ,_u=0.925751 ,_P=0.143417 
					, act=35,_n_visits=3 ,_Q=0.465863 ,_u=0.908840 ,_P=0.140797 
						, act=43,_n_visits=2 ,_Q=-0.373646 ,_u=2.861129 ,_P=0.809249 
					, act=43,_n_visits=7 ,_Q=0.671767 ,_u=0.728113 ,_P=0.300797 
						, act=35,_n_visits=6 ,_Q=-0.626132 ,_u=1.928775 ,_P=0.944903 
							, act=34,_n_visits=3 ,_Q=0.464121 ,_u=1.722878 ,_P=0.462297 
							, act=37,_n_visits=2 ,_Q=0.737424 ,_u=1.495213 ,_P=0.401208 
			, act=34,_n_visits=2 ,_Q=-0.332387 ,_u=0.668018 ,_P=0.057259 
			, act=35,_n_visits=24 ,_Q=0.645149 ,_u=0.532131 ,_P=0.380093 
				, act=19,_n_visits=2 ,_Q=-0.766992 ,_u=0.776434 ,_P=0.097139 
				, act=43,_n_visits=18 ,_Q=-0.593014 ,_u=0.916555 ,_P=0.688014 
					, act=19,_n_visits=2 ,_Q=0.045573 ,_u=1.172094 ,_P=0.170565 
					, act=28,_n_visits=2 ,_Q=0.529674 ,_u=0.760434 ,_P=0.110659 
					, act=29,_n_visits=10 ,_Q=0.654451 ,_u=0.819301 ,_P=0.397420 
						, act=28,_n_visits=9 ,_Q=-0.627666 ,_u=1.530749 ,_P=0.918450 
							, act=20,_n_visits=4 ,_Q=0.731742 ,_u=1.332107 ,_P=0.376777 
								, act=44,_n_visits=2 ,_Q=-0.581877 ,_u=1.257006 ,_P=0.435440 
							, act=44,_n_visits=4 ,_Q=0.441063 ,_u=1.321673 ,_P=0.467282 
					, act=50,_n_visits=3 ,_Q=0.718254 ,_u=0.629334 ,_P=0.122109 
		, act=43,_n_visits=5 ,_Q=-0.260930 ,_u=0.182409 ,_P=0.017525 
			, act=36,_n_visits=4 ,_Q=0.319099 ,_u=1.258602 ,_P=0.503441 
				, act=45,_n_visits=2 ,_Q=-0.302862 ,_u=1.650853 ,_P=0.381248 
		, act=45,_n_visits=3 ,_Q=-0.542640 ,_u=0.482945 ,_P=0.030933 
		, act=48,_n_visits=3 ,_Q=-0.135777 ,_u=0.084086 ,_P=0.005386 
			, act=36,_n_visits=2 ,_Q=0.275592 ,_u=1.335555 ,_P=0.377752 
		, act=50,_n_visits=2 ,_Q=-0.183745 ,_u=0.191733 ,_P=0.009211 
	, act=28,_n_visits=214 ,_Q=0.512162 ,_u=0.117028 ,_P=0.178026 
		, act=1,_n_visits=5 ,_Q=-0.287701 ,_u=0.037544 ,_P=0.003087 
			, act=35,_n_visits=2 ,_Q=0.163045 ,_u=1.257606 ,_P=0.377282 
		, act=3,_n_visits=3 ,_Q=-0.422566 ,_u=0.093003 ,_P=0.005098 
		, act=8,_n_visits=10 ,_Q=-0.202820 ,_u=0.048012 ,_P=0.006579 
			, act=27,_n_visits=2 ,_Q=0.606195 ,_u=0.639651 ,_P=0.127930 
			, act=35,_n_visits=5 ,_Q=0.127078 ,_u=1.208807 ,_P=0.402936 
				, act=21,_n_visits=2 ,_Q=0.063265 ,_u=1.296277 ,_P=0.388883 
				, act=42,_n_visits=2 ,_Q=-0.172860 ,_u=2.039363 ,_P=0.407873 
			, act=36,_n_visits=2 ,_Q=0.097307 ,_u=1.167656 ,_P=0.233531 
		, act=9,_n_visits=2 ,_Q=-0.346548 ,_u=0.037019 ,_P=0.001522 
		, act=10,_n_visits=2 ,_Q=-0.433851 ,_u=0.255563 ,_P=0.010507 
		, act=11,_n_visits=2 ,_Q=-0.374231 ,_u=0.124851 ,_P=0.005133 
		, act=18,_n_visits=2 ,_Q=-0.712887 ,_u=0.443869 ,_P=0.018248 
		, act=19,_n_visits=28 ,_Q=-0.300480 ,_u=0.143432 ,_P=0.057001 
			, act=27,_n_visits=14 ,_Q=0.413202 ,_u=0.638426 ,_P=0.368595 
				, act=26,_n_visits=12 ,_Q=-0.385210 ,_u=1.204515 ,_P=0.801773 
					, act=29,_n_visits=2 ,_Q=0.265790 ,_u=1.032788 ,_P=0.186838 
					, act=36,_n_visits=6 ,_Q=0.831043 ,_u=0.488590 ,_P=0.206242 
						, act=18,_n_visits=2 ,_Q=-0.890770 ,_u=1.737816 ,_P=0.310870 
			, act=35,_n_visits=7 ,_Q=0.180327 ,_u=0.846025 ,_P=0.260508 
				, act=21,_n_visits=5 ,_Q=-0.089641 ,_u=1.223282 ,_P=0.499403 
					, act=20,_n_visits=4 ,_Q=0.156589 ,_u=1.601920 ,_P=0.640768 
						, act=12,_n_visits=2 ,_Q=0.145149 ,_u=1.547143 ,_P=0.357297 
			, act=36,_n_visits=4 ,_Q=0.213771 ,_u=0.899944 ,_P=0.138555 
				, act=20,_n_visits=3 ,_Q=-0.082635 ,_u=2.230487 ,_P=0.772664 
					, act=21,_n_visits=2 ,_Q=0.478807 ,_u=2.768993 ,_P=0.783189 
		, act=20,_n_visits=2 ,_Q=-0.560294 ,_u=0.121234 ,_P=0.004984 
		, act=21,_n_visits=13 ,_Q=-0.450823 ,_u=0.283983 ,_P=0.054483 
			, act=27,_n_visits=5 ,_Q=0.489777 ,_u=0.725693 ,_P=0.251387 
				, act=29,_n_visits=2 ,_Q=-0.434157 ,_u=1.302774 ,_P=0.390832 
			, act=29,_n_visits=2 ,_Q=0.302444 ,_u=0.606600 ,_P=0.105066 
			, act=36,_n_visits=4 ,_Q=0.293393 ,_u=0.756540 ,_P=0.218394 
				, act=20,_n_visits=3 ,_Q=-0.155795 ,_u=1.584278 ,_P=0.548810 
					, act=19,_n_visits=2 ,_Q=0.140650 ,_u=2.828891 ,_P=0.800131 
		, act=25,_n_visits=2 ,_Q=-0.494314 ,_u=0.161321 ,_P=0.006632 
		, act=26,_n_visits=9 ,_Q=-0.425081 ,_u=0.208361 ,_P=0.028553 
			, act=35,_n_visits=6 ,_Q=0.408029 ,_u=1.119694 ,_P=0.475046 
				, act=42,_n_visits=4 ,_Q=-0.441664 ,_u=1.310199 ,_P=0.468751 
					, act=34,_n_visits=3 ,_Q=0.442974 ,_u=1.686404 ,_P=0.584187 
			, act=36,_n_visits=2 ,_Q=0.645280 ,_u=0.665816 ,_P=0.141241 
		, act=27,_n_visits=5 ,_Q=-0.697772 ,_u=0.488608 ,_P=0.040175 
			, act=35,_n_visits=4 ,_Q=0.717066 ,_u=1.542901 ,_P=0.617160 
		, act=30,_n_visits=2 ,_Q=-0.413724 ,_u=0.246762 ,_P=0.010145 
		, act=34,_n_visits=2 ,_Q=-0.452803 ,_u=0.202389 ,_P=0.008320 
		, act=35,_n_visits=48 ,_Q=-0.735712 ,_u=0.580984 ,_P=0.390122 
			, act=27,_n_visits=13 ,_Q=0.675958 ,_u=0.703514 ,_P=0.266807 
				, act=26,_n_visits=10 ,_Q=-0.641721 ,_u=1.374282 ,_P=0.793442 
					, act=36,_n_visits=3 ,_Q=0.730219 ,_u=0.695421 ,_P=0.185446 
						, act=44,_n_visits=2 ,_Q=-0.606733 ,_u=2.647418 ,_P=0.748803 
					, act=44,_n_visits=4 ,_Q=0.858133 ,_u=0.787659 ,_P=0.210042 
						, act=36,_n_visits=3 ,_Q=-0.866214 ,_u=2.786685 ,_P=0.965336 
							, act=37,_n_visits=2 ,_Q=0.903987 ,_u=1.975933 ,_P=0.558878 
			, act=36,_n_visits=32 ,_Q=0.826806 ,_u=0.532382 ,_P=0.512529 
				, act=20,_n_visits=2 ,_Q=-0.641988 ,_u=0.404906 ,_P=0.043634 
				, act=27,_n_visits=2 ,_Q=-0.975444 ,_u=0.969482 ,_P=0.104474 
				, act=44,_n_visits=20 ,_Q=-0.815586 ,_u=0.925425 ,_P=0.698087 
					, act=26,_n_visits=14 ,_Q=0.939248 ,_u=0.677325 ,_P=0.466167 
						, act=27,_n_visits=13 ,_Q=-0.934725 ,_u=1.299563 ,_P=0.937129 
							, act=19,_n_visits=4 ,_Q=0.865042 ,_u=1.078254 ,_P=0.311265 
							, act=43,_n_visits=8 ,_Q=0.966543 ,_u=1.230521 ,_P=0.568353 
								, act=19,_n_visits=3 ,_Q=-0.976368 ,_u=1.523390 ,_P=0.345472 
									, act=11,_n_visits=2 ,_Q=0.973991 ,_u=1.236814 ,_P=0.349824 
					, act=27,_n_visits=2 ,_Q=0.585449 ,_u=0.740996 ,_P=0.101998 
					, act=53,_n_visits=2 ,_Q=0.813205 ,_u=0.963240 ,_P=0.088393 
			, act=37,_n_visits=2 ,_Q=-0.046088 ,_u=0.743118 ,_P=0.065037 
		, act=36,_n_visits=4 ,_Q=-0.755416 ,_u=0.545887 ,_P=0.037404 
			, act=35,_n_visits=3 ,_Q=0.859492 ,_u=2.014102 ,_P=0.697705 
		, act=37,_n_visits=4 ,_Q=-0.692301 ,_u=0.508317 ,_P=0.034829 
			, act=36,_n_visits=2 ,_Q=0.968557 ,_u=1.086428 ,_P=0.376350 
		, act=42,_n_visits=4 ,_Q=-0.808254 ,_u=0.558774 ,_P=0.038287 
			, act=27,_n_visits=2 ,_Q=0.864962 ,_u=1.170260 ,_P=0.270260 
		, act=43,_n_visits=2 ,_Q=-0.519138 ,_u=0.217227 ,_P=0.008930 
		, act=44,_n_visits=4 ,_Q=-0.471863 ,_u=0.319050 ,_P=0.021861 
			, act=35,_n_visits=2 ,_Q=0.249568 ,_u=1.123565 ,_P=0.389214 
		, act=45,_n_visits=3 ,_Q=-0.568997 ,_u=0.347658 ,_P=0.019057 
			, act=35,_n_visits=2 ,_Q=0.515761 ,_u=1.418398 ,_P=0.401184 
		, act=48,_n_visits=3 ,_Q=-0.259834 ,_u=0.099201 ,_P=0.005438 
		, act=49,_n_visits=2 ,_Q=-0.330507 ,_u=0.054329 ,_P=0.002234 
		, act=50,_n_visits=2 ,_Q=-0.509679 ,_u=0.127728 ,_P=0.005251 
		, act=51,_n_visits=2 ,_Q=-0.628834 ,_u=0.259326 ,_P=0.010661 
		, act=57,_n_visits=10 ,_Q=-0.194946 ,_u=0.027902 ,_P=0.004206 
			, act=35,_n_visits=5 ,_Q=0.339731 ,_u=1.078280 ,_P=0.359427 
				, act=21,_n_visits=2 ,_Q=-0.286942 ,_u=1.430482 ,_P=0.429145 
				, act=42,_n_visits=2 ,_Q=-0.439994 ,_u=1.555351 ,_P=0.311070 
			, act=36,_n_visits=3 ,_Q=0.068489 ,_u=1.160878 ,_P=0.309567 
	, act=29,_n_visits=4 ,_Q=0.298291 ,_u=0.298917 ,_P=0.010575 
		, act=36,_n_visits=2 ,_Q=-0.540858 ,_u=1.094615 ,_P=0.379186 
	, act=34,_n_visits=3 ,_Q=0.237141 ,_u=0.261359 ,_P=0.007397 
	, act=35,_n_visits=164 ,_Q=0.471148 ,_u=0.157738 ,_P=0.184153 
		, act=6,_n_visits=3 ,_Q=-0.268087 ,_u=0.037671 ,_P=0.002361 
		, act=8,_n_visits=2 ,_Q=-0.168531 ,_u=0.111599 ,_P=0.005245 
		, act=18,_n_visits=2 ,_Q=-0.412313 ,_u=0.360184 ,_P=0.016927 
		, act=19,_n_visits=7 ,_Q=-0.269322 ,_u=0.156382 ,_P=0.019598 
			, act=28,_n_visits=3 ,_Q=-0.013959 ,_u=1.428502 ,_P=0.466547 
				, act=21,_n_visits=2 ,_Q=0.118545 ,_u=1.938271 ,_P=0.548226 
			, act=36,_n_visits=3 ,_Q=0.723727 ,_u=0.795652 ,_P=0.194894 
		, act=20,_n_visits=2 ,_Q=-0.453386 ,_u=0.192174 ,_P=0.009031 
		, act=21,_n_visits=9 ,_Q=-0.226283 ,_u=0.156564 ,_P=0.024526 
			, act=27,_n_visits=5 ,_Q=0.488635 ,_u=0.656950 ,_P=0.232267 
				, act=19,_n_visits=4 ,_Q=-0.544839 ,_u=1.526171 ,_P=0.610468 
					, act=20,_n_visits=3 ,_Q=0.707047 ,_u=2.385997 ,_P=0.826534 
			, act=36,_n_visits=3 ,_Q=-0.167797 ,_u=1.099131 ,_P=0.310881 
				, act=37,_n_visits=2 ,_Q=0.369261 ,_u=1.709329 ,_P=0.483471 
		, act=26,_n_visits=5 ,_Q=-0.488121 ,_u=0.416311 ,_P=0.039130 
			, act=27,_n_visits=3 ,_Q=0.620914 ,_u=1.154112 ,_P=0.346234 
				, act=19,_n_visits=2 ,_Q=-0.628788 ,_u=2.815251 ,_P=0.796273 
		, act=27,_n_visits=4 ,_Q=-0.551821 ,_u=0.368254 ,_P=0.028844 
			, act=28,_n_visits=3 ,_Q=0.679648 ,_u=1.764206 ,_P=0.611139 
		, act=28,_n_visits=51 ,_Q=-0.643413 ,_u=0.614193 ,_P=0.500316 
			, act=26,_n_visits=2 ,_Q=-0.046136 ,_u=0.528276 ,_P=0.044826 
			, act=27,_n_visits=22 ,_Q=0.642367 ,_u=0.573347 ,_P=0.372984 
				, act=19,_n_visits=19 ,_Q=-0.616728 ,_u=0.968660 ,_P=0.803240 
					, act=10,_n_visits=4 ,_Q=0.458661 ,_u=0.965780 ,_P=0.182109 
					, act=37,_n_visits=11 ,_Q=0.803694 ,_u=0.679286 ,_P=0.384262 
						, act=36,_n_visits=10 ,_Q=-0.808126 ,_u=1.456837 ,_P=0.921385 
							, act=20,_n_visits=6 ,_Q=0.840890 ,_u=1.425785 ,_P=0.570314 
							, act=44,_n_visits=3 ,_Q=0.693072 ,_u=1.224395 ,_P=0.326505 
					, act=43,_n_visits=2 ,_Q=0.216855 ,_u=1.060660 ,_P=0.150000 
			, act=36,_n_visits=26 ,_Q=0.701468 ,_u=0.566880 ,_P=0.416879 
				, act=37,_n_visits=21 ,_Q=-0.686978 ,_u=0.928380 ,_P=0.779840 
					, act=19,_n_visits=11 ,_Q=0.901339 ,_u=0.562010 ,_P=0.276472 
						, act=27,_n_visits=10 ,_Q=-0.903239 ,_u=1.488088 ,_P=0.941149 
							, act=26,_n_visits=3 ,_Q=0.975513 ,_u=1.247852 ,_P=0.249570 
							, act=29,_n_visits=6 ,_Q=0.854688 ,_u=1.336465 ,_P=0.623684 
								, act=22,_n_visits=2 ,_Q=-0.987045 ,_u=1.876242 ,_P=0.335632 
					, act=27,_n_visits=3 ,_Q=0.211159 ,_u=0.696040 ,_P=0.124511 
						, act=19,_n_visits=2 ,_Q=-0.074831 ,_u=3.154536 ,_P=0.892238 
					, act=46,_n_visits=5 ,_Q=0.633744 ,_u=0.788997 ,_P=0.211710 
						, act=19,_n_visits=2 ,_Q=-0.401619 ,_u=1.329921 ,_P=0.265984 
		, act=29,_n_visits=2 ,_Q=-0.353277 ,_u=0.215836 ,_P=0.010143 
		, act=32,_n_visits=2 ,_Q=-0.433429 ,_u=0.026845 ,_P=0.001262 
		, act=36,_n_visits=4 ,_Q=-0.656860 ,_u=0.481576 ,_P=0.037720 
			, act=28,_n_visits=3 ,_Q=0.805237 ,_u=1.726618 ,_P=0.598118 
		, act=37,_n_visits=13 ,_Q=-0.178011 ,_u=0.096716 ,_P=0.021211 
			, act=27,_n_visits=4 ,_Q=0.331033 ,_u=0.707519 ,_P=0.204243 
				, act=19,_n_visits=2 ,_Q=-0.308755 ,_u=1.175578 ,_P=0.271488 
			, act=28,_n_visits=8 ,_Q=0.138524 ,_u=1.078394 ,_P=0.498089 
				, act=21,_n_visits=6 ,_Q=0.006721 ,_u=1.015354 ,_P=0.460522 
					, act=29,_n_visits=4 ,_Q=0.243062 ,_u=1.602219 ,_P=0.573227 
						, act=30,_n_visits=3 ,_Q=-0.259366 ,_u=2.046454 ,_P=0.708912 
		, act=42,_n_visits=3 ,_Q=-0.716200 ,_u=0.542660 ,_P=0.034004 
			, act=27,_n_visits=2 ,_Q=0.824671 ,_u=1.270746 ,_P=0.359421 
		, act=44,_n_visits=3 ,_Q=-0.690611 ,_u=0.492385 ,_P=0.030853 
			, act=36,_n_visits=2 ,_Q=0.891452 ,_u=1.318017 ,_P=0.372791 
		, act=46,_n_visits=2 ,_Q=-0.265342 ,_u=0.152422 ,_P=0.007163 
		, act=48,_n_visits=3 ,_Q=-0.014280 ,_u=0.108676 ,_P=0.005107 
			, act=28,_n_visits=2 ,_Q=0.115462 ,_u=1.422586 ,_P=0.402368 
	, act=36,_n_visits=206 ,_Q=0.493156 ,_u=0.135389 ,_P=0.198295 
		, act=1,_n_visits=2 ,_Q=-0.327716 ,_u=0.061656 ,_P=0.002584 
		, act=2,_n_visits=2 ,_Q=-0.525546 ,_u=0.057245 ,_P=0.002399 
		, act=3,_n_visits=2 ,_Q=-0.422939 ,_u=0.050571 ,_P=0.002119 
		, act=4,_n_visits=2 ,_Q=-0.377521 ,_u=0.093878 ,_P=0.003934 
		, act=8,_n_visits=6 ,_Q=-0.197412 ,_u=0.047235 ,_P=0.004619 
			, act=27,_n_visits=2 ,_Q=0.172668 ,_u=1.422024 ,_P=0.381569 
			, act=28,_n_visits=2 ,_Q=0.114629 ,_u=0.983914 ,_P=0.264012 
		, act=9,_n_visits=2 ,_Q=-0.248283 ,_u=0.040006 ,_P=0.001676 
		, act=10,_n_visits=2 ,_Q=-0.553534 ,_u=0.123933 ,_P=0.005193 
		, act=18,_n_visits=9 ,_Q=-0.462452 ,_u=0.332581 ,_P=0.046457 
			, act=28,_n_visits=2 ,_Q=-0.292724 ,_u=1.154056 ,_P=0.244812 
			, act=35,_n_visits=6 ,_Q=0.792367 ,_u=0.660128 ,_P=0.280068 
				, act=34,_n_visits=4 ,_Q=-0.849978 ,_u=1.554532 ,_P=0.556166 
					, act=26,_n_visits=3 ,_Q=0.897936 ,_u=1.961762 ,_P=0.679574 
		, act=19,_n_visits=4 ,_Q=-0.306404 ,_u=0.140312 ,_P=0.009800 
			, act=27,_n_visits=3 ,_Q=0.508752 ,_u=1.169199 ,_P=0.405023 
				, act=18,_n_visits=2 ,_Q=-0.651676 ,_u=2.156640 ,_P=0.609990 
		, act=20,_n_visits=15 ,_Q=-0.084819 ,_u=0.116576 ,_P=0.024426 
			, act=27,_n_visits=8 ,_Q=-0.107960 ,_u=1.083542 ,_P=0.463342 
				, act=18,_n_visits=6 ,_Q=0.395494 ,_u=1.210155 ,_P=0.548875 
					, act=19,_n_visits=5 ,_Q=-0.302121 ,_u=1.689607 ,_P=0.755615 
						, act=11,_n_visits=3 ,_Q=0.648154 ,_u=1.694776 ,_P=0.508433 
			, act=28,_n_visits=2 ,_Q=0.303412 ,_u=0.385879 ,_P=0.061878 
			, act=35,_n_visits=4 ,_Q=0.443864 ,_u=0.497722 ,_P=0.133022 
		, act=21,_n_visits=5 ,_Q=-0.339598 ,_u=0.177279 ,_P=0.014858 
			, act=27,_n_visits=4 ,_Q=0.410865 ,_u=1.229464 ,_P=0.491786 
				, act=18,_n_visits=2 ,_Q=-0.164816 ,_u=1.406258 ,_P=0.487142 
		, act=27,_n_visits=49 ,_Q=-0.766117 ,_u=0.647778 ,_P=0.452427 
			, act=28,_n_visits=22 ,_Q=0.813776 ,_u=0.563106 ,_P=0.357621 
				, act=20,_n_visits=18 ,_Q=-0.797705 ,_u=1.048455 ,_P=0.823650 
					, act=34,_n_visits=11 ,_Q=0.917386 ,_u=0.764274 ,_P=0.407800 
						, act=35,_n_visits=10 ,_Q=-0.909520 ,_u=1.490832 ,_P=0.942885 
							, act=19,_n_visits=4 ,_Q=0.842634 ,_u=1.239081 ,_P=0.413027 
							, act=43,_n_visits=5 ,_Q=0.962893 ,_u=1.273770 ,_P=0.424590 
								, act=19,_n_visits=2 ,_Q=-0.967556 ,_u=1.495460 ,_P=0.448638 
					, act=35,_n_visits=3 ,_Q=0.800281 ,_u=0.719312 ,_P=0.139567 
						, act=34,_n_visits=2 ,_Q=-0.796763 ,_u=3.087315 ,_P=0.873225 
					, act=44,_n_visits=2 ,_Q=0.741146 ,_u=0.720837 ,_P=0.104897 
			, act=35,_n_visits=24 ,_Q=0.797960 ,_u=0.530278 ,_P=0.382695 
				, act=34,_n_visits=18 ,_Q=-0.756013 ,_u=0.985338 ,_P=0.739646 
					, act=20,_n_visits=6 ,_Q=0.759747 ,_u=0.863677 ,_P=0.293262 
						, act=28,_n_visits=5 ,_Q=-0.721861 ,_u=2.066503 ,_P=0.924168 
							, act=29,_n_visits=3 ,_Q=0.956090 ,_u=1.280097 ,_P=0.384029 
					, act=28,_n_visits=3 ,_Q=0.548980 ,_u=0.695815 ,_P=0.135008 
						, act=20,_n_visits=2 ,_Q=-0.389072 ,_u=2.936032 ,_P=0.830435 
					, act=37,_n_visits=5 ,_Q=0.794501 ,_u=0.745048 ,_P=0.180701 
						, act=38,_n_visits=3 ,_Q=-0.956281 ,_u=1.786278 ,_P=0.714511 
							, act=20,_n_visits=2 ,_Q=0.982556 ,_u=2.506504 ,_P=0.708946 
					, act=41,_n_visits=3 ,_Q=0.821011 ,_u=0.699014 ,_P=0.135629 
		, act=28,_n_visits=4 ,_Q=-0.611261 ,_u=0.475917 ,_P=0.033239 
			, act=27,_n_visits=3 ,_Q=0.775069 ,_u=1.532128 ,_P=0.530745 
		, act=29,_n_visits=10 ,_Q=-0.441284 ,_u=0.289547 ,_P=0.044490 
			, act=27,_n_visits=2 ,_Q=0.323227 ,_u=1.052465 ,_P=0.210493 
			, act=28,_n_visits=6 ,_Q=0.476465 ,_u=1.075635 ,_P=0.430254 
				, act=20,_n_visits=5 ,_Q=-0.466093 ,_u=1.835785 ,_P=0.820988 
					, act=38,_n_visits=2 ,_Q=0.702720 ,_u=0.608844 ,_P=0.182653 
		, act=32,_n_visits=2 ,_Q=-0.275379 ,_u=0.084119 ,_P=0.003525 
		, act=33,_n_visits=2 ,_Q=-0.501406 ,_u=0.103510 ,_P=0.004338 
		, act=34,_n_visits=10 ,_Q=-0.271272 ,_u=0.133494 ,_P=0.020512 
			, act=27,_n_visits=7 ,_Q=0.276063 ,_u=1.178679 ,_P=0.550050 
				, act=18,_n_visits=4 ,_Q=-0.058847 ,_u=1.219894 ,_P=0.498020 
					, act=26,_n_visits=3 ,_Q=0.272370 ,_u=1.924102 ,_P=0.666528 
						, act=25,_n_visits=2 ,_Q=0.075270 ,_u=1.911006 ,_P=0.540514 
				, act=45,_n_visits=2 ,_Q=-0.528664 ,_u=1.889566 ,_P=0.308565 
			, act=28,_n_visits=2 ,_Q=0.511357 ,_u=0.756065 ,_P=0.151213 
		, act=35,_n_visits=6 ,_Q=-0.657799 ,_u=0.430329 ,_P=0.042078 
			, act=27,_n_visits=3 ,_Q=0.812160 ,_u=1.472037 ,_P=0.394989 
			, act=28,_n_visits=2 ,_Q=0.689378 ,_u=1.313297 ,_P=0.352395 
		, act=42,_n_visits=2 ,_Q=-0.717491 ,_u=0.453573 ,_P=0.019007 
		, act=43,_n_visits=3 ,_Q=-0.706385 ,_u=0.537022 ,_P=0.030006 
			, act=35,_n_visits=2 ,_Q=0.902659 ,_u=1.390399 ,_P=0.393264 
		, act=45,_n_visits=4 ,_Q=-0.837022 ,_u=0.576621 ,_P=0.040273 
			, act=35,_n_visits=2 ,_Q=0.836585 ,_u=0.734688 ,_P=0.254503 
		, act=48,_n_visits=13 ,_Q=-0.183513 ,_u=0.029456 ,_P=0.005760 
			, act=27,_n_visits=6 ,_Q=0.139221 ,_u=1.015738 ,_P=0.410506 
				, act=18,_n_visits=3 ,_Q=-0.018522 ,_u=1.534746 ,_P=0.411816 
					, act=45,_n_visits=2 ,_Q=0.381959 ,_u=2.034072 ,_P=0.575322 
				, act=45,_n_visits=2 ,_Q=-0.320098 ,_u=1.566204 ,_P=0.420257 
			, act=28,_n_visits=3 ,_Q=0.252122 ,_u=1.295780 ,_P=0.224436 
				, act=20,_n_visits=2 ,_Q=-0.108003 ,_u=1.482892 ,_P=0.419425 
			, act=35,_n_visits=3 ,_Q=0.336723 ,_u=0.600453 ,_P=0.138669 
		, act=49,_n_visits=2 ,_Q=-0.293640 ,_u=0.043256 ,_P=0.001813 
		, act=50,_n_visits=4 ,_Q=-0.395144 ,_u=0.146799 ,_P=0.010253 
			, act=27,_n_visits=2 ,_Q=0.151526 ,_u=1.403314 ,_P=0.486122 
		, act=57,_n_visits=5 ,_Q=-0.282833 ,_u=0.051124 ,_P=0.004285 
			, act=35,_n_visits=2 ,_Q=0.775605 ,_u=1.292591 ,_P=0.258518 
	, act=37,_n_visits=4 ,_Q=0.259429 ,_u=0.302998 ,_P=0.010719 
		, act=28,_n_visits=2 ,_Q=-0.460422 ,_u=1.269522 ,_P=0.439775 
	, act=42,_n_visits=7 ,_Q=-0.250396 ,_u=0.828577 ,_P=0.046901 
		, act=35,_n_visits=4 ,_Q=0.330735 ,_u=0.872967 ,_P=0.356387 
	, act=43,_n_visits=3 ,_Q=0.215870 ,_u=0.311616 ,_P=0.008819 
		, act=36,_n_visits=2 ,_Q=-0.150397 ,_u=1.779955 ,_P=0.503447 
	, act=44,_n_visits=3 ,_Q=0.215788 ,_u=0.262477 ,_P=0.007429 
		, act=35,_n_visits=2 ,_Q=-0.165841 ,_u=1.632656 ,_P=0.461785 
	, act=45,_n_visits=6 ,_Q=-0.228874 ,_u=0.816619 ,_P=0.040446 
		, act=36,_n_visits=4 ,_Q=0.339856 ,_u=1.106024 ,_P=0.395703 

3.2.1.2 构建蒙特卡洛树节点

Gamma公式展示 Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n1)!nN 是通过 Euler integral

Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t &ThinSpace; . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=0tz1etdt.
构建过程

1、初始搜索树为只有根节点。
2、搜索路径:依据选择函数来确定
选择函数:
所有儿子节点中权重最大值
权重:(抑制已经选择过的)
u= c_puct * _P * ( V i s i t ( p ) / V i s i t ) \sqrt(Visit(p)/Visit) ( Visit(p)/Visit)
Q+u

3、叶子节点再扩充搜索树节点
3.1 、搜索树节点:依据策略函数来确定
4、更新搜索路径值
4.1 上级节点更新
4.2 本级节点更新
访问节点数: +1
Q += 增量价值/访问次数

s9
s8
s7
s6
s5
s4
s3
s2
s1
act36
act35
act36
act28
act35
act36
act27
act28
act35
act36
act27
act28
act35
act36
act27
act27
act28
act35
act35
act36
act27
act27
act28
act35
act35
act28
act36
act27
act27
act36
act28
act35
act35
act28
act36
act27
act27
act36
act28
act35
act35
act28
act36
act36
act27
visited:10
visited:2
visited:1
visited:2
visited:1
visited:3
visited:2
visited:1
visited:2
visited:1
visited:9
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:8
visited:1
visited:2
visited:1
visited:2
visited:1
visited:2
visited:1
visited:7
visited:1
visited:2
visited:1
visited:1
visited:2
visited:1
visited:6
visited:1
visited:1
visited:1
visited:2
visited:1
visited:5
visited:1
visited:1
visited:1
visited:1
visited:4
visited:1
visited:1
visited:1
visited:3
visited:1
visited:1
visited:2
visited:1
player MCTS PolicyNet Node get_move_probs( state):act copy.deepcopy(state):state_copy _playout(state_copy) getNext(node):node policy(state_copy):action_probs, leaf_value expand(action_probs) update_recursive(leaf_value) loop [ ] player MCTS PolicyNet Node

数据变化

MSTC

属性方法值 变化
_rootNode(1)_playout1、getPath 2、 _policy 3、expand 4、update
_policy
_c_puct
_n_playout

Node

属性方法值变化
_parentinitself._parent = parent
_children{}expand由action_priors 生成 _children[action] = TreeNode( prob)
_n_visits0update_n_visits +=1
_Q0updateself._Q += 1.0*(leaf_value - self._Q) / self._n_visits
_u0get_valueself._u = (c_puct * self._P * np.sqrt(self._parent._n_visits) / (1 + self._n_visits))
_pprior_pinitself._P = prior_p

待续未完。。。

Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐