Skip to content

ding.league.starcraft_player

ding.league.starcraft_player

MainPlayer

Bases: ActivePlayer

Overview

Main player in league training. Default branch (0.5 pfsp, 0.35 sp, 0.15 veri). Default snapshot every 2e9 steps. Default mutate prob = 0 (never mutate).

Interface: init, is_trained_enough, snapshot, mutate, get_job Property: race, payoff, checkpoint_path, player_id, train_iteration

mutate(info)

Overview

MainPlayer does not mutate

MainExploiter

Bases: ActivePlayer

Overview

Main exploiter in league training. Can identify weaknesses of main agents, and consequently make them more robust. Default branch (1.0 main_players). Default snapshot when defeating all 3 main players in the league in more than 70% of games, or timeout of 4e9 steps. Default mutate prob = 1 (must mutate).

Interface: init, is_trained_enough, snapshot, mutate, get_job Property: race, payoff, checkpoint_path, player_id, train_iteration

__init__(*args, **kwargs)

Overview

Initialize min_valid_win_rate additionally

Note: - min_valid_win_rate (:obj:float): only when win rate against the main player is greater than this, can the main player be regarded as able to produce valid training signals to be selected

mutate(info)

Overview

Main exploiter is sure to mutate(reset) to the supervised learning player

Returns: - mutate_ckpt_path (:obj:str): mutation target checkpoint path

LeagueExploiter

Bases: ActivePlayer

Overview

League exploiter in league training. Can identify global blind spots in the league (strategies that no player in the league can beat, but that are not necessarily robust themselves). Default branch (1.0 pfsp). Default snapshot when defeating all players in the league in more than 70% of games, or timeout of 2e9 steps. Default mutate prob = 0.25.

Interface: init, is_trained_enough, snapshot, mutate, get_job Property: race, payoff, checkpoint_path, player_id, train_iteration

__init__(*args, **kwargs)

Overview

Initialize mutate_prob additionally

Note: - mutate_prob (:obj:float): the mutation probability of league exploiter. should be in [0, 1]

mutate(info)

Overview

League exploiter can mutate to the supervised learning player with 0.25 prob

Returns: - ckpt_path (:obj:Union[str, None]): with mutate_prob prob returns the pretrained model's ckpt path, with left 1 - mutate_prob prob returns None, which means no mutation

Full Source Code

../ding/league/starcraft_player.py

1from typing import Optional, Union 2import numpy as np 3 4from ding.utils import PLAYER_REGISTRY 5from .player import ActivePlayer, HistoricalPlayer 6from .algorithm import pfsp 7 8 9@PLAYER_REGISTRY.register('main_player') 10class MainPlayer(ActivePlayer): 11 """ 12 Overview: 13 Main player in league training. 14 Default branch (0.5 pfsp, 0.35 sp, 0.15 veri). 15 Default snapshot every 2e9 steps. 16 Default mutate prob = 0 (never mutate). 17 Interface: 18 __init__, is_trained_enough, snapshot, mutate, get_job 19 Property: 20 race, payoff, checkpoint_path, player_id, train_iteration 21 """ 22 _name = "MainPlayer" 23 24 def _pfsp_branch(self) -> HistoricalPlayer: 25 """ 26 Overview: 27 Select prioritized fictitious self-play opponent, should be a historical player. 28 Returns: 29 - player (:obj:`HistoricalPlayer`): the selected historical player 30 """ 31 historical = self._get_players(lambda p: isinstance(p, HistoricalPlayer)) 32 win_rates = self._payoff[self, historical] 33 p = pfsp(win_rates, weighting='squared') 34 return self._get_opponent(historical, p) 35 36 def _sp_branch(self): 37 """ 38 Overview: 39 Select normal self-play opponent 40 """ 41 main_players = self._get_players(lambda p: isinstance(p, MainPlayer)) 42 main_opponent = self._get_opponent(main_players) 43 44 # TODO(nyz) if only one main_player, self-play win_rates are constantly equal to 0.5 45 # main_opponent is not too strong 46 if self._payoff[self, main_opponent] > 1 - self._strong_win_rate: 47 return main_opponent 48 49 # if the main_opponent is too strong, select a past alternative 50 historical = self._get_players( 51 lambda p: isinstance(p, HistoricalPlayer) and p.parent_id == main_opponent.player_id 52 ) 53 win_rates = self._payoff[self, historical] 54 p = pfsp(win_rates, weighting='variance') 55 return self._get_opponent(historical, p) 56 57 def _verification_branch(self): 58 """ 59 Overview: 60 Verify no strong historical main exploiter and no forgotten historical past main player 61 """ 62 # check exploitation 63 main_exploiters = self._get_players(lambda p: isinstance(p, MainExploiter)) 64 exp_historical = self._get_players( 65 lambda p: isinstance(p, HistoricalPlayer) and any([p.parent_id == m.player_id for m in main_exploiters]) 66 ) 67 win_rates = self._payoff[self, exp_historical] 68 # TODO(nyz) why min win_rates 0.3 69 if len(win_rates) and win_rates.min() < 1 - self._strong_win_rate: 70 p = pfsp(win_rates, weighting='squared') 71 return self._get_opponent(exp_historical, p) 72 73 # check forgotten 74 main_players = self._get_players(lambda p: isinstance(p, MainPlayer)) 75 main_opponent = self._get_opponent(main_players) # only one main player 76 main_historical = self._get_players( 77 lambda p: isinstance(p, HistoricalPlayer) and p.parent_id == main_opponent.player_id 78 ) 79 win_rates = self._payoff[self, main_historical] 80 # TODO(nyz) whether the method `_get_players` should return players with some sequence(such as step) 81 # win_rates, historical = self._remove_monotonic_suffix(win_rates, historical) 82 if len(win_rates) and win_rates.min() < self._strong_win_rate: 83 p = pfsp(win_rates, weighting='squared') 84 return self._get_opponent(main_historical, p) 85 86 # no forgotten main players or strong main exploiters, use self-play instead 87 return self._sp_branch() 88 89 # def _remove_monotonic_suffix(self, win_rates, players): 90 # if not len(win_rates): 91 # return win_rates, players 92 # for i in range(len(win_rates) - 1, 0, -1): 93 # if win_rates[i - 1] < win_rates[i]: 94 # return win_rates[:i + 1], players[:i + 1] 95 # return np.array([]), [] 96 97 # override 98 def is_trained_enough(self) -> bool: 99 # ``_pfsp_branch`` and ``_verification_branch`` are played against historcial player 100 return super().is_trained_enough(select_fn=lambda p: isinstance(p, HistoricalPlayer)) 101 102 # override 103 def mutate(self, info: dict) -> None: 104 """ 105 Overview: 106 MainPlayer does not mutate 107 """ 108 pass 109 110 111@PLAYER_REGISTRY.register('main_exploiter') 112class MainExploiter(ActivePlayer): 113 """ 114 Overview: 115 Main exploiter in league training. Can identify weaknesses of main agents, and consequently make them 116 more robust. 117 Default branch (1.0 main_players). 118 Default snapshot when defeating all 3 main players in the league in more than 70% of games, 119 or timeout of 4e9 steps. 120 Default mutate prob = 1 (must mutate). 121 Interface: 122 __init__, is_trained_enough, snapshot, mutate, get_job 123 Property: 124 race, payoff, checkpoint_path, player_id, train_iteration 125 """ 126 _name = "MainExploiter" 127 128 def __init__(self, *args, **kwargs): 129 """ 130 Overview: 131 Initialize ``min_valid_win_rate`` additionally 132 Note: 133 - min_valid_win_rate (:obj:`float`): only when win rate against the main player is greater than this, \ 134 can the main player be regarded as able to produce valid training signals to be selected 135 """ 136 super(MainExploiter, self).__init__(*args, **kwargs) 137 self._min_valid_win_rate = self._cfg.min_valid_win_rate 138 139 def _main_players_branch(self): 140 """ 141 Overview: 142 Select main player or historical player snapshot from main player as opponent 143 Returns: 144 - player (:obj:`Player`): the selected main player (active/historical) 145 """ 146 # get the main player (only one) 147 main_players = self._get_players(lambda p: isinstance(p, MainPlayer)) 148 main_opponent = self._get_opponent(main_players) 149 # if this main_opponent can produce valid training signals 150 if self._payoff[self, main_opponent] >= self._min_valid_win_rate: 151 return main_opponent 152 # otherwise, curriculum learning, select a historical version 153 historical = self._get_players( 154 lambda p: isinstance(p, HistoricalPlayer) and p.parent_id == main_opponent.player_id 155 ) 156 win_rates = self._payoff[self, historical] 157 p = pfsp(win_rates, weighting='variance') 158 return self._get_opponent(historical, p) 159 160 # override 161 def is_trained_enough(self): 162 # would play against main player, or historical main player (if main player is too strong) 163 return super().is_trained_enough(select_fn=lambda p: isinstance(p, MainPlayer)) 164 165 # override 166 def mutate(self, info: dict) -> str: 167 """ 168 Overview: 169 Main exploiter is sure to mutate(reset) to the supervised learning player 170 Returns: 171 - mutate_ckpt_path (:obj:`str`): mutation target checkpoint path 172 """ 173 return info['reset_checkpoint_path'] 174 175 176@PLAYER_REGISTRY.register('league_exploiter') 177class LeagueExploiter(ActivePlayer): 178 """ 179 Overview: 180 League exploiter in league training. Can identify global blind spots in the league (strategies that no player 181 in the league can beat, but that are not necessarily robust themselves). 182 Default branch (1.0 pfsp). 183 Default snapshot when defeating all players in the league in more than 70% of games, or timeout of 2e9 steps. 184 Default mutate prob = 0.25. 185 Interface: 186 __init__, is_trained_enough, snapshot, mutate, get_job 187 Property: 188 race, payoff, checkpoint_path, player_id, train_iteration 189 """ 190 _name = "LeagueExploiter" 191 192 def __init__(self, *args, **kwargs) -> None: 193 """ 194 Overview: 195 Initialize ``mutate_prob`` additionally 196 Note: 197 - mutate_prob (:obj:`float`): the mutation probability of league exploiter. should be in [0, 1] 198 """ 199 super(LeagueExploiter, self).__init__(*args, **kwargs) 200 assert 0 <= self._cfg.mutate_prob <= 1 201 self.mutate_prob = self._cfg.mutate_prob 202 203 def _pfsp_branch(self) -> HistoricalPlayer: 204 """ 205 Overview: 206 Select prioritized fictitious self-play opponent 207 Returns: 208 - player (:obj:`HistoricalPlayer`): the selected historical player 209 Note: 210 This branch is the same as the psfp branch in MainPlayer 211 """ 212 historical = self._get_players(lambda p: isinstance(p, HistoricalPlayer)) 213 win_rates = self._payoff[self, historical] 214 p = pfsp(win_rates, weighting='squared') 215 return self._get_opponent(historical, p) 216 217 # override 218 def is_trained_enough(self) -> bool: 219 # will only player against historical player 220 return super().is_trained_enough(select_fn=lambda p: isinstance(p, HistoricalPlayer)) 221 222 # override 223 def mutate(self, info) -> Union[str, None]: 224 """ 225 Overview: 226 League exploiter can mutate to the supervised learning player with 0.25 prob 227 Returns: 228 - ckpt_path (:obj:`Union[str, None]`): with ``mutate_prob`` prob returns the pretrained model's ckpt path, \ 229 with left 1 - ``mutate_prob`` prob returns None, which means no mutation 230 """ 231 p = np.random.uniform() 232 if p < self.mutate_prob: 233 return info['reset_checkpoint_path'] 234 return None