仕事と単位に追われた貴重な時間を使った暇つぶし備忘録

きっかけ
オートリオとは
簡単なコードを書く。
オートリオAIを作れ！
- 以下コード
- ざっくりとした説明

きっかけ

先日このゲームを友達が購入した。俺も友達の家でプレイをした。

結果は、1敗2引き分け。敗因は自分の不注意。3回で飽きた。

失礼ながら思ってしまった。奥が浅いと。

必勝法があるのではないかと思った。ググってみた。なかった。少なくとも日本語では。

そもそもこのボードゲームが日本ではあまり流行ってなかった。なので必勝法を考えてみた。

いや、考えさせてみた。AIに。

ということで、Google Colaboratoryで機械学習をさせてみようと思う。

colab.research.google.com

オートリオとは

youtu.be

このゲームを簡単にまとめる(２人対戦）

簡単にいうと、マルバツゲームの進化版である。

難しくいうと、一人2色使う、数に使用制限がある三次元の３並べである。

機械的にわかりやすくいうと、(0,0,0)から(2,2,2)までの27マスにzのみ各パラメーター3個までという制限の中、交互に２色の色を配置し、直線で結ぶことのできる３点に置くと勝利になるゲームだ。

簡単なコードを書く。

import random

class TicTacToe3D:

def __init__(self):

# 3x3x3のゲーム盤を作成（空の状態はNoneで表す）

self.board = [[[None for _ in range(3)] for _ in range(3)] for _ in range(3)]

self.moves_count = {z: 0 for z in range(3)}

self.players = ['X', 'O', 'A', 'B']

self.current_player_index = 0

def print_board(self):

# ゲーム盤の状態を表示

for layer in self.board:

for row in layer:

print(' '.join(['.' if cell is None else cell for cell in row]))

print()

def is_valid_move(self, x, y, z):

# 指定された手が有効かどうかをチェック

return 0 <= x < 3 and 0 <= y < 3 and 0 <= z < 3 and self.board[x][y][z] is None

def make_move(self, x, y, z):

if self.is_valid_move(x, y, z):

self.board[x][y][z] = self.players[self.current_player_index]

self.current_player_index = (self.current_player_index + 1) % len(self.players)

return True

return False

def check_winner(self):

# 勝利条件のリスト

winning_conditions = [

# ここに勝利条件をリストアップ

[(0, 0, 0), (0, 0, 1), (0, 0, 2)],

[(0, 0, 0), (0, 1, 0), (0, 2, 0)],

[(0, 0, 0), (0, 1, 1), (0, 2, 2)],

[(0, 0, 0), (1, 0, 0), (2, 0, 0)],

[(0, 0, 0), (1, 0, 1), (2, 0, 2)],

[(0, 0, 0), (1, 1, 0), (2, 2, 0)],

[(0, 0, 0), (1, 1, 1), (2, 2, 2)],

[(0, 0, 1), (0, 1, 1), (0, 2, 1)],

[(0, 0, 1), (1, 0, 1), (2, 0, 1)],

[(0, 0, 1), (1, 1, 1), (2, 2, 1)],

[(0, 0, 2), (0, 1, 1), (0, 2, 0)],

[(0, 0, 2), (0, 1, 2), (0, 2, 2)],

[(0, 0, 2), (1, 0, 1), (2, 0, 0)],

[(0, 0, 2), (1, 0, 2), (2, 0, 2)],

[(0, 0, 2), (1, 1, 1), (2, 2, 0)],

[(0, 0, 2), (1, 1, 2), (2, 2, 2)],

[(0, 1, 0), (0, 1, 1), (0, 1, 2)],

[(0, 1, 0), (1, 1, 0), (2, 1, 0)],

[(0, 1, 0), (1, 1, 1), (2, 1, 2)],

[(0, 1, 1), (1, 1, 1), (2, 1, 1)],

[(0, 1, 2), (1, 1, 1), (2, 1, 0)],

[(0, 1, 2), (1, 1, 2), (2, 1, 2)],

[(0, 2, 0), (0, 2, 1), (0, 2, 2)],

[(0, 2, 0), (1, 1, 0), (2, 0, 0)],

[(0, 2, 0), (1, 1, 1), (2, 0, 2)],

[(0, 2, 0), (1, 2, 0), (2, 2, 0)],

[(0, 2, 0), (1, 2, 1), (2, 2, 2)],

[(0, 2, 1), (1, 1, 1), (2, 0, 1)],

[(0, 2, 1), (1, 2, 1), (2, 2, 1)],

[(0, 2, 2), (1, 1, 1), (2, 0, 0)],

[(0, 2, 2), (1, 1, 2), (2, 0, 2)],

[(0, 2, 2), (1, 2, 1), (2, 2, 0)],

[(0, 2, 2), (1, 2, 2), (2, 2, 2)],

[(1, 0, 0), (1, 0, 1), (1, 0, 2)],

[(1, 0, 0), (1, 1, 0), (1, 2, 0)],

[(1, 0, 0), (1, 1, 1), (1, 2, 2)],

[(1, 0, 1), (1, 1, 1), (1, 2, 1)],

[(1, 0, 2), (1, 1, 1), (1, 2, 0)],

[(1, 0, 2), (1, 1, 2), (1, 2, 2)],

[(1, 1, 0), (1, 1, 1), (1, 1, 2)],

[(1, 2, 0), (1, 2, 1), (1, 2, 2)],

[(2, 0, 0), (2, 0, 1), (2, 0, 2)],

[(2, 0, 0), (2, 1, 0), (2, 2, 0)],

[(2, 0, 0), (2, 1, 1), (2, 2, 2)],

[(2, 0, 1), (2, 1, 1), (2, 2, 1)],

[(2, 0, 2), (2, 1, 1), (2, 2, 0)],

[(2, 0, 2), (2, 1, 2), (2, 2, 2)],

[(2, 1, 0), (2, 1, 1), (2, 1, 2)],

[(2, 2, 0), (2, 2, 1), (2, 2, 2)]

]

for condition in winning_conditions:

if self.check_line(condition):

return self.board[condition[0][0]][condition[0][1]][condition[0][2]]

return None

def check_line(self, line):

# ライン上のマスが全て同じプレイヤーかチェック

first_cell = self.board[line[0][0]][line[0][1]][line[0][2]]

if first_cell is None:

return False

return all(self.board[x][y][z] == first_cell for x, y, z in line)

def print_board(self):

# ゲーム盤の状態を表示

for layer in self.board:

for row in layer:

print(' '.join(['.' if cell is None else cell for cell in row]))

print()

# ここで線を引く

print('-' * 15)

def is_valid_move(self, x, y, z):

# Z座標に対する手の数が3を超えていないかチェック

return 0 <= x < 3 and 0 <= y < 3 and 0 <= z < 3 and self.board[x][y][z] is None

def is_game_over(self):

# 全てのZ座標で3手が打たれたかチェック

return all(count >= 3 for count in self.moves_count.values())

class RandomAgent:

def __init__(self, symbol):

self.symbol = symbol

def select_move(self, game):

available_moves =

for x in range(3):

for y in range(3):

for z in range(3):

if game.is_valid_move(x, y, z):

available_moves.append((x, y, z))

return random.choice(available_moves) if available_moves else None

def play_game(players):

game = TicTacToe3D()

while not game.is_game_over():

current_player = players[game.current_player_index]

move = current_player.select_move(game)

if move is None:

print(f"No more valid moves for player {current_player.symbol}. It's a draw!")

break

game.make_move(*move)

game.print_board()

winner = game.check_winner()

if winner:

print(f"Player {winner} wins!")

break

# 4人のランダムエージェントでゲームをプレイ

players = [RandomAgent(symbol) for symbol in ['X', 'O', 'A', 'B']]

play_game(players)

とりあえず簡単にコードを書いた。

勝利条件の表し方が不細工だが、47通りだと全部記述した方が早い気がする。

機械同士の４人対戦
全員ランダムに置く

という条件でオートリオを再現した。

. . .

. . .

. . .

. . .

X . .

. . .

. . .

. . .

. . .

-----

1手目はこう示される。

これを続けていくと、

O B A

B X X

O X B

X O B

X A B

O X O

A B .

A O A

O A X

-----

Player O wins!

このようにOが勝利した。

オートリオAIを作れ！

これをAIに学習させることにより、最強のAIを作り出す。

そのAIが決まった手しか打たなくなり、負けることがなくなったがそれが必勝法である。

以下コード

class TicTacToe3D:

def __init__(self):

self.board = [[[None for _ in range(3)] for _ in range(3)] for _ in range(3)]

self.current_player = 'X'

# 勝利条件

self.winning_conditions = [

[(0, 0, 0), (0, 1, 0), (0, 2, 0)],

[(0, 0, 0), (0, 1, 1), (0, 2, 2)],

[(0, 0, 0), (1, 0, 0), (2, 0, 0)],

[(0, 0, 0), (1, 0, 1), (2, 0, 2)],

[(0, 0, 0), (1, 1, 0), (2, 2, 0)],

[(0, 0, 0), (1, 1, 1), (2, 2, 2)],

[(0, 0, 1), (0, 1, 1), (0, 2, 1)],

[(0, 0, 1), (1, 0, 1), (2, 0, 1)],

[(0, 0, 1), (1, 1, 1), (2, 2, 1)],

[(0, 0, 2), (0, 1, 1), (0, 2, 0)],

[(0, 0, 2), (0, 1, 2), (0, 2, 2)],

[(0, 0, 2), (1, 0, 1), (2, 0, 0)],

[(0, 0, 2), (1, 0, 2), (2, 0, 2)],

[(0, 0, 2), (1, 1, 1), (2, 2, 0)],

[(0, 0, 2), (1, 1, 2), (2, 2, 2)],

[(0, 1, 0), (0, 1, 1), (0, 1, 2)],

[(0, 1, 0), (1, 1, 0), (2, 1, 0)],

[(0, 1, 0), (1, 1, 1), (2, 1, 2)],

[(0, 1, 1), (1, 1, 1), (2, 1, 1)],

[(0, 1, 2), (1, 1, 1), (2, 1, 0)],

[(0, 1, 2), (1, 1, 2), (2, 1, 2)],

[(0, 2, 0), (0, 2, 1), (0, 2, 2)],

[(0, 2, 0), (1, 1, 0), (2, 0, 0)],

[(0, 2, 0), (1, 1, 1), (2, 0, 2)],

[(0, 2, 0), (1, 2, 0), (2, 2, 0)],

[(0, 2, 0), (1, 2, 1), (2, 2, 2)],

[(0, 2, 1), (1, 1, 1), (2, 0, 1)],

[(0, 2, 1), (1, 2, 1), (2, 2, 1)],

[(0, 2, 2), (1, 1, 1), (2, 0, 0)],

[(0, 2, 2), (1, 1, 2), (2, 0, 2)],

[(0, 2, 2), (1, 2, 1), (2, 2, 0)],

[(0, 2, 2), (1, 2, 2), (2, 2, 2)],

[(1, 0, 0), (1, 0, 1), (1, 0, 2)],

[(1, 0, 0), (1, 1, 0), (1, 2, 0)],

[(1, 0, 0), (1, 1, 1), (1, 2, 2)],

[(1, 0, 1), (1, 1, 1), (1, 2, 1)],

[(1, 0, 2), (1, 1, 1), (1, 2, 0)],

[(1, 0, 2), (1, 1, 2), (1, 2, 2)],

[(1, 1, 0), (1, 1, 1), (1, 1, 2)],

[(1, 2, 0), (1, 2, 1), (1, 2, 2)],

[(2, 0, 0), (2, 0, 1), (2, 0, 2)],

[(2, 0, 0), (2, 1, 0), (2, 2, 0)],

[(2, 0, 0), (2, 1, 1), (2, 2, 2)],

[(2, 0, 1), (2, 1, 1), (2, 2, 1)],

[(2, 0, 2), (2, 1, 1), (2, 2, 0)],

[(2, 0, 2), (2, 1, 2), (2, 2, 2)],

[(2, 1, 0), (2, 1, 1), (2, 1, 2)],

[(2, 2, 0), (2, 2, 1), (2, 2, 2)]

]

def is_valid_move(self, x, y, z):

return 0 <= x < 3 and 0 <= y < 3 and 0 <= z < 3 and self.board[x][y][z] is None

def make_move(self, x, y, z):

if self.is_valid_move(x, y, z):

self.board[x][y][z] = self.current_player

self.current_player = 'O' if self.current_player == 'X' else 'X'

return True

return False

def check_winner(self):

for condition in self.winning_conditions:

if self.check_line(condition):

return self.board[condition[0][0]][condition[0][1]][condition[0][2]]

return None

def check_line(self, line):

first_cell = self.board[line[0][0]][line[0][1]][line[0][2]]

if first_cell is None:

return False

return all(self.board[x][y][z] == first_cell for x, y, z in line)

def print_board(self):

for layer in self.board:

for row in layer:

print(' '.join(['.' if cell is None else cell for cell in row]))

print()

↑ゲームのルールのセッティング。盤面、勝ちの定義など。

import gym

from gym import spaces

import numpy as np

class TicTacToe3DEnv(gym.Env):

def __init__(self):

super(TicTacToe3DEnv, self).__init__()

self.game = TicTacToe3D()

self.action_space = spaces.Discrete(27)

self.observation_space = spaces.Box(low=0, high=2, shape=(3, 3, 3), dtype=int)

def step(self, action):

x, y, z = action // 9, (action % 9) // 3, action % 3

if not self.game.make_move(x, y, z):

return self.get_state(), -20, True, {}

winner = self.game.check_winner()

done = winner is not None or self.is_board_full()

reward = 5

if winner else 0

return self.get_state(), reward, done, {}

def reset(self):

self.game = TicTacToe3D()

return self.get_state()

def render(self, mode='human'):

self.game.print_board()

def get_state(self):

state = np.zeros((3, 3, 3))

for x in range(3):

for y in range(3):

for z in range(3):

if self.game.board[x][y][z] == 'X':

state[x, y, z] = 1

elif self.game.board[x][y][z] == 'O':

state[x, y, z] = 2

return state

def is_board_full(self):

return all(cell is not None for layer in self.game.board for row in layer for cell in row)

↑ルールのセッティング、対戦の仕方、ゲームの進め方など。

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.optimizers import Adam

import random

from collections import deque

class DQNAgent:

def __init__(self, state_size, action_size):

self.state_size = state_size

self.action_size = action_size

self.memory = deque(maxlen=2000)

self.gamma = 0.95

self.epsilon = 1.0

self.epsilon_min = 0.01

self.epsilon_decay = 0.995

self.learning_rate = 0.001

self.model = self._build_model()

def _build_model(self):

model = Sequential()

model.add(Flatten(input_shape=(3, 3, 3)))

model.add(Dense(64, activation='relu'))

model.add(Dense(64, activation='relu'))

model.add(Dense(self.action_size, activation='linear'))

model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_rate))

return model

def remember(self, state, action, reward, next_state, done):

self.memory.append((state, action, reward, next_state, done))

def act(self, state):

if np.random.rand() <= self.epsilon:

return random.randrange(self.action_size)

act_values = self.model.predict(state)

return np.argmax(act_values[0])

def replay(self, batch_size):

minibatch = random.sample(self.memory, batch_size)

for state, action, reward, next_state, done in minibatch:

target = reward if done else reward + self.gamma * np.amax(self.model.predict(next_state)[0])

target_f = self.model.predict(state)

target_f[0][action] = target

self.model.fit(state, target_f, epochs=1, verbose=0)

if self.epsilon > self.epsilon_min:

self.epsilon *= self.epsilon_decay

↑AIの学習の道筋など。

from google.colab import drive

drive.mount('/content/drive')

import os

import pickle

# リプレイバッファを保存しているパス

replay_buffer_path = '/content/drive/My Drive/3Ddeta/replay_buffer.pkl'

# リプレイバッファが存在するかチェック

if os.path.exists(replay_buffer_path):

# リプレイバッファを読み込む

with open(replay_buffer_path, 'rb') as file:

agent.memory = pickle.load(file)

import matplotlib.pyplot as plt

env = TicTacToe3DEnv()

agent = DQNAgent(env.observation_space.shape[0], env.action_space.n)

episodes =1000

# 各エピソードの報酬を記録するリストを作成

episode_rewards =

# エピソード数だけループ

for e in range(episodes):

state = env.reset()

state = np.reshape(state, [1, 3, 3, 3])

total_reward = 0 # エピソードごとの合計報酬を初期化

for time in range(500):

action = agent.act(state)

next_state, reward, done, _ = env.step(action)

next_state = np.reshape(next_state, [1, 3, 3, 3])

agent.remember(state, action, reward, next_state, done)

state = next_state

total_reward += reward # 合計報酬を更新

if done:

print(f"Episode: {e+1}/{episodes}, Time: {time}, Reward: {total_reward}")

break

episode_rewards.append(total_reward) # エピソードの合計報酬をリストに追加

if len(agent.memory) > 32:

agent.replay(32)

# 報酬の合計をプロット

plt.plot(episode_rewards)

plt.xlabel('Episode')

plt.ylabel('Total Reward')

plt.title('Total Reward per Episode')

plt.show()

# 報酬の合計を表示

print("各エピソードの報酬合計:")

print(episode_rewards)

import pickle

# リプレイバッファを保存するパス

replay_buffer_path = '/content/drive/My Drive/3Ddeta/replay_buffer.pkl'

# リプレイバッファを保存

with open(replay_buffer_path, 'wb') as file:

pickle.dump(agent.memory, file)

↑報酬の設定　実際にうごかす。

ざっくりとした説明

この四つのコードを順に動かしていくと、AIが1000回学習をして段々と強くなっていく。はずである。

手順は、

1、AIとランダムなbotが対戦をする。

２、AIが負ける、無意味な手をうつ（盤外など）だと-10点の報酬、勝つと1点の報酬を受け取る。

３、1000回繰り返す。その中で、AIは報酬がマイナスになる時の傾向、プラスになる時の傾向を探し、報酬をたくさんもらえるように打つ。

４、その結果をファイル（リプレイバッファというらしい）に保存する。

これが一連の流れだ。再びコードを動かすと、前回のファイルを開いて学習を始めるため、学習のデータは積み重なっていく。

AIに学習させるのは、犬に躾をするイメージに近い。

お手ができたらおやつをあげてほめ、できなかったら怒る。おやつはあげない。

これを繰り返すことで、犬もAIも、経験を積み重ね傾向を学んでいくのだ。

これはざっくりとしたイメージのため、厳密には違っているのかもしれない。素人なので。

追伸

動かしても、強くならないからどっかしら間違ってる気がしてきた。

謝辞

ありがとうChatGPT。