参考：

RLChina 强化学习暑期课汪军中文机器之心 (xiaoe-tech.com)
ppt https://pan.baidu.com/s/1SRkXwM6m7okeydlVeTnZFQ 提取码: wm95
百度网盘 (baidu.com)

机器学习和深度学习基础

regression

一对多：image captioning. 多对一：sequential recommendation. 多对多：machine translation
LSTM: 核心，决定哪些信息遗忘，哪些信息加入
GRU 简化 LSTM。更新门+reset gate+ current memory content + final memory at current time step

0th order –queries function value only
- gridding 所有的排列组合都试一遍
- sampling (Metropolis Hastings, evolutionary algorithm,etc)
1storder method –uses gradient information. faster. backprop.
higer order

Graphical model 两种表达形式。inference 问题用哪两种解决。

Graphical model describes structures (sparsity,independence, partition) in joint distributions
DAG Directed acyclic graph
Undirected graph: Independence 两个变量没有连通项
Directed acyclic graph vs Undirected graph
- They do not describe the same set of independent relations
document M, 通过 observed word 推断哪些 topic
prior 和 posterior 会比较好 sample 但直接 joint 可能不一定
- Infer latent variables: MCMC Algorithm
  - MCMC algorithm: Gibbs sampling. converged rate 会比较差，走到概率分布比较小的区域
- Infer latent variables: Variational Inference
  - argmax，找到最可能生成 topic 的 distribution

Gaussian process allows information propagation
- 与其 gridding 采取所有的点，可以采取 posterior 里最有希望的点。
Intuition
Bayesian Optimization: Activation functions
subprocedure 还是一个优化问题. need to solve an optimization per step. This is usually done with gradient descent and repeatedinitialization. 不能全局最优，只能 approximately
Theoretical aspect
- Given the existence of a subprocedure, the complexity of Bayesian optimization is two fold:
  - Sample complexity (number of queries)
  - Computation complexity (number of queries x subprocedure cost)
- The sample complexity suffers curse of dimension in theworst case.
- Bayesian optimization is a popular method, but itstheoretical advantage still remains to be explained

elements of game
rationality of players: self-interested, utility, objective
pure strategy and mixed strategy. 混合策略：以一定的概率选择另一个
classic games.
- zero-sum 零和博弈完全处于竞争关系
- cooperative game 合作关系
- coordination game.多个纳什均衡多个协同选择时候如何避免相撞
- social dilemma

game tree. strategy space.
imperfect information. 一些历史动作不被其他玩家看到。
Markov Game or Stochastic Game. 和 extensive-form game 区别：状态可重复到达，reward function 每一个状态下都可以得到不一定要终点。behaviour strategy 考虑当前状态下怎么做。具有随机性，可循环 cycle
Summary of Strategy Representation

auction. uncertainty of private value. Players don’t know the exact payoff matrix of the game
incomplete information. eg: auction, Mahjong, Werewolves of Miller’s Hollow 狼人杀
Bayesian Game: 建立概率分布，知道对面有几种可能性(player type space)
Dynamic Baysian Game: 动态多回合
summary
引入上帝的角色，发牌。（非完美，不知道上帝做了什么动作，StarCraft 不知道出生点）

Mechanism Design

Complexity of Equilibrium Computation