强化学习(一)
参考:
RLChina 强化学习暑期课 汪军 中文 机器之心 (xiaoe-tech.com)
ppt https://pan.baidu.com/s/1SRkXwM6m7okeydlVeTnZFQ 提取码: wm95
百度网盘 (baidu.com)
机器学习和深度学习基础
regression
nearset neighbor classification
. KNN (投票 可根据邻居距离加权重)logistic regression
MLP and backpropagation
- 这里需要再整理
RNN
一对多:image captioning. 多对一:sequential recommendation. 多对多:machine translation
LSTM: 核心,决定哪些信息遗忘,哪些信息加入
GRU 简化 LSTM。 更新门+reset gate+ current memory content + final memory at current time step
机器学习中的优化理论和方法
- Generalization 泛化问题
- Expressivity 表达性
- Optimization 优化 如何挑到更好的/怎么更快的挑选他
Continuous Optimization problem
0th order –queries function value only
- gridding 所有的排列组合都试一遍
- sampling (Metropolis Hastings, evolutionary algorithm,etc)
1storder method –uses gradient information. faster. backprop.
higer order
Convergence analysis: Gradient Methods
Oracle Complexity
key components
- Oracle definition
- Oracle properties
- Function class
- Optimality measure
- Worst case performance
Variance reduction algorithms
理论框架假设和实际有偏差,导致理论可行但实际效果不佳。一个研究方向
complexity 注意有哪些假设
Graphical Model & Bayesian Inference
Graphical model 两种表达形式。inference 问题用哪两种解决。
Graphical model describes structures (sparsity,independence, partition) in joint distributions
DAG Directed acyclic graph
Undirected graph: Independence 两个变量没有连通项
Directed acyclic graph vs Undirected graph
- They do not describe the same set of independent relations
document M, 通过 observed word 推断哪些 topic
prior 和 posterior 会比较好 sample 但直接 joint 可能不一定Infer latent variables: MCMC Algorithm
- MCMC algorithm: Gibbs sampling. converged rate 会比较差,走到概率分布比较小的区域
Infer latent variables: Variational Inference
- argmax,找到最可能生成 topic 的 distribution
Bayesian Optimization
[http://www.it.uu.se/edu/course/homepage/apml/lectures/lecture 7_handout.pdf](http://www.it.uu.se/edu/course/homepage/apml/lectures/lecture 7_handout.pdf)
Gaussian process allows information propagation
- 与其 gridding 采取所有的点,可以采取 posterior 里最有希望的点。
Intuition
Bayesian Optimization: Activation functions
subprocedure 还是一个优化问题. need to solve an optimization per step. This is usually done with gradient descent and repeatedinitialization. 不能全局最优,只能 approximately
Theoretical aspect
- Given the existence of a subprocedure, the complexity of Bayesian optimization is two fold:
- Sample complexity (number of queries)
- Computation complexity (number of queries x subprocedure cost)
- The sample complexity suffers curse of dimension in theworst case.
- Bayesian optimization is a popular method, but itstheoretical advantage still remains to be explained
- Given the existence of a subprocedure, the complexity of Bayesian optimization is two fold:
博弈论——博弈、均衡、均衡
Motivation and Normal-form Game
elements of game
rationality of players: self-interested, utility, objective
pure strategy and mixed strategy. 混合策略:以一定的概率选择另一个
classic games.
- zero-sum 零和博弈 完全处于竞争关系
- cooperative game 合作关系
- coordination game.多个纳什均衡 多个协同选择时候 如何避免相撞
- social dilemma
Extensive-form Game and Imperfect Information
- game tree. strategy space.
- imperfect information. 一些历史动作不被其他玩家看到。
- Markov Game or Stochastic Game. 和 extensive-form game 区别:状态可重复到达,reward function 每一个状态下都可以得到 不一定要终点。behaviour strategy 考虑当前状态下怎么做。具有随机性,可循环 cycle
- Summary of Strategy Representation
Bayesian Game and Incomplete Information
auction. uncertainty of private value. Players don’t know the exact payoff matrix of the game
incomplete information. eg: auction, Mahjong, Werewolves of Miller’s Hollow 狼人杀
Bayesian Game: 建立概率分布,知道对面有几种可能性(player type space)
Dynamic Baysian Game: 动态多回合
summary
引入上帝的角色,发牌。(非完美,不知道上帝做了什么动作,StarCraft 不知道出生点)
NashEquilibriumand Variants
Best Response 最佳应对策略
Dominant Strategy 不管对手做什么 我这个策略都是最好的
Nash Equilibrium 稳定点 best response。只考虑一个玩家发生变化,不考虑多个同时变化的情况。
Pareto Optimality 找不到任何一个其他点使所有人变更差。相对较好的点。
Mixed Strategy Nash Equilibrium.
incredible threat.
Perfect Bayesian Equilibrium (PBE)
summary
Repeated Game and Learning Methods
- repeated game重复博弈,观察对方行为,我也改变。
- memory 学习问题。
- Tif-for-tat,以牙还牙,对方作什么我就作什么
- Win-stay, lose-shift 不管合作还是对抗,结果是好的我就继续这个策略,结果不好我就换。
- folk theorem
- fictitious play 记录对手动作,认为她下一回合出的动作就是历史出过动作的分布。convergence
- no-regret learning 算会后悔的值
Alternate Solution Concepts and Evolutionary GameTheory
- Stackelberg Equilibrium 博弈有先后,分了先后可能对两个都好
- Correlated Equilibrium 协同问题。概率分配在所有可能的上。引入公共信号,sample出结果,部分告诉玩家。
- Evolutional Game Theory 均衡是稳定点,但不一定出稳定点,或者说有多个稳定点的时候。演化问题,改变的过程。
博弈论——机制设计和博弈复杂度
Mechanism Design
- goal – good performance guarantees
Complexity of Equilibrium Computation