自监督

BERT

BERT 李宏毅Hung-yi Lee

self-supervised 属于unsupervised的一种
masking input, mask special token/random
next sentence orediction
downstream tasks, BERT fine-tune后有能力解各式各样的任务，with a little bit labled data.
GLUE. General Language Understanding Evaluation. 9个任务可以看模型表现
how to use BERT

case 1
1. input sequence, output class. eg: sentiment analysis. bert init by pre-trin, linear random initialization.
case 2
1. input sequence, output same as input. eg: POS tagging
case 3
1. input two sequences前提和假设, output a class(contradicition/entailment/neutral). eg: Natural Language Inferencee.
case 4
1. extraction based question answering. input document and query, output two integers.(answers, starting and end position in the document)

training BERT is challenging, training data is too large, more than 3 billions of words
BERT Embryology胚胎学 When does BERT know POS tagging, syntactic parsing, semantics?
word embedding, CBOW, 把中间挖空，预测这个词
同样意思不同语言的词汇向量相接近。给英文的资料train 中文的test 也可以。
- 资料量是不同语言联系起来的关键。large trian data.

GPT

Predict Next Token. They can do generation. Demo – InferKit (type some text and GPT will generate more)
“Few-shot” Learning “One-shot” Learning “Zero-shot” Learning (no gradient descent)

auto-encoder

Autoencoder（自编码器）是一种无监督学习算法，用于学习数据的有效表示，通常用于降维、特征学习或数据去噪。它包含两部分：编码器（Encoder）和解码器（Decoder）。

编码器（Encoder）:
- 作用： 编码器将输入数据映射到潜在空间中，生成一种紧凑的表示。
- 结构： 通常是一个神经网络，可以是简单的前馈神经网络，也可以是卷积神经网络等。编码器的输出是潜在表示，包含了输入数据的关键信息。
解码器（Decoder）:
- 作用： 解码器将编码器生成的潜在表示映射回原始数据空间。
- 结构： 与编码器相对应，解码器也是一个神经网络，尝试从潜在表示中还原原始数据。解码器的输出应该尽可能接近输入数据。
训练过程：
- Autoencoder的目标是最小化输入与解码器输出之间的重构误差。
- 常用的损失函数是均方误差（Mean Squared Error），即最小化重构数据与原始数据之间的平方差。
- 通过反向传播算法和梯度下降优化器，调整编码器和解码器的权重以减小重构误差。
应用：
- 降维： Autoencoder可以用于降维，将高维数据映射到潜在空间中，从而提取出最重要的特征。
- 特征学习： 学到的潜在表示可以用于监督学习任务的特征学习。
- 去噪： Autoencoder可以通过在训练中引入噪声，学习对输入数据的鲁棒表示，从而在解码时去除噪声。
变体：
- 稀疏自编码器（Sparse Autoencoder）： 强制编码器生成稀疏表示，有助于更好地捕捉数据的结构。
- 变分自编码器（Variational Autoencoder）： 引入概率分布，使得潜在表示更具连续性和可解释性。

auto-encoder 李宏毅Hung-yi Lee

basic idea

图片压缩又还原
dimension reduction 高维转为低维
- More Dimension Reduction(not based on deep learning) ： PCA https://youtu.be/iwh5o_M4BNU t-SNEhttps://youtu.be/GBUEjkpoxXc
de-noising auto-encoder

feature disentanglement

application: voice conversion

Discrete Latent Representation

Text as Representation. document to summary. 加一个discriminator, real or not. this is cycle GAN.
tree as embedding.

more applications

Generator. With some modification, we have variational auto-encoder (VAE)
Anomaly Detection. (Fraud Detection, Fraud Detection, Cancer Detection)

Adversarial Attack

Attack and Defense (ntu.edu.tw)

attack

Are networks robust to the inputs that are built to fool them? • Useful for spam classification, malware detection, network intrusion detection, etc.

L infinity 下面那个更容易被发现
保证update了之后还要在方框里面，如果超出了就把他拉回来
黑箱攻击：不知道模型参数的攻击
攻击会成功的可能原因 data。 Adversarial Examples Are Not Bugs, They Are Features
application:
- Speech processing(Detect synthesized speech ).
- Natural language processing ()
- Attack in the Physical World
backdoor in the attach

defense

passive Defense
- 轻微模糊化 smoothing （但有side effect 原来正常的影响 confidence下降）。但一旦被知道就会攻击被躲过防御，失去效力
- Image Compression
- Generator （输出图片和原来输入越接近越好）
- Randomization。不被别人猜中防御下一招
Proactive Defense
- 训练一个不易于被攻破的模型。find the problem and fix it. 比较吃运算资源。

Attack Approaches

• FGSM (https://arxiv.org/abs/1412.6572) • Basic iterative method (https://arxiv.org/abs/1607.02533) • L-BFGS (https://arxiv.org/abs/1312.6199) • Deepfool (https://arxiv.org/abs/1511.04599) • JSMA (https://arxiv.org/abs/1511.07528) • C&W (https://arxiv.org/abs/1608.04644) • Elastic net attack (https://arxiv.org/abs/1709.04114) • Spatially Transformed (https://arxiv.org/abs/1801.02612) • One Pixel Attack (https://arxiv.org/abs/1710.08864) • …… only list a few