Continuous Human Action Recognition for Human-machine Interaction: A Review

图扩散模型在分子、蛋白质和材料科学中的生成式人工智能 - 知乎 (zhihu.com)

Generative AI for brain image computing and brain network computing: a review

intro
1. intro
  1. 引入背景
  2. 引出方法分为哪几类
  3. 方法又继续分割为。。和。。
  4. 早期方法最近deep learning
2. our contribution
3. organization
human action segementation
1. feature
2. network: 下分TCN GAN domain adaptation GCN
3. action segementation methods
4. action segementation augmentation methods
object detection for action segementation
写model的时候，先介绍该model，然后说其能。。（加入ref），然后however（。。。ref来解决这个问题）。然后他再。。widely used。。

mine

the application of Generative AI in HAR
intro
paper search method
a summary of related literature surveys
Assessing the State of Self-Supervised Human Activity Recognition Using Wearables
HAR
different sensing modes:
Ambient sensors that have been used for HAR mainly include Global Navigation Satellite System (GNSS), Cellular, WiFi, Zigbee, FM (Frequency Modulation), and RFID. mmWave
wearable sensors used for HAR are accelerometer, gyroscope, magnetometer, barometer, camera, acoustic sensor, light sensor, and biosensor
event camera
多模态
different activitites
A Survey on Deep Learning for Human Activity Recognition
根据其复杂性，人类活动分为手势、动作、交互和群体活动。
different applications
人机交互、
日常活动监测和健身监测。
运动跟踪
医疗保健、康复、帕金森病
challenges
“Multimodal and universal activity recognition” (Gu 等, 2022, p. 22)
“• Developing new deep models that require less labeled data.” (Gu 等, 2022, p. 22)
“Crowdsourcing quality data for deep models.” (Gu 等, 2022, p. 23)
“Efficient deep learning algorithms for resource limited devices (e.g., smartphones).” (Gu 等, 2022, p. 23)
“Stable and robust deep models.” (Gu 等, 2022, p. 23)
Generative AI
uni model
language
Decoder Models
Encoder-Decoder Models：
vision
GAN
Cross-Subject Transfer Learning In Human Activity Recognition Systems Using Generative Adversarial Networks
VAE
Flow AIGC 综述 2023：A History of Generative AI from GAN to ChatGPT_a comprehensive survey of ai-generated content (ai-CSDN博客
Diffusion
multimodel
Vision Language Encoders
Vision Language Decoders
Text Audio Generation
Text Graph Generation
Text Code Generation
application
challenge
data bias
not reliable output
not explainable
ethics
计算要求数据要求来源问题…成本 AI幻象
security privacy
application
virtual data generation for augment or data recovery

directly to data
[2302.07998] cGAN-Based High Dimensional IMU Sensor Data Generation for Therapeutic Activities (arxiv.org)
A New XAI-based Evaluation of Generative Adversarial Networks for IMU Data Augmentation
NIM: Modeling and Generation of Simulation Inputs Via Generative Neural Networks | IEEE Conference Publication | IEEE Xplore
Learning to Generate Synthetic Data via Compositing | IEEE Conference Publication | IEEE Xplore
Virtual IMU Data Augmentation by Spring-Joint Model for Motion Exercises Recognition without Using Real Data | Proceedings of the 2022 ACM International Symposium on Wearable Computers
ActivityGAN: generative adversarial networks for data augmentation in sensor-based human activity recognition
Exploring the Benefits of Time Series Data Augmentation for Wearable Human Activity Recognition. | Proceedings of the 8th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence (acm.org)
HMGAN: A Hierarchical Multi-Modal Generative Adversarial Network Model for Wearable Human Activity Recognition: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies: Vol 7, No 3
Robust human activity recognition using generative adversarial imputation networks | Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe (acm.org)
ConvBoost: Boosting ConvNets for Sensor-based Activity Recognition: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies: Vol 7, No 2
Resource-Efficient Continual Learning for Sensor-Based Human Activity Recognition | ACM Transactions on Embedded Computing Systems
Visualization as Intermediate Representations (VLAIR) for Human Activity Recognition | Proceedings of the 14th EAI International Conference on Pervasive Computing Technologies for Healthcare (acm.org)
videos TO IMU, transfer
IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies: Vol 4, No 3
On the Effectiveness of Virtual IMU Data for Eating Detection with Wrist Sensors
Approaching the Real-World: Supporting Activity Recognition Training with Virtual IMU Data (2021) | Hyeokhyen Kwon | 11 Citations (typeset.io)
Synthetic Smartwatch IMU Data Generation from In-the-wild ASL Videos
Video2IMU: Realistic IMU features and signals from videos
Let there be IMU data: generating training data for wearable, motion sensor based activity recognition from monocular RGB videos
Sensors | Free Full-Text | Complex Deep Neural Networks from Large Scale Virtual IMU Data for Effective Human Activity Recognition Using Wearables (mdpi.com)
Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition | Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (acm.org) video to doppler
Deep generative cross-modal on-body accelerometer data synthesis from videos | Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers
Generating Virtual Head-Mounted Gyroscope Signals From Video Data - MinYen Lu (2023)
LLM to IMU, llm to help, text to …
Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition
AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism
TS2ACT: Few-Shot Human Activity Sensing with Cross-Modal Co-Learning: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies: Vol 7, No 4 文本搜索生成图像该方法使用语义丰富的标签文本搜索人类活动图像，形成由部分标记的时间序列和完全标记的图像组成的增强数据集
IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition
data denoising
PPG-GAN: An Adversarial Network to De-noise PPG Signals during Physical Activity
An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN
features
Hybrid Learning Models for IMU-Based HAR with Feature Analysis and Data Correction - Yu-Hsuan Tseng (2023)
classification
GAN-based Style Transformation to Improve Gesture-recognition Accuracy | Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Towards Environment-Independent Activity Recognition Using Wi-Fi CSI with an Encoder-Decoder Network | Proceedings of the 8th Workshop on Body-Centric Computing Systems (acm.org)
SensorGAN: A Novel Data Recovery Approach for Wearable Human Activity Recognition | ACM Transactions on Embedded Computing Systems 多个传感器部分传感器数据丢失
Adversarial Multi-view Networks for Activity Recognition | Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies不算？人类活动识别（HAR）在各种应用中发挥着不可替代的作用，多年来一直是一个繁荣的研究课题。最近的研究表明，使用深度学习技术在特征提取（即数据表示）方面取得了重大进展。然而，它们在从感官数据中捕获多模态时空模式方面面临着重大挑战，并且它们通常忽略了受试者之间的变体。我们提出了一种判别对抗多视图网络（DAMUN）来解决基于传感器的HAR中的上述问题。我们首先设计了一个多视图特征提取器，使用卷积网络从时间、空间和时空视图中获得感官数据流的表示。然后，我们通过可训练的Hadamard融合模块将多视图表示融合为鲁棒的联合表示，最后采用暹罗对抗网络架构来减少不同受试者表示之间的变体。我们在三个真实世界数据集的迭代left-one-subject-out环境下进行了广泛的实验，并证明了我们方法的有效性和鲁棒性。
Augmented Adversarial Learning for Human Activity Recognition with Partial Sensor Sets | Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 人类活动识别（HAR）在广泛的应用中发挥着重要作用，例如健康监测和游戏。附着在身体部位的惯性传感器构成了HAR的关键传感系统。HAR的各种惯性传感器数据集已经发布，目的是吸引集体努力并节省数据采集负担。然而，这些数据集在受试者和传感器位置方面是异构的。这两个因素的耦合使得很难将模型推广到一个新的应用场景，其中有看不见的受试者和新的传感器位置组合。在本文中，我们设计了一个框架来结合异构数据来学习HAR的通用表示，以便它可以适用于新的应用。我们为HAR（AALH）提出了一个增强对抗学习框架，以学习可概括的表示，以处理传感器位置和受试者差异的各种组合。我们训练了一个对抗神经网络，将各种传感器集的数据映射到一个域不变且class-discriminative的公共潜在表示空间。我们通过混合缺失策略丰富潜在表示空间，并用多域混合方法补充每个主题域，它们显着提高了模型泛化。在两个HAR数据集上的实验结果表明，该方法在看不见的主题和新的传感器位置组合上显着优于以前的方法。
Semi-supervised learning for human activity recognition using adversarial autoencoders | Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers
Cross-Location Activity Recognition using Adversarial Learning | Proceedings of the 11th International Symposium on Information and Communication Technology (acm.org)
AI-Assisted Food Intake Activity Recognition Using 3D mmWave Radars | Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management (acm.org)
IEEE Xplore Full-Text PDF: Boosting Inertial-Based Human Activity Recognition With Transformers

reconstruction
GAN-based Face Reconstruction for Masked-Face | Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments (acm.org)
CAvatar: Real-time Human Activity Mesh Reconstruction via Tactile Carpets: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies: Vol 7, No 4 不算？压力。人体网格重建对于各种应用至关重要，包括虚拟现实、动捕、运动表现分析和医疗保健监控。在疗养院等医疗环境中，采用合理且非侵入性的方法进行人体网格重建，以保护隐私和尊严，这一点至关重要。传统的基于视觉的技术遇到了与遮挡、视点限制、照明条件和隐私问题相关的挑战。在这项研究中，我们介绍了CAvata，这是一种实时人体网格重建方法，创新性地利用触觉地毯记录的压力图作为输入。这种先进的非侵入性技术避免了使用过程中对摄像头的需求，从而保护了隐私。我们的方法解决了几个挑战，例如触觉传感器的有限空间分辨率，从嘈杂的压力图中提取有意义的信息，以及适应用户变化和多个用户。我们开发了一种基于注意力的深度学习网络，并辅以鉴别器网络，以显著的准确性从2D压力图中预测3D人体姿势和形状。我们的模型展示了有希望的结果，平均每个关节位置误差（MPJPE）为5.89厘米，每个顶点误差（PVE）为6.88厘米。据我们所知，我们是第一个仅使用触觉地毯信号生成人类活动的3D网格，提供了一种解决隐私问题的新方法，超越了现有基于视觉和可穿戴解决方案的局限性。卡凡达的演示显示在https://youtu.be/ZpO3LEsgV7Y。
chanllenges and future potential

mine not on overleaf

很多classification 不属于HAR任务？
“. IMUSim [23] introduced the first design of a virtual IMU sensor system based on 3D motion sequences from motion capture (MoCap) equipment, including acceleration, angular velocity, and magnetic calculation, as well as a data processing unit and noise simulation.” (Xia 和 Sugiura, 2022, p. 80) 相当于以前都是用“MoCap equipment” (Xia 和 Sugiura, 2022, p. 80)来生成virtual data，然后imutube有了从video
“When the data is transformed into 2D, the interpretation of informative features in visual form can be supported by common knowledge of interpreting images as pattern recognition, object detection, and color recognition.” (Yoon 等, 2022, p. 2)
VAX 并不是生成虚拟数据而是视频pretrain 然后用IMU
IMUGPT2.0有个跨模态的表格。还有一些描述

virtual data generation

[A Deep Learning Method for Complex Human Activity Recognition Using Virtual Wearable Sensors]

For example, \href{https://www.coursera.org/specializations/generative-ai-for-data-scientists}{1} introduces a framework that can generate synthetic anomalies for video anomaly detection using generative adversarial networks (GANs). \href{https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-data-dividend-fueling-generative-ai}{2} presents a survey of generative adversarial networks for face generation, which can be used for data augmentation in face recognition tasks.

Li et al. \href{https://www.semanticscholar.org/paper/c091a85061ce914a7462b3c08d7092e6a5c958ad}{[13]} presented a 3D convolutional generative adversarial network for imputing missing traffic data. Their approach illustrates the applicability of generative AI in resolving data incompleteness, offering promising alternatives for data imputation.

The use of Generative Adversarial Networks (GANs) for data augmentation and recovery has become a standard practice in HAR research \href{https://www.semanticscholar.org/paper/0b6f110b702552858454347cadf22ddd1a05f62f}{[1]}.
An early exploration of GANs in HAR introduced ActivityGAN\cite{li_activitygan_2020}, a unified architecture of convolutional GANs specifically designed for generating sensor-based data simulating human physical activities \href{https://www.semanticscholar.org/paper/7f61ebd5f1a38a73139056c8fe299455bb0b21ea}{[4]}. This pioneering work emphasized the importance of generating sufficient synthetic data that is distinguishable by visualization techniques and trainable for HAR machine learning models.

Jordão et al. \href{https://www.semanticscholar.org/paper/0b6f110b702552858454347cadf22ddd1a05f62f}{[1]} conducted a comprehensive study that addresses essential issues compromising the understanding of HAR performance. The authors focused on sample generation processes and validation protocols and emphasized the importance of data generation for HAR applications. Their work provides a standardized framework for evaluating HAR models based on wearable sensor data.

Moreover, the study by Plunkett, Dixon, Deneke, and Harley \href{https://www.semanticscholar.org/paper/d5358e1ac4cfc4be0b4d4e5c5cc52331629891a6}{[4]} describes a simulation platform for the generation of synthetic videos for human activity recognition. The platform randomizes elements of the virtual scene, such as camera position, human model, and interaction motion, to introduce video variation, thereby enhancing the diversity and abundance of synthetic data for human activity recognition tasks.

the work by Cheng et al. \href{https://www.semanticscholar.org/paper/ac2ab59d85d5f827f8ef24cced2cb8aa593c6efc}{[1]} introduces a learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognition. The proposed approach leverages contrastive supervision to learn hierarchical time series data augmentation invariances, contributing to the advancement of virtual data generation techniques for human activity recognition.

research by Xu et al. \href{https://www.semanticscholar.org/paper/987f4b24944d5c31b275a0f9cbfac1be4994d25f}{[5]} presents an augmentation robust self-supervised learning approach for human activity recognition, demonstrating the robustness of the proposed method.
Feng et al. \href{https://www.semanticscholar.org/paper/4a031a61a05abb21e41a48b9c4b9f966fba793ec}{[6]} propose a domain adaptive human activity recognition network based on multimodal feature fusion, highlighting the integration of generative models for data augmentation.
Shao and Sanchez \href{https://www.semanticscholar.org/paper/2ff1d58d58bb99cf39ecb10c16836cf38f07a4dd}{[7]} conduct a study on diffusion modeling for sensor-based human activity recognition, emphasizing the effectiveness of generative models compared to other data augmentation approaches.

Hussein et al. \href{https://www.semanticscholar.org/paper/963b950f41f260f9ce621d97bd252266847126a6}{[4]} propose a robust method for human activity recognition using generative adversarial imputation networks (GAIN) to recover missing data samples, showcasing the potential of generative models in addressing data completeness challenges.

Furthermore, the significance of data augmentation techniques has also been explored in other related domains.

Furthermore, researchers have delved into developing specialized HAR models for specific scenarios. For instance, Yu et al. proposed FedHAR, a personalized federated HAR framework that integrates semi-supervised online learning to protect users’ privacy and address concept drift and convergence instability issues in online federated learning \href{https://www.semanticscholar.org/paper/128231419d946fa463bea6fc6323f0fc385e4659}{[1]}. This personalized approach aligns with the idea of using virtual data generation to tailor HAR models to individual users while maintaining data privacy. Additionally, a comprehensive HAR system, Multi-ResAtt, was introduced by Al-qaness et al., featuring a novel deep learning architecture for activity recognition using wearable sensors \href{https://www.semanticscholar.org/paper/56f155db3abe158962c39ae17aa678b63e0f9dee}{[2]}.

The field of human activity recognition has seen a surge in the use of generative AI techniques for virtual data generation to augment or recover existing data. This trend has led to the exploration of various methods and models for data generation and augmentation in the context of sensor-based human activity recognition.
One interesting approach involves the use of Generative Adversarial Networks (GANs) to generate high-dimensional IMU sensor data for therapeutic activities \href{https://www.semanticscholar.org/paper/401343b9b82a31e790decccbf0cc7101adce30cc}{[14]}. In this study, a novel GAN network called TheraGAN was developed, and the results demonstrated that the generated signals closely mimicked real signals, leading to a significant improvement in performance when the generated data was added to the existing dataset.
Another relevant paper explores the benefits of time series data augmentation for wearable human activity recognition \href{https://www.semanticscholar.org/paper/87e7834eb0b862493b4038e9e0f6303daa0e624b}{[7]}. The study focuses on the use of data augmentation techniques specifically tailored for time series data, which is a crucial aspect of sensor-based activity recognition. This approach highlights the importance of context-aware data generation for enhanced recognition accuracy.
Furthermore, the use of hierarchical multi-modal generative adversarial network models for wearable human activity recognition has been investigated \href{https://www.semanticscholar.org/paper/963b950f41f260f9ce621d97bd252266847126a6}{[8]}. This model, known as HMGAN, addresses the multi-modal nature of sensor data and demonstrates the potential for improving recognition accuracy through generative modeling.
Additionally, Robust human activity recognition using generative adversarial imputation networks has been proposed as an adaptive method for recovering missing data samples before classifying activities \href{https://www.semanticscholar.org/paper/963b950f41f260f9ce621d97bd252266847126a6}{[8]}. This approach emphasizes the importance of data recovery in the context of activity recognition and showcases the potential of generative adversarial networks for imputation tasks.

These studies collectively highlight the growing importance of generative AI in the field of human activity recognition, particularly in the context of data augmentation and recovery for sensor-based applications. They also underscore the need for context-aware, hierarchical, and adaptive approaches to data generation and augmentation for improved recognition accuracy and robustness \href{https://www.semanticscholar.org/paper/87e7834eb0b862493b4038e9e0f6303daa0e624b}{[7]}\href{https://www.semanticscholar.org/paper/963b950f41f260f9ce621d97bd252266847126a6}{[8]}\href{https://www.semanticscholar.org/paper/401343b9b82a31e790decccbf0cc7101adce30cc}{[14]}.

In the broader context of generative AI, the use of GANs for data augmentation has extended to various domains. For instance, Bosquet et al. presented a full data augmentation pipeline for small object detection based on GANs, which combined object generation, segmentation, inpainting, and blending techniques \href{https://www.semanticscholar.org/paper/2d01d191956d81c950221db4323004687f037b32}{[6]}. This broader application showcases the versatility and impact of GAN-based data augmentation techniques across different fields. Moreover, GANs have been harnessed in medical imaging for data augmentation, as shown in the study by Biswas et al. \href{https://www.semanticscholar.org/paper/4091cd22447dda0ca7bf087794dc3333870dda6b}{[9]}, highlighting the active research in ensuring high-quality synthetic medical images for clinical use. This aligns with the need for ensuring the quality and suitability of virtual data in the HAR domain as well.