Reading Note of the Paper "Threats to Pre-trained Language Models - Survey and Taxonomy"

Information of the Original Paper: Guo, S., Xie, C., Li, J., Lyu, L., & Zhang, T. (2022). Threats to pre-trained language models: Survey and taxonomy. arXiv preprint arXiv:2202.06862.

Brief Intro

Pre-trained language models (PTLMs) have achieved great success and remarkable performance, while there are growing concerns regarding their security issues.

Reasons that make PTLMs particularly vulnerable:

  • Threats can occur at different stages of PTLM pipeline (pre-training, finetuning, inferring) raised by different malicious entities (model publisher, downstream service provider, user);
  • Two types of model transferability facilitate attacks (landscape & portrait);
  • Four categories of attacks based on different attack goals (integrity threats: backdoor and evasion attacks & privacy violations: data and model).

Content Access

Please click here to access the content of the blog from my Gitbook.

Yanyun Wang
Yanyun Wang
MPhil Student

My research interests include adversarial/backdoor attack, adversarial training, and speech/image generation.