Reading Note of the Paper "Threats to Pre-trained Language Models - Survey and Taxonomy"

Information of the Original Paper: Guo, S., Xie, C., Li, J., Lyu, L., & Zhang, T. (2022). Threats to pre-trained language models: Survey and taxonomy. arXiv preprint arXiv:2202.06862.

Brief Intro

Pre-trained language models (PTLMs) have achieved great success and remarkable performance, while there are growing concerns regarding their security issues.

Reasons that make PTLMs particularly vulnerable:

  • Threats can occur at different stages of PTLM pipeline (pre-training, finetuning, inferring) raised by different malicious entities (model publisher, downstream service provider, user);
  • Two types of model transferability facilitate attacks (landscape & portrait);
  • Four categories of attacks based on different attack goals (integrity threats: backdoor and evasion attacks & privacy violations: data and model).

Content Access

Please click here to access the content of the blog from my Gitbook.

Yanyun Wang
Yanyun Wang
Research Assistant

My research interests include adversarial attack, robust machine learning and trustworthy AI.