Unified Prompt Attack Against Text-to-Image Generation Models

Computing and Communications

Electronic data

paper
Accepted author manuscript, 1.46 MB, PDF document

Text available via DOI:

https://doi.org/10.1109/tpami.2025.3545652
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Unified Prompt Attack Against Text-to-Image Generation Models. / Peng, Duo; Ke, Qiuhong; Huang, Mark He et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 47, No. 6, 30.06.2025, p. 4816-4834.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Peng, D, Ke, Q, Huang, MH, Hu, P & Liu, J 2025, 'Unified Prompt Attack Against Text-to-Image Generation Models', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 6, pp. 4816-4834. https://doi.org/10.1109/tpami.2025.3545652

APA

Peng, D., Ke, Q., Huang, M. H., Hu, P., & Liu, J. (2025). Unified Prompt Attack Against Text-to-Image Generation Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(6), 4816-4834. https://doi.org/10.1109/tpami.2025.3545652

Vancouver

Peng D, Ke Q, Huang MH, Hu P, Liu J. Unified Prompt Attack Against Text-to-Image Generation Models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2025 Jun 30;47(6):4816-4834. Epub 2025 Feb 25. doi: 10.1109/tpami.2025.3545652

Author

Peng, Duo ; Ke, Qiuhong ; Huang, Mark He et al. / Unified Prompt Attack Against Text-to-Image Generation Models. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2025 ; Vol. 47, No. 6. pp. 4816-4834.

Bibtex

@article{dabc3819f8244ba4bc92a83c6242bec3,

title = "Unified Prompt Attack Against Text-to-Image Generation Models",

abstract = "Text-to-Image (T2I) models have advanced significantly, but their growing popularity raises security concerns due to their potential to generate harmful images. To address these issues, we propose UPAM, a novel framework to evaluate the robustness of T2I models from an attack perspective. Unlike prior methods that focus solely on textual defenses, UPAM unifies the attack on both textual and visual defenses. Additionally, it enables gradient-based optimization, overcoming reliance on enumeration for improved efficiency and effectiveness. To handle cases where T2I models block image outputs due to defenses, we introduce Sphere-Probing Learning (SPL) to enable optimization even without image results. Following SPL, our model bypasses defenses, inducing the generation of harmful content. To ensure semantic alignment with attacker intent, we propose Semantic-Enhancing Learning (SEL) for precise semantic control. UPAM also prioritizes the naturalness of adversarial prompts using In-context Naturalness Enhancement (INE), making them harder for human examiners to detect. Additionally, we address the issue of iterative queries–common in prior methods and easily detectable by API defenders–by introducing Transferable Attack Learning (TAL), allowing effective attacks with minimal queries. Extensive experiments validate UPAM{\textquoteright}s superiority in effectiveness, efficiency, naturalness, and low query detection rates.",

author = "Duo Peng and Qiuhong Ke and Huang, {Mark He} and Ping Hu and Jun Liu",

year = "2025",

month = jun,

day = "30",

doi = "10.1109/tpami.2025.3545652",

language = "English",

volume = "47",

pages = "4816--4834",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "6",

}

RIS

TY - JOUR

T1 - Unified Prompt Attack Against Text-to-Image Generation Models

AU - Peng, Duo

AU - Ke, Qiuhong

AU - Huang, Mark He

AU - Hu, Ping

AU - Liu, Jun

PY - 2025/6/30

Y1 - 2025/6/30

N2 - Text-to-Image (T2I) models have advanced significantly, but their growing popularity raises security concerns due to their potential to generate harmful images. To address these issues, we propose UPAM, a novel framework to evaluate the robustness of T2I models from an attack perspective. Unlike prior methods that focus solely on textual defenses, UPAM unifies the attack on both textual and visual defenses. Additionally, it enables gradient-based optimization, overcoming reliance on enumeration for improved efficiency and effectiveness. To handle cases where T2I models block image outputs due to defenses, we introduce Sphere-Probing Learning (SPL) to enable optimization even without image results. Following SPL, our model bypasses defenses, inducing the generation of harmful content. To ensure semantic alignment with attacker intent, we propose Semantic-Enhancing Learning (SEL) for precise semantic control. UPAM also prioritizes the naturalness of adversarial prompts using In-context Naturalness Enhancement (INE), making them harder for human examiners to detect. Additionally, we address the issue of iterative queries–common in prior methods and easily detectable by API defenders–by introducing Transferable Attack Learning (TAL), allowing effective attacks with minimal queries. Extensive experiments validate UPAM’s superiority in effectiveness, efficiency, naturalness, and low query detection rates.

AB - Text-to-Image (T2I) models have advanced significantly, but their growing popularity raises security concerns due to their potential to generate harmful images. To address these issues, we propose UPAM, a novel framework to evaluate the robustness of T2I models from an attack perspective. Unlike prior methods that focus solely on textual defenses, UPAM unifies the attack on both textual and visual defenses. Additionally, it enables gradient-based optimization, overcoming reliance on enumeration for improved efficiency and effectiveness. To handle cases where T2I models block image outputs due to defenses, we introduce Sphere-Probing Learning (SPL) to enable optimization even without image results. Following SPL, our model bypasses defenses, inducing the generation of harmful content. To ensure semantic alignment with attacker intent, we propose Semantic-Enhancing Learning (SEL) for precise semantic control. UPAM also prioritizes the naturalness of adversarial prompts using In-context Naturalness Enhancement (INE), making them harder for human examiners to detect. Additionally, we address the issue of iterative queries–common in prior methods and easily detectable by API defenders–by introducing Transferable Attack Learning (TAL), allowing effective attacks with minimal queries. Extensive experiments validate UPAM’s superiority in effectiveness, efficiency, naturalness, and low query detection rates.

U2 - 10.1109/tpami.2025.3545652

DO - 10.1109/tpami.2025.3545652

M3 - Journal article

VL - 47

SP - 4816

EP - 4834

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 6

ER -

Research

Electronic data

Links

Text available via DOI: