【Stable Diffusion】Stable Diffusion Web UIで画像からプロンプトを抽出できる「Interrogate CLIP」「Interrogate DeepBooru」を使う方法！

特定の画像を見たとき、どんなプロンプトを使えば画像内の要素が再現できるのか、気になったことはないでしょうか？

Stable Diffusion Web UIには「Interrogate CLIP」「Interrogate DeepBooru」という機能があり、画像からプロンプトを抽出することが可能です。

今回はこの「Interrogate CLIP」「Interrogate DeepBooru」の使い方について解説します。

特定の画像のプロンプトが気になるという方は、ぜひ参考にしてみてください。

こんな方におすすめ

好きな画像を再現するためのプロンプトが知りたい
プロンプトを抽出する方法が知りたい
画像生成の幅を広げたい

・Stable Diffusion Web UIのインストールはこちら

「Interrogate CLIP」と「Interrogate DeepBooru」とは？
- Interrogate CLIP
- Interrogate DeepBooru
「Interrogate CLIP」と「Interrogate DeepBooru」の使い方
Interrogate CLIPとInterrogate DeepBooruの比較
- 実写画像のプロンプトを抽出&生成
- 二次元画像のプロンプトを抽出&生成
Interrogate CLIPとInterrogate DeepBooruの使い方まとめ

「Interrogate CLIP」と「Interrogate DeepBooru」とは？

https://www.pexels.com/ja-jp/photo/3760067/

どちらも画像からプロンプトを抽出できる機能ですが、それぞれ異なる特徴があります。

Interrogate CLIP

こちらはOpenAIが開発したCLIP（Contrastive Language-Image Pretraining）モデルを使用します。

特徴としては幅広い画像に対応しており、画像から関連性の高い要素をプロンプトとして抽出できます。

プロンプトが単語ではなく文章形式で表示されるため、画像の構図や状況などを把握しやすいです。

GitHub - openai/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image - GitHub - openai/C...

Interrogate DeepBooru

DeepBooruは、アニメやイラストに特化したタグ予測モデルです。

二次元系のイラストだと視線や服装、汗や小物などの細かい部分まで解析してくれます。

なお、抽出されるプロンプトは、danbooruなど画像共有サイトのタグで表示されます。

そのため、それぞれのプロンプトがどの要素を指しているのかがわかりやすいです。

「Interrogate CLIP」と「Interrogate DeepBooru」の使い方

https://www.pexels.com/ja-jp/photo/632470/

それでは実際に「Interrogate CLIP」と「Interrogate DeepBooru」の使い方について解説します。

どちらもimg2imgタブで画像を取り込み、ボタンをクリックするだけです。

まずimg2imgタブを開きます。

画像を取り込み、Generateの隣にある「Interrogate CLIP」か「Interrogate DeepBooru」をクリックします。

初回のみ自動でモデルがダウンロードされます。

解析が終わるとプロンプト入力欄にテキストが表示され、その画像に使われているプロンプトがわかります。

Interrogate CLIPとInterrogate DeepBooruの比較

実写・二次元の画像をそれぞれのモデルで取り込み、抽出されたプロンプトで画像を生成してみます。

抽出する画像はこちらを使用します。どちらも同じプロンプトでモデルを変えただけです。

1girl,smile,masterpiece, high quality, very_high_resolution
Negative prompt: EasyNegative

実写画像のプロンプトを抽出&生成

まずこちらの実写画像からプロンプトを抽出して、それぞれ実写・二次元モデルで画像を生成してみます。

Interrogate CLIPで抽出した結果

a woman with a necklace on her neck posing for a picture in front of a window with a view of a field, Fan Qi, rounded eyes, a stock photo, dau-al-set

翻訳:畑が見える窓の前で、首にネックレスを付けた女性が写真を撮るポーズをしています。

単語: 丸い目

Interrogate DeepBooruで抽出した結果

1girl, bangs, blurry, blurry_background, blurry_foreground, brown_eyes, brown_hair, collarbone, depth_of_field, freckles, heart_necklace, indoors, jewelry, lips, long_hair, looking_at_viewer, mole, mole_on_body, mole_on_breast, mole_on_neck, mole_on_thigh, mole_under_eye, mole_under_mouth, necklace, photo_(medium), realistic, smile, solo, umbrella, upper_body

1girl: 1人の女性
bangs: 前髪
blurry: ぼやけた
blurry_background: ぼやけた背景
blurry_foreground: ぼやけた前景
brown_eyes: 茶色の目
brown_hair: 茶色の髪
collarbone: 鎖骨
depth_of_field: 被写界深度
freckles: そばかす
heart_necklace: ハートのネックレス
indoors: 屋内
jewelry: ジュエリー
lips: 唇
long_hair: 長い髪
looking_at_viewer: カメラ目線
mole: ほくろ
mole_on_body: 体にあるほくろ
mole_on_breast: 胸にあるほくろ
mole_on_neck: 首にあるほくろ
mole_on_thigh: 太ももにあるほくろ
mole_under_eye: 目の下にあるほくろ
mole_under_mouth: 口の下にあるほくろ
necklace: ネックレス
photo_(medium): 写真（中サイズ）
realistic: リアルな
smile: 笑顔
solo: 単独
umbrella: 傘
upper_body: 上半身

Interrogate CLIPのプロンプトで生成した画像

Interrogate DeepBooruのプロンプトで生成した画像

Interrogate CLIPの方が情報は少ないですが、「畑が見える窓の前」や「ネックレス」など特徴的な部分は捉えられています。

画像生成でもそれらが反映されており、全体の構図はしっかり表現されていました。

Interrogate DeepBooruもネックレスなどは抽出されていますが、「ほくろ」や「傘」など画像にはないプロンプトが抽出されています。

特にほくろの要素が多くそれが画像にも反映されており、元画像とは少し異なる結果になりました。

実写系の画像からプロンプトを抽出する場合は、Interrogate CLIPの方が精度は高いかもしれません。

二次元画像のプロンプトを抽出&生成

次に二次元の画像を使ってプロンプトを抽出し、それぞれのモデルで画像を生成してみます。

Interrogate CLIPで抽出した結果

a girl with long black hair and blue eyes standing in front of a cityscape with buildings and skyscrapers, Ai-Mitsu, anime art style, a character portrait, computer art

翻訳：長い黒髪と青い目の少女がビルや高層ビルのある都市の風景の前に立っています。

単語：アニメ風、キャラクターの肖像、コンピューターアート

Interrogate DeepBooruで抽出した結果

1girl, bangs, black_hair, blue_eyes, blush, building, city, cityscape, jacket, long_hair, looking_at_viewer, open_mouth, outdoors, skyscraper, smile, solo, street, teeth, upper_body, white_shirt