【Stable Diffusion】長いプロンプトを利用する方法

「Stable Diffusionにおけるプロンプトの長さはどれくらいまで？」
「Diffusers利用時にプロンプトを長くしたい」

このような場合には、この記事の内容が参考になります。
この記事では、Stable Diffusionで長いプロンプトに利用する方法を解説しています。

本記事の内容

トークン数は原則75個まで
長いプロンプトを利用する方法

それでは、上記に沿って解説していきます。

トークン数は原則75個まで

次のようなメッセージを見たことがありませんか？

Token indices sequence length is longer than the specified maximum sequence length for this model (*** > 77). Running this sequence through the model will result in indexing errors 
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['@@@']

これは、エラーメッセージではありません。
ただし、画像は望んだように作成されていない可能性があります。

なぜなら、入力したプロンプトが途中でカットされてるからです。
Stable Diffusionは、原則75個のトークンまでしか処理されません。

そのトークン数を確認する方法は、次の記事で解説しています。

なお、プロンプトの開始と終了にはそれぞれトークンが付加されます。
それら込みで合計77個のトークンが、原則で処理されることになります。

以上、トークン数は原則75個までを説明しました。
次は、長いプロンプトを利用する方法を説明します。

長いプロンプトを利用する方法

Stable Diffusionを動かせる環境が、必要です。
Google Colabではなくローカル環境で動かす場合は、次の記事が参考になります。

より正確に言うと、Diffusersを利用している前提で説明を行います。

まず、次のコードを実行してみてください。
プロンプトは、78個のトークンで構成されています。

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

seed = 3
device = "cuda"

model_id = "runwayml/stable-diffusion-v1-5"

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to(device)

prompt = " a dog " + ", " * 73 + " in the river"

generator = torch.Generator(device=device).manual_seed(seed)

image = pipe(
    prompt,
    num_inference_steps=25,
    guidance_scale=7.5,
    generator=generator,
).images[0]

image.save("test.png")

コードを実行すると、次のようなメッセージが表示されるはずです。

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['in the river']

そして、次のような画像が作成されます。

メッセージにもあるように、プロンプトから「in the river」部分が抜け落ちています。
したがって、画像は単なる「a dog」に過ぎません。

長いプロンプトに対応するように、コードを改良しましょう。
改良したコードは、以下。

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

seed = 3
device = "cuda"

model_id = "runwayml/stable-diffusion-v1-5"

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    custom_pipeline="lpw_stable_diffusion"
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to(device)

prompt = " a dog " + ", " * 73 + " in the river"

generator = torch.Generator(device=device).manual_seed(seed)

image = pipe(
    prompt,
    num_inference_steps=25,
    guidance_scale=7.5,
    generator=generator,
    max_embeddings_multiples=2
).images[0]

image.save("test.png")

元のコードに、次の行を加えただけです。

custom_pipeline=”lpw_stable_diffusion”
max_embeddings_multiples=2

これらはペアで追加します。
max_embeddings_multiplesの方の値は、トークン数に応じて調整します。

では、改良したコードを実行してみましょう。
実行すると、次のようなメッセージが表示されます。

Token indices sequence length is longer than the specified maximum sequence length for this model (80 > 77). Running this sequence through the model will result in indexing errors

こんなメッセージは出ますが、画像は次のようなモノとなります。

「in the river」が、ちゃんと画像に反映されています。

max_embeddings_multiplesに関して、補足です。
例えば、「2」でダメな場合は次のようにメッセージで教えてくれます。

Token indices sequence length is longer than the specified maximum sequence length for this model (153 > 77). Running this sequence through the model will result in indexing errors
Prompt was truncated. Try to shorten the prompt or increase max_embeddings_multiples

プロンプトを短くするか、max_embeddings_multiplesの値を増やすかの二択です。
この場合だと、「3」にすれば「Prompt was truncated.」以降は表示されません。

以上、長いプロンプトを利用する方法を説明しました。