stablediffusion吧 关注:24,858贴子:80,414
  • 0回复贴,共1

遇到“RuntimeError: NaN detected in latents”报错怎么办?

取消只看楼主收藏回复

使用目录内的 python 进行启动....
19:26:39-330626 INFO Windows Python 3.10.11 D:\ruanjian\lora\lora-scripts-v1.5.1\python\python.exe
19:26:39-349106 INFO detected locale zh_CN, use pip mirrors
19:26:45-510343 INFO Torch 2.0.0+cu118
Torch backend: nVidia CUDA 11.8 cuDNN 8700
Torch detected GPU: NVIDIA GeForce GTX 1660 SUPER VRAM 6144 Arch (7, 5) Cores 22
19:26:45-555863 INFO Starting tensorboard...
19:26:46-064687 INFO Server started at http://127.0.0.1:28000
TensorBoard 2.10.1 at http://127.0.0.1:6006/ (Press CTRL+C to quit)
19:27:32-399047 INFO Training started with config file / 训练开始,使用配置文件:
D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732.toml
19:27:32-430074 INFO Task 35c66513-49f6-4eac-b36a-1447579619c1 created
Loading settings from D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732.toml...
D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732
prepare tokenizer
update token length: 255
Using DreamBooth method.
prepare images.
found directory D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai contains 21 image files
No caption file found for 21 images. Training will continue without captions for these images. If class token exists, it will be used. / 21枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\1.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\10.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\11.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\12.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\13.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\14.png... and 16 more
210 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 640)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir: "D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai"
image_count: 21
num_repeats: 10
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: liliai
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 65.56it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 640), count: 210
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: D:/ruanjian/lora/lora-scripts-v1.5.1/sd-models/model.ckpt
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Enable xformers for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<?, ?it/s]
caching latents...
0%| | 0/21 [00:24<?, ?it/s]
Traceback (most recent call last):
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\train_network.py", line 998, in <module>
trainer.train(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\train_network.py", line 259, in train
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 1870, in cache_latents
dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 905, in cache_latents
cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.random_crop)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 2213, in cache_batch_latents
raise RuntimeError(f"NaN detected in latents: {info.absolute_path}")
RuntimeError: NaN detected in latents: D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\1.png
Traceback (most recent call last):
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 996, in <module>
main()
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\ruanjian\\lora\\lora-scripts-v1.5.1\\python\\python.exe', './sd-scripts/train_network.py', '--config_file', 'D:\\ruanjian\\lora\\lora-scripts-v1.5.1\\config\\autosave\\20230926-192732.toml']' returned non-zero exit status 1.
19:30:05-229826 ERROR Training failed / 训练失败


IP属地:四川1楼2023-09-26 19:47回复