stablediffusion吧 关注:24,773贴子:80,051
  • 21回复贴,共1

遇到“RuntimeError: NaN detected in latents”报错怎么办?

只看楼主收藏回复

使用目录内的 python 进行启动....
19:26:39-330626 INFO Windows Python 3.10.11 D:\ruanjian\lora\lora-scripts-v1.5.1\python\python.exe
19:26:39-349106 INFO detected locale zh_CN, use pip mirrors
19:26:45-510343 INFO Torch 2.0.0+cu118
Torch backend: nVidia CUDA 11.8 cuDNN 8700
Torch detected GPU: NVIDIA GeForce GTX 1660 SUPER VRAM 6144 Arch (7, 5) Cores 22
19:26:45-555863 INFO Starting tensorboard...
19:26:46-064687 INFO Server started at http://127.0.0.1:28000
TensorBoard 2.10.1 at http://127.0.0.1:6006/ (Press CTRL+C to quit)
19:27:32-399047 INFO Training started with config file / 训练开始,使用配置文件:
D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732.toml
19:27:32-430074 INFO Task 35c66513-49f6-4eac-b36a-1447579619c1 created
Loading settings from D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732.toml...
D:\ruanjian\lora\lora-scripts-v1.5.1\config\autosave\20230926-192732
prepare tokenizer
update token length: 255
Using DreamBooth method.
prepare images.
found directory D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai contains 21 image files
No caption file found for 21 images. Training will continue without captions for these images. If class token exists, it will be used. / 21枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\1.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\10.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\11.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\12.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\13.png
D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\14.png... and 16 more
210 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 640)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir: "D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai"
image_count: 21
num_repeats: 10
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: liliai
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 65.56it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 640), count: 210
mean ar error (without repeats): 0.0
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: D:/ruanjian/lora/lora-scripts-v1.5.1/sd-models/model.ckpt
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Enable xformers for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|██████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<?, ?it/s]
caching latents...
0%| | 0/21 [00:24<?, ?it/s]
Traceback (most recent call last):
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\train_network.py", line 998, in <module>
trainer.train(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\train_network.py", line 259, in train
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 1870, in cache_latents
dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 905, in cache_latents
cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.random_crop)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\sd-scripts\library\train_util.py", line 2213, in cache_batch_latents
raise RuntimeError(f"NaN detected in latents: {info.absolute_path}")
RuntimeError: NaN detected in latents: D:\ruanjian\lora\lora-scripts-v1.5.1\train\liliai\10_liliai\1.png
Traceback (most recent call last):
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 996, in <module>
main()
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "D:\ruanjian\lora\lora-scripts-v1.5.1\python\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\ruanjian\\lora\\lora-scripts-v1.5.1\\python\\python.exe', './sd-scripts/train_network.py', '--config_file', 'D:\\ruanjian\\lora\\lora-scripts-v1.5.1\\config\\autosave\\20230926-192732.toml']' returned non-zero exit status 1.
19:30:05-229826 ERROR Training failed / 训练失败


IP属地:四川1楼2023-09-26 19:47回复
    同问 请问楼主解决了吗 一模一样的报错


    2楼2023-10-12 12:17
    回复
      不懂,过来看一眼。


      IP属地:广东来自Android客户端3楼2023-10-12 12:18
      回复
        出现了一摸一样的错误,请问应该怎么解决?


        IP属地:福建4楼2023-10-18 15:33
        回复
          贴吧都没有人回答,刚去问了我师父,是vae的问题。
          去这里下载一个VAE填上去即可 https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/tree/main
          详细可以参考这个地址:https://www.bilibili.com/video/BV1Vz4y137kj/?spm_id_from=333.337.search-card.all.click&vd_source=674ea6d3cd9c0cb17f1b809c8bcbbc78


          IP属地:福建5楼2023-10-18 16:18
          收起回复
            1图片大小可以设置成1024*1024 甚至更大点
            2 必须设置VAE 不需要开v2
            3 路径名都不能含有中文
            4 必须使用 sdxl-vae-fp16-fix
            否则会报错 RuntimeError(f"NaN detected in latents: {info.absolute_path}")
            https://huggingface.co/madebyollin/sdxl-vae-fp16-fix


            IP属地:辽宁6楼2023-11-09 12:31
            收起回复
              保存模型精度,选bf16, 别选fp16


              IP属地:湖南7楼2024-03-19 02:56
              回复
                我也是这个情况,你解决了吗?告知一下


                IP属地:河北8楼2024-04-21 14:07
                收起回复
                  sdxl-vae-fp16-fix 是正确的,我就不报错了


                  IP属地:上海9楼2024-07-22 14:06
                  回复