while running stable diffusion and torch on cpu RuntimeError: expected scalar type BFloat16 but found Float

2 min read 05-10-2024

while running stable diffusion and torch on cpu RuntimeError: expected scalar type BFloat16 but found Float

Stable Diffusion on CPU: Battling the "RuntimeError: expected scalar type BFloat16 but found Float"

The Problem:

You're excited to run Stable Diffusion on your CPU, but when you fire it up, you encounter the error message "RuntimeError: expected scalar type BFloat16 but found Float". This frustrating error means your CPU doesn't support the BFloat16 data type required by Stable Diffusion's PyTorch implementation.

In simpler terms: Stable Diffusion is trying to use a special kind of number format (BFloat16) to run faster, but your CPU doesn't understand that format.

The Scenario:

from diffusers import StableDiffusionPipeline
from diffusers import EulerDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
pipe = pipe.to("cpu") # Move model to CPU

# Generate an image
image = pipe("a cute cat wearing a hat", num_inference_steps=50).images[0]

# Save the image
image.save("cat_with_hat.png")

This is a common code snippet for generating images with Stable Diffusion. However, running this on a CPU often results in the "RuntimeError: expected scalar type BFloat16 but found Float" error.

The Solution:

The solution is straightforward: use the standard float32 data type instead of bfloat16. Here's how you modify your code:

from diffusers import StableDiffusionPipeline
from diffusers import EulerDiscreteScheduler

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float32)
pipe = pipe.to("cpu") # Move model to CPU

# Generate an image
image = pipe("a cute cat wearing a hat", num_inference_steps=50).images[0]

# Save the image
image.save("cat_with_hat.png")

By changing torch_dtype to torch.float32, you tell Stable Diffusion to use the standard float format which your CPU understands.

Understanding BFloat16 and Float:

BFloat16 is a specialized data type that utilizes a smaller memory footprint than float32 while still offering relatively high precision. It's often used in specialized hardware like TPUs for faster processing.
Float32 is the standard floating-point data type widely supported by CPUs. It provides a balance of precision and performance.

Since most CPUs don't have dedicated hardware for BFloat16 calculations, using float32 ensures compatibility and avoids the error.

The Trade-Off:

Using float32 instead of bfloat16 will generally result in slightly slower processing times, but you won't experience the error and can successfully run Stable Diffusion on your CPU.

Additional Considerations:

Performance Impact: The difference in speed between float32 and bfloat16 can vary depending on your CPU and the complexity of the model. It's best to test and see the impact for yourself.
Memory Usage: Using float32 might require slightly more memory due to its larger size compared to bfloat16.
GPU Support: If you have a GPU with BFloat16 support, you can potentially achieve faster performance by using torch_dtype=torch.bfloat16 and running the model on your GPU.

Conclusion:

Running Stable Diffusion on a CPU is possible without BFloat16 support by simply switching to the float32 data type. While this might introduce a slight performance difference, it guarantees compatibility and allows you to enjoy the power of Stable Diffusion on your machine.