Is tensor parallelism available for stable diffusion?

I found this video https://www.youtube.com/watch?v=fq6hw_gZe-o&t=328s on youtube, the author managed to run 70b llama3 on 4xP100 GPU and got very good performance with low cost. I wonder if there is a way to run stable diffusion in similar way. As per above author, for any transformer based model, you can split the layers on different GPU and utilize all the vram by parallel compute. Nvidia this year released DistriFusion which builds on top of diffusers and is able to do Parallel Inference for SDXL. Is there a way to do this for flux?