Build auto end-to-end image-to-image transformer using stable diffusion web framework
There are many articles, posts or YouTube video teaching how to use stable diffusion web UI framework, https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/master, to replace face, background, cloth, style, etc. The interactive UI is easy to explore the capability of stable diffusion, e.g. how different diffusion model effects on the style of generated image, how to use inpainting to replacing face, cloth, background. But we are interesting to build auto end-to-end pipeline to do model transform, i.e. keeping cloth while replacing model and their pose. Here is the main recipe:
- Choose a base stable diffusion model based on the style your preferred. There are a lot of open-source models you can find in https://civitai.com/
- Segment and mask cloth. You can use Meta segment anything, https://segment-anything.com/
- Select a pose template. You can choose image that can extract pose. OpenPose can do pose extraction, https://github.com/CMU-Perceptual-Computing-Lab/openpose.
- Align the cloth to pose template. The step is very important. If the alignment is good, perfect image is generated. Otherwise, the generated image is weird something.
Connect all the modules above together, end-to-end pipeline can be built. Input a cloth image & pose template image, then output try-on cloth image with human model. Of course, the method works best on mannequin image transforming to human model. But for general challenging try-on, it cannot.