ResAdapter : Domain Consistent Resolution Adapter for Diffusion Models

ByteDance Inc, 
* Corresponding Author

Abstract

Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models (e.g., SD and the personalized model) to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with post-process, ResAdapter directly generates images with the dynamical resolution. This perspective enables the efficient inference without repeat denoising steps and complex post-process operations, thus eliminating the additional inference time. Enhanced by a broad range of resolution priors without any style information from trained domain, ResAdapter with 0.5M generates images with out-of-domain resolutions for the personalized diffusion model while preserving their style domain. Comprehensive experiments demonstrate the effectiveness of ResAdapter with diffusion models in resolution interpolation and exportation. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for images with flexible resolution, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images.

ResAdapter Pipeline

Overview of ResAdapter. Left: Pipeline of ResAdapter. It is based on the frozen model (e.g., SD or SDXL) learns resolution priors from mixed-resolution datasets, which can be integrated into any personalized model to generate multi-resolution images. Right: Architecture comparison between ResAdapter and the vanilla LoRA. ResAdapter is only inserted to downsampler and upsampler, and unfreezes the group normalization of resnet blocks.

Text to Image Tasks

Qualitative results about the text-to-image generation task. The baseline represents DreamshaperXL. For each pair of images, the left side is from ResAdapter with baseline and the right side is from baseline.


ResAdapter with ControlNet

Qualitative results about the image-to-image generation task. The baseline represents ControlNet. For each pair of images, the left side is from ResAdapter with baseline and the right side is from baseline.


ResAdapter with IP-Adapter

Qualitative results. The baseline represents IP-Adapter. For each pair of images, the left side is from ResAdapter with baseline and the right side is from baseline.


ResAdapter with LCM-LoRA

Qualitative results about the accelerating text-to-image task. The baseline represents Samaritan3dXL with LCM-LoRA or DreamshaperXL with LCM-LoRA. For each pair of images, the left side is from ResAdapter with baseline and the right side is from baseline.


ResAdapter with ElasticDiffusion

Samples 2048x204 images with our ResAdapter for efficient inference.


BibTeX

@article{cheng2024resadapter,
  title={ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models},
  author={Cheng, Jiaxiang and Xie, Pan and Xia, Xin and Li, Jiashi and Wu, Jie and Ren, Yuxi and Li, Huixia and Xiao, Xuefeng and Zheng, Min and Fu, Lean},
  journal={arXiv preprint arXiv:2403.02084},
  year={2024}
}