Abstract
Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
Motivation: Semantic Attribute Binding
The core discovery behind ColorWave is semantic attribute binding — an implicit connection within IP-Adapter between visual attributes in reference images and their corresponding linguistic descriptors in text prompts. This previously unexplored property enables precise RGB-level color manipulation without any model fine-tuning.
Illustration of semantic attribute binding. Given a color name in the prompt, the model picks the respective color from the reference image. This binding persists even when colors are changed or synthetic patches are used.
Similarity matrix between color names and RGB values. The heatmap shows normalized dot product similarity between key projections of color word tokens and image features, revealing implicit color-semantic bindings.
Arbitrary Color Attribution Across Diverse Objects
ColorWave precisely applies user-specified colors to a wide range of subjects including inanimate objects, animals, plants, and human clothing. Our method successfully generates images where target objects accurately reflect the desired reference colors while maintaining high visual quality across diverse scenarios.
ColorWave precisely applies user-specified colors to various target objects (bowl, bowling ball, plate, vase, pants, teddy bear, snooker ball, parrot, sofa, rose) while maintaining natural lighting, material properties, and contextual integration.
Method
ColorWave is a novel training-free approach that enables precise RGB color control in text-to-image diffusion models. Unlike previous methods, our approach doesn't require any additional optimization or model fine-tuning. The key insight behind ColorWave is the discovery of semantic attribute binding — an implicit connection between visual attributes in reference images and their corresponding linguistic descriptors in text prompts.
Overview of ColorWave. Our approach leverages semantic attribute binding between IP-Adapter and text cross-attention pathways to achieve precise color control. User-specified RGB values are encoded through IP-Adapter and selectively bound to object tokens in the text prompt, enabling training-free color attribution while preserving generative quality.
By systematically analyzing the cross-attention mechanisms within the IP-Adapter framework, we identified that the adapter inherently associates specific color values with their linguistic descriptors, enabling exact color specification without requiring additional training or fine-tuning. Our approach effectively "rewires" these connections to establish precise color attribution to target objects while maintaining the generative capabilities of the underlying diffusion model.
Comparison with Prior Work
Unlike previous methods like ColorPeel that require separate optimization processes for each individual color, ColorWave inherently accepts any arbitrary RGB triplet without modification. This represents not just an incremental improvement but a paradigm shift in how precise color control can be achieved in generative models.
Quantitative Evaluation
ColorWave achieves notably lower ΔE error in CIE Lab color space compared to existing training-free methods, indicating perceptually better colors. Our method also achieves comparatively lower mean angular error in both sRGB and Hue, signifying higher color accuracy in terms of chromaticity and hue.
Quantitative comparison with baselines. All metrics the smaller the better (↓). ColorWave outperforms all training-free methods and is competitive with training-based approaches despite requiring zero optimization time.
Advanced Applications
Beyond simple color attribution, ColorWave can be extended to more sophisticated visual attribute control tasks like applying complex color patterns and transferring textures. These examples illustrate how semantic attribute binding extends beyond simple color matching to more sophisticated visual attribute control.
Generalizability to complex color patterns and textures. ColorWave successfully handles intricate patterns (crystal, metal textures) and multi-object scenes, demonstrating the versatility of semantic attribute binding.
Extensive Results Gallery
We showcase comprehensive results across various colors and object categories. Each grid demonstrates ColorWave's ability to handle different color specifications while maintaining consistent quality and accurate color reproduction across diverse objects.
Maroon Objects
Red Objects
Pink Objects
Orange Objects
Purple Objects
Green Objects
Turquoise Objects
Navy Objects
All results show objects: bowl, vase, plate, ball, person, teddy bear, parrot, sofa, and rose. Reference colors are depicted in the external frame of each grid, demonstrating ColorWave's robustness across the color spectrum.
BibTeX
@inproceedings{laria2026colorwave,
title={Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models},
author={Laria, H{\'e}ctor and Gomez-Villa, Alexandra and Qin, Jiang and Butt, Muhammad Atif and Raducanu, Bogdan and Vazquez-Corral, Javier and van de Weijer, Joost and Wang, Kai},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2026}
}