Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models

1Computer Vision Center, Spain 2Universitat Autònoma de Barcelona
3Harbin Institute of Technology 4City University of Hong Kong
5Program of Computer Science, City University of Hong Kong (Dongguan)
WACV 2026
ColorWave teaser showing color interpolation

ColorWave accurately reproduces subtle color variations in smooth interpolation between similar tones. Each column shows a different target object rendered with gradually shifting colors (displayed above each image). The results demonstrate that our method is sensitive to small changes in the RGB color space while preserving realistic object appearance and scene composition.

Abstract

Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.

Motivation: Semantic Attribute Binding

The core discovery behind ColorWave is semantic attribute binding — an implicit connection within IP-Adapter between visual attributes in reference images and their corresponding linguistic descriptors in text prompts. This previously unexplored property enables precise RGB-level color manipulation without any model fine-tuning.

Semantic attribute binding illustration

Illustration of semantic attribute binding. Given a color name in the prompt, the model picks the respective color from the reference image. This binding persists even when colors are changed or synthetic patches are used.

Similarity matrix visualization

Similarity matrix between color names and RGB values. The heatmap shows normalized dot product similarity between key projections of color word tokens and image features, revealing implicit color-semantic bindings.

Arbitrary Color Attribution Across Diverse Objects

ColorWave precisely applies user-specified colors to a wide range of subjects including inanimate objects, animals, plants, and human clothing. Our method successfully generates images where target objects accurately reflect the desired reference colors while maintaining high visual quality across diverse scenarios.

ColorWave results on diverse objects

ColorWave precisely applies user-specified colors to various target objects (bowl, bowling ball, plate, vase, pants, teddy bear, snooker ball, parrot, sofa, rose) while maintaining natural lighting, material properties, and contextual integration.

Method

ColorWave is a novel training-free approach that enables precise RGB color control in text-to-image diffusion models. Unlike previous methods, our approach doesn't require any additional optimization or model fine-tuning. The key insight behind ColorWave is the discovery of semantic attribute binding — an implicit connection between visual attributes in reference images and their corresponding linguistic descriptors in text prompts.

ColorWave method overview diagram

Overview of ColorWave. Our approach leverages semantic attribute binding between IP-Adapter and text cross-attention pathways to achieve precise color control. User-specified RGB values are encoded through IP-Adapter and selectively bound to object tokens in the text prompt, enabling training-free color attribution while preserving generative quality.

By systematically analyzing the cross-attention mechanisms within the IP-Adapter framework, we identified that the adapter inherently associates specific color values with their linguistic descriptors, enabling exact color specification without requiring additional training or fine-tuning. Our approach effectively "rewires" these connections to establish precise color attribution to target objects while maintaining the generative capabilities of the underlying diffusion model.

Comparison with Prior Work

Unlike previous methods like ColorPeel that require separate optimization processes for each individual color, ColorWave inherently accepts any arbitrary RGB triplet without modification. This represents not just an incremental improvement but a paradigm shift in how precise color control can be achieved in generative models.

Quantitative Evaluation

ColorWave achieves notably lower ΔE error in CIE Lab color space compared to existing training-free methods, indicating perceptually better colors. Our method also achieves comparatively lower mean angular error in both sRGB and Hue, signifying higher color accuracy in terms of chromaticity and hue.

Quantitative comparison table

Quantitative comparison with baselines. All metrics the smaller the better (↓). ColorWave outperforms all training-free methods and is competitive with training-based approaches despite requiring zero optimization time.

Advanced Applications

Beyond simple color attribution, ColorWave can be extended to more sophisticated visual attribute control tasks like applying complex color patterns and transferring textures. These examples illustrate how semantic attribute binding extends beyond simple color matching to more sophisticated visual attribute control.

Advanced applications of ColorWave

Generalizability to complex color patterns and textures. ColorWave successfully handles intricate patterns (crystal, metal textures) and multi-object scenes, demonstrating the versatility of semantic attribute binding.

Extensive Results Gallery

We showcase comprehensive results across various colors and object categories. Each grid demonstrates ColorWave's ability to handle different color specifications while maintaining consistent quality and accurate color reproduction across diverse objects.

All results show objects: bowl, vase, plate, ball, person, teddy bear, parrot, sofa, and rose. Reference colors are depicted in the external frame of each grid, demonstrating ColorWave's robustness across the color spectrum.

BibTeX

@inproceedings{laria2026colorwave,
  title={Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models},
  author={Laria, H{\'e}ctor and Gomez-Villa, Alexandra and Qin, Jiang and Butt, Muhammad Atif and Raducanu, Bogdan and Vazquez-Corral, Javier and van de Weijer, Joost and Wang, Kai},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}