ENHANCED UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION USING CONTRASTIVE LEARNING AND HISTOGRAM OF ORIENTED GRADIENTS-Upubscience Publisher

ENHANCED UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION USING CONTRASTIVE LEARNING AND HISTOGRAM OF ORIENTED GRADIENTS

Download as PDF

Volume 1, Issue 1, Pp 51-59, 2024

DOI: https://doi.org/10.61784/mjet3010

Author(s)

WanChen Zhao, WenHan Wang, Cheng Zhang, XiaoLei Qu^*

Affiliation(s)

College of Instrument Science and Optoelectronic Engineering, Beihang University, Beijing 100191, China.

Corresponding Author

XiaoLei Qu

ABSTRACT

Image-to-Image Translation is a vital area of computer vision that focuses on transforming images from one visual domain to another while preserving their core content and structure. However, this field faces two major challenges: first, the data from the two domains are often unpaired, making it difficult to train generative adversarial networks effectively; second, existing methods tend to produce artifacts or hallucinations during image generation, leading to a decline in image quality. To address these issues, this paper proposes an enhanced unsupervised image-to-image translation method based on the Contrastive Unpaired Translation (CUT) model, incorporating Histogram of Oriented Gradients (HOG) features. This novel approach ensures the preservation of the semantic structure of images, even without semantic labels, by minimizing the loss between the HOG features of input and generated images. The method was tested on translating synthetic game environments from GTA5 dataset to realistic urban scenes in cityscapes dataset, demonstrating significant improvements in reducing hallucinations and enhancing image quality.

KEYWORDS

Image-to-image translation; Photorealism; GANs

CITE THIS PAPER

WanChen Zhao, WenHan Wang, Cheng Zhang, XiaoLei Qu. Enhanced unsupervised image-to-image translation using contrastive learning and histogram of oriented gradients. Multidisciplinary Journal of Engineering and Technology. 2024, 1(1): 51-59. DOI: https://doi.org/10.61784/mjet3010.

REFERENCES

[1] Richardson E, Alaluf Y, Patashnik O, et al. Encoding in style: a stylegan encoder for image-to-image translation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 2287-2296.

[2] Kiani L, Saeed M, Nezamabadi-pour H. Image colorization using generative adversarial networks and transfer learning. 2020 International Conference on Machine Vision and Image Processing (MVIP). IEEE, 2020: 1-6.

[3] Li X, Guo X. SPN2D-GAN: semantic prior based night-to-day image-to-image translation. IEEE Transactions on Multimedia, 2022, 25: 7621-7634.

[4] Isola P, Zhu J Y, Zhou T, Efros A A. Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017: 5967-5976.

[5] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, 2017: 2223-2232.

[6] Cherian A, Sullivan A. Sem-GAN: Semantically-consistent image-to-image translation. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019: 1797-1806.

[7] Tang H, Liu H, Xu D, et al. Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 34(4): 1972-1987.

[8] Emami H, Aliabadi M M, Dong M, et al. SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia, 2020, 23: 391-401.

[9] Liu M Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Advances in Neural Information Processing Systems, 2017, 30.

[10] Han J, Shoeiby M, Petersson L, et al. Dual contrastive learning for unsupervised image-to-image translation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 746-755.

[11] Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005, 1: 886-893.

[12] Richter S R, Vineet V, Roth S, Koltun V. Playing for data: Ground truth from computer games. Proceedings of the European Conference on Computer Vision (ECCV), 2016.

[13] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223.

[14] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, 27.

[15] Radford A. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

[16] Wang T C, Liu M Y, Zhu J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8798-8807.

[17] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, 2017, 30.

[18] Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.

[19] Zhang B, Gu S, Zhang B, et al. Styleswin: Transformer-based GAN for high-resolution image generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 11304-11314.

[20] Wyawahare M, Ekbote N, Pimperkhede S, et al. Conversion of Satellite Images to Google Maps Using GAN. International Conference on Innovations in Computational Intelligence and Computer Vision. Singapore: Springer Nature Singapore, 2022: 103-117.

[21] Kim T, Cha M, Kim H, et al. Learning to discover cross-domain relations with generative adversarial networks. Proceedings of the International Conference on Machine Learning. PMLR, 2017: 1857-1865.

[22] Yi Z, Zhang H, Tan P, et al. DualGAN: Unsupervised dual learning for image-to-image translation. Proceedings of the IEEE International Conference on Computer Vision, 2017: 2849-2857.

[23] Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation. Proceedings of the European Conference on Computer Vision (ECCV), 2018: 172-189.

[24] Choi Y, Choi M, Kim M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8789-8797.

[25] Richter S R, AlHaija H A, Koltun V. Enhancing photorealism enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(2): 1700-1715.

[26] Gonzalez-Garcia A, Van De Weijer J, Bengio Y. Image-to-image translation for cross-domain disentanglement. Advances in Neural Information Processing Systems, 2018, 31.

[27] Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training GANs. Advances in Neural Information Processing Systems, 2016, 29.

[28] Bińkowski M, Sutherland D J, Arbel M, et al. Demystifying MMD GANs. arXiv preprint arXiv:1801.01401, 2018.

[29] Heusel M, Ramsauer H, Unterthiner T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, 2017, 30.

[30] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2818-2826.

[31] Paszke A, Gross S, Massa F, et al. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 2019, 32: 8024-8035.

[32] Jung C, Kwon G, Ye J C. Exploring patch-wise semantic relation for contrastive learning in image-to-image translation tasks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 18260-18269.

[33] Jiang L, Zhang C, Huang M, et al. TSIT: A simple and versatile framework for image-to-image translation. Proceedings of the European Conference on Computer Vision (ECCV), 2020: 206-222.