Innovations in Text-Guided Visual Content Generation
WANG Hao is a final year PhD candidate in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He received the B.E. degree from Huazhong University of Science and Technology, China. His research interest is developing AI-powered perception and generation algorithms for the multimodal domain. In particular, his recent work investigates the translation between visual and text data, to generate controllable contents with efficiency and robustness. He has published first-authored top-tier conference and journal work in computer vision and multimedia fields, including CVPR, ECCV, IEEE TPAMI, IEEE TIP, etc.
Text-guided visual content generation is a significant task in generative AI, which focuses on translating semantic information from text to visual content. Generating complex and high-quality visuals while maintaining control is a key challenge in this domain. In this talk, we will introduce two innovative frameworks: StyleGAN-based inversion and online alignment. These frameworks aim to overcome the existing challenges, where we enable high-fidelity visual generation and cross-modal semantic matching simultaneously. With our approach, the inference phase allows for the direct generation of visual content from textual input, streamlining the process into a single step.