Publications
You can also find my articles on my Google Scholar profile.

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Tao, C.*, Su, S.*, Zhu, X.*, Zhang, C., Chen, Z., Liu, J., ... & Dai, J.CVPR 2025 / Paper / Model
HoVLE is a high-performance monolithic Vision-Language Model that uses a insightful holistic embedding module to effectively integrate vision and language, outperforming previous models.
Learning 1D Causal Visual Representation with De-focus Attention Networks
Tao, C.*, Zhu, X.*, Su, S.*, Lu, L., Tian, C., Luo, X., ... & Dai, J.NeurIPS 2024 / Paper / Code
De-focus Attention Networks are introduced to address the “over-focus” issue in 1D causal vision models by using several inspiring ideas, enabling 1D causal vision models to match 2D models in performance.