Abstract: The recent success of attention mechanism-driven deep models, like vision transformer (ViT) as one of the most representatives, has intrigued a wave of advanced research to explore their ...