PyTorch implementation of Vision Transformer (ViT).
Vision Transformer (ViT) - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
PyTorch implementation of Vision Transformer (ViT).
Vision Transformer (ViT) - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.