APPLICATION OF TRANSFORMER ARCHITECTURE TO PROBLEM SUPER-RESOLUTION
Keywords:
super-resolution, transformer architecture, convoluting neural network, computer visionAbstract
In the course of the last 15 years convolutional neural networks are basic approach to the solution of the problems, dealing with computer vision and demonstrate high level of productivity. However, transformer architecture, that showed high performance in the field of natural language processing, find wide application in the field, connected with the solution of computer vision problems and demonstrate comparative or even better results. The authors considered the application of the transformer architecture to super-resolution problem, the paper also contains a short review of the previous approaches. Direct application of the original transformer architecture enabled to provide the performance, comparable with the present-day convolutional neural networks. However, efficient application of transformer architecture to the problems of computer vision is connected with the challenges, following from the differences between visual and language domains. The first difference is the scale, as the images contain visual elements of various scales, this complicates their processing, using the transformer architecture, that is analogous to tokens processing in NLP, it operates with the fragments of the same size. The second difference – volume of information, as the computational complexity of self-attention processing is quadratic to the length of the input sequence, that becomes especially critical for processing of the high resolution images.
The given paper analyzes 12 papers devoted to this subject, starting from 2021, they consider various approaches, aimed at elimination of these problems. The following directions of the research may be highlighted in these papers: study of the application of local attention with the windows of different forms, in particular, dispersed attention; study of channel self-attention and its combination with spatial attention; study of the possibilities of transformer architecture enlargement by means of convolutional blocks. The above-mentioned studies enabled to improve considerably the quality of the reconstructed images, but the studies are not exhaustive.
Downloads
-
PDF
Downloads: 1