Compressing the attention operation is crucial for the efficiency of processing long inputs. Existing sparse attention methods (more specifically, local attention methods), such as StreamingLLM, adopt ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results