正在加载图片...
MACHINE LEARNING BERKELEY Aside ● There are ways to make transformers more efficient (architecture-wise) ● BUT recall:a major appeal of using transformers is that they scale well relative to compute Transformer architectures are supposed to be simple:self attention is just huge matrix multiplications o huge matrix multiplications are good for parallelization o want to keep the architecture as simple as possible Aside ● There are ways to make transformers more efficient (architecture-wise) ● BUT recall: a major appeal of using transformers is that they scale well relative to compute ● Transformer architectures are supposed to be simple: self attention is just huge matrix multiplications ○ huge matrix multiplications are good for parallelization ○ want to keep the architecture as simple as possible
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有