Global Attention special token="token中的里長伯" Add special token into original sequence Attend to every token-collect global information Attended by every token->it knows global information No attention between non- special token … … Global Attention Add special token into original sequence • Attend to every token → collect global information • Attended by every token → it knows global information special token = “token中的里長伯“ No attention between nonspecial token