Microsoft interview question

Transformer architecture and implement self-attention from scratch