Mirelo AI interview question

How attention, diffusion models, and parallel neural network training work - extremely focused on specific details, like e.g., the exact formula of how AdaInNormalization works