NVIDIA interview question

Efficient ways to parallelize the matrix-multiplication