IBM interview question

Clean a data set that has repeated words