research engineer interview questions shared by candidates
Build a program to process data from an emitter. The data arrives ordered and for every received record your program may take from 0.1 to 5 seconds to process. The processed data has to be given to a stream, ordered and in real time. For the sake of the example the processing time is random sleeping between 0.1 to 5 seconds.
Build a queues based system with multiple record processors that work in parallel, but make sure that this processing happens in parallel, not just concurrently as in the real world the CPU will be working, not just sleeping.
As an addition to the answer above: Parallelising the elements processing without extra logic around it would cause the processed elements to be published downstream in a non-deterministic order. If we want to maintain order and parallelism, a solution could be to have a (circular) atomic auto incrementing integer `i`, after processing an element `e` assign the latest `i` to it by putting them into a map from `i` to `e`. Keep track of the latest `i` which has been published downstream, let's call it `latest`. At this point, whenever `i` is incremented, check if `i` is the successor of `latest`, if that's the case it means you can publish that element downstream and you can also publish all the elements in the map that are successors (while clearing them from the map). If you use this approach in some cases (eg. when processing of one element produces lots of data), you should make sure the queue in bounded, not to risk out of memory while processing too many elements in parallel.