I applied online. The process took 1 week. I interviewed at AppsFlyer
Interview
I had a very unpleasant experience interviewing at AppsFlyer. During the interview, I was repeatedly interrupted and wasn’t given the chance to fully explain my answers. They asked me to draw a system design of a project I’ve worked on, but I barely managed to sketch two components before they started firing off questions nonstop.
Despite this, I answered all their technical questions correctly (I later verified them) and even solved the coding problem perfectly. Still, they informed me by email that I hadn’t passed.
I made the effort to come to their office and spent an hour and a half in the interview, only to receive a rejection with no constructive feedback. Overall, it was one of the most unprofessional interview experiences I’ve had. I wouldn’t recommend interviewing there.
I applied through an employee referral. The process took 4 weeks. I interviewed at AppsFlyer in Feb 2023
Interview
I had four main interviews and a lot of small talks during my hiring process. First interview - intro with group manager. Mainly talking about myself and my experience. Second interview - intro with the team manager and then technical interview with the team manager and one of the team members. I got 4 pyspark questions via Google Colab (similar to Jupyter notebook) on the evening before the interview and had enough time to search for the correct answer in cases where I wasn't sure. During the interview, we talked about each question and its solution. The third interview was with an HR and the last one was with the CIO. After that, I had a short (15-minute) meeting with the COO, not an interview just small talk. Also, they asked for two previous managers for recommendations and wanted me to have a F2F meeting on-site (as all the interviews were via Zoom).
Interview questions [4]
Question 1
The Packages table represents packages that customers purchase. Each package has an ID, a start and end date (represented by a number), and a number of installs that the package includes. The Consumption table shows us how many installs each account used and when. When we get a user's consumption data, we need to check according to the date, which packages the user used. A user can only have one package at any given time. The report we need to calculate needs to show how many installs a user used from each of its packages, and how many installs remain in each package the user purchased. Packages table +-----+-------------------+----------------+-------------------+ |pack |pack_end_date |pack_installs |pack_start_date| +-----+-------------------+----------------+-------------------+ | 1 | 123460| 10| 123456| | 2 | 123470| 5| 123460| | 3 | 123475| 10| 123470| +-----+-------------------+----------------+-------------------+ consumption table +----------+--------------+---------+ |account |install_date | installs| +----------+--------------+---------+ | AB| 123459| 10| | AB| 123465| 5| | AB| 123466| 3| +----------+--------------+---------+
A developer on the team wrote an ETL that runs once a day as a Spark job. Every day it reads a CSV file that shows the total value of each customer's transactions of that day and writes them as a parquet file partitioned by date and customer id. Below you can see an example of the CSV file. Note that each customer has one entry representing the total transaction value it did on that day. However, sometimes the CSV file contains a correction for a sum reported in the past. For example - this file represents the transactions on 1/10. You can see that customer 1002 has 2 entries. One for 1/10 and one for 30/9. This means that the total sum of transactions the customer did on 1/10 is 70, but the total sum of transactions it did on 30/9 was 40 and this sum should replace the value already reported on 30/9. current date file: 2020-10-01 date,customer,price 2020-10-01,1000,40 2020-10-01,1001,10 2020-09-30,1002,40 2020-10-01,1002,70 2020-10-01,1003,10 2020-09-29,1004,10 2020-10-01,1004,10 This function represents the ETL. It runs once a day with a string representing the current day. It reads the CSV file, does some transformations, and writes it. Please help us find the bug in the code above, and return the right results
A developer on the team was running the following line in a function for logging purposes, and the job crashed with an "out of memory" exception. The developer says that the cluster has many workers with a lot of memory and disk and still the job crashes. Can you help explain how come to this line makes the job crash with OOM even though the cluster is huge? def someFunc(): for row in df.collect(): print(f'Customerr{row["customer"]} => Paid {row["price"]}')
Our developer had to join the results with a dimensional table of categories. The join works, but its a bit slow, see if you can understand why and whether it can run faster