Customer analytics analyst Interview Questions

9K

Customer Analytics Analyst interview questions shared by candidates

Top Interview Questions

Sort: Relevance|Popular|Date
Meta
Data Scientist, Analytics was asked...5 March 2015

Write a SQL query to compute a frequency table of a certain attribute involving two joins. What if you want to GROUP or ORDER BY some attribute? What changes would you need to make? How would you account for NULLs?

25 Answers

Posts and comments in the same table looks weird. Here's my attempt (made easy with CASE) to exclude all the posts from the table and grouping/counting comments. SEL parent_id ,COUNT(*) as comment_count ( SEL * ,CASE WHEN perent_id IS NULL THEN 'Post' ELSE 'comment' END as post_or_comment FROM Submissions ) a WHERE post_or_comment = 'comment' Less

Here is the solution. You need a left self join that accounts for posts with zero comments. Select children , count(submission_id) from ( Select a.submission_id, count(b.submission_id) as children from Submissions a Left Join submissions b on On a.submission_id=b.parent_id Where a.parent_id is null Group by a.submission_id ) a Group by children Less

I've tested all these on a mock data set and none of them work! Does anyone have the correct solution? I'm stuck on this one.. Less

Show More Responses
Meta

There are two mobile restroom stalls at a construction site where I work. There are also three situations that have an equal chance of occurrence: a. none of them is occupied b. only one of them is occupied c. both are occupied 1. If I were to pick one at random, what is the probability that it is occupied? 2. If it turns out that that first one I go to is occupied and I decide to try the other one, what is the probability that the second one is also occupied?

13 Answers

the answer to the first question is - 1/3 + 1/3*1/2 = 1/2 the answer to the second question require the formula of conditional probability. Let's say: P(A) - probability that second stall is occupied P(B) - probability that the first stall is occupied P(A\B) = P(AandB) / P(B) P(B) = 1/2 (first question) P(AandB) = 1/3 P(A\B) = (1/3) / (1/2) = 2/3 Less

Above answer is wrong, the answer to the first and second question are both 1/2.

" "the answer to the first question is - 1/3 + 1/3*1/2 = 1/2" Can you please explain how you derive this? " There are also three situations that have an equal chance of occurrence - meaning each have probability 1/3 to occur. 1/3 - the probability that I chose the option where they are both occupied. 1/3 * 1/2 - the probability that if one occupied and the other isn't , I chose the one that is occupied. Less

Show More Responses
Meta

Lets say the population on Facebook clicks ads with a click-through-rate of P. We select a sample of size N and examine the sample's conversion rate, denoted by hat{P}, what is the minimum sample size N such that Probability( ABS(hat{P} - P) < DELTA ) = 95%. In other words (this is my translation), find the minimum sample size N such that our sample estimate hat{P} is within DELTA of the true click through rate P, with 95% confidence.

6 Answers

Interpret the question this way: we want to choose an N such that P_hat is an element of [P - delta, P + delta] with probability 95%. First, note that since P_hat is the sum of N Bernoulli trials with some common parameter (by assumption) that we are trying to estimate, we can safely assume P_hat to be normally distributed with mean equal to the true mean (P) and variance equal to (P)(1 - P) / N. Now, we when does a normally distributed random variable fall within delta of it's mean with 95% probability? The answer depends on how big delta is. Since P_hat is normally distributed, we know from our statistics classes that 95% of the time it will fall within 2 standard deviations of its mean. So in other words, we want [P - delta, P + delta] = [P - 2*SE(P_hat), P + 2*SE(P_hat)]. That is, we want delta = SE(P_hat). So what is the SE ("standard error") of P_hat? Well that's just the square root of its (sample) variance, or Sqrt(P_hat * (1 - P_hat) / N). But wait! We haven't run the experiment yet! How can we know what P_hat is? We can either (a) make an educated guess, or (b) take the "worst" possible case and use that to upper bound N. Let's go with option (b): P_hat * (1 - P_hat) is maximized when P_hat is .5, so the product is 0.25. To put it all together: delta = 2 * Sqrt(0.25) / Sqrt(N) = 2 * .5 / Sqrt(N) => N = (1 / delta) ^ 2. So when N is greater than (1 / delta)^2, we can rest assured that P_hat will fall within the acceptable range 95% of the time. Less

Why is the variance P(1-P) / N. Isn't it NP(1-P), because it is the binomial distribution (sum of Bernoulli trials)? Less

Use Chebyshev's inequality

Show More Responses
Meta

Given two binary strings, write a function that adds them. You are not allowed to use any built in string to int conversions or parsing tools. E.g. Given "100" and "111" you should return "1011". What is the time and space complexity of your algorithm?

6 Answers

In Python: def normalize_length(str1, str2): len1 = len(str1) len2 = len(str2) if (len1 = 0): if (input2[i] == "1") and (input1[i] == "1"): if(carry): result = "1" + result carry = 1 else: carry = 1 result = "0" + result i -= 1 if (input2[i] == "1") and (input1[i] == "0"): if (carry): result = "0" + result else: result = "1" + result i -= 1 if (input2[i] == "0") and (input1[i] == "1"): if (carry): result = "0" + result else: result = "1" + result i -=1 if (input2[i] == "0") and (input1[i] == "0"): if (carry): result = "1" + result carry = 0 else: result = "0" + result i -=1 if(carry): result = "1" + result carry = 0 return(result) str1 = "111" str2 = "1011" print(normalize_length(str1, str2)) print(add_binary(str1, str2)) Obviously there are better ways to do this, but hey: my solution is O(N). Less

Ignore the answer above - didn't realize that Glassdoor would cut off parts of my answer for being too long. Assuming you already wrote the normalizing code to make the input lengths the same by adding zeros: def add_binary(input1, input2): normalized = normalize_length(input1, input2) input1 = normalized[0] input2 = normalized[1] length = len(input1) result = "" carry = 0 i = length-1 while(i >= 0): if (input2[i] == "1") and (input1[i] == "1"): if(carry): result = "1" + result carry = 1 else: carry = 1 result = "0" + result i -= 1 if (input2[i] == "1") and (input1[i] == "0"): if (carry): result = "0" + result else: result = "1" + result i -= 1 if (input2[i] == "0") and (input1[i] == "1"): if (carry): result = "0" + result else: result = "1" + result i -=1 if (input2[i] == "0") and (input1[i] == "0"): if (carry): result = "1" + result carry = 0 else: result = "0" + result i -=1 if(carry): result = "1" + result carry = 0 return(result) Less

def calc_bin_sum(bin1, bin2): ## bin1 conversion to a number based in 10 b1 = 0 for i in range(len(bin1)): b1 = b1 + int(bin1[i]) * (2**i) ## bin2 conversion to a number based in 10 b2 = 0 for j in range(len(bin2)): b2 = b2 + int(bin2[j]) * (2**i) ## Add two numbers corr_based_10 = b1 + b2 ## Change it back to binary def trans(x): binary = [] while x: binary.append(x % 2) x >>= 1 return binary return ''.join(map(str, trans(corr_based_10))) Less

Show More Responses
Meta

We have a table called ad_accounts(account_id, date, status). Status can be active/closed/fraud. A) what percent of active accounts are fraud? B) How many accounts became fraud today for the first time? C) What would be the financial impact of letting fraud accounts become active (how would you approach this question)?

6 Answers

A) what percent of active accounts are fraud? Select sum(Case when status = ‘fraud’ then 1 else 0 end)/count(*) as Fraud_percentage from ad_accounts where status ‘closed’; B) How many accounts became fraud today for the first time? select count(*) from ( select account_id, min(date) as First_fraud from ad_accounts where status = 'fraud' group by account_id having First_fraud = current_date() ); Less

A) what percent of active accounts are fraud? SELECT COUNT(DISTINCT t2.account_id)/COUNT( DISTINCT t1.account_id) AS perc_fraud FROM ad_accounts AS t1 INNER JOIN ad_accounts AS t2 ON t1.account_id = t2.account_id AND t2.status = 'fraud' AND t2.date > t1.date WHERE t1.account_id = 'active' B) How many accounts became fraud today for the first time? SELECT COUNT(DISTINCT t1.account_id) AS fraud_today FROM ad_accounts AS t1 INNER JOIN ad_accounts AS t2 ON t1.account_id = t2.account_id AND t2.status 'fraud' AND t2.date < t1.date WHERE t1.status = 'fraud' AND DATE_TRUNC('day, t1.date) = '2019-04-20'::timestamp Less

^ You need to left join

Show More Responses
Hyundai Capital America

what is logistic regression? How to perform variable selection

4 Answers

See answers for Capital One statistician questions

A Measurement of Variables

Logistic regression is a predictive analysis. To explain the relationship between the one dependent variable with another independent variable. Less

Show More Responses
International Flavors & Fragrances

How will you develop a method by Chromatography for a high matrix ed samples?

4 Answers

trial and error

Depends on the type of matrix . If proteins present, then electrophoretic method will be good but it's time consuming. If other species, then column and fractional chromarography would be ideal. Less

Gcms

Show More Responses
Enova

Imagine a cube 1x1x1, then imagine that you form a cube of 10x10x10 with all these little cubes of 1x1x1. How many cubes do you have to remove to get rid of the surface of cubes.

4 Answers

10*10*2 + 10*8*2 + 8*8*2

(10*10*10)-(8*8*8)

n^3 - x = (n-2)^3 x = n^3 - (n-2)^3 x = 6n^2 - 12n +8 if n = 10 then x = 488 surface cubes to remove Less

Show More Responses
Canopy Growth Corporation

How do you deal with a difficult situation/conflict?

4 Answers

First make sure I have all the relevant information and a good understanding of the situation/conflict. Follow any company policy and procedures. If not confident of my own knowledge or abilities consult with those I can trust. Using my best judgement and sense of fairness form a plan for resolution. The plan itself or its implementation might involve meetings with those involved to gain further insight. Follow up further to monitor progress towards resolution. If that doesn’t work try plan B. Less

... up in hopes of progress towards resolution. There is always plan “B”😀

How do I know if I got the job? Lol

Show More Responses
Bank of America

how did u handle multicolleanarity in logistic model

4 Answers

Quite a simple question, u can either add or drop variables; obtain a larger dataset to estimate the regression model; transform the variables ( eg. log transformation) etc. Less

Try the following: 1) Remove highly correlated predictor variables from Regression Model 2) Apply PCA (Principal Component Analysis) or LDA (Linear Discriminant Analysis) methods on data attributes 3) Choose appropriate sample size and ensure that computed VIF value is below 2 Less

1:pca for large number of features 2: RFE with VIF 3: if dataset has less number of features then plot a heat map,find highly correlated features and drop them Less

Show More Responses
Viewing 1 - 10 of 8,671 Interview Questions

See Interview Questions for Similar Jobs