In a previous article I explained the concept & formula behind advertising adstock. This article focuses on how to analytically derive the adstock rate. I’ll also compare and contrast this method to assumption based methods followed by some consulting companies in the Marketing Mix space.

First, I start with a little story. A marketing manager tries to create a model that can predict sales based on different advertising levels. He or she asks a statistician to test a particular adstock rate for the advertising variable in the model. The results comes back and it is either insignificant or unsatisfactory. The marketing manager then asks the statistician to try a different adstock level. This process repeats itself till there is something of value and makes sense.

This process however isn’t only inefficient, it’s is also wrong. The manager is “assuming” an adstock rate. What makes this even worse is some consulting companies in the Marketing Mix space repeat this mistake on a massive scale. They use expensive computing power on servers and clusters to test every potential adstock rate.

The best approach is to analytically derive the adstock rate based on the data. The field of operations research, or in particular, mathematical optimization, can lend a hand. We can set up an optimization program in Excel or any other programming language to derive the optimal adstock rate.

Here are simple instructions to follow:

- Start with the data

Get sales, i.e. actual, and the advertising variable(s) - Set up regression

Predicted Sales = α + β * adstock(Advertising)

α and β are linear regression parameters

The adstock function “adstock(Advertising)” is defined as A_{t}= X_{t}+ adstock rate * A_{t-1} - Setup & run optimization

Minimize the sum of squared errors for regression formula by changing the adstock rate

Mathematically speaking

Objective function: Minimize Σ (Actual – Predicted)^{2}

Subject to: 0 <= adstock rate < 1

The workbook here has a complete mathematical setup to calculate the optimum adstock rate as described above. I use the LINEST function to simplify the least square minimization and regression formula. You’ll need to have Solver Add-in installed with Excel. Hint: It is already installed; you just need to load it – Instructions on loading Solver Add-in.

This approach has two benefits: (1) Fast: There are no iteration is necessary and (2) Accurate: Adstock rate will be defined to the most significant decimal point. Assumption based rates as a comparison will often stop at 28% while actual adstock rate is 28.54783%. To credit consulting companies, though, an adstock rate of 28% is “good enough”.

It is important to note that the optimum adstock rate can lead to negative correlation with sales data and hence lead to the interpretation that advertising negatively affects sales. The statistician in this case has to either test different adstock rates or apply constrained optimization. Sometimes, however, advertising can lead to negative effects in the case of over saturated market.

Update: The R approach is to use the nls() function with the same set up as above.

```
# Define Adstock Function
adstock <- function(x, rate=0){
return(as.numeric(filter(x=x, filter=rate, method="recursive")))
}
# Run Optimization
modFit <- nls(sales~b0+b1*adstock(ad, rate),
start=c(b0=1, b1=1, rate=0))
summary(modFit)
```

Note that the R code above uses the Gauss-Newton algorithm which doesn’t take into account any constraints and hence the rate might not be kosherly within 0-1 bounds. If you want to set up constraints then you can put a penalty in the formula function or use the Port algorithm. However, nls documentation mentions that the Port algorithm appears unfinished so use with caution. Alternatively, you can use the nlsLM() function in the “minpack.lm” package. That function uses Levenberg-Marquardt and it does work with lower and upper bounds.

JBHey! I know it’s asking a lot, but would you pleeeease give a equivalent of this optimisation in SAS code? I spent the evening looking around the proc optmodel, but I didn’t reach any solution… any idea??

And THANKS A LOT for your articles: they helped me A LOT !

analyticsartistPost authorHi JB, I’m working on a version in R for the moment. SAS will take sometime.

Keith C.Hi Gabriel, Did you ever have any luck with the R code for deriving the Adstock Rate? Wondering if that will run faster, as the Excel version is bogging down my machine. And thanks for all of this!! It’s great stuff!!

analyticsartistPost authorI actually did. The short version is the nls function. I’ll update the blog soon but can’t commit to a time.

analyticsartistPost authorJB, I found this article and I think you’ll find it useful in building the an optimization in SAS.

Click to access 242-2010.pdf

Will you please let me know if you were able to get it done that way and post the code?

HarryHi Gabriel,

This is a fantastic blog and one of the best MMM resources I’ve found on the web!

Is it possible to use the nlsLM optimisation package with a negative exponential function? I’ve tried with the following R code (a variation of yours!), however the results keep setting all of my variables to the upper bounds set within the nlsM function: My code is as follows:

____________________________________________________________________

#FUNCTIONS####

# this function does the adstocks

adstock_fun <- function(x, lambda, init=0) {

stats::filter(x, lambda, method="recursive", init=init, sides=1)

}

# this function does diminishing returns

dim_ret_func <- function(x,b) {

1-exp(-x/b)}

#decay function

decay_func <- function(x,lambda,b){

dim_ret_func(adstock_fun(x,lambda),b*max(x))}

#EXAMPLE TEST#####

# Set seed for reproducibility

set.seed(4321)

# Create empty shell advertising for 104 weeks (2 years)

ad1 <- rep(0, 104)

ad2 <- rep(0, 104)

# Fill advertising with data

ad1[3:64] <- runif(64-3+1, 0, 200)

ad2[53:93] <- runif(93-53+1, 0, 200)

# Create sales variable

sales <- 1000 + 30 * decay_func(ad1, 0.1, 0.7) + 20 * decay_func(ad2, 0.1, 0.5)

# Create noise

sales <- sales + rnorm(104, mean = 0, sd = 3)

# Solve with nlsLM

library(minpack.lm)

solve solve

Nonlinear regression model

model: sales ~ c0 + c1 * decay_func(ad1, a1, d1) + c2 * decay_func(ad2, a2, d2)

data: parent.frame()

c0 c1 c2 a1 a2 d1 d2

998.41 20.49 15.46 0.70 0.70 0.90 0.90

residual sum-of-squares: 3119

Number of iterations to convergence: 15

Achieved convergence tolerance: 1.49e-08

All ad stocks and denominators are set to the maximums specified in the upper argument. Should this be happening?

Finally if I apply the output of the above algorithm to a linear regression model, should the coefficients be the same?

#LM CHECK#####

check <- lm( sales ~ c0 + decay_func(ad1, a1, d1) + decay_func(ad2, a2, d2))

summary(check)

Should c0, c1 and c2 from check be the same as in the solve?

Thanks,

Harry

JBI am right to suppose that in your formula, “Predicted” is to be replaced by Sales ?

analyticsartistPost authorIt is supposed to be called “Predicted Sales”. I’ve already modified.

JBRight, thanks 🙂

gauravHi,

Thanks for sharing this. I had a question regarding computing the optimum adstock rate.. I looked at your workbook for “Adstock Rate – Deriving with Analytical Method”. If we’re dealing with multiple advertising variables at the same time, does this optimization method still work??

Thanks,

Gaurav

AnalyticsArtistPost authorIt definitely does. On the site I had a simple example as a demonstration but you can take it a step further and add multiple variables.

gauravThanks.. So, say we have 2 simultaneous ad campaigns running for the product in question. Then using your excel sheet for optimizing the adstock rate, we’ll essentially need to consider 2 adstock rates for the 2 ads, and set up the regression equation such that we have sales on the left side and both advertising variables (in their adstock form) on the RHS and then use solver to optimize the 2 adstock rates together by minimizing the SSE of the regression?? Thanks again…

AnalyticsArtistPost authorThat is the basic idea. However, there are two things to consider:

AllenHi Gabriel

As you suggested, this nlsLM() is better than nls() because it works with lower and upper bounds of the parameters. So I tried installing the “minpack.lm” package, everything works well until I starting to put values in my lower and upper bounds, it produce unreasonable results and I really have no idea where went wrong. Could you show an example using these lower/upper please? Many thanks

Allen

AnalyticsArtistPost authorHi Allen,

Here is an example:

AllenThat’s fantastic! Thank you so much!!~~~~~~~ now it works really well. I booked marked your page, will check regularly for new blogs : )

AbhiIf I have log transformed variables on both sides (sales and adstock), can I still use this?

AnalyticsArtistPost authorYes, you can. Give it a try and see if your model performance improves.

Jimena ZumaetaThank you for this entry! It was really helpful. However, I am curious as to what happens with the b1 and b2 estimated.

I thought that when I re-fitted my model with my optimal r1 and r2, I would get the same betas, but that is not the case.

I am fitting several variables into a model, among them there are 2 advertising ones (TV and Radio GRP). I used this model to optimise the alphas with nlsLM.

Therefore, I am wondering if I should just ignore the betas that nlsLM found.

Thank you!

AnalyticsArtistPost authorBeta will change with every data transformation. You should look at the combined effect of B*f(x) over the modeling dataset.

philoinmeHello there Analytics Artist,

This blog is really useful. And it is an equivalent to going through many research papers and trying out stuff which works out.

Question about Adstock optimization:

I understood your way but if we talk about the ideal way, would it be to fit all the variables along with adstock variables of the respective channels and then use the nls() package to find the optimal parameters?

My question is like a follow up question to what Allen posted.

Akshay

AnalyticsArtistPost authorHi Akshay, Yes, I do that. There are multiple ways. I first try each variable independently. I can then build multiple models with various variables. I’ll eventually get a consistent adstock to use.

philoinmeBut as you consistently add on new variables, what is the use of building models with only one variable?

The parameters and respective coefficients change anyways.

Akshay

Sray AgarwalHi, I am exploring MMM and have applied all the ideas you suggested, however my most of the adstocks are insignificant. Can we connect over email or skype and explore the dataset….

AnalyticsArtistPost authorI’m happy to help you. Please use the contact page to contact me and we can take this offline. However I can also answer questions on the blog comments so other people can benefit.

JeffThis is a great blog- please keep adding content!

I am curious about the use of NLS to optimize the adstock parameters. Isn’t this a recipe for overfitting? It seems like using a grid search (or similar) with cross validation would be better in order to generalize – which I think is what you want in a MMM?

AnalyticsArtistPost authorI agree with you that cross validation is better. However, a grid search wouldn’t work. Here is why:

Now, how to address cross validation? One quick way is to subset the data into two sets according to time. Run the optimization on the first set and validate on the second. I’ll address more complex validation techniques in a different post.

JeffGood stuff.

There are cross validation techniques for time series, maybe something along those lines. Ala Rob Hyndman posts on the topic.

RE grid search, not sure if I agree. I mean a lot of meta parameters in machine learning are over the real numbers. Think of learning rates for NNs or kernel parameters for SVMs.

Short of brute strength, something like randomized search could be useful.

http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.RandomizedSearchCV.html#sklearn.grid_search.RandomizedSearchCV

My 2 cents, It just struck me that it might be better to optimize for these parameters within the context of the whole model – with control variables, trend etc. include.

Kamesh CasukhelaPardon my ignorance but what is

xin the model?_{t}AnalyticsArtistPost authorxis the advertising level for week_{t}t.AngelaHi, I tried the nls method you posted in the article and get a rate of -0.921620, since you said we need to control it between 0 and 1, so I used the nlsLM function in minpack.lm package and set the boundary, but the rate is 1.0 when calculate it with this way. What does it mean? Many thanks.

AnalyticsArtistPost authorI did noticed that this happens from time to time. Try limiting your date ranges, i.e. try your model on different chunks of the data.

AlexHi Gabriel, when you estimate the adstock rate you use the simple linear regression. But in the blogs later you also indicated the relationship could be non-linear. So do you think it’s more accurate to use some non-linear regression to estimate the adstock rate? Like log-log function (log_sales = a+b*log(Adstock)).

Thanks

Alex

AnalyticsArtistPost authorI’ve used this approach before. I wouldn’t call it more accurate though. It’s just another model formulation. Nielsen, the marketing company, popularized this approach. It’s called the ScanPro model. It would be great if you can point me to how this model is interpreted though. Like, what is the effect on sales in a log-log model?

Tamas GantnerHi Gabriel

I’ve got a question regarding how you propose to optimise the response curve/decay parameters. It is clear that you would be using a package such as nls, nls2 or nlsLM but what sort of y variable do you use in the formula as response? We are trying to optimise the parameters so we can find the contribution of a variable x but we can’t find the contribution until we’ve optimised it… This seems to be a vicious circle to me. Or are you proposing just to use the overall response in the model e.g. y from y = x0 + B1*x1+B2*x2?

Hope my question makes sense…

Cheers,

Tamas

AnalyticsArtistPost authorLet’s take it step by step. The optimization is to minimize the sum of squared residuals between ‘y’ and ‘y-hat’. The optimization parameters are the variables’ coefficients and the adstock rates. So, basically, you are trying to get a ‘y-hat’ that is as close as possible to ‘y’.

y-hat = x0 + B1*adstock(x1, r1) + B2*adstock(x2, r2)

The contribution for x1 would be B1*adstock(x1, r1)

Hope that clears it.

Tamas GantnerHi Gabriel,

Thanks very much for your response.

So if I understand correctly, once we have y_hat, there is no need to run multiple regression as we already have worked out the optimal parameters that minimise the squared residuals.

I’ve tried doing this but the more independent variables I have, I find it harder and harder to make the algorithm work. So now I’m thinking I should maybe break this down into several steps and feed in 1-3 variables or so at a time, work out the optimal parameters for these and then add them in as constants to the equation and move on to the next 1-3 independent variables. But this approach then might necessitate that you revisit the parameters of those independent variables that you had already worked out as those might not be optimal anymore once new independent variables are added in to the equation.

Do you have any other suggestions that might work with a larger number of independent variables (i.e. >30)?

Cheers,

Tamas

Hindol GangulyHello Gabriel,

i am a follower of your blog and new in retail analytics. i have gone through your posts

regarding marketing mix modelling & adstock. they are fantastic posts and in R using the nlsLM function i am able in finding the optimal adstock rate. i have a very basic question which may have a trivial answer, but i am curious to know how with the sales figures, ad1 & ad2 variable figures are related? i mean for sales figure 37 i get the ad1 figure 6, or for sales 110 we have ad1 figure 20, what do the numerical figures actually mean? are the figures represent monetary contributions?

please give me a brief overview. thanks for your time.

RainThank you very much for posting these wonderful articles and I love them!

This post gives an example on how to find the optimum adstock rate given sales and advertising variables. If the advertisement has been done through multiple channel (ie. Online, Mobile, TV), that means the sales number has been impacted by all 3 channels. Can I still estimate the adstock rate for each channel individually with the sales number and the marketing expenses in each channel?

Another question is, what if the results shows that the adstock rate is not significant. What does that imply?

Sray AgarwalI am trying to find lag effect using negative exponential decay in R. My questions are:

How to find out the optimal Decay Rate, that is the rate of decay? Will nlsLM function in R be helpful (from package minpack.lm), and if yes how? also, How to code this in R?

I would appreciate an answer with example code.

Ayush BiyaniThanks for the good work. We all benefit from this. Here is a question. what scenarios would you think can I get an adstock of 1. yes, perfect 1. I am myself surprised. This literally means that my ad only builds up . right ? what should we do ? any suggestions?

AnalyticsArtistPost authorYes it builds up but decays too.

FJ RodGreat post and blog.

Does this tool have any restrictions when used to derive adstock rates for categories with a long purchase cycle and threshold effects such as inertia that are not captured by adstock? Say vehicles.

JRReally interesting post thank you. But this is time series data, and you should really have something in the model to specify the non-independence of data points (i.e. points closer together in time are likely to be similar) – otherwise any standard errors / p-values that you get from your model are going to be incorrect. Any thoughts on how to do this?

AnalyticsArtistPost authorYou can include y_(t-1) in each model. Thanks for pointing that out, though. Have you experimented with other forms?