Adstock Rate – Deriving with Analytical Methods

In a previous article I explained the concept & formula behind advertising adstock.  This article focuses on how to analytically derive the adstock rate.  I’ll also compare and contrast this method to assumption based methods followed by some consulting companies in the Marketing Mix space.

First, I start with a little story.  A marketing manager tries to create a model that can predict sales based on different advertising levels.  He or she asks a statistician to test a particular adstock rate for the advertising variable in the model.  The results comes back and it is either insignificant or unsatisfactory.  The marketing manager then asks the statistician to try a different adstock level.  This process repeats itself till there is something of value and makes sense.

This process however isn’t only inefficient, it’s is also wrong.  The manager is “assuming” an adstock rate.  What makes this even worse is some consulting companies in the Marketing Mix space repeat this mistake on a massive scale.  They use expensive computing power on servers and clusters to test every potential adstock rate.

The best approach is to analytically derive the adstock rate based on the data. The field of operations research, or in particular, mathematical optimization, can lend a hand.  We can set up an optimization program in Excel or any other programming language to derive the optimal adstock rate.

Here are simple instructions to follow:

  1. Start with the data
    Get sales, i.e. actual, and the advertising variable(s)
  2. Set up regression
    Predicted Sales = α + β * adstock(Advertising)
    α and β are linear regression parameters
    The adstock function “adstock(Advertising)” is defined as At = Xt + adstock rate * At-1
  3. Setup & run optimization
    Minimize the sum of squared errors for regression formula by changing the adstock rate
    Mathematically speaking
    Objective function: Minimize Σ (Actual – Predicted)2
    Subject to: 0 <= adstock rate < 1

The workbook here has a complete mathematical setup to calculate the optimum adstock rate as described above.  I use the LINEST function to simplify the least square minimization and regression formula.  You’ll need to have Solver Add-in installed with Excel.  Hint: It is already installed; you just need to load it – Instructions on loading Solver Add-in.

This approach has two benefits: (1) Fast: There are no iteration is necessary and (2) Accurate: Adstock rate will be defined to the most significant decimal point.  Assumption based rates as a comparison will often stop at 28% while actual adstock rate is 28.54783%.  To credit consulting companies, though, an adstock rate of 28% is “good enough”.

It is important to note that the optimum adstock rate can lead to negative correlation with sales data and hence lead to the interpretation that advertising negatively affects sales.  The statistician in this case has to either test different adstock rates or apply constrained optimization.  Sometimes, however, advertising can lead to negative effects in the case of over saturated market.

Update: The R approach is to use the nls() function with the same set up as above.

# Define Adstock Function
adstock <- function(x, rate=0){
 return(as.numeric(filter(x=x, filter=rate, method="recursive")))

# Run Optimization
modFit <- nls(sales~b0+b1*adstock(ad, rate), 
              start=c(b0=1, b1=1, rate=0))

Note that the R code above uses the Gauss-Newton algorithm which doesn’t take into account any constraints and hence the rate might not be kosherly within 0-1 bounds.  If you want to set up constraints then you can put a penalty in the formula function or use the Port algorithm.  However, nls documentation mentions that the Port algorithm appears unfinished so use with caution.  Alternatively, you can use the nlsLM() function in the “minpack.lm” package.  That function uses Levenberg-Marquardt and it does work with lower and upper bounds.


45 thoughts on “Adstock Rate – Deriving with Analytical Methods

  1. JB

    Hey! I know it’s asking a lot, but would you pleeeease give a equivalent of this optimisation in SAS code? I spent the evening looking around the proc optmodel, but I didn’t reach any solution… any idea??
    And THANKS A LOT for your articles: they helped me A LOT !

      1. Keith C.

        Hi Gabriel, Did you ever have any luck with the R code for deriving the Adstock Rate? Wondering if that will run faster, as the Excel version is bogging down my machine. And thanks for all of this!! It’s great stuff!!

    1. Harry

      Hi Gabriel,

      This is a fantastic blog and one of the best MMM resources I’ve found on the web!

      Is it possible to use the nlsLM optimisation package with a negative exponential function? I’ve tried with the following R code (a variation of yours!), however the results keep setting all of my variables to the upper bounds set within the nlsM function: My code is as follows:


      # this function does the adstocks
      adstock_fun <- function(x, lambda, init=0) {
      stats::filter(x, lambda, method="recursive", init=init, sides=1)

      # this function does diminishing returns
      dim_ret_func <- function(x,b) {

      #decay function
      decay_func <- function(x,lambda,b){

      #EXAMPLE TEST#####

      # Set seed for reproducibility

      # Create empty shell advertising for 104 weeks (2 years)
      ad1 <- rep(0, 104)
      ad2 <- rep(0, 104)

      # Fill advertising with data
      ad1[3:64] <- runif(64-3+1, 0, 200)
      ad2[53:93] <- runif(93-53+1, 0, 200)

      # Create sales variable
      sales <- 1000 + 30 * decay_func(ad1, 0.1, 0.7) + 20 * decay_func(ad2, 0.1, 0.5)

      # Create noise
      sales <- sales + rnorm(104, mean = 0, sd = 3)

      # Solve with nlsLM
      solve solve
      Nonlinear regression model
      model: sales ~ c0 + c1 * decay_func(ad1, a1, d1) + c2 * decay_func(ad2, a2, d2)
      data: parent.frame()
      c0 c1 c2 a1 a2 d1 d2
      998.41 20.49 15.46 0.70 0.70 0.90 0.90
      residual sum-of-squares: 3119

      Number of iterations to convergence: 15
      Achieved convergence tolerance: 1.49e-08

      All ad stocks and denominators are set to the maximums specified in the upper argument. Should this be happening?

      Finally if I apply the output of the above algorithm to a linear regression model, should the coefficients be the same?

      #LM CHECK#####

      check <- lm( sales ~ c0 + decay_func(ad1, a1, d1) + decay_func(ad2, a2, d2))

      Should c0, c1 and c2 from check be the same as in the solve?



  2. gaurav

    Thanks for sharing this. I had a question regarding computing the optimum adstock rate.. I looked at your workbook for “Adstock Rate – Deriving with Analytical Method”. If we’re dealing with multiple advertising variables at the same time, does this optimization method still work??


      1. gaurav

        Thanks.. So, say we have 2 simultaneous ad campaigns running for the product in question. Then using your excel sheet for optimizing the adstock rate, we’ll essentially need to consider 2 adstock rates for the 2 ads, and set up the regression equation such that we have sales on the left side and both advertising variables (in their adstock form) on the RHS and then use solver to optimize the 2 adstock rates together by minimizing the SSE of the regression?? Thanks again…

      2. AnalyticsArtist Post author

        That is the basic idea. However, there are two things to consider:

        1. Each variable adstock’s rate should be constrained between 0-1.
        2. You need to incorporate trend or seasonality. The trend is particularly important because the adstock rates tend to be inflated in models where there is an obvious trend but no such variable was selected for modeling. Look at for a reference.
  3. Allen

    Hi Gabriel
    As you suggested, this nlsLM() is better than nls() because it works with lower and upper bounds of the parameters. So I tried installing the “minpack.lm” package, everything works well until I starting to put values in my lower and upper bounds, it produce unreasonable results and I really have no idea where went wrong. Could you show an example using these lower/upper please? Many thanks


    1. AnalyticsArtist Post author

      Hi Allen,
      Here is an example:

      # Set seed for reproducibility
      # Create empty shell advertising for 104 weeks (2 years)
      ad1 <- rep(0, 104)
      ad2 <- rep(0, 104)
      # Fill advertising with data
      ad1[3:64]  <- runif(64-3+1, 0, 200)
      ad2[53:93] <- runif(93-53+1, 0, 200)
      # Define adstock function
      adstock <- function(x, rate=0){
          return(as.numeric(filter(x=x, filter=rate, method="recursive")))
      # Create sales variable
      sales <- 1000 + 30 * adstock(ad1, .3) + 20 * adstock(ad2, .7)
      # Create noise
      sales <- sales + rnorm(104, mean = 0, sd = 3)
      # Solve with nlsLM
      nlsLM(sales~b0 + b1 * adstock(ad1, r1) + b2 * adstock(ad2, r2), 
            algorithm = "LM",
            start     = c(b0=   1, b1=   1, b2=   1, r1=0, r2=0),
            lower     = c(b0=-Inf, b1=-Inf, b2=-Inf, r1=0, r2=0),
            upper     = c(b0= Inf, b1= Inf, b2= Inf, r1=1, r2=1))
      ## Nonlinear regression model
      ##   model: sales ~ b0 + b1 * adstock(ad1, r1) + b2 * adstock(ad2, r2)
      ##    data: parent.frame()
      ##      b0      b1      b2      r1      r2
      ## 1000.45   29.99   20.00    0.30    0.70
      ##  residual sum-of-squares: 943
      ## Number of iterations to convergence: 13 
      ## Achieved convergence tolerance: 1.49e-08
      1. Allen

        That’s fantastic! Thank you so much!!~~~~~~~ now it works really well. I booked marked your page, will check regularly for new blogs : )

      2. Jimena Zumaeta

        Thank you for this entry! It was really helpful. However, I am curious as to what happens with the b1 and b2 estimated.
        I thought that when I re-fitted my model with my optimal r1 and r2, I would get the same betas, but that is not the case.
        I am fitting several variables into a model, among them there are 2 advertising ones (TV and Radio GRP). I used this model to optimise the alphas with nlsLM.

        Therefore, I am wondering if I should just ignore the betas that nlsLM found.

        Thank you!

      3. philoinme

        Hello there Analytics Artist,

        This blog is really useful. And it is an equivalent to going through many research papers and trying out stuff which works out.

        Question about Adstock optimization:

        I understood your way but if we talk about the ideal way, would it be to fit all the variables along with adstock variables of the respective channels and then use the nls() package to find the optimal parameters?

        My question is like a follow up question to what Allen posted.


      4. AnalyticsArtist Post author

        Hi Akshay, Yes, I do that. There are multiple ways. I first try each variable independently. I can then build multiple models with various variables. I’ll eventually get a consistent adstock to use.

      5. philoinme

        But as you consistently add on new variables, what is the use of building models with only one variable?

        The parameters and respective coefficients change anyways.


  4. Sray Agarwal

    Hi, I am exploring MMM and have applied all the ideas you suggested, however my most of the adstocks are insignificant. Can we connect over email or skype and explore the dataset….

    1. AnalyticsArtist Post author

      I’m happy to help you. Please use the contact page to contact me and we can take this offline. However I can also answer questions on the blog comments so other people can benefit.

  5. Jeff

    This is a great blog- please keep adding content!

    I am curious about the use of NLS to optimize the adstock parameters. Isn’t this a recipe for overfitting? It seems like using a grid search (or similar) with cross validation would be better in order to generalize – which I think is what you want in a MMM?

    1. AnalyticsArtist Post author

      I agree with you that cross validation is better. However, a grid search wouldn’t work. Here is why:

      1. Grid search will only look for adstock between 0-100% by increments of 1%, e.g. the adstock search space for each variable is [0%, 1%, 2%, 3%, …, 100%]. So if the adstock is actually 33.33%, grid search will only find the closest.
      2. Grid search is an exhaustive search. So if you have 10 variables with 100 different values of adstock for each of these variables to search for you would have to do 100^10 = 100,000,000,000,000,000,000 searches. This is impractical with current computing power.
      3. Most solutions I encountered don’t use 1% increments; they use 5% increments. Hence, the search space for each variable is [0%, 5%, 10%, 15%, …, 95%]. 10 variables grid search means 20^10 = 10,240,000,000,000 searches. Again, still impractical.
      4. Can you imagine a solution that uses grid search to find coefficients? The point is coefficients aren’t very different from adstock or other transformation but we don’t use grid search to find them.

      Now, how to address cross validation? One quick way is to subset the data into two sets according to time. Run the optimization on the first set and validate on the second. I’ll address more complex validation techniques in a different post.

      1. Jeff

        Good stuff.

        There are cross validation techniques for time series, maybe something along those lines. Ala Rob Hyndman posts on the topic.

        RE grid search, not sure if I agree. I mean a lot of meta parameters in machine learning are over the real numbers. Think of learning rates for NNs or kernel parameters for SVMs.

        Short of brute strength, something like randomized search could be useful.

        My 2 cents, It just struck me that it might be better to optimize for these parameters within the context of the whole model – with control variables, trend etc. include.

  6. Angela

    Hi, I tried the nls method you posted in the article and get a rate of -0.921620, since you said we need to control it between 0 and 1, so I used the nlsLM function in minpack.lm package and set the boundary, but the rate is 1.0 when calculate it with this way. What does it mean? Many thanks.

  7. Alex

    Hi Gabriel, when you estimate the adstock rate you use the simple linear regression. But in the blogs later you also indicated the relationship could be non-linear. So do you think it’s more accurate to use some non-linear regression to estimate the adstock rate? Like log-log function (log_sales = a+b*log(Adstock)).



    1. AnalyticsArtist Post author

      I’ve used this approach before. I wouldn’t call it more accurate though. It’s just another model formulation. Nielsen, the marketing company, popularized this approach. It’s called the ScanPro model. It would be great if you can point me to how this model is interpreted though. Like, what is the effect on sales in a log-log model?

      1. Tamas Gantner

        Hi Gabriel

        I’ve got a question regarding how you propose to optimise the response curve/decay parameters. It is clear that you would be using a package such as nls, nls2 or nlsLM but what sort of y variable do you use in the formula as response? We are trying to optimise the parameters so we can find the contribution of a variable x but we can’t find the contribution until we’ve optimised it… This seems to be a vicious circle to me. Or are you proposing just to use the overall response in the model e.g. y from y = x0 + B1*x1+B2*x2?

        Hope my question makes sense…



      2. AnalyticsArtist Post author

        Let’s take it step by step. The optimization is to minimize the sum of squared residuals between ‘y’ and ‘y-hat’. The optimization parameters are the variables’ coefficients and the adstock rates. So, basically, you are trying to get a ‘y-hat’ that is as close as possible to ‘y’.

        y-hat = x0 + B1*adstock(x1, r1) + B2*adstock(x2, r2)

        The contribution for x1 would be B1*adstock(x1, r1)

        Hope that clears it.

      3. Tamas Gantner

        Hi Gabriel,

        Thanks very much for your response.

        So if I understand correctly, once we have y_hat, there is no need to run multiple regression as we already have worked out the optimal parameters that minimise the squared residuals.

        I’ve tried doing this but the more independent variables I have, I find it harder and harder to make the algorithm work. So now I’m thinking I should maybe break this down into several steps and feed in 1-3 variables or so at a time, work out the optimal parameters for these and then add them in as constants to the equation and move on to the next 1-3 independent variables. But this approach then might necessitate that you revisit the parameters of those independent variables that you had already worked out as those might not be optimal anymore once new independent variables are added in to the equation.

        Do you have any other suggestions that might work with a larger number of independent variables (i.e. >30)?



  8. Hindol Ganguly

    Hello Gabriel,

    i am a follower of your blog and new in retail analytics. i have gone through your posts
    regarding marketing mix modelling & adstock. they are fantastic posts and in R using the nlsLM function i am able in finding the optimal adstock rate. i have a very basic question which may have a trivial answer, but i am curious to know how with the sales figures, ad1 & ad2 variable figures are related? i mean for sales figure 37 i get the ad1 figure 6, or for sales 110 we have ad1 figure 20, what do the numerical figures actually mean? are the figures represent monetary contributions?

    please give me a brief overview. thanks for your time.

  9. Rain

    Thank you very much for posting these wonderful articles and I love them!

    This post gives an example on how to find the optimum adstock rate given sales and advertising variables. If the advertisement has been done through multiple channel (ie. Online, Mobile, TV), that means the sales number has been impacted by all 3 channels. Can I still estimate the adstock rate for each channel individually with the sales number and the marketing expenses in each channel?

    Another question is, what if the results shows that the adstock rate is not significant. What does that imply?

  10. Sray Agarwal

    I am trying to find lag effect using negative exponential decay in R. My questions are:

    How to find out the optimal Decay Rate, that is the rate of decay? Will nlsLM function in R be helpful (from package minpack.lm), and if yes how? also, How to code this in R?

    I would appreciate an answer with example code.

  11. Ayush Biyani

    Thanks for the good work. We all benefit from this. Here is a question. what scenarios would you think can I get an adstock of 1. yes, perfect 1. I am myself surprised. This literally means that my ad only builds up . right ? what should we do ? any suggestions?

  12. FJ Rod

    Great post and blog.
    Does this tool have any restrictions when used to derive adstock rates for categories with a long purchase cycle and threshold effects such as inertia that are not captured by adstock? Say vehicles.

  13. JR

    Really interesting post thank you. But this is time series data, and you should really have something in the model to specify the non-independence of data points (i.e. points closer together in time are likely to be similar) – otherwise any standard errors / p-values that you get from your model are going to be incorrect. Any thoughts on how to do this?


Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s