Advertising Diminishing Returns & Saturation

In this post we take the basic model developed in Marketing Mix Modeling Explained – with R and add the non-linear effect of advertising. I distinguish between advertising and advertising returns. I call the later sales for ease of readability.

Advertising non-linearity beyond the adstock/carry-over idea comes from two concepts: (1) diminishing returns and (2) saturation. Diminishing returns means that advertising exhibits non-constant and a decreasing marginal return to scale. For example, sales from $200 of advertising are less than twice the sales of $100 of advertising. A subsequent part of diminishing returns is the saturation effect where sales reach a limit after which more advertising has near zero incremental effect.

Diminishing Returns Response Curves
There is a number of functions used to model this non-linear advertising – sales relationship. I present few that are the most popular. I also specify the parameters ranges usually used with each function where applicable. But like they say, a picture is worth a 1,000 words.

  1. Power Functionαxβ ; 0 < β ≤ 1
    The power function has a the nice property that when β = 1 the function becomes linear. β = 1, of course, means that there is no diminishing returns within the observed data ranges. This function will never saturate as lim y = ∞ as x reaches ∞. This could be unreasonable for testing marketing outside of the observed ranges of modeled data. Aside from it’s use in advertising, the power function is also used to model price variable when β < 0.
  2. Michaelis-Menten Function(αx) / (1 + β⋅x) β ≥ 0
    The Michaelis-Menten function has a similar property to the power function as it becomes linear but when β = 0. This function has the added bonus of reaching a sales saturation of β/α.
  3. Negative Exponential Function= α⋅(1 − eβ⋅x) ; β > 0
    This function is called the negative exponential function due the −β portion. It is also referred to as the 2-parameter asymptotic exponential. Maximum sales attained by this modeling form, i.e. saturation, is α.

There are three important questions to ask:

  1. What happens at zero level of advertising?
  2. What happens at very high level of advertising?
  3. What happens between zero and high level of advertising?

Naturally, zero level of advertising should produce no sales effect and higher level of advertising effect should reach an upper limit mathematically called an asymptote. What happens in between is up to a great debate and is the subject of next post but it suffices to say an S curve function is sometimes desired.

The input variable, x, in all of the three functions above can be one of three things (1) advertising, (2) advertising adstock or (3) cumulative advertising for a certain period of time.

Last thing I want to add is that these functions are monotonically increasing, i.e. sales for higher level of advertising units is always greater than sales for lower level of advertising. Mathematically, f(x+ϵ) > f(x). This, of course, means advertising can do no harm, which is a whole different topic on it’s own.

Advertisements

Marketing Mix Modeling Explained – With R

Marketing mix modeling (MMM) is a process used to quantify the effects of different advertising mediums, i.e. media. It is also used to optimize spend budget over these different mediums. The popular method of choice is multiple regression analysis. The model also takes into account other variables such as pricing, distribution points and competitor tactics. This article will explain the mathematics behind MMM by starting with a simple model then adding complexities. I’ll also incorporate R code so you can immediately reproduce the results.

Start Simple:
Let’s assume there is only one advertising variable that affects sales. This simple model is usually defined as:

Sales = Base + b1·Advertising

There are two aspect to this model: (1) It is linear and (2) The Base is a constant. This is OK for now as we’ll add more complexity later. However, I can quickly tell you that Base can include other variables to make it non-constant. The non-linearity part will be introduced in a future blog post.

A sample R code can be:

sales <- c(37, 89, 82, 58, 110, 77, 103, 78, 95, 106, 98, 96, 68, 96, 157, 198, 145, 132, 96, 135)
ad <- c(6, 27, 0, 0, 20, 0, 20, 0, 0, 18, 9, 0, 0, 0, 13, 25, 0, 15, 0, 0)

modFit.0 <- lm(sales~ad)
summary(modFit.0)

This model has an R2 of 0.184 so there is much work to be done.

Complexity 1: The Adstock Case
The model above assumes that advertising in weekt will only affect sales that same week. This is wrong and will cause the advertising effect to be under-valued. Simply put, past ads can (and usually do) affect present and future sales. This multi effect aspect of advertising can be controlled for with adstock transformation, which I covered in a previous blog post.

Our model now becomes:

Sales = Base + b1·f(Advertising|α)

where f() is a the adstock transformation function for the Advertising variable given an adstock of α. Other functional forms besides adstock can be incorporated here as well. Also notice how the order of observations matter for adstocking to take place.

With an adstock rate of 50% the R code is:

sales <- c(37, 89, 82, 58, 110, 77, 103, 78, 95, 106, 98, 96, 68, 96, 157, 198, 145, 132, 96, 135)
ad <- c(6, 27, 0, 0, 20, 0, 20, 0, 0, 18, 9, 0, 0, 0, 13, 25, 0, 15, 0, 0)

ad.adstock <- as.numeric(filter(x=ad, filter=.50, method="recursive"))

modFit.1 <- lm(sales~ad.adstock)
summary(modFit.1)

Notice how we improved R2 from 0.184 to 0.252.

Complexity 2: More Advertising Variables
It should be clear by now that I have been using advertising mediums and advertising variables interchangeability. From modeling prospective Advertising can be a paid media channel like TV, radio or banner ads, a non-paid media variable like social impressions or word-of-mouth, or a marketing campaign. When adding more variables, however, their unit of measure need not be the same. Many measures can be used including TRPs, GRPs, impressions or spend. I listed them in order of preference when available. Regardless of unit of measure in a statistical model they are all called advertising variables and our model formulation becomes:

Sales = Base + ∑i=1 bi·f(Advertisingii)

where f() is a the adstock transformation function for Advertisingi with an adstock of αi, i.e. each advertising variable has it’s own alpha rate.

The R code for two advertising variables with adstock rates of 30% is:

sales <- c(37, 89, 82, 58, 110, 77, 103, 78, 95, 106, 98, 96, 68, 96, 157, 198, 145, 132, 96, 135)
ad1 <- c(6, 27, 0, 0, 20, 0, 20, 0, 0, 18, 9, 0, 0, 0, 13, 25, 0, 15, 0, 0)
ad2 <- c(3, 0, 4, 0, 5, 0, 0, 0, 8, 0, 0, 5, 0, 11, 16, 11, 5, 0, 0, 15)

ad1.adstock <- as.numeric(filter(x=ad1, filter=.3, method="recursive"))
ad2.adstock <- as.numeric(filter(x=ad2, filter=.3, method="recursive"))

modFit2 <- lm(sales~ad1.adstock+ad2.adstock)
summary(modFit2)

Now, our model is even stronger with R2 of 0.769.

Complexity 3: Changing Base & Other Variables
So far we assumed the Base to be a constant, i.e. an intercept. I often get asked the question of how to make the Base non-constant. The simple answer is Base includes more than just the intercept. If you notice an increasing trend in Sales then part of modeling is to create a trend variable. This trend variable gets added to the base. Seasonal variables also sometimes get added to the Base. Finally, there is the idea of distribution points.

Distribution points accounts for the number of outlets (stores or online) that the product in question is being sold at. If a retailer, for example, doubles their stores then we would assume their sales would increase not due to marketing but simply to number of stores available. Marketing plays a role, of course, but I think you get the point.

Finally, pricing & promotions are of prime importance. They too are variables to add to the model. However, these variables aren’t part of the base. Due to their complexity I’ll leave their discussion to a future blog post.

Hence, our current model is now of the form:

Sales = a0 + a1·Trend + a2·Distribution + ∑i=1 bi·f(Advertisingii)

sales <- c(37, 89, 82, 58, 110, 77, 103, 78, 95, 106, 98, 96, 68, 96, 157, 198, 145, 132, 96, 135)
ad1 <- c(6, 27, 0, 0, 20, 0, 20, 0, 0, 18, 9, 0, 0, 0, 13, 25, 0, 15, 0, 0)
ad2 <- c(3, 0, 4, 0, 5, 0, 0, 0, 8, 0, 0, 5, 0, 11, 16, 11, 5, 0, 0, 15)
trend <- 1:20

ad1.adstock <- as.numeric(filter(x=ad1, filter=.3, method="recursive"))
ad2.adstock <- as.numeric(filter(x=ad2, filter=.3, method="recursive"))

modFit.3 <- lm(sales~trend+ad1.adstock+ad2.adstock)
summary(modFit.3)

Our final model’s R2 is 0.940.

Business Implications & Contributions
Aside from the statistical fit of our model clients always ask about the business implication. This is usually referred to as sales lift or uplift due to marketing. a.k.a. the contribution. The contribution in our model is the product of adstocked advertising & the it’s coefficient.

Contributioni = bi·f(Advertisingii)

Final Remarks & a Challenge:
You can see now that Marketing Mix Modeling is a business term for regression analysis on transformed variables. Any decent data scientist or statistition can do the job. However, it is important to note that the mix in Marketing Mix refers to the different mediums, media, campaigns or variables and their effects on sales. This is in contrast to mixed effects models, which measure the effect of one variable on many different levels, like DMA level modeling as an example. Mixed effect models can be used instead of multiple regression analysis when dealing with multiple geographies, like DMA’s, but the mixed terms refer to different things and I thought to call out.

The challenge that faces all statistical analyses is data as it is 80% of the work. While that can be taken care of by data personnel, there is still one challenge that escapes many. What adstock rate to give to each advertising variable? This is harder than it sounds and it goes beyond basic statistics. Modelers don’t only have to worry about a particular adstock being statistically valid, but they also have to choose among different adstock rates with different contributions, and all of which are statistically valid as well. One reason for this is that the ultimate consumer MMM results is a human. The model that makes the “most sense” – however that is defined – can trump the most accurate model. HBR has a good article about this problem. My recommendation for such scenarios is to track the model’s fit statistics at each decision points in the modeling process. The modeler or data scientist can then show the decision maker that choosing a higher contribution will make R2 drop from 90% to 70% and leave the final decision to the business users.

Advertising Adstock with Maximum Period Decay

Current advertising adstock transformations in all their forms assume an infinite decay function. This means 1 week of advertising can have an impact 100 weeks for initial airing. This is unrealistic.  In this article I’ll discuss a variation on advertising adstock called maximum period decay effect.

A typical decay factor for advertising adstock looks like the figure on the right. Typical Advertising AdstockThe graph will lead you to believe that after week 10 the adstock values are small, close to zero and can be treated as zeros. WRONG! These values are, indeed, close to zero but they aren’t zero. A human can ignore them but a computer won’t. In a regression analysis framework this will also causes a multicollinearity problems with multiple sequential variables as the adstock will continue after advertising is over and will function as two identical decreasing trend variables.

In my previous article on Advertising Adstock – Concept & Formula, advertising adstock was defined as
At = Xt + r·At-1 ; where r is the advertising adstock rate.

Mathematically speaking this formula can be rewritten as
At = Xt + r·Xt-1 + r2·Xt-2 + r3·Xt-3 + … + rn·Xt-n ; where n is the maximum number of weeks available, i.e. n = t.

The mathematical short-hand notation is
At = Σ ri·Xt-i ; 0 ≤ i ≤ maximum number of weeks available.

I redefine the maximum number of weeks available to be the maximum periods that week t will learn from the previous n weeks. So if we define n as 5, then the current advertising adstock will learn from the previous 5 weeks. This has many benefits as we limit the total effect of advertising. Coupons for example, aren’t expected to have unlimited decay. We also remove that ever-decaying trend factor and solve the multicollinearity problem for sequential variables.

The files below show this improvement of advertising adstock transformation with maximum period decay:
Excel
• SAS
• R
• Python

Adstock Rate – Deriving with Analytical Methods

In a previous article I explained the concept & formula behind advertising adstock.  This article focuses on how to analytically derive the adstock rate.  I’ll also compare and contrast this method to assumption based methods followed by some consulting companies in the Marketing Mix space.

First, I start with a little story.  A marketing manager tries to create a model that can predict sales based on different advertising levels.  He or she asks a statistician to test a particular adstock rate for the advertising variable in the model.  The results comes back and it is either insignificant or unsatisfactory.  The marketing manager then asks the statistician to try a different adstock level.  This process repeats itself till there is something of value and makes sense.

This process however isn’t only inefficient, it’s is also wrong.  The manager is “assuming” an adstock rate.  What makes this even worse is some consulting companies in the Marketing Mix space repeat this mistake on a massive scale.  They use expensive computing power on servers and clusters to test every potential adstock rate.

The best approach is to analytically derive the adstock rate based on the data. The field of operations research, or in particular, mathematical optimization, can lend a hand.  We can set up an optimization program in Excel or any other programming language to derive the optimal adstock rate.

Here are simple instructions to follow:

  1. Start with the data
    Get sales, i.e. actual, and the advertising variable(s)
  2. Set up regression
    Predicted Sales = α + β * adstock(Advertising)
    α and β are linear regression parameters
    The adstock function “adstock(Advertising)” is defined as At = Xt + adstock rate * At-1
  3. Setup & run optimization
    Minimize the sum of squared errors for regression formula by changing the adstock rate
    Mathematically speaking
    Objective function: Minimize Σ (Actual – Predicted)2
    Subject to: 0 <= adstock rate < 1

The workbook here has a complete mathematical setup to calculate the optimum adstock rate as described above.  I use the LINEST function to simplify the least square minimization and regression formula.  You’ll need to have Solver Add-in installed with Excel.  Hint: It is already installed; you just need to load it – Instructions on loading Solver Add-in.

This approach has two benefits: (1) Fast: There are no iteration is necessary and (2) Accurate: Adstock rate will be defined to the most significant decimal point.  Assumption based rates as a comparison will often stop at 28% while actual adstock rate is 28.54783%.  To credit consulting companies, though, an adstock rate of 28% is “good enough”.

It is important to note that the optimum adstock rate can lead to negative correlation with sales data and hence lead to the interpretation that advertising negatively affects sales.  The statistician in this case has to either test different adstock rates or apply constrained optimization.  Sometimes, however, advertising can lead to negative effects in the case of over saturated market.

Update: The R approach is to use the nls() function with the same set up as above.

# Define Adstock Function
adstock <- function(x, rate=0){
 return(as.numeric(filter(x=x, filter=rate, method="recursive")))
}

# Run Optimization
modFit <- nls(sales~b0+b1*adstock(ad, rate), 
              start=c(b0=1, b1=1, rate=0))
summary(modFit)

Note that the R code above uses the Gauss-Newton algorithm which doesn’t take into account any constraints and hence the rate might not be kosherly within 0-1 bounds.  If you want to set up constraints then you can put a penalty in the formula function or use the Port algorithm.  However, nls documentation mentions that the Port algorithm appears unfinished so use with caution.  Alternatively, you can use the nlsLM() function in the “minpack.lm” package.  That function uses Levenberg-Marquardt and it does work with lower and upper bounds.

Advertising Adstock – Concept & Formula

Adstock Example

Advertising adstock is a term used to measure the memory effect of advertising carried over from start of advertising.  For example, if a company advertises at a certain level in week 1, week 2 will have a portion of week 1 level. Week 3, in turn, will have a portion of week 2 level.  In other words, adstock is a percentage term that measures the decaying effect of advertising throughout the weeks.

The term that comes up often in response models where we try to measure the effect of advertising on sales or on purchase intent.  The models are usually regression based but are often published under names like Marketing Mix Models (MMM), Marketing Mix Optimization (MMO), Network-Effects and Hierarchical models.

The theory behind adstock is that marketing exposures build awareness in consumers’ minds.  That awareness doesn’t disappear right after the consumers see the ad but rather remains in their memory.  Memory decays over the weeks and hence the decay portion of adstock.

The formula for advertising adstock is At = Xt + adstock rate * At-1.

The files below show a simple implementation of advertising adstock transformation:
• Excel Adstock Transformation
• SAS Adstock Transformation
• R Adstock Transformation
• Python Adstock Transformation – coming soon