Using Noisy Data to Interpret Campaign Performance in R
Defining the Challenge
Let’s face it. Seasonal, trend, and general random data are often hard to decipher using marketing tools that are currently available online. I’ve found that off-the-shelf tools aren’t effective at decomposing this data into useful information and I often find myself manually creating a model to provide this insight.
Below is one of our approaches on how to generate insight when understanding is limited, with an example of how to isolate noisy data to visualise prominent activity in a dataset. I will extract noisy data by removing seasonal and trend information, helping us see variations in performance. I’ve chosen to use the statistical analysis language, R, to develop the model in my example.
If you work with analytics, you’ll know that attribution is problematic when running multiple marketing campaigns. It’s hard to determine the value of each source, particularly when utilising non-digital marketing channels. This goes beyond selecting the right model of attribution, such as linear or last-click.
Reviewing Google’s Academy for Ads ‘Ecosystem of Touchpoints’, there are channels which still remain ‘non-digital’. ‘Pure digital’ channels can be easily tracked with analytics, whilst channels that can have digital components output data that’s difficult to link. This means we get several channels that we don’t receive good behavioural data from. This may be of concern when trying to quantify the value of a TV ad, for example.
Developing the Right Approach
There are strategies that can be implemented to make your life easier when tracking non-digital channels, including:
- Tracking your website store locator and other direct offline dimensions.
- Using unique phone numbers on the website to identify conversions.
- Using unique voucher codes to track at the point of conversion.
- Joining online and offline data through the use of identifiers.
- Asking people when they convert “How did you find us?” (TV/Radio/Online Ad).
- Experimenting with controlled advertising to gauge impact.
- Researching who your consumers are and learning how you can target them.
The above tactics aren’t by any means a solution to attribution. You will face issues such as not having outlined the correct policies to capture personal information, or you simply may not have the budget available to implement some of these tactics.
An Alternative Method in R
An alternative method to acquiring these insights would be to carry out time series analysis. This approach will decompose traffic in to trend, seasonality, and random noise, enabling us to investigate further.
I’ve generated some random data which we’ll interrogate as website sessions. Dependant on your industry, product, and behaviour, you’d tend to see these types of trends.
I’ve stored my sessions data in a CSV called webData, containing columns Date (as sequential serial numbers) and website sessions by day.
You should also consider using Google Analytics data; there are several packages in R that utilise the reporting API. I’d recommend using RGA or GoogleAnalyticsR. These packages can be used to create automated apps and reporting systems for ongoing reporting. This is helpful when applying similar datasets to your saved model.
I’d first like to define our data as a time series object then remove leap years. As we aren’t directly comparing date ranges, removing leap years won’t affect our results. Note, Date has been converted into a Time index.
webData.ly <- subset(webData, !date %in% leapyears) webData.ts <- ts(webData.ly[, 2], frequency = 365, start = c(2008, 1)) plot(webData.ts, ylab = “Sessions”
Isolating Random Noise
I’ll be using multiplicative decomposition to transform our time series data. The calculation we’re using below represents S as seasonality, T as trend and e as remaining random noise:
I’ll use this model because the observed data shows exponential performance growth and an increasing variation within the webData overtime. I’ll use the decompose() command to dissect this data. If you observe a regular trend in your data with a fixed amount of variation, then an additive model should be applied. You can do this by changing the ‘type’ argument to ‘additive’ in your command. Be sure to take some time observing your data to understand which approach is appropriate.
# Decompose Time Series webData.decompose <- decompose(webData.ts, "multiplicative") plot(webData.decompose)
The above chart dissects trend, seasonality, and random noise from our data. This random noise is data that’s left over from removing seasonality and trend from the observed data. This comes in handy for discovering outliers and interesting variations.
It’s worth documenting changes in strategic approach during the period you’re analysing. For example, paid media budgets could impact seasonal and trend data – making your model inaccurate. Evidence of sine waves in random noise would suggest that hidden patterns may still exist in your data, or that you may need to apply a different method of decomposition.
Analysing the Campaigns
I’ve subset the webData’s random noise so we can identify when campaigns were live within a 50-day period from the campaign start date. This will allow me to directly compare each campaign.
# Subset Campaign Data index <- which(webData$date %in% campaignStart) cData <- data.frame(TV, Press, Radio) cData <- cData %>% gather(key = "campaign", value = "noise") # Create Plot library("ggplot2”) ggplot(cData, aes(x = index, y = noise, col = campaign)) + geom_smooth(se = FALSE, method = loess)
By isolating our results, we can compare the impact of each campaign. TV was clearly the most successful channel, with one day gaining an additional 30% in visits. Although, when the TV campaign ended, users were less likely to return to the website immediately – this could be due to the nature of their products.
We could derive more insights by viewing new and returning sessions to the website. Radio had less of an impact than TV, but we can observe a similar effect. While our press campaigns had a couple of runs, they weren’t as successful at delivering website sessions.
There will be a lot of variables you can account for such as changes in strategy and ad spend, but you should also consider national holidays, weather, news, and events, as these situational events could change consumer behaviour. On top of this, it’s also worth considering how you’ll store this information and understanding how to remove it from your dataset.
These are just a small proportion of the challenges of attribution and what solutions are available to marketers – we use these practices to better understand trends, seasonality, and variation in performance data. These methodologies can be applied to project performance or work out what’s really been going on your past campaigns. It’s pretty technical, but if you’d like to learn more, please get in touch or pop us a tweet @Homeagencyuk.