Intro Examples to grizbayr

Usage

Select which piece of information you would like to calculate.

Metric	Function Call
All Below Metrics	`calculate_all_metrics()`
Win Probability	`estimate_win_prob()`
Value Remaining	`estimate_value_remaining()`
Lift vs. Control	`estimate_lift_vs_baseline()`
Win Probability vs. Baseline	`estimate_win_prob_vs_baseline()`

If you would like to calculate all the metrics then use calculate_all_metrics(). This is a slightly more efficient implementation since it only needs to sample from the posterior once for all 4 calculations instead of once for each metric.

Create an Input Dataframe or Tibble

All of these functions require a very specific tibble format. However, the same tibble can be used in all metric calculations. A tibble is used here because it has the additional check that all column lengths are the same. A tibble of this format can also conveniently be created using dplyr’s group_by() %>% summarise() sequence of functions.

The columns in the following table are required if there is an X in the box for the distribution. (Int columns can also be dbl due to R coercian)

Distribution Type	option_name (char)	sum_impressions (int)	sum_clicks (int)	sum_sessions (int)	sum_conversions (dbl)	sum_revenue (dbl)	sum_cost (dbl)	sum_conversions_2 (dbl)	sum_revenue_2 (dbl)	sum_duration (dbl)	sum_page_views (int)
Conversion Rate	X		X		X
Response Rate	X			X	X
Click Through Rate (CTR)	X	X	X
Revenue Per Session	X			X	X	X
Multi Revenue Per Session	X			X	X	X		X	X
Cost Per Activation (CPA)	X		X		X		X
Total CM	X	X	X		X	X	X
CM Per Click	X		X		X	X	X
Cost Per Click (CPC)	X		X				X
Session Duration	X			X						X
Page Views Per Session	X			X							X

Example:

We will use the Conversion Rate distribution for this example so we need the columns option_name, sum_clicks, and sum_conversions.

raw_data_long_format <- tibble::tribble(
   ~option_name, ~clicks, ~conversions,
            "A",       6,           3,
            "A",       1,           0,
            "B",       2,           1,
            "A",       2,           0,
            "A",       1,           0,
            "B",       5,           2,
            "A",       1,           0,
            "B",       1,           1,
            "B",       1,           0,
            "A",       3,           1,
            "B",       1,           0,
            "A",       1,           1
)

raw_data_long_format %>% 
  dplyr::group_by(option_name) %>% 
  dplyr::summarise(sum_clicks = sum(clicks), 
                   sum_conversions = sum(conversions))
#> # A tibble: 2 x 3
#>   option_name sum_clicks sum_conversions
#>   <chr>            <dbl>           <dbl>
#> 1 A                   15               5
#> 2 B                   10               4

This input dataframe can also be created manually if the aggregations are already done in an external program.

# Since this is a stochastic process with a random number generator,
# it is worth setting the seed to get consistent results.
set.seed(1776)

input_df <- tibble::tibble(
  option_name = c("A", "B", "C"),
  sum_clicks = c(1000, 1000, 1000),
  sum_conversions = c(100, 120, 110)
)
input_df
#> # A tibble: 3 x 3
#>   option_name sum_clicks sum_conversions
#>   <chr>            <dbl>           <dbl>
#> 1 A                 1000             100
#> 2 B                 1000             120
#> 3 C                 1000             110

One note: clicks or sessions must be greater than or equal to the number of conversions (this is a rate bound between 0 and 1).

input_df is used in the following examples.

Estimate All Metrics

This function wraps all the below functions into one call.

estimate_all_values(input_df, distribution = "conversion_rate", wrt_option_lift = "A")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> $`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.726  72.59%  
#> 2 C                 0.228  22.76%  
#> 3 A                 0.0465 4.65%   
#> 
#> $`Value Remaining`
#>       95% 
#> 0.1328629 
#> 
#> $`Lift vs Baseline`
#>       30% 
#> 0.1205052 
#> 
#> $`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.923  92.25%  
#> 2 A                 0.0775 7.75%

Win Probability

This produces a tibble with all the option names, the win_prob_raw so this can be used as a double, and a cleaned string win_prob where the decimal is represented as a percent.

estimate_win_prob(input_df, distribution = "conversion_rate")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.725  72.47%  
#> 2 C                 0.228  22.82%  
#> 3 A                 0.0471 4.71%

Value Remaining (Loss)

Value Remaining is a measure of loss. If B is selected as the current best option, we can estimate with 95% confidence (default), that an alternative option is not more than X% worse than the current expected best option.

estimate_value_remaining(input_df, distribution = "conversion_rate")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#>       95% 
#> 0.1317859

This number can also be framed in absolute dollar terms (or percentage points in the case of a rate metric).

estimate_value_remaining(input_df, distribution = "conversion_rate", metric = "absolute")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#>        95% 
#> 0.01429526

Estimate Lift

The metric argument defaults to lift which produces a percent lift vs the baseline. Sometimes we may want to understand this lift in absolute terms (especially when samples from the posteriors could be negative, such as Contribution Margin (CM).)

estimate_lift_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#>      30% 
#> 0.120096

estimate_lift_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A", metric = "absolute")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#>        30% 
#> 0.01268253

Win Probability vs. Baseline

This function is used to compare an individual option to the best option as opposed to the win probability of each option overall.

estimate_win_prob_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.924  92.45%  
#> 2 A                 0.0755 7.55%

Sample From the Posterior

Samples can be directly collected from the posterior with the following function.

sample_from_posterior(input_df, distribution = "conversion_rate")
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> # A tibble: 150,000 x 3
#>    option_name samples sample_id
#>    <chr>         <dbl>     <int>
#>  1 A            0.0991         1
#>  2 A            0.109          2
#>  3 A            0.106          3
#>  4 A            0.0923         4
#>  5 A            0.102          5
#>  6 A            0.113          6
#>  7 A            0.103          7
#>  8 A            0.102          8
#>  9 A            0.0875         9
#> 10 A            0.0924        10
#> # … with 149,990 more rows

Looping Over All Distributions

You may want to evaluate the results of a test in multiple different distributions.

(input_df_all <- tibble::tibble(
   option_name = c("A", "B", "C"),
   sum_impressions = c(10000, 9000, 11000),
   sum_sessions = c(1000, 1000, 1000),
   sum_conversions = c(100, 120, 110),
   sum_revenue = c(900, 1200, 1150),
   sum_cost = c(10, 50, 30),
   sum_conversions_2 = c(10, 8, 20),
   sum_revenue_2 = c(10, 16, 15)
) %>% 
  dplyr::mutate(sum_clicks = sum_sessions)) # Clicks are the same as Sessions
#> # A tibble: 3 x 9
#>   option_name sum_impressions sum_sessions sum_conversions sum_revenue sum_cost
#>   <chr>                 <dbl>        <dbl>           <dbl>       <dbl>    <dbl>
#> 1 A                     10000         1000             100         900       10
#> 2 B                      9000         1000             120        1200       50
#> 3 C                     11000         1000             110        1150       30
#> # … with 3 more variables: sum_conversions_2 <dbl>, sum_revenue_2 <dbl>,
#> #   sum_clicks <dbl>

distributions <- c("conversion_rate", "response_rate", "ctr", "rev_per_session", "multi_rev_per_session", "cpa", "total_cm", "cm_per_click", "cpc")

# Purrr map allows us to apply a function to each element of a list. (Similar to a for loop)
purrr::map(distributions,
           ~ estimate_all_values(input_df_all,
                                 distribution = .x,
                                 wrt_option_lift = "A",
                                 metric = "absolute")
)
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> Using default priors.
#> [[1]]
#> [[1]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.724  72.41%  
#> 2 C                 0.229  22.93%  
#> 3 A                 0.0466 4.66%   
#> 
#> [[1]]$`Value Remaining`
#>        95% 
#> 0.01443064 
#> 
#> [[1]]$`Lift vs Baseline`
#>        30% 
#> 0.01252419 
#> 
#> [[1]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.922  92.22%  
#> 2 A                 0.0778 7.78%   
#> 
#> 
#> [[2]]
#> [[2]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.726  72.57%  
#> 2 C                 0.229  22.89%  
#> 3 A                 0.0454 4.54%   
#> 
#> [[2]]$`Value Remaining`
#>        95% 
#> 0.01414997 
#> 
#> [[2]]$`Lift vs Baseline`
#>        30% 
#> 0.01266849 
#> 
#> [[2]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.924  92.39%  
#> 2 A                 0.0761 7.61%   
#> 
#> 
#> [[3]]
#> [[3]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.993  99.34%  
#> 2 A                 0.0066 0.66%   
#> 3 C                 0      0%      
#> 
#> [[3]]$`Value Remaining`
#> 95% 
#>   0 
#> 
#> [[3]]$`Lift vs Baseline`
#>         30% 
#> 0.008780786 
#> 
#> [[3]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.993  99.34%  
#> 2 A                 0.0066 0.66%   
#> 
#> 
#> [[4]]
#> [[4]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.578  57.77%  
#> 2 C                 0.398  39.81%  
#> 3 A                 0.0241 2.41%   
#> 
#> [[4]]$`Value Remaining`
#>       95% 
#> 0.3055693 
#> 
#> [[4]]$`Lift vs Baseline`
#>       30% 
#> 0.1960611 
#> 
#> [[4]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.938  93.8%   
#> 2 A                 0.0620 6.2%    
#> 
#> 
#> [[5]]
#> [[5]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.584  58.4%   
#> 2 C                 0.394  39.38%  
#> 3 A                 0.0222 2.22%   
#> 
#> [[5]]$`Value Remaining`
#>       95% 
#> 0.3012504 
#> 
#> [[5]]$`Lift vs Baseline`
#>       30% 
#> 0.2060909 
#> 
#> [[5]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.944  94.41%  
#> 2 A                 0.0559 5.59%   
#> 
#> 
#> [[6]]
#> [[6]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 A                      1 100%    
#> 2 B                      0 0%      
#> 3 C                      0 0%      
#> 
#> [[6]]$`Value Remaining`
#> 95% 
#>   0 
#> 
#> [[6]]$`Lift vs Baseline`
#> 30% 
#>   0 
#> 
#> [[6]]$`Win Probability vs Baseline`
#> # A tibble: 1 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 A                      1 100%    
#> 
#> 
#> [[7]]
#> [[7]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.534  53.38%  
#> 2 C                 0.429  42.9%   
#> 3 A                 0.0373 3.73%   
#> 
#> [[7]]$`Value Remaining`
#>      95% 
#> 336.4168 
#> 
#> [[7]]$`Lift vs Baseline`
#>      30% 
#> 155.9331 
#> 
#> [[7]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.906  90.55%  
#> 2 A                 0.0945 9.45%   
#> 
#> 
#> [[8]]
#> [[8]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.534  53.44%  
#> 2 C                 0.432  43.18%  
#> 3 A                 0.0337 3.37%   
#> 
#> [[8]]$`Value Remaining`
#>       95% 
#> 0.3244191 
#> 
#> [[8]]$`Lift vs Baseline`
#>       30% 
#> 0.1576408 
#> 
#> [[8]]$`Win Probability vs Baseline`
#> # A tibble: 2 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 B                 0.911  91.11%  
#> 2 A                 0.0889 8.89%   
#> 
#> 
#> [[9]]
#> [[9]]$`Win Probability`
#> # A tibble: 3 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 A                      1 100%    
#> 2 B                      0 0%      
#> 3 C                      0 0%      
#> 
#> [[9]]$`Value Remaining`
#> 95% 
#>   0 
#> 
#> [[9]]$`Lift vs Baseline`
#> 30% 
#>   0 
#> 
#> [[9]]$`Win Probability vs Baseline`
#> # A tibble: 1 x 3
#>   option_name win_prob_raw win_prob
#>   <chr>              <dbl> <chr>   
#> 1 A                      1 100%