Bernoulli Trials - Difference Between Two Processes

The Data

Independent observations with two possible outcomes on a trial. And some characteristic that allows separating the trials into to groups. Imagine an underlying process difference that might account for differences in success (e.g., coins from different mints have a different bias) and a single process \(\delta\) to characterize the difference between them.

Simulate a data set data with k successes in n trials and \(\theta\) processes. Start with 10 trials and a success rate of 90% for one group and 10% for the other.

The data contains a single process with observations and a descriptive outcome label. One observation per row.

## # A tibble: 20 x 3
##    process   obs outcome
##      <chr> <dbl>   <chr>
##  1 theta_2     0 failure
##  2 theta_2     0 failure
##  3 theta_2     0 failure
##  4 theta_2     0 failure
##  5 theta_2     1 success
##  6 theta_2     0 failure
##  7 theta_2     0 failure
##  8 theta_2     0 failure
##  9 theta_2     0 failure
## 10 theta_2     0 failure
## 11 theta_1     1 success
## 12 theta_1     1 success
## 13 theta_1     1 success
## 14 theta_1     0 failure
## 15 theta_1     1 success
## 16 theta_1     1 success
## 17 theta_1     1 success
## 18 theta_1     1 success
## 19 theta_1     1 success
## 20 theta_1     1 success

Use class(data) to learn the class of the r object.

## [1] "tbl_df"     "tbl"        "data.frame"

Use ggplot to generate a data graph showing successes, failures, and total observations.

Stan needs summarized data ( k successes in n trials) in a list. Create the list stan_data.

## $n1
## [1] 10
## 
## $k1
## [1] 9
## 
## $n2
## [1] 10
## 
## $k2
## [1] 1

Use class(stan_data) to learn the class of the r object.

## [1] "list"

The Model

A graphical representation of the model shows:

a continuous, unobserved “process” parameter \(\delta\)
a continuous, unobserved “process” parameter \(\theta\)
a discrete, observable number of successes k
a discrete, observable number of observations n

The vector’s arrow shows dependency. Successes are dependent on both the underlying process \(\theta\) and the number of observations. The difference between \(\theta\)s is addressed by \(\delta\).

The assumption for \(\theta\) is that all possible rates are equally likely. The Beta distribution is set with 1 “success” and 1 “failure”.

The assumption of k is that the outcomes are from a Binomial distribution with \(\theta\) determining the rate for n observations.

The Stan model follows from the graphical model. The data block names n and k and sets lower boundaries. The parameters block names \(\theta\) and sets upper and lower boundaries. Transformed parameters block names \(\delta\) as the difference between the two \(\theta\) processes. The model block sets the prior distribution for \(\theta\) and the model for k1 and k2.

## 
## data { 
##   int<lower=1> n1; 
##   int<lower=1> n2; 
##   int<lower=0> k1;
##   int<lower=0> k2;
## } 
## parameters {
##   real<lower=0,upper=1> theta1;
##   real<lower=0,upper=1> theta2;
## } 
## transformed parameters {
##   real<lower=-1,upper=1> delta;
##   // deterministic variable delta is the difference
##   // between rates from two processes producing k1 and k2 outcomes
##   delta = theta1 - theta2;
## }
## model {
##   // Prior Distribution for Rate Theta
##   theta1 ~ beta(1, 1);
##   theta2 ~ beta(1, 1);
##   // Observed Counts
##   k1 ~ binomial(n1, theta1);
##   k2 ~ binomial(n2, theta2);
## }

Run Stan

“Run Stan with default settings and see what happens.”

(Andrew Gelman said this and I’m taking him at his word!)

Look at the object generated with class(samples_default). The S4 object contains lots of information beyond the model results.

## [1] "stanfit"
## attr(,"package")
## [1] "rstan"

And, simply running basic commands against the Stan object will produce reports and graphs specific to the class.

A summary table:

## Inference for Stan model: binomial_two_rates.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##          mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## theta1   0.83    0.00 0.10   0.59   0.78   0.85   0.91   0.98  3418    1
## theta2   0.16    0.00 0.10   0.02   0.09   0.15   0.23   0.40  3021    1
## delta    0.67    0.00 0.14   0.34   0.59   0.68   0.77   0.90  3427    1
## lp__   -11.87    0.03 1.07 -14.71 -12.33 -11.53 -11.10 -10.84  1532    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Dec  9 10:01:27 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

A plot of parameters

## ci_level: 0.8 (80% intervals)

## outer_level: 0.95 (95% intervals)

And a better plot of \(\theta\)

Bernoulli Trials - Difference Between Two Processes

12/06/2017

The Data

The Model

Run Stan