Power calculations

Power Calculations - an Introduction to types of error

Type I error refers to the instance where we reject the null when p-value is less than 0.05 (1 out of 20). P-value is mostly never 0, we are just saying that the chance where we got to reject it when it’s true is relatively small. This is also called false positive.

On the other hand, I pondered about, “well if 0.05 implies that there’s a chance we might be rejecting the null hypothesis even when it’s true, why don’t we just set p-value to be incredibly small, like 0.0001 for instance?”. In this case, you and I might be missing out on NOT rejecting the null hypothesis when we should. This is called type II error.

The 0.05 and 0.01 are arbitrary

Usually, we reject the null hypothesis when p-value is lower than 0.05 or 0.01 levels. But the intuition behind this is not always clearly stated, it just so happens that original papers on the topics use the number, and it has become a custom since.

Here I attempt to retrace what I understand about the values:

Power calculations

reject <- function(N, alpha=0.05) {
    hf <- sample(hfPopulation, N)
    control <- sample(controlPopulation, N)
    pval <- t.test(hf, control)$p.value
    pval < alpha
}

B <- 2000
N <- 12
rejections <- replicate(B, reject(N))
mean(rejections)

[1] 0.2245

We can see that with a replication of drawing samples from hf and control populations, each with only 12 individuals per sample, the percentile in which we reject the null hypothesis out of 2000 repetitions is pretty low.

Let’s see how the change in power improves with larger sample size.

Ns <- seq(5, 50, 5)

power <- sapply(Ns, function(N) {
    rejections <- replicate(B, reject(N))
    mean(rejections)
})
plot(Ns, power, type = "b")

We can see that with a sample size of 12, the result gave us the mean similar to what we calculated before, and as the sample size grows, the power is increasing accordingly.

The same power change can be observed if we instead change the alpha:

N <- 30
alphas <- c(0.1,0.05,0.01,0.001,0.0001)
power <- sapply(alphas, function(alpha){
  rejections <- replicate(B, reject(N, alpha=alpha))
  mean(rejections)
})
plot(alphas, power, xlab="alpha", type="b", log="x")