## Problem

Allan Rossman used to live along a golf course and collected the golf balls that landed in his yard. Most of these golf balls had a number on them. Allan tallied the numbers on the first 500 golf balls that landed in his yard one summer.

Specifically. he collected the following data: 137 golf balls numbered 1 138 golf balls numbered 2 107 golf balls numbered 3 104 golf balls numbered 4 14 “others” (Either with a different number, or no number at all. We will ignore these other balls for the purposes of this question.)

Question: What is the distribution of these numbers? In particular, are the numbers 1, 2, 3, and 4 equally likely?

## My solutions

```
library(tidyverse)
library(ggplot2)
library(Cairo)
#How many simulations to run?
NumberOfSims<-10000
NumberofBalls <- 500-14
set.seed(123) # set the seed for the random number generator - this makes sure the results are reproducible when we are debugging
# create blank vectors to store values
vec <- vector()
vec2 <- vector()
vec3 <- vector()
vec4 <- vector()
theme_set(theme_minimal())
```

### Simulation

```
# compute maximum frequency for each simlation
for (j in 1:NumberOfSims){
for (i in 1:NumberofBalls){
vec[i] <- sample(1:4, 1)
}
vec2[j]<- max(table(vec))
}
# compute minimum frequency for each simlation
for (j in 1:NumberOfSims){
for (i in 1:NumberofBalls){
vec[i] <- sample(1:4, 1)
}
vec3[j]<- min(table(vec))
}
# compute range of frequency for each simlation
range <- vec2 - vec3
# compute variance of frequency for each simlation
for (j in 1:NumberOfSims){
for (i in 1:NumberofBalls){
vec[i] <- sample(1:4, 1)
}
vec4[j]<- var(table(vec))
}
df <- cbind(vec2, vec3, vec4, range) %>% as.data.frame()
colnames(df) <- c("max", "min", "variance", "range")
dim(df)
```

`## [1] 10000 4`

`head(df)`

```
## max min variance range
## 1 136 112 11.67 24
## 2 136 117 268.33 19
## 3 138 116 43.00 22
## 4 133 108 95.00 25
## 5 129 116 73.67 13
## 6 137 111 107.00 26
```

### Minimum frequency

```
# observed vector
obs <- c(137, 138, 107, 104)
# calculate test statistics
min(obs)
```

`## [1] 104`

```
# calculate p-value
a <- df %>% filter(min > min(obs)) %>% nrow()
pvalue <- 1- a/NumberOfSims
pvalue
```

`## [1] 0.1481`

`ggplot(aes(x = min), data = df) + geom_histogram(binwidth = 2) + geom_vline(xintercept = min(obs), size = 1.4, color = "#AFAFFF") + annotate("text", x = min(obs) - 5, y = 1000, label = " test statistics = 104 \n pvalue = 0.1481", size = 4)`

### Variance

```
# calculate test statistics
var(obs)
```

`## [1] 343`

```
# calculate p-value
a <- df %>% filter(variance > var(obs)) %>% nrow()
pvalue <- a/NumberOfSims
pvalue
```

`## [1] 0.0336`

```
# draw the graph
ggplot(aes(x = variance), data = df) + geom_histogram() + geom_vline(xintercept = var(obs), size = 1.4, color = "#AFAFFF") + annotate("text", x = var(obs) + 150, y = 600, label = " test statistics = 343 \n pvalue = 0.0336", size = 4)
```

### Range

```
# calculate test statistics
max(obs) - min(obs)
```

`## [1] 34`

```
# calculate p-value
a <- df %>% filter(range > max(obs) - min(obs)) %>% nrow()
pvalue <- a/NumberOfSims
pvalue
```

`## [1] 0.0742`

```
# draw the graph
ggplot(aes(x = range), data = df) + geom_histogram() + geom_vline(xintercept = max(obs) - min(obs), size = 1.4, color = "#AFAFFF") + annotate("text", x = max(obs) - min(obs) + 8, y = 600, label = " test statistics = 34 \n pvalue = 0.0742", size = 4)
```

I tried 3 test statistics using simulation-based hypothesis tests. My null hypothesis here is that the numbers 1, 2, 3, and 4 distribute equally. My alternative hypothesis here is that the numbers 1, 2, 3, and 4 do not distribute equally.

Using minimum frequency of ball number among 486 balls as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 104 and our pvalue = 0.1481. Thus, with the significance level of 0.05, we fail to reject the null hypothesis that the numbers 1, 2, 3, and 4 distribute equally likely.

Using variance of the frequency of the numbers 1, 2, 3, and 4 as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 343 and our pvalue = 0.0336. Thus, with the significance level of 0.05, we reject the null hypothesis and conclude that the numbers 1, 2, 3, and 4 do not distribute equally.

Using range of the frequency of the numbers 1, 2, 3, and 4 as the test statistics, we simulated 10000 times and made a histogram for these 10000 minimum frequency. Our observed test statistics = 34 and our pvalue = 0.0742 Thus, with the significance level of 0.05, we fail to reject the null hypothesis that the numbers 1, 2, 3, and 4 distribute equally.