整个代码由两个函数组成。第一个函数为simulateCLT(n, m, method, df), 其中为样本大小,为模拟次数,method为采用的方法,取值为uniform或student,df为t分布的自由度(均匀分布该参数不起作用,且默认为5)。该函数用来产生中心极限定理的左端项,即
simulateCLT <- function(n, m, method = "student", df = 5){
# n is the number of sample size
# m is the number of simulation
# method is the distribution of sample
if (method == "uniform")
x = sapply(1:m, function(x) runif(n, -1, 1))
else if (method == "student")
x = sapply(1:m, function(x) rt(n, df))
else
stop(gettextf("method = '%s' is not supported. Using 'uniform' or 'student'", method))
x = as.data.frame(x)
Tx = sapply(x, function(y) sum(y)/sqrt(n*var(y)))
Tx = as.numeric(Tx) # from row to col
Tx = as.data.frame(Tx)
return(Tx)
}
simulatePlot <- function(fix = 'n', method = 'student'){
if (!(fix %in% c('n', 'm')))
stop(gettextf("fix = '%s' is not supported. Using 'n' or 'm'", fix))
if (!(method %in% c('student','uniform')))
stop(gettextf("method = '%s' is not supported. Using 'uniform' or 'student'", method))
library(ggplot2)
library(cowplot)
plots = NULL
iters = c(50,100,500,1000)
for (i in c(1:4))
{
m = iters[i]
if (fix == 'n')
if (method == 'uniform')
Tx = simulateCLT(500, m, method = 'uniform')
else
Tx = simulateCLT(500, m, method = 'student')
else
if (method == 'uniform')
Tx = simulateCLT(m, 500, method = 'uniform')
else
Tx = simulateCLT(m, 500, method = 'student')
plots[[i]] = ggplot(data.frame(Tx),aes(Tx)) +
geom_density() +
stat_function(fun=dnorm, color="red", args=list(mean=0, sd=1)) +
xlim(-4, 4)
if (fix == 'n')
plots[[i]] = plots[[i]] + ggtitle(paste0("m=",m))
else
plots[[i]] = plots[[i]] + ggtitle(paste0("n=",m))
}
plot_grid(plotlist = plots)
}
simulatePlot(fix = 'n', method = 'student')
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
由图可知,对于t分布,在固定模拟次数时,随着样本大小的增大,两条密度曲线越来越接近,故中心极限定理的左端项越来越近似为。
simulatePlot(fix = 'm', method = 'student')
由图可知,对于t分布,在固定样本大小时,随着模拟次数的增大,两条密度曲线越来越接近,故中心极限定理的左端项越来越近似为。
simulatePlot(fix = 'n', method = 'uniform')
由图可知,对于均匀分布,在固定模拟次数时,随着样本大小的增大,两条密度曲线越来越接近,故中心极限定理的左端项越来越近似为。
simulatePlot(fix = 'm', method = 'uniform')
由图可知,对于均匀分布,在固定样本大小时,随着模拟次数的增大,两条密度曲线越来越接近,故中心极限定理的左端项越来越近似为。
对于t分布和均匀分布,在增大样本大小或者增大模拟次数,都会使得中心极限定理的左端项更近似为。