## Titanic

The digibook Titanic is an introduction to the analysis of statistical data on the basis of the data on the survivors and died of the Titanic tragedy.

There is a passengers list with name, age, gender, class, and whether or not he or she has survived.

A digibook offers complete teaching material, both the introduction to the exercises, links to the apps and the exercise on top of the app. Students can work with this material almost without support.

Extra. The last part of the digibook offers a first introduction to the randomization method in the case of a cross table.
Start the lesson at www.vustat.eu/digibook/titanic_en

## The bootstrap principle for proportions

With this app you can examine the bootstrap method for estimation the average or standard deviation of a population. And also for the difference of the averages at two populations. This app is designed for continuous distributions. For yes-no distributions a separate app is available.

An easy job is to verify that the bootstrap does not always work well in extreme situations. For example, with a small sample.
The upper screen shows the distribution. This can be changed with the mouse. For example, to a two peak distribution.

Each time a sample is drawn from the population and the bootstrap method is executed on the sample .
With slow speed each experiment will take two steps. First, a sample is drawn. The mean and the standard deviation of the sample are shown on the left.
The bootstrap method is carried out at the second step. So a lot of drawns from the sample. How many times an element has been selected in the final bootstrap you can see the thickness of the border of the sample circle. With each new bootstrap again a drawing with replacement. is done from the sample This leads to the bootstrap distribution. The bootstrap method is the confidence interval is the center piece. What proportion of the edges is taken into account depends on the confidence level. (Percentage).
This interval is drawn in the third screen.
As more and more bootstraps are taken from the sample this results finally in a new confidence interval.
The program then checks whether the population parameter really is in the calculated interval. Bottom right is the percentage of times that this method was performed and was successful.

Also for the bootstrap distribution is that the root-n-law is applicable.
confidence level
The confidence level can change during simulation.
In theory you can choose any percentage for the confidence level, but because of the possibility of change is chosen for a limited number.

## The bootstrap principle

With this app you can examine the bootstrap method for estimation the average or standard deviation of a population. And also for the difference of the averages at two populations. This app is designed for continuous distributions. For yes-no distributions a separate app is available.

An easy job is to verify that the bootstrap does not always workhttps://blog.vustat.eu/2017/04/13/the-bootstrap-pr…-for-proportions/ well in extreme situations. For example, with a small sample.
The upper screen shows the distribution. This can be changed with the mouse. For example, to a two peak distribution.

Each time a sample is drawn from the population and the bootstrap method is executed on the sample .
With slow speed each experiment will take two steps. First, a sample is drawn. The mean and the standard deviation of the sample are shown on the left.
The bootstrap method is carried out at the second step. So a lot of drawns from the sample. How many times an element has been selected in the final bootstrap you can see the thickness of the border of the sample circle. With each new bootstrap again a drawing with replacement. is done from the sample This leads to the bootstrap distribution. The bootstrap method is the confidence interval is the center piece. What proportion of the edges is taken into account depends on the confidence level. (Percentage).
This interval is drawn in the third screen.
As more and more bootstraps are taken from the sample this results finally in a new confidence interval.
The program then checks whether the population parameter really is in the calculated interval. Bottom right is the percentage of times that this method was performed and was successful.

Also for the bootstrap distribution is that the root-n-law is applicable.
confidence level
The confidence level can change during simulation.
In theory you can choose any percentage for the confidence level, but because of the possibility of change is chosen for a limited number.

## Introduction bootstrap and randomization test

Resampling means that the original dataset is used to generate new samples, the results of which can be analyzed. Bootstrap and randomization are two examples of resampling methods.
Bootstrap is used to estimate confidence intervals
Randomization is used to perform tests

Bootstrap
To bootstrap means lots of samples drawn with replacement from the original data. The sample size is the size of the original data.
The idea behind the bootstrap is that the real population is the best approximated by a population that consists of infinitely many copies of the original sample. This comes down to draw a new sample from the original sample with replacement.

In the figure below you can see the number of times an object is chosen, shown by the width of the border. Each sample provides a new average or median.

The bootstrap will be used to determine the confidence interval for a mean, proportion, for difference for two averages, for difference of two proportions, and for the slope regression, bootstrap is a very general method, applicable in many situations.

Randomization-tests
A test is about a research question. An example of research question is: are there real differences between two groups or can be the difference explained by chance? Membership of the group is than the explanatory variable. If the assumption is true that the membership has no influence on the observed random variable, the allocation to the group. The randomization test does this random allocation a large number of times and that way, you can gain insight into the expected deviation.

For many situations are also traditional tests such as the t-test available. The advantage of this resampling methods from a pedagogical point of view, that it is not necessary to discuss in detail a number of concepts such as distribution before using the concepts confidence interval and tests. The methods are quite robust.

Extra information
On the web is a lot of extra information available. There are many excellent articles published regarding teaching on simulations-based conclusions. on the blog: www.causeweb.org/sbi

## Randomization and bootstrap in contingency tables.

 Baseball Soccer Total Girl 100 250 350 Boy 100 400 500 Total 200 650 850

Explanation of the procedure on the basis of the above cross table.

Randomization

The goal of Randomization is to examine the hypothesis whether the two variables are independent. In this case, sex and sport. The simulation of hypothesis of no connection happened by starting from 850 people. The property of boy or girl is to be distributed among the 850 people. So 500 are “girl” and are 350 “boy”. Independence of sexe the property Baseball or Soccer is distributed among the same 850 students based on the 200 and 650.

Of this new distribution is a cross table is made. The side totals remain the same. The results of a draw are shown in the middle window.

This procedure is very often carried out. The results are shown in the lowest window. The red line in the graph is the result of the cross table in the upper window. If the red line is far apart from the simulated outcome there is reason to say that the two variables are not independent.

Of a cross table there are several properties to be determined. The numbers, percentages, (total, row, column) for each cell of the cross table, the chi-squared and for 2 by 2 contingency tables the relative risk and the difference between two proportions.

A chi-squared statistic has the following form:     in which (e) the expected frequency and (f) the observed frequency is,  added up all the inner cells. The expected frequency is shown in the upper table with the checkbox perfect table checked.

The relative risk see wikipedia https://en.wikipedia.org/wiki/Relative_risk

The difference between two proportions is easily calculated. With small percentages relative risk often more suited to the situation..

The bootstrap method

A cross table can be done in several ways.

For example, a study on the size of families in the different countries of Europe can be done in several ways. One design is to use a method where each family in Europa has the same possibility of being drawn. Another experimental design is that ahead of time is decided that 100 families are to be selected from each country.

At the bootstrap method is pulled from a very large population which consists of very many copies of the sample. The bootstrap method should use the same method as the method used to get the sample.

The results in the third window show how far the results are off in the situation of the sample.

## VUSTAT APPS at NC State University

From Hollylynne S. Lee

These apps look wonderful! I am immediately going to use them as a resource in my current graduate course on statistical thinking!

And I will make sure they are in our next MOOC! Love them and how you are making them accessible across languages. 🙂

Hollylynne S. Lee
Professor, Mathematics and Statistics Education
University Faculty Scholar
Department of Science, Technology, Engineering, and Mathematics Education
Faculty Fellow, Friday Institute for Educational Innovation

Hollyline S. Lee organises  the MOOC Teaching Statistics Through Data Investigation MOOC for Educators. We continue to serve teachers from across the globe and have over 400 participants in the current session (open until June 30, 2017).

Fall 2017 a new MOOC will start focuses on Teaching Statistics Through Inferential Reasoning.  See the course description and outline here.

## Multi-lingual

The vustat apps are available in thirteen languages

0 English
1 German
2 Turkish
3 Dutch
4 Polish
5 Spanish
6 Swedish
7 French
8 Russian
9 Italian
10 Chinese
11 Japanese
12 Portuguese
You can force a language by giving it a parameter. For example to start the sampling app in Dutch by adding www.vustat.eu/apps/index.html?language=3 to the address.

A special app is made to edit a multi-lingual app. All the json files can be made multi-lingual, although the json-file for data analysis are different. The app www.vusoft.eu/apps/googledata allows to edit the words in the app.This app makes it possible to edit the info-information. With this app you can edit the json file directly. This is very tricky. If you do this you should check if it is valid json file after your changes. Please check this with a json validator like http://jsonlint.com/

The files associated with data analysis are treated different. See the post

## The square root law

Learn the square root law with the app Sampling Distribution

Before the students start with the square root law students should first feel that the SD of the mean becomes smaller as the sample size increases.

From my own class experience I learned that this is a surprising result for many students. I think this intuition must be established before their minds are ready for the root n-law.

The approach I’ve often used

1) Show how the sampling distribution of the mean of a sample of six with the normal distribution is created with the app.

2) Change the sample size to n = 100. Do exactly one sample to make clear what the a sample of 100 means.

3) Ask the question if the SD of the sampling distribution get smaller, larger or stays the same.

4) Allow the computer to simulate.

5) Let the students formulate their reaction on the the result. Let the students try to formulate their explanation of the result.

The self-discovering of the square root n-law is too difficult for a lot of students. What how ever can be done, is that students check the square root law in different situations.
The last four distributions may be tailored with the mouse. Asks the students to make their own distribution with a sigma of 10. Then check the square root in this situation.

From the website www.vustat.eu/teaching/root_law.docx you can download a lesson which follows this path and do much more about the normal approximation to the sampling distribution.