Randomization and bootstrap in contingency tables.

Baseball Soccer Total
Girl 100 250 350
Boy 100 400 500
Total 200 650 850


Explanation of the procedure on the basis of the above cross table.


The goal of Randomization is to examine the hypothesis whether the two variables are independent. In this case, sex and sport. The simulation of hypothesis of no connection happened by starting from 850 people. The property of boy or girl is to be distributed among the 850 people. So 500 are “girl” and are 350 “boy”. Independence of sexe the property Baseball or Soccer is distributed among the same 850 students based on the 200 and 650.

Of this new distribution is a cross table is made. The side totals remain the same. The results of a draw are shown in the middle window.

This procedure is very often carried out. The results are shown in the lowest window. The red line in the graph is the result of the cross table in the upper window. If the red line is far apart from the simulated outcome there is reason to say that the two variables are not independent.

Of a cross table there are several properties to be determined. The numbers, percentages, (total, row, column) for each cell of the cross table, the chi-squared and for 2 by 2 contingency tables the relative risk and the difference between two proportions.

A chi-squared statistic has the following form:     in which (e) the expected frequency and (f) the observed frequency is,  added up all the inner cells. The expected frequency is shown in the upper table with the checkbox perfect table checked.

The relative risk see wikipedia https://en.wikipedia.org/wiki/Relative_risk

The difference between two proportions is easily calculated. With small percentages relative risk often more suited to the situation..


The bootstrap method

A cross table can be done in several ways.

For example, a study on the size of families in the different countries of Europe can be done in several ways. One design is to use a method where each family in Europa has the same possibility of being drawn. Another experimental design is that ahead of time is decided that 100 families are to be selected from each country.

At the bootstrap method is pulled from a very large population which consists of very many copies of the sample. The bootstrap method should use the same method as the method used to get the sample.

The results in the third window show how far the results are off in the situation of the sample.


Leave a Reply

Your email address will not be published. Required fields are marked *