Distributome Data & Activity: Horse Kicks

Introduction

In 1898, the Polish statistician and economist Ladislaus von Bortkiewicz published his famous book “Das Gesetz der kleinen Zahlen” (translation: The Law of Small Numbers).  The book contained his analysis of some fascinating data sets on the occurrence of rare events in large populations.  In one case Bortkiewicz analyzed the number of soldiers in each corps of the Prussian cavalry who were killed by being kicked by horses between the years 1875 and 1894.  There were fourteen different corps examined and the data are available below.  Ten of the fourteen corps had twenty squadrons with soldiers in similar positions while the other four had features indicating substantive differences in their populations.  Thus, Bortkiewicz argued that these four corps might be excluded from analyses of the data.  He writes (as translated by C.P. Winsor, 1947: Human Biology 19:154-161):

The Guard Corps contains, apart from artillery, engineers and trainees, 134 infantry companies and 40 cavalry squadrons; the XI corps has three divisions; the I corps has 30 and the VI corps has 25 squadrons, against a norm of 20 squadrons.

Problem 1:    Explain why the number of soldiers in any one of the fourteen Prussian cavalry corps killed by horse kicks might be reasonably modeled by a Poisson distribution.

Problem 2:   Consider the total number of soldiers killed by horse kicks in the fourteen corps put together (even including the four identified by Bortkiewicz as being different).  What distribution would provide a good model for those data?

Problem 3:   Let’s compare the number of soldiers killed by horse kicks in the data to what would be expected under the Poisson probability model.

  1. How well does the data fit the model if you suppose the rate of being killed by a horse kick is the same from corps to corps and year-to-year for the ten corps Bortkiewicz believes are similar?
  2. How well does the data fit the model if you suppose the rate of being killed by a horse kick is the same from corps to corps and year-to-year for all fourteen corps in the data set?
  3. Does allowing each corps to have its own rate of horse-kick deaths improve the fit of the model?  Does allowing for different years to have different rates improve the fit of the model?
  4. Researchers Preece, Ross, and Kirby suggest that corps-to-corps and year-to-year differences in average rates may be modeled as random draws from a Gamma distribution.  If their idea is true, what would be an appropriate model for the number of deaths by horse-kicks?

Data Description
These data indicate the number of deaths by horse-kicks in the Prussian Army from 1875 to 1894 for 14 army corps. The data are derived from Andrews and Herzberg’s book(1985, p. 18). Originally published in the 1898 book “The Law of Small Numbers” by the Polish statistician and economist Ladislaus von Bortkiewicz. Ten of the corps have a similar structure of 20 squadrons each and performed similar duties.  The Guard Corps, Corps I, Corps VI, and Corps XI have different structures and performed somewhat different tasks then the others.

Data Download
Text Raw data: Distributome Data: Horse Kicks (*.txt file)

HTML Data Table

Year Guard.corps corpsI corpsII corpsIII corpsIV corpsV corpsVI corpsVII corpsVIII corpsIX corpsX corpsXI corpsXIV corpsXV
1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0
1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0
1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0
1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0
1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1
1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0
1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1
1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1
1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0
1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0
1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0
1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2
1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2
1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0
1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0
1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0
1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0