Distributome Colorblindness Activity
Colorblindness – Can you see the number in this image?
This Distributome Activity illustrates an application of probability theory to study Colorblindness, typically a genetic disorder which results from an abnormality on the X chromosome. The condition is thus rarer in women since a woman would need to have the abnormality on both of her X chromosomes in order to be colorblind (whether a woman has the abnormality on one X chromosome is essentially independent of having it on the other).
The goal of this activity is to demonstrate an efficient protocol of estimating the probability that a randomly chosen individual may be colorblind.
Suppose that \(p\) is the probability that a randomly selected ”man” is colorblind.
- 100 men are selected at random. What is the distribution of \(X_m\) = the number of these men that are colorblind?
- 100 women are selected at random. What is the distribution of \(X_f\) = the number of these women that are colorblind?
- To estimate the probability that a randomly selected woman is colorblind, you might use the proportion of colorblind women in a sample of n women. What is the variance of this estimator?
- Alternatively, to estimate the probability that a randomly selected woman is colorblind, you might use the square of the proportion of colorblind men in a sample of n men. Explain why this estimate makes sense. What is the variance of this estimator?
- For large samples, is it better to use a sample of men or a sample of women to estimate the probability that a randomly selected women is colorblind? Explain.
You can also use the delta method to find the approximate variance for the estimator above.
In practice, it may difficult to obtain reliable parameter estimates when the event at hand is very rare (such as with colorblindness in women). The use of a valid probability model such as the relationship between the chance of colorblindness in men and the chance in women may improve these estimates.
Distributome Homicide Trends Activity
A Columbus Dispatch newspaper story on Friday January 1, 2010 discussed a drop in the number of homicides in the city the previous year. Here are the first few paragraphs from the article:
- Homicides take big drop in city: Trend also being seen nationally, but why is a mystery.
- The number of homicides in Columbus dropped 25 percent last year after spiking in 2008. As of last night, the city was expected to close out 2009 with 83 homicides, 27 fewer than in 2008, according to records kept by police and The Dispatch. In 2007, 79 people were slain in Columbus. “I don’t know that there’s one reason for homicides going up or down,” said Lt. David Watkins, supervisor of the Police Division’s homicide unit.
- Why one year do we have 130, and then the next year we have 80?
- “You just can’t explain it,” Sgt. Dana Norman said. He supervises the third-shift squad that investigated 44 of last year’s homicides, which occurred at a rate of 11.1 for every 100,000 people in Columbus, based on recent population estimates …
A table appearing with the article showed that there were 568 homicides in the previous 6 years.
Sargent Norman’s statement that “”You just can’t explain it”” presents an intriguing probability question – Is it possible that natural random fluctuation might be a good explanation? Let’s consider probability models for the number of observed crimes and how they might fluctuate to see if the data mentioned in the article is unusual.
- If homicides are rare events that might be independently perpetrated by individuals in a large population – what distribution would approximately describe the number of murders in a year?
- Suppose the expected annual number of homicides in the city is denoted by \(\lambda\) and that the number of homicides is independent from year to year. The article notes that 2008 saw a “spike” in the number of homicides and in fact that was the highest number in the last six years. If nothing is going on except random fluctuations – we want to know if observing 27 fewer homicides in 2009 after the peak year is unusual (peak here meaning the highest in the last 6 years).
Use the Distributome Poisson simulator for the model you specified above to examine the distribution of the change in the number of homicides you would see following a peak of a six year stretch. Does the 27 murder drop seem unusual? Explain.
See a Hint
See the First part of the Answer
See the Second part of the Answer
The shaded region corresponds to values of at least 27, which happens about 12% of the time so the drop of homicides in Columbus would not be particularly unusual when nothing is happening but regular random fluctuations.
This problem might also be viewed as an example of the regression effect where you should expect a regression to the mean following a very high observed value.
When viewing a random process over time it is the extremes that make the headlines – so the probability models we should use to answer the question “What is unusual?” should be probability models about extremes.