The simplest way
to view the distinction between the binomial distribution of Section 5.3 and the
hypergeometric distribution lies in the way the sampling is clone. The types of
applications of the hypergeometric arc very similar to those of the binomial
distribution. We are interested in computing probabilities for the number of
observations that fall into a particular category. But in the case of the binomial,
independence among trials is required. As a result, if the binomial is applied
to. say, sampling from a lot of items (deck of cards, batch of production items),
the sampling must be clone withreplacement of each item after it is observed.
On the other hand, flic hypergeometric distribution does not require independence
and is based on the: sampling done with outreplacement. Applications for the
hypergeometric distribution are found in many areas, with heavy uses in
acceptance sampling, electronic testing, and quality assurance. Obviously, for
many of these fields testing is done at the expense of the item being tested.
That is, the item is destroyed and hence cannot be replaced in the sample. Thus
sampling without replacement is necessary. A simple example: with playing cards
will serve as our first illustration. If we wish to find the: probability of
observing 3 red cards in 5 draws from an ordinary deck of 52 playing cards, the
binomial distribution of Section 5.3 does not apply unless each card is
replaced and the deck reshuffled before the next drawingis made. To solve the
problem of sampling without replacement, let. us restate the problem. If 5
cards are drawn at random, we are interested in the probability of selecting 3
red cards from the 26 available and 2 black cards from the 26 black cards available
in the deck. There are
ways of
selecting 3 red cards, and for each of these ways wc can choose 2 black cards
in
ways. Therefore, the total number of ways to
select 3 red and 2 black cards in 5 draws is the product
The total number of ways to select any 5 cards
from the 52 that are available is
. Hence the probability of selecting 5 cards
without replacement of which 3 are red and 2 are black is given by
= 0,3251
In general, we
are interested in the probability of selecting x successes from
the A"
items labeled successes and n — x failures from the Ar — k items
labeled
failures when a
random sample of size n is selected from AT items. This is known
as a hypergeometric
experiment, that is, one that possesses the following two
properties:
1. A random
sample of size n is selected without replacement from N items.
2. k of
the Ar items may be classified as successes and N — k are
classified as failures.
The number X of successes
of a hypergeometric experiment is called a hypergeometric
random variable.
Accordingly,
the probability distribution of the hypergeometric variable is called the hypergeometric
distribution, and its values will be denoted by h(x; N, n, A:),
since they depend on the number of successes k in the set N from
which we select n items.
Hypergeometric Distribution In Acceptance Sampling
As in the case
of the binomial distribution, the hypergeometric distribution finds applications
in acceptance sampling where lots of material or parts are sampled in order to
determine whether or not the entire lot is accepted.
Example 5.11:1
A particular part that is used as an
injection device is sold in lots of 10. The producer feels that the lot is
deemed acceptable if no more than one defective is in the lot. Some lots are
sampled and the sampling plan involves random sampling and testing 3 of the
parts out of 10. If none of the 3 is defective, the lot is accepted. Comment on
the utility of this plan.
Solution: Let us assume
that the lot is truly unacceptable (i.e., that 2 out of 10 are
defective). The
probability that our sampling plan finds the lot acceptable is
P (X = 0) =
=
0,467
Thus, if the lot
is truly unacceptable with 2 defective parts, this sampling plan will allow
acceptance roughly 47% of the time. As a result, this plan should be considered
faulty. Let us now generalize in order
to find a formula for h(x;N,n,k). The total number of samples of size n
chosen from A7 items is
. These samples
are assumed to be equally likely. There are
ways of selecting x successes from the A*
that are available, and for each of these ways we can choose the n — x failures
in
ways> Thus the total number of favorable
samples among the
possible samples is given by
.
Hence we have the following definition.
Hypergeometric Distribution
The probability
distribution of the hypergeometric random variable X, the number of successes
in a random sample of size n selected from Ar items of which k are
labeled success and N — k labeled failure, is
h(x; N, n, k) =
, max {0,
n - (N- k)}< x < min {n,k}.
The range of x
can be determined by the three binomial coefficients in the definition,
where x and
n — x are no more than k and N — A:, respectively; and both of
them cannot be
less than 0. Usually, when both k (the number of successes) and
N - k (the
number of failures) are larger than the sample size n, the range of a
hypergeometric random
variable will be x = 0 , 1 , . . . ,n.
Example 5.12:1 Lots of 40
components each are called unacceptable if they contain as many as 3 defectives
or more. The procedure for sampling the lot is to select 5 components at random
and to reject the lot if a defective is found. What is the probability that exactly
1 defective is found in the sample if there are 3 defectives in the entire lot?
Solution: Using the
hypergeometric distribution with n = 5, N = 40, k — 3, and x =
1, we
find the probability of
obtaining one defective to be
h(1; 40,5, 3)
Once again this
plan is likely not desirable since it detects a bad lot (3 defectives)
only about 30% of the
time.
Theorem 5.3: The mean
and variance of the hypergeometric distribution h(x; N, n, A;) are
and
2 =
(n)[
(1-
]
The proof for the mean
is shown in Appendix A.25.
Example 5.13:1 Let us now
reinvestigate Example 3.9. The purpose of this example was to illustrate
the notion of a random variable
and the corresponding sample space. In
the example, we have a lot of 100
items of which 12 are defective. What is the
probability that in a
sample of 10, 3 are defective?
Solution: Using
the hypergeometric probability function we have
h(3;
100, 10, 12) =
Example 5.14:1 Find the mean
and variance of the random variable of Example 5.12 and then use Chebyshev's
theorem to interpret the interval //. ± 2
.
Solution: Since Example
5.12 was a hypergeometric experiment, with N = 40, u = 5, and
k = 3,
then by Theorem 5.3 we have
Taking the
square root of 0.3113, we find that a = 0.558. Hence the required interval
is 0.375 ± (2)(0.558), or from -0.741 to 1.491. Chebyshev's theorem states that
the number of defectives obtained when 5 components are selected at random from
a lot of 40 components of which 3 are defective has a probability of at least
3/4 of falling between - 0.741 and 1.491. That is, at, least three-fourths of the
time, the 5 components include: less than 2 defectives.
Relationship to
the Binomial Distribution
In this chapter
wc discuss several important discrete distributions that have wide
applicability.
Many of these distributions relate nicely to each other. The beginning
student should
gain a clear understanding of these relationships. There is an
interesting
relationship between the: hypergeometric and the binomial distribution.
As one might
expect, if n is small compared to N, the nature of the N items
changes
very little in
each draw. So a binomial distribution can be used to approximate
the
hypergeometric distribution when n is small, compared to N. In
fact, as a rule
of thumb the
approximation is good when
< 0.05.
Thus the
quantity
plays the role of the binomial parameter p.
As a result, the binomial distribution may be viewed as a large population
edition of the hypergeometric: distributions. The mean and variance then come
from the formulas
2 = npq =
Comparing those
formulas with those of Theorem 5.3, we see that the mean is the
same whereas the
variance differs by a correction factor of (N — n)/(N — 1), which
is negligible when n
is small relative to N.
Multivariate
Hypergeometric Distribution
If N items
can be partitioned into the A: cells A1,
A2,..., Ak
with at, a.2,..., ak elements, respectively, then the probability
distribution of the random variables X1,X2,...,Xk,,
representing the number of elements selected from A1 ,A2,..., Ak in a random
sample' of size n, is f(x1, x2,
... xk ; a1, a2, ... ak, N, n) =
with
Example
5.16:1
A group of 10 individuals is used for a biological case study. The group
contains 3 people with blood type O, 4 with blood type A, and 3 with blood type
B. What
is the probability that a random
sample of 5 will contain I person with blood type O, 2 people with blood type
A, and 2 people with blood type 13?
Solution: Using the
extension of the hypergeometric distribution with x1= 1, x2
= 2, x3 = 2,
aI = 3, a2
= 4, a3 = 3, iV = 10, and n = 5, we find t h a t the
desired probability is
f(1,2,2;3,4,3,10,5) =