Fithriyah Binti 'Ibad Abdurrahman

Minggu, 19 Mei 2013

Hypergeometric Distribution


The simplest way to view the distinction between the binomial distribution of Section 5.3 and the hypergeometric distribution lies in the way the sampling is clone. The types of applications of the hypergeometric arc very similar to those of the binomial distribution. We are interested in computing probabilities for the number of observations that fall into a particular category. But in the case of the binomial, independence among trials is required. As a result, if the binomial is applied to. say, sampling from a lot of items (deck of cards, batch of production items), the sampling must be clone withreplacement of each item after it is observed. On the other hand, flic hypergeometric distribution does not require independence and is based on the: sampling done with outreplacement. Applications for the hypergeometric distribution are found in many areas, with heavy uses in acceptance sampling, electronic testing, and quality assurance. Obviously, for many of these fields testing is done at the expense of the item being tested. That is, the item is destroyed and hence cannot be replaced in the sample. Thus sampling without replacement is necessary. A simple example: with playing cards will serve as our first illustration. If we wish to find the: probability of observing 3 red cards in 5 draws from an ordinary deck of 52 playing cards, the binomial distribution of Section 5.3 does not apply unless each card is replaced and the deck reshuffled before the next drawingis made. To solve the problem of sampling without replacement, let. us restate the problem. If 5 cards are drawn at random, we are interested in the probability of selecting 3 red cards from the 26 available and 2 black cards from the 26 black cards available in the deck. There are   ways of selecting 3 red cards, and for each of these ways wc can choose 2 black cards in  ways. Therefore, the total number of ways to select 3 red and 2 black cards in 5 draws is the product  The total number of ways to select any 5 cards from the 52 that are available is   . Hence the probability of selecting 5 cards without replacement of which 3 are red and 2 are black is given by     

   = 0,3251

In general, we are interested in the probability of selecting x successes from
the A" items labeled successes and n — x failures from the Ar — k items labeled
failures when a random sample of size n is selected from AT items. This is known
as a hypergeometric experiment, that is, one that possesses the following two
properties:
1. A random sample of size n is selected without replacement from N items.
2. k of the Ar items may be classified as successes and N k are classified as failures.

The number X of successes of a hypergeometric experiment is called a hypergeometric
random variable. Accordingly, the probability distribution of the hypergeometric variable is called the hypergeometric distribution, and its values will be denoted by h(x; N, n, A:), since they depend on the number of successes k in the set N from which we select n items.

Hypergeometric Distribution In Acceptance Sampling

As in the case of the binomial distribution, the hypergeometric distribution finds applications in acceptance sampling where lots of material or parts are sampled in order to determine whether or not the entire lot is accepted.

Example 5.11:1
A particular part that is used as an injection device is sold in lots of 10. The producer feels that the lot is deemed acceptable if no more than one defective is in the lot. Some lots are sampled and the sampling plan involves random sampling and testing 3 of the parts out of 10. If none of the 3 is defective, the lot is accepted. Comment on the utility of this plan.

Solution: Let us assume that the lot is truly unacceptable (i.e., that 2 out of 10 are
defective). The probability that our sampling plan finds the lot acceptable is
P (X = 0) =   = 0,467
Thus, if the lot is truly unacceptable with 2 defective parts, this sampling plan will allow acceptance roughly 47% of the time. As a result, this plan should be considered faulty.  Let us now generalize in order to find a formula for h(x;N,n,k). The total number of samples of size n chosen from A7 items is . These samples are assumed to be equally likely. There are  ways of selecting x successes from the A* that are available, and for each of these ways we can choose the n — x failures in  ways> Thus the total number of favorable samples among the  possible samples is given by . Hence we have the following definition.

Hypergeometric Distribution
The probability distribution of the hypergeometric random variable X, the number of successes in a random sample of size n selected from Ar items of which k are labeled success and N — k labeled failure, is
h(x; N, n, k) =  , max {0, n - (N- k)}< x < min {n,k}.
The range of x can be determined by the three binomial coefficients in the definition,
where x and n — x are no more than k and N — A:, respectively; and both of
them cannot be less than 0. Usually, when both k (the number of successes) and
N - k (the number of failures) are larger than the sample size n, the range of a
hypergeometric random variable will be x = 0 , 1 , . . . ,n.
Example 5.12:1 Lots of 40 components each are called unacceptable if they contain as many as 3 defectives or more. The procedure for sampling the lot is to select 5 components at random and to reject the lot if a defective is found. What is the probability that exactly 1 defective is found in the sample if there are 3 defectives in the entire lot?
Solution: Using the hypergeometric distribution with n = 5, N = 40, k — 3, and x = 1, we
find the probability of obtaining one defective to be
h(1; 40,5, 3)
Once again this plan is likely not desirable since it detects a bad lot (3 defectives)
only about 30% of the time.
Theorem 5.3: The mean and variance of the hypergeometric distribution h(x; N, n, A;) are
 and 2 =  (n)[ (1- ]
The proof for the mean is shown in Appendix A.25.
Example 5.13:1 Let us now reinvestigate Example 3.9. The purpose of this example was to illustrate
the notion of a random variable and the corresponding sample space. In
the example, we have a lot of 100 items of which 12 are defective. What is the
probability that in a sample of 10, 3 are defective?
Solution: Using the hypergeometric probability function we have
h(3; 100, 10, 12) =
Example 5.14:1 Find the mean and variance of the random variable of Example 5.12 and then use Chebyshev's theorem to interpret the interval //. ± 2 .

Solution: Since Example 5.12 was a hypergeometric experiment, with N = 40, u = 5, and
k = 3, then by Theorem 5.3 we have
Taking the square root of 0.3113, we find that a = 0.558. Hence the required interval is 0.375 ± (2)(0.558), or from -0.741 to 1.491. Chebyshev's theorem states that the number of defectives obtained when 5 components are selected at random from a lot of 40 components of which 3 are defective has a probability of at least 3/4 of falling between - 0.741 and 1.491. That is, at, least three-fourths of the time, the 5 components include: less than 2 defectives.

Relationship to the Binomial Distribution
In this chapter wc discuss several important discrete distributions that have wide
applicability. Many of these distributions relate nicely to each other. The beginning
student should gain a clear understanding of these relationships. There is an
interesting relationship between the: hypergeometric and the binomial distribution.
As one might expect, if n is small compared to N, the nature of the N items changes
very little in each draw. So a binomial distribution can be used to approximate
the hypergeometric distribution when n is small, compared to N. In fact, as a rule
of thumb the approximation is good when  < 0.05.
Thus the quantity  plays the role of the binomial parameter p. As a result, the binomial distribution may be viewed as a large population edition of the hypergeometric: distributions. The mean and variance then come from the formulas
2 = npq =
Comparing those formulas with those of Theorem 5.3, we see that the mean is the
same whereas the variance differs by a correction factor of (N — n)/(N — 1), which
is negligible when n is small relative to N.
Multivariate Hypergeometric Distribution
If N items can be partitioned into the A: cells A1, A2,..., Ak with at, a.2,..., ak elements, respectively, then the probability distribution of the random variables X1,X2,...,Xk,, representing the number of elements selected from A1 ,A2,..., Ak in a random sample' of size n, is f(x1, x2, ... xk ; a1, a2, ... ak, N, n) =   with
Example 5.16:1 A group of 10 individuals is used for a biological case study. The group contains 3 people with blood type O, 4 with blood type A, and 3 with blood type B. What
is the probability that a random sample of 5 will contain I person with blood type O, 2 people with blood type A, and 2 people with blood type 13?

Solution: Using the extension of the hypergeometric distribution with x1= 1, x2 = 2, x3 = 2,
aI = 3, a2 = 4, a3 = 3, iV = 10, and n = 5, we find t h a t the desired probability is
f(1,2,2;3,4,3,10,5) =

Tidak ada komentar:

Posting Komentar