Story Transcript
Sample Survey Methodology I
Presented by Daw Nu Nu Lwin Assistant Lecturer Department of Statistics Co-operative University, Sagaing
CO-OPERATIVE UNIVERSITY, SAGAING DEPARTMENT OF STATISTICS
SAMPLE SURVEY METHODOLOGY C Stats 2104 Second Year (Second Semester) Reference • Cochran, W.G., (1977), Sampling Techniques, 3rd edition, JohnWiley. • Scheaffer, R.C, Mendenhall, W. and ott,L., (1990). Elementary Survey Sampling, (2nd edition). Duxbury Press.
• Introduction Type of research Sources of data Method of data collection Questionnaire Design Criteria and Choice of Sample Design • Simple Random Sampling What is Simple Random Sampling? Estimation of a Population Mean, Total and Proportion Sample Size for Mean, Total and Proportion • Stratified Random Sampling What is Stratified Random Sampling? Estimation of a Population Mean, Total and Proportion • Systematic Random Sampling What is Systematic Random Sample? Estimation of a Population Mean, Total and Proportion • Cluster Random Sampling What is Cluster Random Sample? Estimation of a Population Mean, Total and Proportion
Type of Research The topic for the research study is sales promotion strategy and the nature of the topic is theoretical and descriptive. So the conduct the research study the type of research suitable is descriptive research only. The data are collected from sales records, dealers, customers and salesmen of the companies performing in FMCG (Fast- Moving Consumer Goods) sector. The descriptive research has met the requirement of research study.
Classification of data
Type
Primary data
Secondary data
o Data that have been collected from first-hand
experiences is known as primary data. It has more reliable, authentic and not been published anywhere. o Primary data has not been changed or altered by human being, therefore its validity is greater than secondary data.
Targeted issue are addressed Data interpretation is better Efficient spending for information Decency of data Proprietary issue Addresses specific research issue Greater control
High cost Time consuming Inaccurate feed-backs More number of resources is required
Secondary data are those that have been collected
by others. These are usually in journals, periodicals, research publication, official record etc. Secondary data may be available in the published or unpublished form. When it is not possible to collect the data by primary method, the investigator go for secondary method. This data collected for some purpose other than the problem at hand.
Secondary data sources (i) Internal sources (ii) External sources (i) Internal sources Internal sources of secondary data are usually for marketing application Sale records Marketing Activity Cost information Distributor reports and feedback Customer feedback
External sources of secondary data are usually for Financial application Journal Books Magazines Newspaper Libraries The internet
Ease of access Low cost to acquire Clarification of research question May answer research question
Not specific to researcher’s needs
Incomplete information Not timely Reference Amogh Kadam Rizwan Shaikh Prathmesh Parab
(i) Direct Personal Interview (ii) Questionnaires Sent Though Mail (iii) Interviews by Enumerators (iv) Telephone Interview (v) Questionnaire Vs. Schedule
(i) Direct Personal Interview widely used in social and economic surveys. Investigator personally contacts the respondents and can obtain the required data fairly accurately. Interviewer asks the questions pertaining to the objectives of survey and the information. Response rate is usually good and the information is more reliable and correct. However, more expenses and time is required to contact the respondents.
(ii) Questionnaires Sent Though Mail investigator prepares questionnaires and sends it to the respondents. Respondents are requested to complete the questionnaires and return them to the investigator by a specified date. Suitable where respondents are spread over a wide area. less expensive, normally it has poor response rate. Adopted only where the respondents are literate and can understand the questions. Questionnaire is drafted, and the extent to which willing cooperation of the respondents is secured. For rural areas, this method has got its obvious limitation and is seldom used.
(iii) Interviews by Enumerators Involves the appointment of enumerators by surveying agency. Enumerators go to the respondents, ask them the questions contained in the schedule, and then fill up the responses in the schedule themselves. For success of this method, the enumerators should be given proper training for soliciting co-operation of the respondents. The enumerators should be asked to carry with them their identity cards, so that, the respondents are satisfied of their authenticity. Usefully employed where the respondents to be covered are illiterate. (iv) Telephone Interview In case the respondents in the population to be covered can be approached by phone, their response to various questions, included in the schedule, can be obtained over phone. If long distance calls are not involved and only local calls are to be made, this mode of collecting data may also prove quite economical. It is, however, desirable that interviews conducted over the phone are kept short so as to maintain the interest of the respondents.
(v) Questionnaire Vs. Schedule In the questionnaire approach, the informants or respondents are asked pre-specified questions and their replies to these questions are recorded by themselves or by investigators. In this case, the investigator is not supposed to influence the respondents. This approach is widely used in main enquiries. In the schedule approach , the exact from of the questions to be asked are not given and the task of questioning and soliciting information is left to the investigator, who backed by the training and instructions has to use his ingenuity in explaining the concepts and definitions to the informant for obtaining reliable information.
Determining the type of question (i) Open-ended questions (ii) Close-ended questions (i) Open-ended questions This gives the respondents the ability to respond in their own words. Some illustrations of this type are listed below:
What is your age? How much orange juice does this bottle contain? Which is your favorite your series? Why do you smoke Gold Flakes cigarettes? I like Nescafe because ----------------. My career goal is to ---------------. I think hybrid cars are ---------------.
This allows the subject to choose one of the given alternatives. Dichotomous question
Multiple Choice
Scales
Figure ( 1.1 ) Types of question response option Question Content
Open - ended
Closed-ended
Dichotomous
Multiple Responses
Scales
Are you diabetic?
Yes / No What kind of cola do you drink? Normal / diet Your working hours in the organization are fixed / flexible
For example Will you consider selling organic food products in your store? • Definitely not in the next one year • Probably not in the next one year • Undecided • Probably in the next one year • Definitely in the next one year
( 1 strongly Disagree
5 Strongly agree)
1. The people in my company know their roles very clearly 2.I feel the need for the organization to change 3. Existing systems are very effective
1
2
3
4
5
Figure (1.2) Questionnaire Design Process
Convert the Research Objectives into the information Needed Method of Administering the Questionnaire Content of the Questions Motivating the Respondent to answer Determining Types of Questions
Question Design Criteria Determine the Questionnaire Structure Physical Presentation of Questionnaire Pilot Testing the Questionnaire Administering the Questionnaire
These six criteria are: • Population specification • Unit of analysis specification • Sample size determination • Selection procedure description • Response rate and nonresponse treatment • Estimation procedures
2.1 What Is Simple Random Sampling? Simple random sampling is a probability sampling procedure that gives every element in the target population, and each possible sample of a given size, an equal chance of being selected. As such, it is an equal probability selection method (EPSEM). 2.2 Estimation of a Population Mean, Total and Proportion Estimate population mean (or) Sample mean σ 𝑦𝑖 ഥ ഥ= 𝒀=𝒚 𝑛
Sample variance s2
=
σ 𝑦𝑖2 −𝑛𝑦ത 2
𝑛−1
Estimate variance of sample mean ෝ 𝒚 ഥ 𝒗
=
𝒔𝟐 𝑵−𝒏 𝒏 𝑵
Relative standard error of estimate = 𝑅𝑆𝐸
(𝑦) ത 𝑉 𝑦ത
× 100
Bound on the error of estimate for Mean B = 2 𝑣ො 𝑦ത
Estimate population Total τො = Nതy Estimate variance of Total ෝ 𝝉ො = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത 𝒗 Relative standard error of estimate = 𝑅𝑆𝐸
(ො𝜏) 𝑉 𝜏ො
× 100
Bound on the error of estimate B
= 2 𝑣ො 𝜏Ƹ
Estimate population proportion P = 𝑷
σ 𝑦𝑖 𝑛
= (or) 𝑷
𝑥 𝑛
Estimated variance of 𝑷 ෝ 𝑷 𝒗
=
ෝ𝒒 ෝ 𝑵−𝒏 𝒑 𝒏−𝟏 𝑵
Relative standard error of estimate = 𝑅𝑆𝐸
(𝑃) 𝑉 𝑃
× 100
Bound on the error of estimate B
=2
𝑣ො 𝑃
A simple random sample of n = 9 hospital records is drawn to estimate the average amount of money due on N = 484 open accounts. The sample values for these nine records are listed in table . Estimate the average and total amount outstanding, and place a bound on your error of estimations.
Y1
33.50
Y2
32.00
Y3
52.00
Y4
43.00
Y5
40.00
Y6
41.00
Y7
45.00
Y8
42.50
Y9
39.00
y
y2
33.50
1,222.25
32.00
1,024.00
52.00
2,704.00
43.00
1,849.00
40.00
1,600.00
41.00
1,681.00
45.00
2,025.00
42.50 39.00 σ 𝑦𝑖 = 368
y
1,806.25 2 i
15,332.50
1,521.00
Estimated population mean σ𝑦 =𝒚 ഥ ഥ= 𝑖 𝒀
𝑛
= $ 40.89 Sample Variance
s2 =
σ 𝑦𝑖2 −𝑛𝑦ത 2 𝑛−1
=
=
15332.50−9 40.89 2 9−1 15332.50−15047.9289 8
= 35.57
Estimate variance of sample mean yത V
S2 N−n = n N 35.57 484−9 = 9 484
= 3.95 × 0.98 = 3.87
Bound on the error of estimation for mean, B = 2 3.87 = $ 3.93 To summarize, the estimate of the mean amount of money owed per account, , is = $ 40.89. Although we cannot be certain how close is to , we are reasonably confident that the error of estimation is less than $ 3.93.
Estimate population Total τො = Nതy = (484) (40.89) = $ 19790.76 Estimate variance of Total ෝ 𝝉ො = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത 𝒗 = 484 2 3.87 = 906570.72 Bound on the error of estimation for Total B = 2 vො 𝜏Ƹ = 2 906570.72 = $ 1904.28 To summarize, the estimate of the total amount of money owed per account, , is = $ 19790.76. Although we cannot be certain how close , is to , we are reasonably confident that the error of estimation is less than $ 1904.28.
A training company wants to claim an improvement in sales performance for people who attend its training courses. A simple random sample of 10 staff was interviewed from 100 staff .The observed data are as follows: 38 45 55 25 48 50 70 20 40 45
(a) Estimate the average and total amount of improvement in sales performance and place a bound on the error of estimation. (b) Estimate the proportion of staff whose improvement in sales performance is over 40. Solution n = 10, N = 100
𝒚𝒊
𝑦𝑖2
38
1444
45
2025
55
3025
25
625
48
2304
50
2500
70
4900
20
400
40
1600
45
2025
σ 𝑦𝑖 = 436
σ 𝑦𝑖2 = 20848
Estimate of population mean 𝑦ത
=
σ 𝑦𝑖 𝑛
=
436 10
= 43.6 scores
Estimate variance of 𝑦, ത 𝑣ො 𝑦ത
=
𝑠2
=
𝑠2 𝑁−𝑛 𝑛 𝑁 σ 𝑦𝑖2 −𝑛𝑦ത 2 20848−10 43.6 2 = = 204.2667 𝑛−1 10−1 204.2667 100−10 = 20.4267 × 0.9 = 10 100
𝑣ො 𝑦ത = 18.3840 Bound on the error of estimate
B
= 2 𝑣ො 𝑦ത = 2 18.3840 = 8.5753 scores
To summarize, the estimate of the average amount of improvement in sales performance, , is = 43.6 scores. Although we cannot be certain how close is to , we are reasonably confident that the error of estimation is less than 8.5753 scores. Estimate of population total 𝜏Ƹ = N𝑦ത = 100× 43.6 = 4360 scores Estimate variance of 𝑦, ത 𝑣ො 𝜏Ƹ = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത = 100 2 18.3840 = 183840 scores
To summarize, the estimate of the total amount of improvement in sales performance, , is = 4360 scores. Although we cannot be certain how close , is to , we are reasonably confident that the error of estimation is less than 857.5313 scores. x = the Staff member whose improvement in sales performance is over 40 =6 Estimate of population proportion 𝑝Ƹ
𝑥
6
𝑛
10
= =
= 0.6
Estimate variance of 𝑝,Ƹ 𝑣ො 𝑝Ƹ
=
𝑝𝑞
𝑁−𝑛
𝑛−1
𝑁
𝑞 = 1-p = 1- 0.6 = 0.4 𝑣ො 𝑝Ƹ
=
0.6×0.4
100−10
10−1
100
= 0.0267 × 0.9 = 0.0240
Bound on the error of estimate B = 2 𝑣ො 𝑝Ƹ = 2 0.0240 = 0.3098 Thus we estimate that 0.6 (60 %) of improvement in sales performance is over 40, with a bound on the error of estimation B= 0.3098 (30.98%).
Sample Size for Mean, Total n= where,
D=
𝐵2 4
D=
𝑁𝜎 2 𝑁−1 𝐷+ 𝜎2
, for mean 𝐵2 4𝑁
, for total
Sample Size for Proportion
Where ,
n=
𝑁𝑝𝑞 𝑁−1 𝐷+𝑝𝑞
D=
𝐵2 4
Example 2.4 An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration. Obviously it would be timeconsuming and tedious to weight each bird. Therefore, determine the number of chicks to be sampled in this study in order to estimate with a bound on he error of estimation equal to 1000 grams. Many similar studies on chick nutrition have been run in the past. Using data from these studies, the investigator found that 2, the population variance, was approximately equal to 36.00 grams. Determine the required sample size.
Solution 2 = 36.00, N = 1000 chicks, B = 1000 grams D=
𝐵2 4𝑁2
n= =
(1000)2
= 4(1000)2 = 0.25
𝑁𝜎 2 𝑁−1 𝐷+ 𝜎 2
(1000)(36)2 1000−1 (0.25)+ (36)2
= 125.98 ~ 126 chicks ∴ The require sample size is 126 chicks
An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration. Obviously it would be time-consuming and tedious to weight each bird. Therefore, determine the number of chicks to be sampled in this study in order to estimate with a bound on he error of estimation equal to 1000 grams. Many similar studies on chick nutrition have been run in the past. Using data from these studies, the investigator found that 2, the population variance, was approximately equal to 36.00 grams. Determine the required sample size.
Solution N = 2000, B = 0.05, p = 0.5 q = 1-p = 1-0.5 = 0.5 D= n= = =
𝐵2 4
(0.05)2 = 4 𝑁𝑝𝑞 𝑁−1 𝐷+𝑝𝑞
(200)(0.5)(0.5) 1999 0.000625 +(0.5)(0.5) 500 1.499
=333.56 Students ∴ The require sample size is 126 chicks
What is Stratified Random Sampling? A stratified random sample is one obtained by separating the population elements into nonoverlapping groups, called strata, and then selecting a simple random sample from each stratum. Estimator of the population Mean, Total and Proportion Estimated of the population mean 𝑦ത𝑠𝑡
=
1 𝑁
σ 𝑁𝑖 𝑦ത𝑖
Estimated variance of 𝑦ത𝑠𝑡 𝑣ො 𝑦ത𝑠𝑡
=
1 𝑁
𝑁𝑖 −𝑛𝑖 𝑁𝑖
σ 𝑁𝑖2
𝑠𝑖2 𝑛𝑖
Relative standard error of estimate 𝑅𝑆𝐸
=
𝑦ത𝑠𝑡 𝑉 𝑦ത𝑠𝑡
× 100
Bound on the error of estimation B
= 2 𝑣ො 𝑦ത𝑠𝑡
Estimated of the population total 𝜏 = 𝑁 𝑦ത𝑠𝑡 Estimated variance of 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑁 2 𝑣ො 𝜏Ƹ𝑠𝑡 Relative standard error of estimate
𝑅𝑆𝐸
=
𝑣ො 𝜏ො 𝑠𝑡 𝜏ො 𝑠𝑡
× 100
Bound on the error of estimation B
= 2 𝑣ො 𝜏Ƹ 𝑠𝑡
Estimated of the population proportion 1 𝑃𝑠𝑡 = 𝑁 σ 𝑁𝑖 𝑝Ƹ 𝑖 𝐬𝐭 Estimated variance of 𝐏 1 N −n vො pො st = σ Ni i i pො i qො i N
ni −1
Relative standard error of estimate 𝑅𝑆𝐸
=
𝑣ො 𝑝ො𝑠𝑡 𝑝ො𝑠𝑡
× 100
Bound on the error of estimation B = 2 𝑣ො 𝑝Ƹ 𝑠𝑡
Example 3.1 The following data in the table, obtained from sampling 18 laborer , 10 technicians, 2 administrators and N1=132, N2=92, N3=27. Stratum
Population
sizes
Sample sizes
Sample
Sample
mean
variance
I
132
18
8.8333
81.5594
II
92
10
6.7
50.4556
III 27 2 4.5 24.5 Using the data, estimate the average and total of man hours Lost per employee.
Solution 𝑠𝑖2
Stratum
𝑁𝑖
𝑛𝑖
𝑦ത𝑖
I
132
18
8.8333
II
92
10
6.7
50.4556
III
27
2
4.5
24.5
251
30
81.5594 1165.9956
𝑁 1 (1903.8956) 251
= = 7.5852 hours Estimate the variance of 𝑦ത𝑠𝑡 1 𝑁 −𝑛 𝑣ො 𝑦ത𝑠𝑡 = 2 σ 𝑁𝑖 𝑖 𝑖 𝑠𝑖2 =
𝑛𝑖
1 (114515.7541) 251 2
= 1.8177
𝑁𝑖 − 𝑛𝑖 𝑛𝑖
𝑁𝑖
𝑁𝑖 − 𝑛𝑖 2 𝑠𝑖 𝑛𝑖
6.3333
68183.2995
616.4
8.2
38063.7046
121.5
12.5
8268.75
1903.8956
Estimate the population mean 1 𝑦ത𝑠𝑡 = σ 𝑁𝑖 𝑦ത𝑖
𝑁
𝑁𝑖 𝑦ത𝑖
114515.7541
Bound on the error of estimate B = 2 𝑣ො 𝑦ത𝑠𝑡 = 2 1.8177 = 2.6964 hours ∴ Thus we estimate the average of man-hours lost per employee to be 𝑦ത𝑠𝑡 = 7.5852 hours. The error of estimation to be less than B = 2.6964 hours. Estimate the population total 𝜏Ƹ 𝑠𝑡 = 𝑁 𝑦ത𝑠𝑡 = 251 × 7.5852 = 1903.8852 hours Estimate the variance of 𝜏Ƹ 𝑠𝑡 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑁 2 𝑣ො 𝑦ത𝑠𝑡 = 251 2 (1.8177) = 114516.9177 Bound on the error of estimate B = 2 𝑣ො 𝜏Ƹ 𝑠𝑡 = 2 114516.9177 = 676.8070 hours ∴ Thus we estimation the average of man-hours lost per employee to be 𝜏Ƹ 𝑠𝑡 = 1903.8852 hours. The error of estimation to be less than B = 676.8070 hours.
Example 3.2 The advertising firm wanted to estimate the proportion of house-holds in the country of example that view show X. The country is divided into three strata, town A, town B, and the rural area. The strata contain N1 = 155, N2 = 62, and N3 = 93 households, respectively. A stratified random sample of n = 40 households is chosen with proportional allocation. In other words, a simple random sample is taken from each stratum; the sizes of the samples are n1 = 20, n2 = 8, and n3 = 12. Interviews are conducted in the 40 sampled households; results are shown in table. Estimate the proportion of households viewing show X and place a bound on the error of estimation.
Number of Stratum
Population size
Sample Size
Households Viewing Show, X
1
N1=155
n1 = 20
16
2
N2=62
n2 = 8
2
3
N3=93
n3 = 12
6
Solution Stratum
𝑁𝑖
𝑛𝑖
𝑥𝑖
𝑝Ƹ 𝑖
𝑞ො𝑖
𝑝Ƹ 𝑖 𝑞ො𝑖
𝑁𝑖 𝑝Ƹ 𝑖 𝑁𝑖 − 𝑛𝑖 𝑁𝑖 − 𝑛𝑖 𝑛𝑖 − 1 𝑁𝑖 𝑛𝑖 − 1 𝑝Ƹ 𝑖 𝑞ො
Town A
155
20
16
0.8
0.2
0.16
7.1053
176.2114
124
Town B
62
8
2
0.25
0.75
0.1875
7.7143
89.6787
15.5
Rural Area
93
12
6
0.5
0.5
0.25
7.3636
171.2037
46.5
310
40
437.0938
186
SYSTEMATIC SAMPLING What is systematic sampling? A sample obtained by randomly selecting one element from the first k elements in the frame and every kth element thereafter is called a one-ink systematic sample. Estimator of the population Mean, Total and Proportion N = 𝑛𝑘 For mean, Estimated of the population mean 𝝁 σ𝑛𝑖=1 𝑦𝑖 yത 𝑠𝑦 = 𝑛 ഥ𝒔𝒚 Estimated variance of 𝒚 2
𝑦ത𝑠𝑦 = 𝑁−𝑛 𝑠 V 𝑁 𝑛 Relative standard error of estimation 𝑅𝑆𝐸
=
𝑣ො 𝑦ത𝑠𝑦 𝑦ത𝑠𝑦
× 100
Bound on the error of estimation 𝐵 = 2 𝑉 𝑦ത𝑠𝑦
For total, Estimated population total 𝜏Ƹ sy = 𝑁𝑦തsy Estimated variance of 𝝉ො 𝑉 𝜏Ƹ sy = 𝑁 2 𝑉 𝑦ത𝑠𝑦 Relative standard error of estimation 𝑅𝑆𝐸
𝑣ො 𝜏ො sy
=
𝜏ො sy
× 100
Bound on the error of estimation 𝐵 = 2 𝑉 𝜏Ƹ sy
Estimated of the population proportion 𝒑 𝑝Ƹ 𝑠𝑦 = 𝑦ത𝑠𝑦 =
σ𝑛 𝑖=1 𝑦𝑖 𝑛
ෝ𝒔𝒚 Estimated variance of 𝒑 𝑝Ƹ 𝑠𝑦 𝑞ො𝑠𝑦 𝑁 − 𝑛 𝑉 𝑝Ƹ𝑠𝑦 = 𝑛−1 𝑁 where 𝑞ො𝑠𝑦 = 1 − 𝑝Ƹ 𝑠𝑦 Relative standard error of estimation 𝑅𝑆𝐸
=
𝑣ො 𝑝ො𝑠𝑦 𝑝ො𝑠𝑦
× 100
Bound on the error of estimation B=2 𝑉 𝑝Ƹ 𝑠𝑦
Example 4.3 A 1-in-6 systematic sample is obtained from a voter registration list to estimate the proportion of voters in favor of the proposed bond issue. Several different random starting points are used to insure that the results of the sample are not affected by periodic variation in the population. The coded results of this pre-selection survey are as shown in the table. Estimate p, the proportion of the 5775 registered voters in favor of the proposed bond issue (N=5775). Place a bound on the error of estimation. Voter 4 10 16 . . 5760 5766 5772
Response 1 0 1 . . 0 0 1 962
𝑦𝑖 = 652 𝑖=1
Solution
Estimated population proportion σ𝑛𝑖=1 𝑦𝑖 652 𝑝Ƹ𝑠𝑦 = = = 0.678 962 962 Estimate variance of 𝑝Ƹ𝑠𝑦 𝑝Ƹ𝑠𝑦 𝑞ො𝑠𝑦 𝑁 − 𝑛 𝑉 𝑝Ƹ𝑠𝑦 = 𝑛−1 𝑁 0.678 0.322 5775 − 962 = = 0.00019 961 5775 Bound on the error of estimation 𝐵 = 2 𝑉 𝑝Ƹ𝑠𝑦 = 2 × 0.0138 = 0.0275 Thus we estimate 0.678 (67.8%) of the registered voters favor the proposed bond issue. We are relatively confident that the error of estimation is less than 0.0275 (2.75%).
Example A group of guidance counselors is concerned about the average yearly tuition for out-of-state students in 371 junior colleges. From an alphabetical list of these colleges, a 1-in-7 systematic sample is drawn. Data concerning out-of –state tuition expenses for an academic year (September to June) are obtained for each college sample. Use the following data summary: σ 𝑦𝑖 = $ 11950
σ 𝑦𝑖2 = $2731037
(a) Estimate the mean yearly tuition and place a bound on the error of estimation.
Solution k = 7 (1 –in – 7 systematic sample) N = 371, σ 𝑦𝑖 = 11950, σ 𝑦𝑖2 = 2731037 N = nk 𝑁 371 n= = = 53 𝑘 7 (a) Estimate the population mean σ𝑦
11950
𝑦ത𝑠𝑦 = 𝑖 = = 225.47 students 𝑛 53 Estimate variance of 𝑦ത𝑠𝑦 𝑣ො 𝑦ത𝑠𝑦 𝑠
2
𝑣ො 𝑦ത𝑠𝑦
= = =
𝑠2
𝑁−𝑛
𝑛 𝑁 σ 𝑦𝑖2 −𝑛𝑦ത 2 𝑛−1 705.5922 53
=
2731037−53 225.47 2 371−53
53−1
= 705.5922
371
= 13.3131× 0.8571 = 11.4107 Bound on the error of estimation B
= 2 𝑣ො 𝑦ത𝑠𝑦 = 2 11.4107 = 6.7559 students
Thus we estimate the mean yearly tuition to be 𝑦ത𝑠𝑦 =225.47 students. The error of estimation should be less than B = 6.7559 students.
CLUSTER SAMPLING What is Cluster Sampling? A cluster sample is a simple random sample in which each sampling unit is a collection, or cluster, of elements. Estimated of the population mean 𝜇 σ𝑛 𝑖=1 𝑦𝑖
𝑦ത𝑐𝑙 = σ𝑛
𝑖=1 𝑚𝑖
Estimated variance of 𝑦ത𝑐𝑙 ത 𝑐𝑙 𝑚𝑖 𝑁−𝑛 σ𝑛 𝑖=1 𝑦𝑖 −𝑦 𝑉 𝑦ത𝑐𝑙 = ഥ 𝑁𝑛 𝑀²
2
𝑛−1
Where, 2 σ 2 σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖 2 = σ 𝑦𝑖2 −2𝑦ത𝑐𝑙 σ 𝑦𝑖 𝑚𝑖 + 𝑦ത𝑐𝑙 𝑚𝑖 Relative standard error of estimation 𝑅𝑆𝐸
=
𝑣ො 𝑦ത 𝑐𝑙 𝑦ത 𝑐𝑙
× 100
Bound on the error of estimation: 𝐵 = 2 𝑉 𝑦ത𝑐𝑙
Estimator of the population total 𝝉 𝜏Ƹ cl = M𝑦ത𝑐𝑙 Estimated variance of 𝝉ො 𝐜𝐥 𝑉 𝜏Ƹ cl = 𝑉 M𝑦ത𝑐𝑙 = 𝑀2 𝑉 𝑦ത𝑐𝑙 Bound on the error of estimation 𝐵 = 2 𝑉 𝜏Ƹ cl Estimator of the population proportion 𝝆 σ𝑛 𝑖=1 𝑦𝑖
𝑝Ƹcl = σ𝑛
𝑖=1 𝑚𝑖
ෝ𝐜𝐥 : Estimated variance of 𝒑 ොcl 𝑚𝑖 𝑁−𝑛 σ𝑛 𝑖=1 𝑦𝑖 −𝑝 𝑉 𝑝Ƹ cl = ഥ 𝑁𝑛𝑀²
2
𝑛−1
Bound on the error of estimation 𝐵 = 2 𝑉 𝑝Ƹ cl
N = the number of clusters in the population n = the number of clusters selected in a simple random sample mi = the number of elements in cluster i, i=1,-----.N 𝑚 ഥ 𝑛 1 = 𝑚𝑖 = 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑛 𝑖=1
𝑀
𝑛
= 𝑚𝑖 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑖=1 ഥ=𝑀 𝑀 𝑁
= 𝑡ℎ𝑒 𝑎𝑣𝑒𝑎𝑟𝑔𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛, ഥ must be estimated since, M is not known,𝑀 𝑦𝑖 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑖 𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
Example 5.2 Interviews are conducted in each of the 25 blocks sampled in example. The data on income for adult males are presented in table. Use the data to estimate the average and total income per adult male in the city and place a bound on the error of estimation. Assume N = 415, M = 2500. σ 𝑚𝑖 = 151, σ 𝑦𝑖 = 1,329,000, σ 𝑚𝑖2 = 1.047, σ 𝑦𝑖2 = 82,039,000,000, σ 𝑚𝑖 𝑦𝑖 = 8,403,000 Solution N = 415, M = 2500, n = 25 σ 𝑚𝑖 = 151, σ 𝑦𝑖 = 1,329,000, σ 𝑚𝑖2 = 1.047, σ 𝑦𝑖2 = 82,039,000,000, σ 𝑚𝑖 𝑦𝑖 = 8,403,000 Estimate of the population mean σ𝑛𝑖=1 𝑦𝑖 1,329,000 𝑦ത𝑐𝑙 = 𝑛 = = $ 8,801 σ𝑖=1 𝑚𝑖 151 Estimated Variance of 𝑉 𝑦ത 𝑁 − 𝑛 σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖 2 𝑉 𝑦ത𝑐𝑙 = ഥ2 𝑛−1 𝑁𝑛𝑀
σ𝑛𝑖=1 𝑚𝑖 151 𝑚 ഥ= = = 6.04 𝑛 25 𝑛
𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖
𝑛
2
𝑖=1
15
𝑛
= 𝑦𝑖2 − 2𝑦ത 𝑦𝑖 𝑚𝑖 + 𝑦ത𝑐𝑙 2 𝑚𝑖2 𝑖=1
𝑖=1
𝑖=1
25
𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖
2
= 82,039,000,000 − 2 8,801 8,403,000 + 8,801
2
1,047
𝑖=1
𝑉 𝑦ത𝑐𝑙
= 15,227,502,247 415−25 = 415 25 6.04 ²
15,227,502,247 24
= 653,785 Bound on the error of estimation
𝑦ത𝑐𝑙 = 2 𝑉 𝑦ത𝑐𝑙
= $ 1.617
Thus we estimate the average income per adult male to be of estimation should be less than B = $ 1.617.
𝑦ത𝑐𝑙 = $ 8,801. The error
Example 5.3 Use the data in example (5.2) to estimate the total income of all adult males in the city, and place a bound on the error of estimation .There are 2,500 adult males in the city. Solution M = 2500, 𝑦ത𝑐𝑙 = $ 8,801, 𝑉 𝑦ത𝑐𝑙 = 653,785 Estimator of the population total 𝝉 𝜏Ƹ cl =M𝑦ത𝑐𝑙 = 2,500(8,801) = $ 22,002,500 Estimated variance of 𝝉ො 𝐜𝐥 𝑉 𝜏Ƹ cl = 𝑉 M𝑦ത𝑐𝑙 = 𝑀2 𝑉 𝑦ത𝑐𝑙 = (2500)2 (653,785) = 408615625 Bound on the error of estimation 𝐵 = 2 𝑉 𝜏Ƹ cl = 2 408615625 = $ 40428.486 Thus we estimate the total income per adult male to be 𝜏Ƹ cl = $ 22,002,500. The error of estimation should be less than B = $ 40428.486.
Example 5.5 In addition to the information on income, the adult males in the sample survey of example 5.2 are asked whether they rent or own their homes. The results are given in table. Use the data in table to estimate the proportion of adult males in the city who rent their homes .Place a bound on the error of estimation. 2 25 2 25 25 σ25 𝑖=1 𝑎𝑖 = 262 , σ𝑖=1 𝑚𝑖 = 1047, σ𝑖=1 𝑎𝑖 𝑚𝑖 = 511, σ𝑖=1 𝑚𝑖 = 15 , σ25 𝑖=1 𝑎𝑖 = 72 Solution N = 415, M = 2500, n = 25 2 25 2 25 25 σ25 𝑖=1 𝑎𝑖 = 262, σ𝑖=1 𝑚𝑖 = 1047, σ𝑖=1 𝑎𝑖 𝑚𝑖 = 511, σ𝑖=1 𝑚𝑖 = 151, σ25 𝑖=1 𝑎𝑖 = 72 Estimate of the population proportion 𝑝Ƹ cl σ𝑛𝑖=1 𝑎𝑖 72 𝑝Ƹ cl = 𝑛 = = 477 σ𝑖=1 𝑚𝑖 151 Estimated Variance 𝑝Ƹ cl 𝑁 − 𝑛 σ𝑛𝑖=1 𝑎𝑖 − 𝑝Ƹ cl 𝑚𝑖 2 𝑉 𝑝Ƹ cl = ഥ 𝑛−1 𝑁𝑛𝑀²
𝑛
𝑎𝑖 − 𝑝Ƹ cl 𝑚𝑖 𝑖=1
𝑛 2
𝑛
𝑛
= 𝑎𝑖2 − 2𝑝Ƹ cl 𝑎𝑖 𝑚𝑖 + 𝑝Ƹ cl 2 𝑚𝑖2 𝑖=1
𝑖=1
𝑖=1
= 262 − 2 0.477 511 + 0.477 𝑛 σ 151 𝑖=1 𝑚1 = 𝑚 ഥ 𝑀 ഥ= = = 6.04 𝑛 25
2
1047 = 12.729
415 − 25 (12.729) = 0.00055 415 25 6.04 ²(24) Bound on the error of estimation 𝐵 = 2 𝑉 𝑝Ƹ cl = 2 0.00055 = 0.0469 Thus the best estimate of the proportion of adult males who rent homes is .477. The error of estimation should be less than B = 0.0469.
𝑉 𝑝Ƹ cl =