C Stats 2104 Online Flipbook PDF

CSTATS2104 Sample Survey Methodology I

22 downloads 107 Views 4MB Size

Report

DOWNLOAD PDF

Recommend Stories

pdf

2104, de 30 de julio de 2014

PDF Created with deskpdf PDF Writer - Trial ::

2009_028.pdf

Online

PDF Created with deskpdf PDF Writer - Trial ::

Online

online

Story Transcript

Sample Survey Methodology I

Presented by Daw Nu Nu Lwin Assistant Lecturer Department of Statistics Co-operative University, Sagaing

CO-OPERATIVE UNIVERSITY, SAGAING DEPARTMENT OF STATISTICS

SAMPLE SURVEY METHODOLOGY C Stats 2104 Second Year (Second Semester) Reference • Cochran, W.G., (1977), Sampling Techniques, 3rd edition, JohnWiley. • Scheaffer, R.C, Mendenhall, W. and ott,L., (1990). Elementary Survey Sampling, (2nd edition). Duxbury Press.

• Introduction  Type of research  Sources of data  Method of data collection  Questionnaire Design  Criteria and Choice of Sample Design • Simple Random Sampling  What is Simple Random Sampling?  Estimation of a Population Mean, Total and Proportion  Sample Size for Mean, Total and Proportion • Stratified Random Sampling  What is Stratified Random Sampling?  Estimation of a Population Mean, Total and Proportion • Systematic Random Sampling  What is Systematic Random Sample?  Estimation of a Population Mean, Total and Proportion • Cluster Random Sampling  What is Cluster Random Sample?  Estimation of a Population Mean, Total and Proportion

Type of Research  The topic for the research study is sales promotion strategy and the nature of the topic is theoretical and descriptive.  So the conduct the research study the type of research suitable is descriptive research only.  The data are collected from sales records, dealers, customers and salesmen of the companies performing in FMCG (Fast- Moving Consumer Goods) sector.  The descriptive research has met the requirement of research study.

Classification of data

Type

Primary data

Secondary data

o Data that have been collected from first-hand

experiences is known as primary data. It has more reliable, authentic and not been published anywhere. o Primary data has not been changed or altered by human being, therefore its validity is greater than secondary data.

 Targeted issue are addressed  Data interpretation is better  Efficient spending for information  Decency of data  Proprietary issue  Addresses specific research issue  Greater control

 High cost  Time consuming  Inaccurate feed-backs  More number of resources is required

 Secondary data are those that have been collected

by others.  These are usually in journals, periodicals, research publication, official record etc.  Secondary data may be available in the published or unpublished form. When it is not possible to collect the data by primary method, the investigator go for secondary method.  This data collected for some purpose other than the problem at hand.

Secondary data sources (i) Internal sources (ii) External sources (i) Internal sources Internal sources of secondary data are usually for marketing application Sale records  Marketing Activity  Cost information  Distributor reports and feedback  Customer feedback

External sources of secondary data are usually for Financial application Journal  Books  Magazines  Newspaper  Libraries  The internet

 Ease of access  Low cost to acquire  Clarification of research question  May answer research question

 Not specific to researcher’s needs

 Incomplete information  Not timely Reference Amogh Kadam Rizwan Shaikh Prathmesh Parab

(i) Direct Personal Interview (ii) Questionnaires Sent Though Mail (iii) Interviews by Enumerators (iv) Telephone Interview (v) Questionnaire Vs. Schedule

(i) Direct Personal Interview  widely used in social and economic surveys.  Investigator personally contacts the respondents and can obtain the required data fairly accurately.  Interviewer asks the questions pertaining to the objectives of survey and the information.  Response rate is usually good and the information is more reliable and correct.  However, more expenses and time is required to contact the respondents.

(ii) Questionnaires Sent Though Mail  investigator prepares questionnaires and sends it to the respondents.  Respondents are requested to complete the questionnaires and return them to the investigator by a specified date.  Suitable where respondents are spread over a wide area.  less expensive, normally it has poor response rate.  Adopted only where the respondents are literate and can understand the questions.  Questionnaire is drafted, and the extent to which willing cooperation of the respondents is secured.  For rural areas, this method has got its obvious limitation and is seldom used.

(iii) Interviews by Enumerators  Involves the appointment of enumerators by surveying agency.  Enumerators go to the respondents, ask them the questions contained in the schedule, and then fill up the responses in the schedule themselves.  For success of this method, the enumerators should be given proper training for soliciting co-operation of the respondents.  The enumerators should be asked to carry with them their identity cards, so that, the respondents are satisfied of their authenticity.  Usefully employed where the respondents to be covered are illiterate. (iv) Telephone Interview  In case the respondents in the population to be covered can be approached by phone, their response to various questions, included in the schedule, can be obtained over phone. If long distance calls are not involved and only local calls are to be made, this mode of collecting data may also prove quite economical. It is, however, desirable that interviews conducted over the phone are kept short so as to maintain the interest of the respondents.

(v) Questionnaire Vs. Schedule  In the questionnaire approach, the informants or respondents are asked pre-specified questions and their replies to these questions are recorded by themselves or by investigators.  In this case, the investigator is not supposed to influence the respondents. This approach is widely used in main enquiries.  In the schedule approach , the exact from of the questions to be asked are not given and the task of questioning and soliciting information is left to the investigator, who backed by the training and instructions has to use his ingenuity in explaining the concepts and definitions to the informant for obtaining reliable information.

Determining the type of question (i) Open-ended questions (ii) Close-ended questions (i) Open-ended questions This gives the respondents the ability to respond in their own words. Some illustrations of this type are listed below:

What is your age? How much orange juice does this bottle contain? Which is your favorite your series? Why do you smoke Gold Flakes cigarettes? I like Nescafe because ----------------. My career goal is to ---------------. I think hybrid cars are ---------------.

This allows the subject to choose one of the given alternatives.  Dichotomous question

 Multiple Choice

 Scales

Figure ( 1.1 ) Types of question response option Question Content

Open - ended

Closed-ended

Dichotomous

Multiple Responses

Scales

 Are you diabetic?

Yes / No  What kind of cola do you drink? Normal / diet  Your working hours in the organization are fixed / flexible

For example  Will you consider selling organic food products in your store? • Definitely not in the next one year • Probably not in the next one year • Undecided • Probably in the next one year • Definitely in the next one year

( 1 strongly Disagree

5 Strongly agree)

1. The people in my company know their roles very clearly 2.I feel the need for the organization to change 3. Existing systems are very effective

1

2

3

4

5

Figure (1.2) Questionnaire Design Process

Convert the Research Objectives into the information Needed Method of Administering the Questionnaire Content of the Questions Motivating the Respondent to answer Determining Types of Questions

Question Design Criteria Determine the Questionnaire Structure Physical Presentation of Questionnaire Pilot Testing the Questionnaire Administering the Questionnaire

These six criteria are: • Population specification • Unit of analysis specification • Sample size determination • Selection procedure description • Response rate and nonresponse treatment • Estimation procedures

2.1 What Is Simple Random Sampling? Simple random sampling is a probability sampling procedure that gives every element in the target population, and each possible sample of a given size, an equal chance of being selected. As such, it is an equal probability selection method (EPSEM). 2.2 Estimation of a Population Mean, Total and Proportion Estimate population mean (or) Sample mean σ 𝑦𝑖 ෡ ഥ ഥ= 𝒀=𝒚 𝑛

Sample variance s2

=

σ 𝑦𝑖2 −𝑛𝑦ത 2

𝑛−1

Estimate variance of sample mean ෝ 𝒚 ഥ 𝒗

=

𝒔𝟐 𝑵−𝒏 𝒏 𝑵

Relative standard error of estimate ෢ = 𝑅𝑆𝐸

෡ (𝑦) ത 𝑉 𝑦ത

× 100

Bound on the error of estimate for Mean B = 2 𝑣ො 𝑦ത

Estimate population Total τො = Nതy Estimate variance of Total ෝ 𝝉ො = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത 𝒗 Relative standard error of estimate ෢ = 𝑅𝑆𝐸

෡ (ො𝜏) 𝑉 𝜏ො

× 100

Bound on the error of estimate B

= 2 𝑣ො 𝜏Ƹ

Estimate population proportion P ෡= 𝑷

σ 𝑦𝑖 𝑛

෡= (or) 𝑷

𝑥 𝑛

෡ Estimated variance of 𝑷 ෡ ෝ 𝑷 𝒗

=

ෝ𝒒 ෝ 𝑵−𝒏 𝒑 𝒏−𝟏 𝑵

Relative standard error of estimate ෢ = 𝑅𝑆𝐸

෡ (𝑃) ෠ 𝑉 𝑃෠

× 100

Bound on the error of estimate B

=2

𝑣ො 𝑃෠

A simple random sample of n = 9 hospital records is drawn to estimate the average amount of money due on N = 484 open accounts. The sample values for these nine records are listed in table . Estimate the average and total amount outstanding, and place a bound on your error of estimations.

Y1

33.50

Y2

32.00

Y3

52.00

Y4

43.00

Y5

40.00

Y6

41.00

Y7

45.00

Y8

42.50

Y9

39.00

y

y2

33.50

1,222.25

32.00

1,024.00

52.00

2,704.00

43.00

1,849.00

40.00

1,600.00

41.00

1,681.00

45.00

2,025.00

42.50 39.00 σ 𝑦𝑖 = 368

y

1,806.25 2 i

 15,332.50

1,521.00

Estimated population mean σ𝑦 ෡=𝒚 ഥ ഥ= 𝑖 𝒀

𝑛

= $ 40.89 Sample Variance

s2 =

σ 𝑦𝑖2 −𝑛𝑦ത 2 𝑛−1

=

=

15332.50−9 40.89 2 9−1 15332.50−15047.9289 8

= 35.57

Estimate variance of sample mean ෡ yത V

S2 N−n = n N 35.57 484−9 = 9 484

= 3.95 × 0.98 = 3.87

Bound on the error of estimation for mean, B = 2 3.87 = $ 3.93 To summarize, the estimate of the mean amount of money owed per account, , is = $ 40.89. Although we cannot be certain how close is to , we are reasonably confident that the error of estimation is less than $ 3.93.

Estimate population Total τො = Nതy = (484) (40.89) = $ 19790.76 Estimate variance of Total ෝ 𝝉ො = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത 𝒗 = 484 2 3.87 = 906570.72 Bound on the error of estimation for Total B = 2 vො 𝜏Ƹ = 2 906570.72 = $ 1904.28 To summarize, the estimate of the total amount of money owed per account, , is = $ 19790.76. Although we cannot be certain how close , is to , we are reasonably confident that the error of estimation is less than $ 1904.28.

A training company wants to claim an improvement in sales performance for people who attend its training courses. A simple random sample of 10 staff was interviewed from 100 staff .The observed data are as follows: 38 45 55 25 48 50 70 20 40 45

(a) Estimate the average and total amount of improvement in sales performance and place a bound on the error of estimation. (b) Estimate the proportion of staff whose improvement in sales performance is over 40. Solution n = 10, N = 100

𝒚𝒊

𝑦𝑖2

38

1444

45

2025

55

3025

25

625

48

2304

50

2500

70

4900

20

400

40

1600

45

2025

σ 𝑦𝑖 = 436

σ 𝑦𝑖2 = 20848

Estimate of population mean 𝑦ത

=

σ 𝑦𝑖 𝑛

=

436 10

= 43.6 scores

Estimate variance of 𝑦, ത 𝑣ො 𝑦ത

=

𝑠2

=

𝑠2 𝑁−𝑛 𝑛 𝑁 σ 𝑦𝑖2 −𝑛𝑦ത 2 20848−10 43.6 2 = = 204.2667 𝑛−1 10−1 204.2667 100−10 = 20.4267 × 0.9 = 10 100

𝑣ො 𝑦ത = 18.3840 Bound on the error of estimate

B

= 2 𝑣ො 𝑦ത = 2 18.3840 = 8.5753 scores

To summarize, the estimate of the average amount of improvement in sales performance, , is = 43.6 scores. Although we cannot be certain how close is to , we are reasonably confident that the error of estimation is less than 8.5753 scores. Estimate of population total 𝜏Ƹ = N𝑦ത = 100× 43.6 = 4360 scores Estimate variance of 𝑦, ത 𝑣ො 𝜏Ƹ = 𝑣ො 𝑁𝑦ത = 𝑁 2 𝑣ො 𝑦ത = 100 2 18.3840 = 183840 scores

To summarize, the estimate of the total amount of improvement in sales performance, , is = 4360 scores. Although we cannot be certain how close , is to , we are reasonably confident that the error of estimation is less than 857.5313 scores. x = the Staff member whose improvement in sales performance is over 40 =6 Estimate of population proportion 𝑝Ƹ

𝑥

6

𝑛

10

= =

= 0.6

Estimate variance of 𝑝,Ƹ 𝑣ො 𝑝Ƹ

=

𝑝𝑞

𝑁−𝑛

𝑛−1

𝑁

𝑞 = 1-p = 1- 0.6 = 0.4 𝑣ො 𝑝Ƹ

=

0.6×0.4

100−10

10−1

100

= 0.0267 × 0.9 = 0.0240

Bound on the error of estimate B = 2 𝑣ො 𝑝Ƹ = 2 0.0240 = 0.3098 Thus we estimate that 0.6 (60 %) of improvement in sales performance is over 40, with a bound on the error of estimation B= 0.3098 (30.98%).

Sample Size for Mean, Total n= where,

D=

𝐵2 4

D=

𝑁𝜎 2 𝑁−1 𝐷+ 𝜎2

, for mean 𝐵2 4𝑁

, for total

Sample Size for Proportion

Where ,

n=

𝑁𝑝𝑞 𝑁−1 𝐷+𝑝𝑞

D=

𝐵2 4

Example 2.4 An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration. Obviously it would be timeconsuming and tedious to weight each bird. Therefore, determine the number of chicks to be sampled in this study in order to estimate  with a bound on he error of estimation equal to 1000 grams. Many similar studies on chick nutrition have been run in the past. Using data from these studies, the investigator found that 2, the population variance, was approximately equal to 36.00 grams. Determine the required sample size.

Solution 2 = 36.00, N = 1000 chicks, B = 1000 grams D=

𝐵2 4𝑁2

n= =

(1000)2

= 4(1000)2 = 0.25

𝑁𝜎 2 𝑁−1 𝐷+ 𝜎 2

(1000)(36)2 1000−1 (0.25)+ (36)2

= 125.98 ~ 126 chicks ∴ The require sample size is 126 chicks

An investigator is interested in estimating the total weight gain in 0 to 4 weeks for N = 1000 chicks fed on a new ration. Obviously it would be time-consuming and tedious to weight each bird. Therefore, determine the number of chicks to be sampled in this study in order to estimate  with a bound on he error of estimation equal to 1000 grams. Many similar studies on chick nutrition have been run in the past. Using data from these studies, the investigator found that 2, the population variance, was approximately equal to 36.00 grams. Determine the required sample size.

Solution N = 2000, B = 0.05, p = 0.5 q = 1-p = 1-0.5 = 0.5 D= n= = =

𝐵2 4

(0.05)2 = 4 𝑁𝑝𝑞 𝑁−1 𝐷+𝑝𝑞

(200)(0.5)(0.5) 1999 0.000625 +(0.5)(0.5) 500 1.499

=333.56 Students ∴ The require sample size is 126 chicks

What is Stratified Random Sampling? A stratified random sample is one obtained by separating the population elements into nonoverlapping groups, called strata, and then selecting a simple random sample from each stratum. Estimator of the population Mean, Total and Proportion Estimated of the population mean 𝑦ത𝑠𝑡

=

1 𝑁

σ 𝑁𝑖 𝑦ത𝑖

Estimated variance of 𝑦ത𝑠𝑡 𝑣ො 𝑦ത𝑠𝑡

=

1 𝑁

𝑁𝑖 −𝑛𝑖 𝑁𝑖

σ 𝑁𝑖2

𝑠𝑖2 𝑛𝑖

Relative standard error of estimate ෢ 𝑅𝑆𝐸

=

෡ 𝑦ത𝑠𝑡 𝑉 𝑦ത𝑠𝑡

× 100

Bound on the error of estimation B

= 2 𝑣ො 𝑦ത𝑠𝑡

Estimated of the population total  𝜏 = 𝑁 𝑦ത𝑠𝑡 Estimated variance of 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑁 2 𝑣ො 𝜏Ƹ𝑠𝑡 Relative standard error of estimate

෢ 𝑅𝑆𝐸

=

𝑣ො 𝜏ො 𝑠𝑡 𝜏ො 𝑠𝑡

× 100

Bound on the error of estimation B

= 2 𝑣ො 𝜏Ƹ 𝑠𝑡

Estimated of the population proportion 1 𝑃෠𝑠𝑡 = 𝑁 σ 𝑁𝑖 𝑝Ƹ 𝑖 ෡𝐬𝐭 Estimated variance of 𝐏 1 N −n vො pො st = σ Ni i i pො i qො i N

ni −1

Relative standard error of estimate ෢ 𝑅𝑆𝐸

=

𝑣ො 𝑝ො𝑠𝑡 𝑝ො𝑠𝑡

× 100

Bound on the error of estimation B = 2 𝑣ො 𝑝Ƹ 𝑠𝑡

Example 3.1 The following data in the table, obtained from sampling 18 laborer , 10 technicians, 2 administrators and N1=132, N2=92, N3=27. Stratum

Population

sizes

Sample sizes

Sample

Sample

mean

variance

I

132

18

8.8333

81.5594

II

92

10

6.7

50.4556

III 27 2 4.5 24.5 Using the data, estimate the average and total of man hours Lost per employee.

Solution 𝑠𝑖2

Stratum

𝑁𝑖

𝑛𝑖

𝑦ത𝑖

I

132

18

8.8333

II

92

10

6.7

50.4556

III

27

2

4.5

24.5

251

30

81.5594 1165.9956

𝑁 1 (1903.8956) 251

= = 7.5852 hours Estimate the variance of 𝑦ത𝑠𝑡 1 𝑁 −𝑛 𝑣ො 𝑦ത𝑠𝑡 = 2 σ 𝑁𝑖 𝑖 𝑖 𝑠𝑖2 =

𝑛𝑖

1 (114515.7541) 251 2

= 1.8177

𝑁𝑖 − 𝑛𝑖 𝑛𝑖

𝑁𝑖

𝑁𝑖 − 𝑛𝑖 2 𝑠𝑖 𝑛𝑖

6.3333

68183.2995

616.4

8.2

38063.7046

121.5

12.5

8268.75

1903.8956

Estimate the population mean 1 𝑦ത𝑠𝑡 = σ 𝑁𝑖 𝑦ത𝑖

𝑁

𝑁𝑖 𝑦ത𝑖

114515.7541

Bound on the error of estimate B = 2 𝑣ො 𝑦ത𝑠𝑡 = 2 1.8177 = 2.6964 hours ∴ Thus we estimate the average of man-hours lost per employee to be 𝑦ത𝑠𝑡 = 7.5852 hours. The error of estimation to be less than B = 2.6964 hours. Estimate the population total 𝜏Ƹ 𝑠𝑡 = 𝑁 𝑦ത𝑠𝑡 = 251 × 7.5852 = 1903.8852 hours Estimate the variance of 𝜏Ƹ 𝑠𝑡 𝑣ො 𝜏Ƹ 𝑠𝑡 = 𝑁 2 𝑣ො 𝑦ത𝑠𝑡 = 251 2 (1.8177) = 114516.9177 Bound on the error of estimate B = 2 𝑣ො 𝜏Ƹ 𝑠𝑡 = 2 114516.9177 = 676.8070 hours ∴ Thus we estimation the average of man-hours lost per employee to be 𝜏Ƹ 𝑠𝑡 = 1903.8852 hours. The error of estimation to be less than B = 676.8070 hours.

Example 3.2 The advertising firm wanted to estimate the proportion of house-holds in the country of example that view show X. The country is divided into three strata, town A, town B, and the rural area. The strata contain N1 = 155, N2 = 62, and N3 = 93 households, respectively. A stratified random sample of n = 40 households is chosen with proportional allocation. In other words, a simple random sample is taken from each stratum; the sizes of the samples are n1 = 20, n2 = 8, and n3 = 12. Interviews are conducted in the 40 sampled households; results are shown in table. Estimate the proportion of households viewing show X and place a bound on the error of estimation.

Number of Stratum

Population size

Sample Size

Households Viewing Show, X

1

N1=155

n1 = 20

16

2

N2=62

n2 = 8

2

3

N3=93

n3 = 12

6

Solution Stratum

𝑁𝑖

𝑛𝑖

𝑥𝑖

𝑝Ƹ 𝑖

𝑞ො𝑖

𝑝Ƹ 𝑖 𝑞ො𝑖

𝑁𝑖 𝑝Ƹ 𝑖 𝑁𝑖 − 𝑛𝑖 𝑁𝑖 − 𝑛𝑖 𝑛𝑖 − 1 𝑁𝑖 𝑛𝑖 − 1 𝑝Ƹ 𝑖 𝑞ො

Town A

155

20

16

0.8

0.2

0.16

7.1053

176.2114

124

Town B

62

8

2

0.25

0.75

0.1875

7.7143

89.6787

15.5

Rural Area

93

12

6

0.5

0.5

0.25

7.3636

171.2037

46.5

310

40

437.0938

186

SYSTEMATIC SAMPLING What is systematic sampling? A sample obtained by randomly selecting one element from the first k elements in the frame and every kth element thereafter is called a one-ink systematic sample. Estimator of the population Mean, Total and Proportion N = 𝑛𝑘 For mean, Estimated of the population mean 𝝁 σ𝑛𝑖=1 𝑦𝑖 yത 𝑠𝑦 = 𝑛 ഥ𝒔𝒚 Estimated variance of 𝒚 2

෡ 𝑦ത𝑠𝑦 = 𝑁−𝑛 𝑠 V 𝑁 𝑛 Relative standard error of estimation ෢ 𝑅𝑆𝐸

=

𝑣ො 𝑦ത𝑠𝑦 𝑦ത𝑠𝑦

× 100

Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝑦ത𝑠𝑦

For total, Estimated population total 𝜏Ƹ sy = 𝑁𝑦തsy Estimated variance of 𝝉ො 𝑉෠ 𝜏Ƹ sy = 𝑁 2 𝑉෠ 𝑦ത𝑠𝑦 Relative standard error of estimation ෢ 𝑅𝑆𝐸

𝑣ො 𝜏ො sy

=

𝜏ො sy

× 100

Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝜏Ƹ sy

Estimated of the population proportion 𝒑 𝑝Ƹ 𝑠𝑦 = 𝑦ത𝑠𝑦 =

σ𝑛 𝑖=1 𝑦𝑖 𝑛

ෝ𝒔𝒚 Estimated variance of 𝒑 𝑝Ƹ 𝑠𝑦 𝑞ො𝑠𝑦 𝑁 − 𝑛 𝑉෠ 𝑝Ƹ𝑠𝑦 = 𝑛−1 𝑁 where 𝑞ො𝑠𝑦 = 1 − 𝑝Ƹ 𝑠𝑦 Relative standard error of estimation ෢ 𝑅𝑆𝐸

=

𝑣ො 𝑝ො𝑠𝑦 𝑝ො𝑠𝑦

× 100

Bound on the error of estimation B=2 𝑉෠ 𝑝Ƹ 𝑠𝑦

Example 4.3 A 1-in-6 systematic sample is obtained from a voter registration list to estimate the proportion of voters in favor of the proposed bond issue. Several different random starting points are used to insure that the results of the sample are not affected by periodic variation in the population. The coded results of this pre-selection survey are as shown in the table. Estimate p, the proportion of the 5775 registered voters in favor of the proposed bond issue (N=5775). Place a bound on the error of estimation. Voter 4 10 16 . . 5760 5766 5772

Response 1 0 1 . . 0 0 1 962

෍ 𝑦𝑖 = 652 𝑖=1

Solution

Estimated population proportion σ𝑛𝑖=1 𝑦𝑖 652 𝑝Ƹ𝑠𝑦 = = = 0.678 962 962 Estimate variance of 𝑝Ƹ𝑠𝑦 𝑝Ƹ𝑠𝑦 𝑞ො𝑠𝑦 𝑁 − 𝑛 ෠ 𝑉 𝑝Ƹ𝑠𝑦 = 𝑛−1 𝑁 0.678 0.322 5775 − 962 = = 0.00019 961 5775 Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝑝Ƹ𝑠𝑦 = 2 × 0.0138 = 0.0275 Thus we estimate 0.678 (67.8%) of the registered voters favor the proposed bond issue. We are relatively confident that the error of estimation is less than 0.0275 (2.75%).

Example A group of guidance counselors is concerned about the average yearly tuition for out-of-state students in 371 junior colleges. From an alphabetical list of these colleges, a 1-in-7 systematic sample is drawn. Data concerning out-of –state tuition expenses for an academic year (September to June) are obtained for each college sample. Use the following data summary: σ 𝑦𝑖 = $ 11950

σ 𝑦𝑖2 = $2731037

(a) Estimate the mean yearly tuition and place a bound on the error of estimation.

Solution k = 7 (1 –in – 7 systematic sample) N = 371, σ 𝑦𝑖 = 11950, σ 𝑦𝑖2 = 2731037 N = nk 𝑁 371 n= = = 53 𝑘 7 (a) Estimate the population mean σ𝑦

11950

𝑦ത𝑠𝑦 = 𝑖 = = 225.47 students 𝑛 53 Estimate variance of 𝑦ത𝑠𝑦 𝑣ො 𝑦ത𝑠𝑦 𝑠

2

𝑣ො 𝑦ത𝑠𝑦

= = =

𝑠2

𝑁−𝑛

𝑛 𝑁 σ 𝑦𝑖2 −𝑛𝑦ത 2 𝑛−1 705.5922 53

=

2731037−53 225.47 2 371−53

53−1

= 705.5922

371

= 13.3131× 0.8571 = 11.4107 Bound on the error of estimation B

= 2 𝑣ො 𝑦ത𝑠𝑦 = 2 11.4107 = 6.7559 students

Thus we estimate the mean yearly tuition to be 𝑦ത𝑠𝑦 =225.47 students. The error of estimation should be less than B = 6.7559 students.

CLUSTER SAMPLING What is Cluster Sampling? A cluster sample is a simple random sample in which each sampling unit is a collection, or cluster, of elements. Estimated of the population mean 𝜇 σ𝑛 𝑖=1 𝑦𝑖

𝑦ത𝑐𝑙 = σ𝑛

𝑖=1 𝑚𝑖

Estimated variance of 𝑦ത𝑐𝑙 ത 𝑐𝑙 𝑚𝑖 𝑁−𝑛 σ𝑛 𝑖=1 𝑦𝑖 −𝑦 ෠ 𝑉 𝑦ത𝑐𝑙 = ഥ 𝑁𝑛 𝑀²

2

𝑛−1

Where, 2 σ 2 σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖 2 = σ 𝑦𝑖2 −2𝑦ത𝑐𝑙 σ 𝑦𝑖 𝑚𝑖 + 𝑦ത𝑐𝑙 𝑚𝑖 Relative standard error of estimation ෢ 𝑅𝑆𝐸

=

𝑣ො 𝑦ത 𝑐𝑙 𝑦ത 𝑐𝑙

× 100

Bound on the error of estimation: 𝐵 = 2 𝑉෠ 𝑦ത𝑐𝑙

Estimator of the population total 𝝉 𝜏Ƹ cl = M𝑦ത𝑐𝑙 Estimated variance of 𝝉ො 𝐜𝐥 𝑉෠ 𝜏Ƹ cl = 𝑉෠ M𝑦ത𝑐𝑙 = 𝑀2 𝑉෠ 𝑦ത𝑐𝑙 Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝜏Ƹ cl Estimator of the population proportion 𝝆 σ𝑛 𝑖=1 𝑦𝑖

𝑝Ƹcl = σ𝑛

𝑖=1 𝑚𝑖

ෝ𝐜𝐥 : Estimated variance of 𝒑 ොcl 𝑚𝑖 𝑁−𝑛 σ𝑛 𝑖=1 𝑦𝑖 −𝑝 𝑉෠ 𝑝Ƹ cl = ഥ 𝑁𝑛𝑀²

2

𝑛−1

Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝑝Ƹ cl

N = the number of clusters in the population n = the number of clusters selected in a simple random sample mi = the number of elements in cluster i, i=1,-----.N 𝑚 ഥ 𝑛 1 = ෍ 𝑚𝑖 = 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑛 𝑖=1

𝑀

𝑛

= ෍ 𝑚𝑖 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑖=1 ഥ=𝑀 𝑀 𝑁

= 𝑡ℎ𝑒 𝑎𝑣𝑒𝑎𝑟𝑔𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑖𝑧𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛, ഥ must be estimated since, M is not known,𝑀 𝑦𝑖 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑖 𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟

Example 5.2 Interviews are conducted in each of the 25 blocks sampled in example. The data on income for adult males are presented in table. Use the data to estimate the average and total income per adult male in the city and place a bound on the error of estimation. Assume N = 415, M = 2500. σ 𝑚𝑖 = 151, σ 𝑦𝑖 = 1,329,000, σ 𝑚𝑖2 = 1.047, σ 𝑦𝑖2 = 82,039,000,000, σ 𝑚𝑖 𝑦𝑖 = 8,403,000 Solution N = 415, M = 2500, n = 25 σ 𝑚𝑖 = 151, σ 𝑦𝑖 = 1,329,000, σ 𝑚𝑖2 = 1.047, σ 𝑦𝑖2 = 82,039,000,000, σ 𝑚𝑖 𝑦𝑖 = 8,403,000 Estimate of the population mean σ𝑛𝑖=1 𝑦𝑖 1,329,000 𝑦ത𝑐𝑙 = 𝑛 = = $ 8,801 σ𝑖=1 𝑚𝑖 151 Estimated Variance of 𝑉෠ 𝑦ത 𝑁 − 𝑛 σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖 2 𝑉෠ 𝑦ത𝑐𝑙 = ഥ2 𝑛−1 𝑁𝑛𝑀

σ𝑛𝑖=1 𝑚𝑖 151 𝑚 ഥ= = = 6.04 𝑛 25 𝑛

෍ 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖

𝑛

2

𝑖=1

15

𝑛

= ෍ 𝑦𝑖2 − 2𝑦ത ෍ 𝑦𝑖 𝑚𝑖 + 𝑦ത𝑐𝑙 2 ෍ 𝑚𝑖2 𝑖=1

𝑖=1

𝑖=1

25

෍ 𝑦𝑖 − 𝑦ത𝑐𝑙 𝑚𝑖

2

= 82,039,000,000 − 2 8,801 8,403,000 + 8,801

2

1,047

𝑖=1

𝑉෠ 𝑦ത𝑐𝑙

= 15,227,502,247 415−25 = 415 25 6.04 ²

15,227,502,247 24

= 653,785 Bound on the error of estimation

𝑦ത𝑐𝑙 = 2 𝑉෠ 𝑦ത𝑐𝑙

= $ 1.617

Thus we estimate the average income per adult male to be of estimation should be less than B = $ 1.617.

𝑦ത𝑐𝑙 = $ 8,801. The error

Example 5.3 Use the data in example (5.2) to estimate the total income of all adult males in the city, and place a bound on the error of estimation .There are 2,500 adult males in the city. Solution M = 2500, 𝑦ത𝑐𝑙 = $ 8,801, 𝑉෠ 𝑦ത𝑐𝑙 = 653,785 Estimator of the population total 𝝉 𝜏Ƹ cl =M𝑦ത𝑐𝑙 = 2,500(8,801) = $ 22,002,500 Estimated variance of 𝝉ො 𝐜𝐥 𝑉෠ 𝜏Ƹ cl = 𝑉෠ M𝑦ത𝑐𝑙 = 𝑀2 𝑉෠ 𝑦ത𝑐𝑙 = (2500)2 (653,785) = 408615625 Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝜏Ƹ cl = 2 408615625 = $ 40428.486 Thus we estimate the total income per adult male to be 𝜏Ƹ cl = $ 22,002,500. The error of estimation should be less than B = $ 40428.486.

Example 5.5 In addition to the information on income, the adult males in the sample survey of example 5.2 are asked whether they rent or own their homes. The results are given in table. Use the data in table to estimate the proportion of adult males in the city who rent their homes .Place a bound on the error of estimation. 2 25 2 25 25 σ25 𝑖=1 𝑎𝑖 = 262 , σ𝑖=1 𝑚𝑖 = 1047, σ𝑖=1 𝑎𝑖 𝑚𝑖 = 511, σ𝑖=1 𝑚𝑖 = 15 , σ25 𝑖=1 𝑎𝑖 = 72 Solution N = 415, M = 2500, n = 25 2 25 2 25 25 σ25 𝑖=1 𝑎𝑖 = 262, σ𝑖=1 𝑚𝑖 = 1047, σ𝑖=1 𝑎𝑖 𝑚𝑖 = 511, σ𝑖=1 𝑚𝑖 = 151, σ25 𝑖=1 𝑎𝑖 = 72 Estimate of the population proportion 𝑝Ƹ cl σ𝑛𝑖=1 𝑎𝑖 72 𝑝Ƹ cl = 𝑛 = = 477 σ𝑖=1 𝑚𝑖 151 Estimated Variance 𝑝Ƹ cl 𝑁 − 𝑛 σ𝑛𝑖=1 𝑎𝑖 − 𝑝Ƹ cl 𝑚𝑖 2 𝑉෠ 𝑝Ƹ cl = ഥ 𝑛−1 𝑁𝑛𝑀²

𝑛

෍ 𝑎𝑖 − 𝑝Ƹ cl 𝑚𝑖 𝑖=1

𝑛 2

𝑛

𝑛

= ෍ 𝑎𝑖2 − 2𝑝Ƹ cl ෍ 𝑎𝑖 𝑚𝑖 + 𝑝Ƹ cl 2 ෍ 𝑚𝑖2 𝑖=1

𝑖=1

𝑖=1

= 262 − 2 0.477 511 + 0.477 𝑛 σ 151 𝑖=1 𝑚1 ෡ = 𝑚 ഥ 𝑀 ഥ= = = 6.04 𝑛 25

2

1047 = 12.729

415 − 25 (12.729) = 0.00055 415 25 6.04 ²(24) Bound on the error of estimation 𝐵 = 2 𝑉෠ 𝑝Ƹ cl = 2 0.00055 = 0.0469 Thus the best estimate of the proportion of adult males who rent homes is .477. The error of estimation should be less than B = 0.0469.

𝑉෠ 𝑝Ƹ cl =