654 Answers, b Because the children who had the surgery could easily 1 47 a It is not reasonable to conclude that watching. determine whether the surgical procedure was laparoscopic Oprah causes a decrease in cravings for fattening foods. repair or open repair based on the type of incision This was an observational study so cause and effect. conclusions cannot be drawn,1 33 There are several possible approaches One. b It is not reasonable to generalize the results of this survey to. possibility is to write the subjects names on otherwise. all women in the United States because not all women watch. identical slips of paper Mix the slips of paper thoroughly. daytime talk shows It is not reasonable to generalize these. and draw out slips one at a time The names on the first. results to all women who watch daytime talk shows because not. 15 slips are assigned to the experimental condition of. all women who watch daytime shows access DietSmart com. listening to a Mozart piano sonata for 24 minutes The. names on the next 15 slips are assigned to the experimental 1 48 The researcher would have had to assign the nine. condition of listening to popular music for the same length cyclists at random to one of the three experimental. of time The remaining 15 names are assigned to the conditions chocolate milk Gatorade or Endurox. relaxation with no music experimental condition, 1 34 1 Do ethnic group and gender influence the type Study 1 This is an observational study random selection. of care that a heart patient receives 2 The experimental was used this was not an experiment so there were. conditions are gender and race 3 The response variable no experimental groups no because this was not an. is the type of care the heart patient received 4 The experiment cause and effect cannot be concluded it is. experimental units are the 720 primary care doctors It is not reasonable to generalize to the population of students at this. clear how the physicians were chosen 5 Yes the design particular large college. incorporates random assignment of doctors to view one of the. Study 2 This study was an experiment random selection. four different videos through rolling a four sided die 6 No. was not used there was no random assignment to, control group was used There is no need for a control group. experimental conditions the grouping was based on, in this study 7 There is no indication that the study includes. gender the conclusion is not appropriate because of. blinding but there is no need for blinding in this experiment. confounding of gender and treatment women ate pecans. Additional Exercises and men did not eat pecans it is not reasonable to. generalize to a larger population,1 41 a Some surgical procedures are more complex. and require a greater degree of concentration music with Study 3 This is an observational study no random. a vocal component might be more distracting when the selection no random assignment to experimental groups. surgical procedure is more complex the conclusion is not appropriate because this was an. b The temperature of the room might affect the comfort observational study and therefore cause and effect. of the surgeon if the surgeon is too hot or too cold she conclusions cannot be drawn cannot generalize to any. or he might be uncomfortable and therefore more easily larger population. distracted by the vocal component Study 4 This is an experiment no random selection. c If the music is too loud the surgeon might be distracted there was random assignment to experimental groups yes. and unable to focus regardless of the presence or absence because this was an experiment with random assignment of. of the vocal component If the music is too soft the subjects to experimental groups we can draw cause and. surgeon might try to concentrate on listening to the vocal effect conclusions cannot generalize to a larger population. component and therefore pay more attention to the music. rather than to the surgical procedure Study 5 This is an experiment there was random selection. from students enrolled at a large college random assignment. 1 43 This experiment could not have been double blind of subjects to experimental groups was used because. because the surgeon would know whether or not there was this was a simple comparative experiment with random. a vocal component to the music assignment of subjects to experimental groups we can draw. 1 45 a Probably not because the judges might not believe cause and effect conclusions there was random selection of. that Denny s food is as good as that of other restaurants students so we can generalize conclusions from this study to. b Experiments are often blinded in this way to eliminate the population of all students enrolled at the large college. preconceptions about particular experimental treatments. Additional Exercises,1 55 Would need to know if dieters were randomly. Section 1 4 assigned to the experimental conditions large fork or small. Exercise Set 1 fork and if the study participants were randomly selected. from the population of dieters, 1 46 This was an observational study so cause and effect. conclusions cannot be drawn 1 57 There was no random selection from some population. Answers 655, 1 59 Yes because this was an experiment and there was whether or not they were using hand gestures and there is. random assignment of subjects to experimental groups no need to blind the person recording the response because. the test was graded with each answer correct or incorrect. so there is no subjectivity in recording the responses. Are You Ready to Move On b The conclusions are reasonable because the subjects. Chapter 1 Review Exercises were assigned to the treatment groups at random. 1 61 a experiment there is random assignment of 1 73 a No the 60 games selected were the 20 most. subjects to experimental conditions popular by sales for each of three different gaming. b observational study there was no assignment of systems The study excluded the games that were not in the. subjects to experimental conditions top 20 most popular by sales. c observational study there was no assignment of subjects b It is not reasonable to generalize to all video games. to experimental conditions because of the exclusion of those games not in the top 20. d experiment there was random assignment of study by sales. participants to experimental conditions, 1 63 a population characteristic Study 1 observational study no random selection no. b statistic random assignment to experimental groups not reasonable. c population characteristic to conclude that taking calcium supplements is the cause of. d statistic the increased heart attack risk not reasonable to generalize. e statistic conclusions from this study to a larger population. 1 65 The council president could assign a unique Study 2 observational study there was random selection. identifying number to each of the names on the petition from the population of people living in Minneapolis who. numbered from 1 to 500 On identical slips of paper write receive Social Security no random assignment of subjects. the numbers 1 to 500 with each number on a single slip to experimental groups not reasonable to conclude that. of paper Thoroughly mix the slips of paper and select taking calcium supplements is the cause of the increased. 30 numbers The 30 numbers correspond to the unique heart attack risk it is reasonable to generalize the results of. numbers assigned to names on the petition These 30 are this study to the population of people living in Minneapolis. the names that would be in the sample who receive Social Security. 1 67 Without random assignment of the study participants Study 3 experiment there was random selection from the. to experimental conditions confounding could impact the population of people living in Minneapolis who receive. conclusions of the study For example people who would Social Security no random assignment of subjects to. choose an attractive avatar might be more outgoing and experimental groups not reasonable to conclude that taking. willing to engage than someone who would choose an calcium supplements is the cause of the increased risk of. unattractive avatar heart attack because the participants in this study who did. not have a previous history of heart problems were given. 1 69 a The alternate assignment to the experimental. the calcium supplement and those with a history of heart. groups large serving bowls small serving bowls would. problems were not given the supplement It is not possible. probably produce groups that are similar, to determine the role of the calcium supplement because. b Blinding ensures that individuals do not let personal beliefs. only those study participants who did not have a history of. influence their measurements The research assistant who. heart problems were given the supplement it is possible to. weighed the plates and estimated the calorie content of the food. generalize the results from this study to the population of all. might intentionally or not have let personal beliefs influence. people living in Minneapolis who receive Social Security. the estimate of the calorie content of the food on the plate. However it is unclear due to the confounding described in. 1 71 a 1 Does using hand gestures help children Question 4 what the conclusion would be. learn math 2 Using hand gestures and not using hand. Study 4 experiment no random selection from some,gestures 3 Number correct on the six problem test. larger population there was random assignment of study. 4 The 128 children in the study they were selected. participants to experimental groups it is reasonable to. because they were the children who answered all six. conclude that taking calcium supplements is the cause. questions on the pretest incorrectly 5 Yes the children. of the increased risk of heart attack it is not reasonable. were assigned randomly to one of the two experimental. to generalize conclusions from this study to some larger. groups 6 Yes the control group is the experimental. population, condition of not using any hand gestures 7 There was. no blinding It would not be possible to include blinding. of subjects in this experiment the children would know. 656 Answers, Chapter 2 Graphical b One example Senior Satisfaction Over 80 say they. would enroll again,Methods for Describing 2 17, Data Distributions Relative frequency Can t live without. Would miss,Section 2 1 0 5,Could live without,Exercise Set 1 0 4. 2 1 a numerical discrete b categorical c numerical. continuous d numerical continuous e categorical 0 3. 2 2 a discrete b continuous c discrete d discrete 0 2. Data Set 1 one variable categorical summarize the data. distribution bar chart 0 0,Computer Cell phone DVD player. Data Set 2 one variable numerical compare groups, comparative dotplot or comparative stem and leaf display. Data Set 3 two variables numerical investigate Additional Exercises. relationship scatterplot 2 21, Data Set 4 one variable categorical compare groups Relative frequency. comparative bar chart, Data Set 5 one variable numerical summarize the data 30. distribution dotplot stem and leaf display or histogram 25. Additional Exercises,2 7 a numerical b numerical c categorical. d numerical e categorical 10,2 9 a categorical b numerical c numerical. d categorical,Employment,Credit card,utilities fraud. Bank fraud,2 11 a numerical b numerical c numerical. d categorical e categorical f numerical g categorical. 2 13 one variable numerical compare groups Type of complaint. comparative dotplot or comparative stem and leaf display. Credit card fraud is the most commonly occurring identity. 2 15 one variable numerical summarize the data theft type Although phone utility bank and employment. distribution dotplot stem and leaf plot or histogram fraud each constitute a relatively large portion of overall. type of identity theft the collective other fraud category. Section 2 2 is greater than any one of these other three. Exercise Set 1 2 23 The relative frequency distribution is. Relative frequency,Type of Household Relative Frequency. Nonfamilies 0 29,Married with Children 0 27,Married without Children 0 29. 0 3 Single Parent 0 15,Definitely Probably Probably Definitely. yes yes no no,Answers 657,Relative frequency 2 26 28 8. 0 15 35 00145678899,37 0034566777,0 05 38 01124558. Legend 34 1 5 34 1 years, The distribution of median ages is centered at approximately. Type of household 37 years old with values ranging from 28 8 to 42 2 years. The distribution is approximately symmetric with one. unusual value of 28 8 years,Section 2 3, Exercise Set 1 Very Large Urban Area Large Urban Area. 2 24 a 1 023478, 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 8 3 0033589. Cost cents per gram of protein 99 4 0366,8711 5 012355. b Because the costs of meat and poultry products 9730 6. represented by the squares on the dotplot are generally 2 7. smaller than the costs for other sources of protein they do 8. appear to be a good value 3 9,Legend 4 6 5 46 extra hours per year. b The statement The larger the urban areas the greater. the extra travel time during peak period travel is generally. consistent with the data Although there is overlap between. the times for the very large and large urban areas the extra. 120 180 240 300 360 420 480 540 travel times for the very large urban areas are generally. Sales millions of dollars greater than those for the large urban areas. b Both distributions are skewed toward larger values The 2 28 a. 2007 ticket sales are centered at about 210 million dollars Density. which is higher than the center of the 2008 ticket sales. which are centered around 150 million dollars The lowest 0 0020. ticket sales for both 2007 and 2008 are approximately 127. million dollars Ticket sales for 2008 have a maximum. value of approximately 533 million dollars which is much 0 0015. higher than the highest ticket sales for 2007 Without this. extreme value the spreads between the lowest and highest 0 0010. values are approximately equal,Credit card balance Credit Bureau data. 658 Answers,Density Density,0 0015 0 3,0 0010 0 2,0 1 2 3 4 5 6 7 8 9 10. Credit card balance survey data Number of attempts. c The histograms are similar in shape A notable 2 41. difference is that the Credit Bureau data show that 7 of Non Disney Disney. students have credit card balances of at least 7 000 but no 9521000 0 001233579. survey respondent indicated a balance of at least 7 000 651 1 567. d Yes because students with credit card balances 0 2 029. of 7 000 or more might be too embarrassed to admit that 3. they have such a high balance 4, 2 29 a If the exam is quite easy the scores would be Legend 1 5 5 150 seconds. clustered at the high end of the scale with a few low scores. for the students who did not study The histogram would be In general the Disney movies have longer tobacco. negatively skewed exposure times than the non Disney movies Disney movies. b If the exam is difficult the scores would be clustered have a typical value of approximately 80 seconds which. around a much lower value with only a few high scores is larger than a typical value for the non Disney movies. The histogram would be positively skewed of 50 seconds In addition there is more variability in the. c In this case the histogram would be bimodal with a Disney tobacco exposure times than the others Disney. cluster of high scores and a cluster of low scores movies vary between 6 and 548 seconds which is greater. than the observed spread for the non Disney movies which. Additional Exercises vary between 1 and 205 seconds Finally there appears to. 2 37 The distribution of wind speed is positively skewed be one outlier in the Disney movies 548 seconds and no. and bimodal There are peaks in the 35 40 m s and outliers in the non Disney movies. 60 65 m s intervals,Frequency Section 2 4,14 Exercise Set 1. 25 30 35 40 45 50 55 60 65 100,Maximum wind speed m s. 0 1 2 3 4 5 6 7 8 9,Answers 659, The scatterplot shows the expected positive relationship The lower left region corresponds to healthier fast food. between grams of fat and calories The relationship is weak choices This region corresponds to food items with fewer. than 3 grams of fat and fewer than 900 milligrams of. Percent who smoke,200 400 600 800 1000 1200 1400 1600 1800. Sodium 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005. As was observed in the calories versus fat scatterplot there. is also a weak positive relationship between calories and There has been a steady downward trend in the percent of. sodium The relationship between calories and sodium people who smoke among people who did not graduate. appears to be a little stronger than the calories versus fat from high school from a high of 44 to 29 in 2005. relationship,50 Not HS graduate,Sodium HS graduate no college. 45 Some college,1800 Bachelors or Higher,Percent who smoke. 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005,0 1 2 3 4 5 6 7 8 9. There is no apparent relationship between sodium and fat c There has been a steady downward trend in the percent. of people age 25 or older who smoke regardless of, d education level In 1960 regardless of education level the. Sodium percentage was approximately the same about 44 48. Over time however the differences have become more. 1800 pronounced In 2005 those people with bachelor s degrees. 1600 or higher had the lowest smoking rate 10 followed. 1400 by those with some college 21 The highest rates of. 1200 smoking were found among those who either did not. 1000 graduate from high school 29 or graduated high school. but did not attend college 27,0 1 2 3 4 5 6 7 8 9,660 Answers. Additional Exercises One alternate assignment of grades is to require that Top. 2 47 a of the Class schools earn grades of 72 or higher Passing. Percentage of households with computer schools earn between 66 and 71 Barely Passing schools. earn between 61 and 65 and Failing schools earn 60 or. below This alternative is suggested because there appear. to be clusters of dots on the dotplot that correspond to the. suggested ranges, 2 53 Answers will vary For example For teens ages 12. 30 to 17 years the percentage of cell phone owners increases. with age in each of the years 2004 2006 and 2008 In. 20 addition within each age group the percentage of teens. owning cell phones increased between 2004 and 2008 The. 10 largest increase in percentage of teens owning cell phones. was among 12 year olds and the smallest percentage. increase was among 13 year olds,Additional Exercises. b The percentage of households with computers has, increased over time from a low of approximately 8 in 2 57 a The areas in the display are not proportional to the. 1985 to over 50 in 2000 The rate of increase has also values they represent The no category seems to represent. increased over time more than 68, 2 49 There is a relatively weak positive relationship b. between poverty rate and dropout rate There are two states 70. that have poverty rates between 10 and 15 but have 60. much higher dropout rates over 15 compared to other 50. states with comparable poverty rates,Section 2 5,Exercise Set 1. 2 51 a categorical, b A bar chart was used because the response is a Yes No. categorical variable and dotplots are used for numerical Response. c This is not a correct representation of the response data 2 59 a. because the percent values add up to over 100 they add Percentage. 2 52 a Overall score is numerical Grade is categorical. b The figure is equivalent to a segmented bar graph 80. because the bar is divided into segments with different Jun 2003. shaded regions representing the different grades Top of 60 Mar 1991. the Class Passing Barely Passing and Failing Mar 1978. and the height of each segment is equal to the frequency 40. for that category for example there are five school. districts in the Top of the Class category three in the. Passing category and so on making the area of each. shaded region proportional to the relative frequencies for 1977 1982 1987 1992 1997 2002. each grade Date,35 40 45 50 55 60 65 70 75 80 85,Overall score. Answers 661,b b The In Default bar in the For Profit Colleges. Percentage category is taller than either of the other In Default bars. 100 2 65 a 0 8,1 4467788999,80 2 000011222333334445557778. Jun 2003 3 01222334457,60 Mar 1991 4 02458,40 Legend 1 4 5 14 cents per gallon. Mar 1978 Mar 1991, Sep 2003 b The center is approximately 24 cents per gallon and. Jun 2003 most states have a tax that is near the center value with. 0 tax values ranging from 8 cents per gallon to 48 cents per. 1977 1982 1987 1992 1997 2002 gallon The distribution is approximately symmetric. Date c The only value that might be considered unusual is the. Good time Bad time 8 cents per gallon tax in Alaska. c The time series plot best shows the trend over time 2 67 a. Are You Ready to Move On 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170. Quality rating,Chapter 2 Review Exercises,b About 114. 2 61 c There is a lot of variability with quality rating ranging. Data Set 1 one variable numerical summarize the data between a low score of 84 defects per 100 vehicles and a. distribution a dotplot stem and leaf display or histogram high score of 170 defects. would be an appropriate graphical display d Two brands Land Rover and Mitsubishi seem to stand. out as having a much greater value for number of defects. Data Set 2 two variables numerical investigate the. Four brands Acura Lexus Mercedes Benz and Porsche, relationship a scatterplot would be an appropriate graphical. have smaller values for number of defects, Data Set 3 two variables categorical compare groups. comparative bar chart, Data Set 4 one variable categorical summarize the data 6. distribution a bar chart would be an appropriate graphical 5. Data Set 5 One variable numerical compare groups a 3. comparative dotplot or comparative stem and leaf display 2. would be an appropriate graphical display 1,2 63 a 720 740 760 780 800 820 840 860 880. 1 APEAL rating,Good standing, In default The histogram is centered at approximately 790 with. values that range between approximately 720 and 880 The. 0 7 distribution is bimodal and positively skewed,Relative frequency. APEAL rating,Public Private non For profit,colleges profit colleges colleges 740. 80 90 100 110 120 130 140 150 160 170,Quality rating. 662 Answers, There is a weak negative relationship between customer 3 3 The distribution is skewed with a possible outlier so. satisfaction as measured by the APEAL rating and the median and interquartile range should be used. number of defects Brands with a higher number of defects. per 100 vehicles tend to have lower satisfaction ratings 3 4 The average may not be the best measure of a typical. value for this data set because the distribution is clearly. 2 69 a skewed,100 Unknown Other Additional Exercises. Native American, African American 3 9 The distribution is skewed so median and. 80 Hispanic Latino interquartile range should be used. Asian American, Nonresident alien 3 11 The distribution is roughly symmetric with no. White obvious outliers so the mean and standard deviation should. 40 Section 3 2,Exercise Set 1, 20 3 12 x 5 51 33 ounces this is a typical value for. the amount of alcohol poured s 5 15 22 ounces this. represents how much on average the values in the data set. 0 spread out or deviate from the mean,Enrollment,5 59 23 ounces this is a typical value for. b The segmented bar graph in part a is more informative the amount of alcohol poured s 5 16 71 ounces this. because it is easier to get a sense of the percentages of each represents how much on average the values in the data set. ethnicity enrolled Specifically in the original graphical spread out or deviate from the mean. display with the Nonwhite category further subdivided it b Individuals pouring alcohol into short wide glasses. is difficult to compare the Nonwhite breakout categories pour on average more alcohol than when pouring into tall. with the other categories represented in the pie chart slender glasses. c The pie chart combined with the segmented bar graph. could have been chosen because some of the pie slices 3 14 a x 5 59 85 hours s 5 14 78 hours. might be very thin and hard to see and too many pieces b x. 5 56 67 hours s 5 9 75 hours When Los Angeles, could be difficult to visually process was excluded from the data set the mean and standard. deviation both decreased quite a bit This suggests that. 2 71 The display is misleading because the area principle using the mean and standard deviation as measures of. is violated The areas of the cocaine mounds are not center and spread for data sets with outliers can be risky. proportional to the relative frequencies being represented because outliers seem to have a significant impact on those. 3 15 Answers will vary One possible answer The mean. is 444 but it is likely that some parents spend only a small. Chapter 3 Numerical amount There is probably a lot of variability in amount spent. so we would expect a large value for the standard deviation. Methods for Describing Additional Exercises,Data Distributions. 3 21 a x 5 287 714 the deviations from the mean are. 209 286 294 714 40 286 2132 714 38 286 242 714,Section 3 1 c s2 5 12 601 905 s 5 112 258. Exercise Set 1 3 23 The deviations are exactly the same as the. 3 1 The distribution is approximately symmetric with no corresponding deviations for the original data set Since. outliers so the mean and standard deviation should be used the deviations are the same the new variance and standard. to describe the center and spread respectively deviation are also the same as the old variance and standard. deviation Subtracting the same number from or adding the. 3 2 The distribution is skewed with an outlier so the. same number to every value in a data set does not change. median and interquartile range should be used,the value of the variance or standard deviation. Answers 663, The boxplot shows that there is one outlier 170 defects. Section 3 3 and the value of the largest non outlier is 146 defects The. Exercise Set 1 middle 50 of the data values range between about 106. 3 25 a median 5 433 246 5 Half of the newspapers had and 126 defects The distribution is positively skewed. average weekly circulations of less than 433 246 5 and the 3 38 a No they are not outliers For this data set values. other half had average weekly circulations of more than are outliers if they are greater than 32 3 1 1 5 32 3 2 20 5. 433 246 5 50 75 cents or less than 20 2 1 5 32 3 2 20 5 1 55 cents. b The median is preferable because the distribution is. skewed and contains outliers b, c It is not reasonable to generalize from this sample. to the population of daily U S newspapers because these. newspapers were not randomly selected They are the 10 20 30 40 50. top 20 American newspapers in average weekday circulation Gasoline tax per gallon cents. 3 26 Lower quartile 5 10 478 25 of the catsups have The distribution is positively skewed. sodium contents lower than 10 478 Upper quartile 5. 11 778 75 of the catsups have sodium contents lower 3 39 a lower quartile 5 16 05 inches upper quartile 5. than 11 778 The interquartile range is iqr 5 11 778 2 21 93 inches iqr 5 21 93 2 16 05 5 5 88 inches. 10 478 5 1 300 the range of the middle 50 of the catsup b For this data set values are outliers if they are greater. sodium contents is 1 300 than 21 93 1 1 5 5 88 5 30 75 inches or less than 16 05 2. 1 5 5 88 5 7 23 inches The value 31 57 inches is an outlier. 3 27 median 5 142 half of the values of number of, minutes used in cell phone calls in one month are less than c. or equal to 142 minutes and half of the data values of. number of minutes used in cell phone calls are greater than. or equal to 142 minutes iqr 5 195 the middle 50 of the. 10 15 20 25 30, data values have a range of 195 minutes Rainfall inches. 3 28 median 5 21 half of the tips were below 21 and. The modified boxplot shows one outlier at the high end of. the remaining half were above 21 iqr 5 24 85 the, the scale The distribution of inches of rainfall is slightly. middle 50 of tips had a range of 24 85,positively skewed. Additional Exercises, 3 33 The large difference between the mean and median. indicates that there were some parents who spent large. amounts of money on school supplies Short wide,3 35 a mean will be greater than the median. Tall slender,5 370 69 seconds median 5 369 5 seconds. c The largest time could be increased by any amount. 20 30 40 50 60 70 80 90 100, and not affect the sample median because the position. Amount of alcohol poured mL, of the middle value will not change if the largest value. is increased The largest time could be decreased to Both distributions short wide and tall slender are. 370 seconds without changing the value of the median skewed although the direction of skew is different for the. two distributions The amount of alcohol poured into short. Section 3 4 wide glasses tends to be more than the amount poured into. tall slender glasses,Exercise Set 1,Additional Exercises. 3 36 Minimum 5 0 lower quartile 5 14 median 5 33 5. upper quartile 5 63 maximum 5 151,Middle states,80 90 100 110 120 130 140 150 160 170 West. Manufacturing defects,5 10 15 20 25,664 Answers, b The most noticeable difference between the wireless 17 5 minutes iii 90th percentile is approximately. percent for the three geographical regions is that the Middle 21 5 minutes iv 95th percentile is approximately. States region is negatively skewed and has a smaller 25 5 minutes v 10th percentile is approximately. interquartile range than the East and West regions The 17 5 minutes. Eastern region has the smallest median 11 4 and the. Middle States and Western regions have medians that are Additional Exercises. about the same 16 9 and 16 3 3 59 a 1 100 gallons b 1 400 gallons c 1 700 gallons. 3 49 The fact that the mean is so much higher than the 3 61 a 120. median indicates that the distribution is positively skewed b 20. Section 3 5 e Since a score of 40 is 3 standard deviations below the. Exercise Set 1 mean that corresponds to a percentile of 0 15 Therefore. 3 50 First national aptitude test z 5 1 5 Second national there were relatively few scores below 40. aptitude test z 5 1 875 The student performed better on. the second national aptitude test relative to the other test Are You Ready to Move On. takers because the z score for the second test is higher than. for the first test Chapter 3 Review Exercises, 3 51 a 40 minutes is 1 standard deviation above the 3 63 x 5 792 03 which is a typical or representative value. mean 30 minutes is 1 standard deviation below the mean for the APEAL rating s 5 36 70 which represents how. The values that are 2 standard deviations away from the much on average the values in the data set spread out or. mean are 25 and 45 minutes deviate from the mean APEAL rating. b Approximately 95 of times are between 25 and 3 65 a x. 5 27 31 s 5 23 83, 45 minutes approximately 0 3 of times are less than b After removing the 105 tip the new mean and. 20 minutes or greater than 50 minutes approximately standard deviation are x. new 5 23 23 and snew 5 15 70, 0 15 of times are less than 20 minutes These values are much smaller than the mean and standard. 3 52 The 10th percentile of 0 indicates that 10 of deviation computed with 105 included This suggests that. students have 0 or less of student debt The 25th percentile the mean and standard deviation can change dramatically. which is the lower quartile indicates that 25 of students when outliers are present or removed from the data set. have 0 or less of student debt The 50th percentile the and therefore are probably not the best measures of center. median indicates that 50 of students have 11 000 or and spread to use in this situation. less of student debt The 75th percentile the upper quartile 3 67 a median 5 140 seconds half the values are. indicates that 75 of students have 24 600 or less of less than 140 seconds and half the values greater than. student debt The 90th percentile indicates that 90 of 140 seconds iqr 5 100 seconds the middle 50 of the. students have 39 300 or less of student debt data values have a range of 100 seconds. 3 53 a b There is an outlier in the data set, 100 3 69 a Median 5 8 grams serving lower quartile 5. 90 7 grams serving upper quartile 5 12 grams serving. 80 interquartile range 5 12 2 7 5 5 grams serving,70 b Median 5 10 grams serving lower quartile 5. 6 grams serving upper quartile 5 13 grams serving,interquartile range 5 13 2 6 5 7 grams serving. c There are no outliers in the sugar content data, d The minimum value and lower quartile are the same. because the smallest five values in the data set are all. equal to 7,Fiber content,15 16 17 18 19 20 21 22 23 24 25 26. Bus travel times minutes,Sugar content, b Note percentiles were estimated using midpoints of the. histogram bar intervals i 86th percentile is approximately 0 5 10 15 20. 20 5 minutes ii 15th percentile is approximately Content grams serving. Answers 665, The sugar content in grams serving is much more variable 4 4 a r 5 0 335 there is a weak positive linear. than the fiber content in grams serving The boxplot of relationship. fiber content shows that the minimum and lower quartiles b The conclusion that heavier logging led to large forest. are equal to each other which is not observed in the fires cannot be justified because correlation does not imply. sugar content The distribution of sugar content values causation. is approximately symmetric which is different from the. skewed fiber content distribution, 3 71 a The 25th percentile indicates that 25 of full 860. time female workers age 25 or older with an associate. Satisfaction Rating, degree earn 26 800 or less The 50th percentile indicates. that 50 of full time female workers age 25 or older 820. with an associate degree earn 36 800 or less The 800. 75th percentile indicates that 75 of full time female 780. workers age 25 or older with an associate degree earn. 51 100 or less, b The 25th 50th and 75th percentile values for men are. all greater than the corresponding percentiles for female 80 90 100 110 120 130 140 150 160 170. workers indicating that full time employed men age 25 Quality Rating. or older with an associate degree in general earn more. than full time employed women age 25 or older with an There is a weak negative linear relationship between. associate degree satisfaction rating and quality rating number of defects. b r 5 20 239 there is a weak negative linear, relationship between satisfaction rating and quality rating. number of defects, Chapter 4 Describing 4 6 No the statement is not correct A correlation of 0. indicates that there is not a linear relationship between two. Bivariate Numerical Data variables There could be a strong nonlinear relationship for. example a quadratic relationship between the two variables. Section 4 1 4 7 No it is not reasonable to conclude that increasing. Exercise Set 1 alcohol consumption will increase income Correlation. 4 1 Scatterplot 1 i Yes ii Yes iii Negative measures the strength of association but association does. Scatterplot 2 i Yes ii No iii not imply causation, Scatterplot 3 i Yes ii Yes iii Positive 4 8 The correlation between college GPA and academic. Scatterplot 4 i Yes ii Yes iii Positive self worth r 5 0 48 indicates that there is a weak or. 4 2 a Negative correlation because as interest rates rise moderate positive linear relationship between those. the number of loan applications might decrease variables This tells us that athletes with higher a GPA. b Close to zero because there is no reason to believe that tend to feel better about themselves academically than. height and IQ should be related those with lower grades The correlation between college. c Positive correlation because taller people tend to have GPA and high school GPA r 5 0 46 indicates that there. larger feet is a weak or moderate positive relationship between those. d Positive correlation because as the minimum daily variables as well This tells us that those athletes with. temperature increases the cooling cost would also increase higher high school GPA tend to also have a higher college. GPA Finally the correlation between college GPA and a. 4 3 a There is a moderately strong positive linear measure of tendency to procrastinate r 5 20 36 indicates. relationship between school achievement test score and that there is a weak negative linear relationship between. midlife IQ those variables Athletes with a lower college GPA tend to. b r 5 0 6 because the article says that r 5 0 64 indicated procrastinate more than athletes with a higher college GPA. a very strong relationship higher than the correlation. between height and weight in adults Therefore a Additional Exercises. correlation that is moderately strong r 5 0 6 with a 4 15 a. positive association taller people tend to weigh more is b r 5 0 001. consistent with the statement,666 Answers, c The correlation coefficient of r 5 0 001 indicates that h No because 23 years is well outside the range of ages. there is essentially no linear relationship between mare used in determining the least squares regression line. weight and foal weight The scatterplot shows that there is. 4 25 a The response variable is the cost of medical care. no obvious relationship linear or otherwise between mare. and the predictor variable is the measure of pollution. weight and foal weight, b There is a moderate negative linear relationship. 4 17 r 5 0 987 this value is consistent with the previous c y. 5 1082 2 2 4 691x, answer because the correlation coefficient is large close d The slope is negative and it is consistent with the. to 1 and positive which indicates a strong positive observed negative association in the scatterplot. association between household debt and corporate debt e No the association between medical cost and pollution. level is negative which indicates that people over the age of. 4 19 The sample correlation coefficient would be closest to. 65 in more polluted areas tend to have lower medical costs. 20 9 Cars traveling at a faster rate of speed will travel the. length of the highway segment more quickly than those who. g No because the value 60 is far outside the range of data. are traveling more slowly and the correlation would be strong. values in the sample,5 11 48 1 0 970x,Section 4 2 b 496 48. Exercise Set 1 c 302 48, 4 21 It makes sense to use the least squares regression Additional Exercises. line to summarize the relationship between x and y for. Scatterplot 1 but not for Scatterplot 2 Scatterplot 1 shows 4 33 y. 5 13 5 2 0 195x, a linear relationship between x and y but Scatterplot 2 4 35 Age is a better predictor of number of cell phone. shows a curved relationship between x and y calls The linear relationship between age and number of. 4 22 It would be larger because the least squares cell phone calls is stronger than the relationship between. regression line is the line with the minimum value for the age and number of text messages sent. sum of the squared vertical deviations from the line All 4 37 The slope is the change in predicted price for each. other lines would have larger values for the sum of the additional mile from the Bay so the slope would be 24 000. squared vertical deviations,5 25 0 1 0 017x,4 23 a y Section 4 3. b 30 7 therms,c 0 017 therms Exercise Set 1, d No because the regression line was determined based 4 39 a y. 5 1 33878 2 0 007661x, on house sizes between 1000 and 3000 square feet There b r 5 0 099 Approximately 9 9 of the variability in. is no guarantee that the linear relationship will continue telomere length can be explained by the linear relationship. outside this range of house sizes between telomere length and perceived stress. c se 5 0 159 a typical amount by which telomere, 4 24 a The response variable y is birth weight and the. length will deviate from the least squares regression line. predictor variable x is mother s age,b It is reasonable to use a line to summarize the. d negative because the slope of the least squares, relationship because the scatterplot shows a clear linear. regression line is negative weak because r 5 20 315. relationship between birth weight and mother s age. 5 21163 4 1 245 15x, c y 4 40 A small value of se indicates that residuals tend to be. d The slope of 245 15 is the amount on average by small Because residuals represent the difference between. which the birth weight increases when the mother s age an observed y value and a predicted y value the value of se. increases by one year tells us how much accuracy we can expect when using the. e It is not appropriate to interpret the intercept of the least least squares regression line to make predictions. squares regression line The intercept is the birth weight. 4 41 It is important to consider both r2 and se when. for a mother who is zero years old which is impossible. evaluating the usefulness of the least squares regression. In addition the intercept is negative indicating a negative. line because a large r2 which indicates the proportion. birth weight which is also impossible, of variability in y that can be explained by the linear. f 3 249 3 grams, relationship between x and y tells us that knowing the. g 2 513 85 grams, value of x is helpful in predicting y and a small se indicates. that residuals tend to be small,Answers 667, 4 42 a Yes the scatterplot looks reasonably linear 4 44 b y 5 20 03443 1 0 5803x The predicted. nitrogen retention for a flying squirrel whose nitrogen. Median distance walked intake is 0 06 grams is 0 000388 grams The residual. 750 associated with the observation 0 06 0 01 is 0 009612. c The observation 0 25 0 11 is potentially influential. 700 because that point has an x value that is far away from the. rest of the data set,5 20 037 1 0 627 0 06 5 0 00062 This prediction. is larger than the prediction made in Part b, 600 4 45 a There appears to be a linear relationship. 5 18 483 1 0 0028655x, 550 c The observation 3928 46 8 is not influential because. the x value for that observation is not far from the rest of. 5 0 7 5 10 0 12 5 15 0 17 5, Representative age the data In addition removal of the potentially influential. point produces a least squares regression line with a. y intercept and slope similar to the original line. 5 492 80 1 14 763x, b y d Those points are not considered influential even though. c The residuals are 27 55 212 14 26 87 9 00 and they are far from the rest of the data because they follow. 216 17 There is a curved pattern in the residual plot The the trend of the remaining data points Removal of those. curvature indicates that the relationship between median points would produce a least squares regression line similar. distance walked and representative age is not linear to the line found using the full data set. Residual e se 5 9 16217 A typical deviation from the least squares. 30 regression line is 9 16217 percentage points, f r 2 5 0 832 Approximately 83 2 of the variability. 20 in percentage transported can be explained by the linear. relationship between percentage transported and number of. Are You Ready to Move On,Chapter 4 Review Exercises. 5 0 7 5 10 0 12 5 15 0 17 5 4 57 Scatterplot 1 i Yes ii Yes iii Negative. Representative age Scatterplot 2 i Yes ii No iii,Scatterplot 3 i Yes ii Yes iii Positive. 4 43 a The pattern for girls differs from boys in that the Scatterplot 4 i Yes ii Yes iii Positive. girls scatterplot shows more apparent nonlinearity in the. 4 59 a r 5 20 10 there is a weak negative linear, relationship Because the relationship is negative larger. 5 480 1 12 525x, arch heights tend to be paired with smaller average. c The residuals are 237 70 10 63 50 56 8 52 232 01. hopping heights, The curvature of the residual plot indicates that a curve is. b The correlations coefficients support the conclusion. more appropriate than a line for describing the relationship. since they are all fairly close to 0, between median distance walked and representative age. Residual 4 61 a r 5 0 944 there is a strong positive linear. relationship between sugar consumption and depression rate. b No because you can t conclude that a cause and effect. relationship exists just based on a strong correlation. 25 c These countries were not a random sample of all. countries and it is unlikely that they are representative. 0 4 63 a negative because it is likely that the work is more. stressful and less enjoyable for nurses with high patient to. 225 nurse ratios, b negative because it is likely that the patients at. hospitals where the patient to nurse ratio is high will not. 5 0 7 5 10 0 12 5 15 0 17 5 get as much individual attention and will be less satisfied.

