Activity 6.2

Lines of Best Fit

10 points

Due at the beginning of class, Friday, February 27, 2009

It is often desirable to use a linear function to model a given set of data. In this activity, you will work with several data sets, and for each, you will look for a suitable line that approximates the given data. You will also work with the line that is generally used as the best-fitting line, the least-squares regression line. You will learn how to use Excel to find this line and its equation.

1. For the following scatterplots 1.a, 1.b, and 1.c, draw on a separate sheet of paper, which you do not need to turn in, an appropriate line to the data by "eye-balling" the graph and judging what line comes closest to all points. These graphs are located on page 453 and 454 of your textbook, if you prefer to do your sketching there. You should then use this plot to type your answers to #1(a) - #1(c), where you will indicate whether the slope of the line you drew is positive, negative, or zero. For graph #1(d), in addition to estimating the slope, you should explain why a line would not be a good fit. (Source: The Wall Street Journal Almanac 1999.)

 

1. a. Slope positive, negative, or 0?

 

1. b. Slope positive, negative, or 0?

 

1. c. Slope positive, negative, or 0?

 

1. d. Slope positive, negative, or 0? Why is a line not a good fit for graph (d)?

Because individual people might draw different lines, especially when the data are scattered as they are on graph (c), we need a way to construct a line that doesn't depend on an individual's perception. The most commonly used method to construct such a line results in the least-squares regression line or just the regression line. This line, among all possible lines we could draw, is the one that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. This line is easy for a computer or calculator to find because it involves calculations using straightforward (but kind of messy) formulas.

2. The regression line is used to show how a response variable changes, on average, as an explanatory variable changes. You can use such a line to predict the value of the response variable for a particular value of the explanatory variable. The least-squares regression line, for the data given in #1c, is shown on the following graph.

2. a. On a separate sheet of paper, which you do not need to turn in, draw in the vertical distances from all points in the data set to the least-squares line. This graph is located on page 455 in your textbook.

2. b. How many data points lie above the line?

2. c. How many data points lie below the line?

2. d. How can you tell from a scatterplot of the data, whether the slope of the regression line will be positive or negative?

3. Retrieve the data set EA6.2.1 Verbal SAT Data.xls and create a scatterplot of "percent taking the test" and "verbal SAT score." When creating the scatterplot, highlight only the two columns of data corresponding to the two quantitative variables -- do not select the names of the states. Include appropriate titles, and change the scale on the vertical axis to go from 450 to 600. (See Activity 2.1 if you need a refresher on creating scatterplots and changing the scale.) You should not paste this scatterplot into your Word document until you add the trendline in #4 below.

4. Now use the following instructions to find the regression line for these data. You should paste this graph into your Word document.

5. Write the equation of your line and indicate what the variables x and y represent.

6. What is the slope of the line, and what does it represent? Interpret the slope in the context of the data.

7. What is the y-intercept of the line, and what does it represent? Interpret the y-intercept in the context of the data. Would it really make sense in the context of these data?

8. Use the line you found to predict the average verbal SAT score for a state in which 60 percent of students take the exam.

9. Retrieve the data set EA6.2.2 Data Movies and Vid.xls. This file contains data collected from a sample of college students. Create a scatterplot of the two quantitative variables and find the regression line for these data. Note that your scatterplot (with trendline) will use all the data, even the outlier. Paste this graph into your Word document, and use it to answer #9(a) - #9(c).

9. a. Write the equation of the line and indicate what the variables x and y represent in the equation.

9. b. Is there a clear choice of explanatory variable and response variable in this data set? Why or why not?

9. c. Describe what your scatterplot and line show.

9. d. There is one clearly unusual data point in the data set - the male who estimated he saw 200 movies at a theater last year. Delete this case and create another scatterplot (with trendline). Paste this graph into your Word document. Write the equation of this adjusted line and describe how the least-squares line changed when the outlier was deleted.

Summary
In this activity, you practiced creating scatterplots and learned how to find the regression line for a set of data using Excel. You interpreted the slope and y-intercept of a regression line in the context of the data set from which it was obtained, and used the regression line to predict values of the response variable. You also explored how an unusual data point can affect the regression line.