Supercomputing Networking Research Education Ohio Supercomputer Center Site Map Staff Directory Support

WebED Education, Outreach and Training

Using Real Data for Science and Math: Urban Sprawl and its Impacts on Rural America

In many regions, people are moving away from the central cities in major metropolitan areas into suburban and rural communities.  This migration consumes a tremendous amount of land resources, particularly prime agricultural land.  It also increases commuting times, energy use, and the potential for spreading air and water pollution problems.  References at the end of this mini-project will give you background on the problem.

We have assembled data on one of the fastest growing such communities in Ohio – Medina County.  A bedroom community for Cleveland and Akron-Canton, this county has grown in population by 120% from 1960 (65,315) to 2000 (143,230).  We obtained data from the auditors land records, showing the dates that different buildings were constructed and the land they consumed.

MINI-PROJECT

This mini-project includes:

1. Doing some background reading on the nature of the problem of sprawl and its possible solutions.

2. Using the maps we have created to visualize the rate of land use change in Medina County.

3.Using the auditor's dataset and data from the U.S. Census to quantify the rate of change in population and land use in the county by creating tables and graphs in Excel.

4. Making some simple forecasts using Microsoft Excel to create a Line and Curve of Best Fit.

5. Applying the same methodology in Excel to examine population growth, acreage growth, or another data set.

6. Writing a short report describing the nature of the overall problems and the specific findings for Medina County.  Represent your analysis in the form of Excel graphs, charts, and tables and describe the implications of sprawl.

GETTING STARTED

Microsoft Excel is required to use the spreadsheet and complete this mini-project.

Reading:

 DATASETS

There are two underlying datasets that you can use in your analysis. The first is the dataset showing the auditor's records of land holdings for all of Medina County.  The dataset appears on a spreadsheet called meparcels1.xls (Click here to download the spreadsheet). The second dataset – census data – is available online.

To analyze the auditor's dataset, you will need to do the following:

First, open the Excel spreadsheet that contains the raw data: the meparcels1.xls spreadsheet. Make certain you have access to the Data Analysis Toolpak that comes with Excel.  If it is already installed, you will find a menu item under Tools > Data Analysis.  If you do not see this menu item under the tools menu in Excel, do the following to turn on this option:

Step

Action

1

Make sure the meparcels1.xls spreadsheet is open.

2

Click on the Tools menu.

3

Select Add-Ins. A dialogue box appears with a list of options.

4

Check the box next to Analysis Toolpak.

5

Click OK. The Analysis Toolpak option should now appear on the Tools menu.

meparcels1.xls spreadsheet fields are as follows:

Field

Definition

PARCELID

Parcel ID: Every parcel is identified by a unique 11-digit alphanumeric code.

AUDITORSLU

Auditors Land Use code (see Appendix A)

YEARBUILT

Year Built: Year the first structure was built on the property.  0 means no structure.

ASSESSEDLA

Assessed value of the land

ASSESSEDVA

Assessed value of the structure on the land

ACRES

Acres: Number of acres

STRUCTYP

Structure Type (see Appendix B)

ODNRLAND

Ohio Department of Natural Resources Land Use Codes (see code sheet)

The coding sheets for these variables appear in Appendices A and B.

A NOTE ABOUT THE MEPARCELS1 SPREADSHEET
The data gathered from the Medina County Auditor is flawed in two ways: 1) there are, in some cases, repeated parcel ID numbers; 2) in theory, no structures should appear on undeveloped parcels (those with a 0 in the YEARBUILT column), but some do. We cannot account for the discrepancies except to say that the raw data is just that.

ANALYZING THE DATASET

NOTE: This tutorial is an example of to analyze a dataset. You are expected to extrapolate this information and apply it to other datasets.

Step

Action

1

Type 0 in cell L3.

2

Add subsequent years in 10-year increments in the L column until you reach 2000.

Now you are ready to create a histogram.  “A histogram is an analysis tool in Excel that calculates individual and cumulative frequencies for a cell range of data and data bins. This tool generates data for the number of occurrences of a value in a data set. For example, in a class of 20 students, you could determine the distribution of scores in letter-grade categories. A histogram table presents the letter-grade boundaries and the number of scores between the lowest bound and the current bound.”

Excerpted from Excel Help.

To create the histogram:

Step

Action

1

Click Tools menu > Data Analysis.

2

From the list of Analysis Tools, select Histogram.

3

Click OK.

4

The Histogram box appears. Holding down the shift key, click once on the C2 cell. Scroll down to the end of the YEARBUILT column. Click the last cell of the column. This selects the entire YEARBUILT column. Once you release your mouse, the Input Range appears in the Histogram box.

5

Next, place your cursor in the Bin Range field and select all of the dates in column L. The Bin Range field automatically populates with the years you entered in column L:

6

Click OK. A new sheet (Sheet 1) appears, identifying a range of years and the number of parcels developed during those years. These numbers will help you create Line of Best Fit and Curve of Best Fit charts to determine future growth projections.

Once you create a histogram, determine the percentage of parcels built over the last 100 years.

Step

Action

1

The histogram appears on Sheet 1 at the bottom of the spreadsheet. Right click on the label and select Rename.

2

Type Percentage.

3

Before going any further, insert a new row of cells on line 14 (this moves the current 14 line to 15); click Insert > Rows. Type the numbers from cells A2 and B2 in cells A14 and B14. Doing this ensures that you count the total number of unknown parcels at the end of the calculations.

4

Next, find the total number of parcels built over 100 years by calculating the sum of column B. To do this, type =SUM(B3:B14) in the B15 cell. Press Enter. This adds up all of the cells in the B column.

5

In the C column, right click on the letter C. Select Format Cells.

6

Under Number tab, select Percentage. Click OK. Type Percent in cell C1.

7

To calculate the growth percentage of parcels over the designated years, type the formula: =B3/$B$15 in cell C3. This indicates that you will divide cell B3 (number of parcels built in 1900) by B15 (the total number of parcels over the 100-year period).

6

Press Enter.

7

To find out the percentages of parcels per decade for each of the following cells in column C, copy the C2 cell, select the remaining cells (through cell 14) and paste the formula. Excel automatically calculates the remaining percentages

8

Next, calculate the cumulative percentage of parcel growth. In the D column, right click on the letter D. Click Format Cells and select Percentage.

9

In cell D1, type Cumulative.

10

The formula for creating the cumulative percentage on the spreadsheet is =C3+D2, =C4+D3, =C5+D4, and so on. In cell D3, type =C3+D2. Then press Enter.

11

To find out the cumulative growth percentages for each of the following cells in column D, copy cell D3, select the remaining cells (through cell 14) and paste the formula. The Cumulative sheet should now look like this:

Next, determine the Line of Best Fit:

Step

Action

1

To find the line of best fit, select cells D3 through D18.

2

Click Insert > Chart. The Chart Wizard takes you through the process of creating a chart.

3

Select the Line chart and the first box on the second row. Click Next.

4

Click Next twice. On the Chart location screen, select As Object in and Cumulative. This ensures that the chart appears on the Percentage screen.

5

Click Finish.

6

The Cumulative chart appears:

7

To create a line of best fit, right click on the plotted line. Select Add Trendline.  We will compare whether the linear or exponential is the best fit for this distribution.

8

The Add Trendline box appears.

9

Select the Linear box and click OK. The Line of Best Fit appears plotted on your chart.

Step

Action

1

To find the curve of best fit, right click on the original plotted line on your chart. Select Add Trendline.

2

Select the Exponential box and click OK.

  

3

The Curve of Best Fit appears plotted on the chart.

 

4

After determining the line and curve of best fit, prompt Excel to display the underlying equations by first right clicking on the linear line. Select Format Trendline.

5

Under Options, click the boxes next to Display equation on chart and Display R-squared value on chart.

 

6 The Linear equation appears on the chart.

 To view the equation of the exponential line, right click on the exponential line and follow the instructions in steps 4 and 5.

 REGRESSION

“The purpose of linear regression is to find the line that comes closest to your data. More precisely, the linear regression finds values for the slope and intercept that define the line that minimizes the sum of the square of the vertical distances between the points and the line.”

Linear regression allows you to “analyze how a single dependent variable is affected by the values of one or more independent variables — for example, how an athlete's performance is affected by such factors as age, height, and weight. You can apportion shares in the performance measure to each of these three factors, based on a set of performance data, and then use the results to predict the performance of a new, untested athlete.”

Regression gives you two things: 1) the predicted equation and 2) an indication of how well the equation fits the data. R2 shows whether you can use the equation to make predictions. Use the equation to find predicted Y values for known X values. Regression gives you the equation. R2 only shows you if the equation fits the data.

(Excerpted from Excel Help)

R Squared (R2)

R2 shows us whether we can use the linear or exponential equation to make predictions. Use the equation to find predicted Y values for known X values. Regression gives you the equation; R2 only shows you if the equation fits the data.

The value R2 is a fraction between 0.0 and 1.0, and has no units. An R2 value of 0.0 means that knowing X does not help you predict Y. There is no linear relationship between X and Y, and the best-fit line is a horizontal line going through the mean of all Y values.  When R2 equals 1.0, all points lie exactly on a straight line with no scatter. Knowing X lets you predict Y perfectly. It all depends on the data!

Excerpted from www.curvefit.com

QUESTION

Based on the R2 values, which of your lines best fits the data?

 EQUATION EXPLANATION

To predict parcels for the next time period, put in the number for the next time period.  To forecast the total number for the next time period (in ten more years), calculate the equation.

Linear Fit

The equation for determining the Linear Fit is Y = a ­± b*X. X represents the time period (i.e., 1920 or 1940 or 1990) numbered from the first period (1900) through the last in your data (2000) or the 11th period. Y represents the total number of structures. The letter “a” signifies the intercept number, and the letter “b” signifies the slope constant. The * represents the multiplication operation.  The final equation looks like this:

Y = (intercept number or constant) + slope constant * X variable

Exponential Fit

The equation of finding the Exponential Fit is Y=aebx. Again, “x” represents the known time period, and Y represents the total number of structures. The letter “e” signifies the exponential function. The final equation looks like this:

Y = (intercept number or constant)e(exponent constant)*x

To predict the number of developed parcels in ten years, you need to put a formula underneath your graph in the spreadsheet and substitute the appropriate values for your linear or exponential equation. 

For the linear equation, the formula in the spreadsheet will take the form:

= intercept number + slopeconstant*12.

In this example there are 11 periods.  The next period of 10 years will be the 12th period – hence substituting the “12” for X in the formula. 

For the exponential equation, the form will be:

= intercept number*EXP(exponentconstant*12)

The EXP( ) expression is the Excel function for the exponential with the items in parentheses representing the power function.  Again, the 12 reflects the next 10 year period.

QUESTIONS – Use your results to discuss the following questions

1. Is the best fit for the data linear or exponential? 

2. If this trend were to continue for the next 10 and 20 years, how many more parcels would be developed?

3. Try to apply the histogram and best-fit methodology to describe county trends in at least one other way using either the acreage variable on this spreadsheet or population data over time for the county.  What trends can you identify by looking at the number of acres of land as they develop over time? Which Best-Fit model matches the number of parcels? What conclusions can you draw by looking at census of population data?

- To obtain population census data, view these sites:

http://www.census.gov/population/cencounts/oh190090.txt

http://factfinder.census.gov/servlet/BasicFactsServlet?_lang=en

- For Tract 2000 information, go to:

http://factfinder.census.gov/servlet/GCTGeoSearchByListServlet?_lang=en

http://factfinder.census.gov/servlet/DTGeoSearchByListServlet?ds_
name=DEC_1990_STF3_&state=dt&_lang=en

 4. Now create a report on your findings.  Your report should contain an introduction that reviews the nature of the problems with urban sprawl based on your reading.  Properly cite your sources!  The second section should present your results of the analysis of the conditions in Medina County.  You can use the maps from the website, graphs and charts you created, and/or tables.  Present these items as illustrations and discuss what you found.

5. Now draw some conclusions about whether this is a big problem in Ohio and what might be done to solve it.  What are the pros and cons of alternatives?

6. Put your report in the form of an electronic document using Microsoft Word.  Instructions for submitting it.Your instructor will tell you when, where, and how to submit the final report.

GO TO APPENDIX

This material was assembled with the support of a grant from Kent State University as a part of their Preparing Tomorrow's Teachers to Use Technology program from the U.S. Department of Education. Other support came from an NSF Grant for the Education Outreach and Training Program of the Partners for Advanced Computational Infrastructure under a subcontract from NCSA at the University of Illinois.