MODEL 1

STEP 2: DERIVING STREAMFLOW AND GAGE HEIGHT RELATIONSHIPS

The second step in model building is obtaining the “site numbers” used by the USGS for these 37 stations. Site numbers are needed to match gage height and streamflow data.

To obtain “site numbers”, we went to the USGS' Streamflow Measurements for North Carolina. This page is located at http://waterdata.usgs.gov/nc/nwis/measurements&introduction. You can perform a query by selecting only Hydrologic Unit criteria. Select Hydrologic Unit and click Submit.

Next, we selected all the hydrologic units, leaving all other options in their default values, and Submited the query.  You can select all hydrologic units by clicking on the first unit, and then, on a Windows machine, hold the shift key down and scroll to the bottom item on the list and click on it.  You will have selected all entries.  On a Macintosh, the Command key produces the same result. You should obtain the entire list of stations, 198 in all, in North Carolina.
  
Now that you have a list of all the stations in North Carolina with their station numbers, you should be able to obtain the USGS“site numbers” for the 37 stations by using their names. We have already done this and placed those numbers on your template.  Look down the list to find a few of the station numbers and corroborate that they match the USGS data.

After the “site numbers” are confirmed, you should obtain the stage measurements for the available 29 stations by using site numbers one by one.

To do that, go to USGS’ Streamflow Measurements for North Carolina. This page is located at http://waterdata.usgs.gov/nc/nwis/measurements&introduction. Perform queries by checking the Multiple Site Numbers box and then clicking on Submit.

Copy all the station numbers from your spreadsheet “flood_stage.xls”, and paste them into the Multiple Site Numbers text box. Choose the period from 01/01/1998 to 12/31/2002. Put these values in the Retrieve data from text box at the end of the form. 

Notice the format order for the year, month, and day.  Select Tab-separated data and Save to file options. Leave the other options in their default values and then click Submit and you will prompted to save a file.  It will have an arbitrary name of letters and numbers.  

Open the saved file in Excel. It will not be recognized as an Excel file so you will have to change the File Type box to read All Files to find it.  When you open the file, you should see a format dialog box relating to use while importing.

Spreadsheet data import dialog box

Choose Delimited  and click Next.  In the next dialog box, choose Tab for the delimiter and click Next.  On the last dialog screen, be sure to change the data type for the station number field to text.  Do this by clicking on the Text button in the Column data format area as illustrated below.

choose text dialog box

Then click Finish.  The spreadsheet will have comment rows on the top that document the data sources.  You can delete these 39 rows.  To do this, Click on the row numbers, dragging down to the 39th row to highlight all of the rows you want to delete, and then Click on Edit Delete.  

You should have column labels on the top of the spreadsheet.  The new second row has some extraneous data you don't need starting as 15s, 5s, 5s, etc.  You can delete that row.  

gage height spreadsheet

Note that we have streamflow data for 27 of the 37 stations, for which we have flood stage data. Save this file in Excel format as “Gage_Height_Streamflow”. Now we have both gage height and streamflow data, so that we can derive the relationship between the two. Gage height is stored in the column inside_gage_va (column I) and Streamflow is stored in discharge_1 (column K). We need to have the data in adjacent columns with the first column being discharge,  We will not need the data in the measured_rating_diff column, so, you can copy the inside_gage_va data to that column.  You can then delete the extra copy of that column.  Finally, there are also missing values in the inside_gage_va column.  Find those values and delete those rows.  Save the file again.

The "cleaned" file includes 944 observations.

To see the nature of the relation and detect possible exceptional observations, create a scatter plot showing gage height versus streamflow. To do that, first select the columns discharge_1 andinside_gage_va and click on chart wizard in Excel and select the XY (Scatter) plot type. You can fill in a title, X, and Y axis labels, and save the graph to a new worksheet called gage_height.  It should look like this:

xy scatter graphic  

You can now add trendlines to this graph to see possible mathematical models that fit the data. To do this, single click Chart Add Trendline. You need to add two different trendlines. First, try the linear form, and second, try a polynomial (order=2) form. Your chart should look like the one below.

fitting a trend line  

You should see that the polynomial line seems to be a better fit.  To get more information about the goodness of the fit, click on the polynomial line and hit delete to take it off the graph.  Now add it again using Chart Add Trendline, but this time, Click on the Options tab and choose to Display equation on chart and Display R-squared value on chart.  You will see the polynomial equation for the trendline along with an R-squared value.  That value indicates the proportion of change in the gage height that is explained by the equation using streamflow.  For example, an R-squared of 0.69 means that 69% of the variations in gage height is explained by the equation.  Another way of thinking about this is that the equation fails to explain 30% of the variation.

options box example

The equation has the polynomial form:
Gage Height = Intercept + (B1 * Streamflow) + (B2 * Streamflow2).  So if we can predict streamflow, we can estimate the gage height and compare that to the flood stage for a particular place, telling us whether or not flooding will occur.


PREVIOUS PAGE                                                    NEXT PAGE