Learn how to Work Smarter

Friday, March 12, 2010

Keeping real-time river and precipitation data, er, current

Been dragging my heels too long on my "data normalization" piece -- was sharing hydrology data with another staff today and figured this needed written up.

One of our projects is 1) adjacent to a tidally-influenced body of water; 2) has a RCRA cap; and 3) a bunch of transducer-outfitted wells each collecting 48 records/day to prove or disprove hydraulic connection with said river.  So we track environmental parameters including daily river fluctuations and local rainfall.

Enter the Internet and web scraping.

There are a ton of different ways of pulling data from the 'Net but, frankly, the easiest to validate -- prove is working to those who ask -- is the HTTP interface (type of text handled by a browser - a webpage).  I found Excel is a fine way of scraping raw delimited text, then link that to Access and append new records where they belong.  (As municipalities improve XML interfaces I expect to go that direction -- which should make this a little more efficient!)

First -- get the river data.  The USGS provides 30-minute increment stage, flow and discharge data for the river through their website.  They have a whole page on automated data retrieval -- but with the usual caveats I worked the quick-n-dirty.

Second -- collecting precipitation data.  Our City (that works!) has a nice network of automated bucket gages hosted by the USGS.  Pulling that into Excel looks like this (right):

The pink-ish cells are the "scraped" HTML -- the cells to the right contain a little text handling, stuff like:

  • date_PST: 
    • =LEFT(A15, 11)
  • inch_precip: 
    • =IF(ISNUMBER(VALUE(B15)), VALUE(B15)*0.01, "")
  • bad_data_flag: 

Next, I used linked tables in Access to pull this into a "staged" area -- which I think is important when you're relying on outside data -- you can control and manipulate the actual data table by using SQL queries to pull from the staged data.

Here are the linked tables (Access shows them in little Excel icons), named ltblPrecip_top60 and ltblRiverGageUpdate:

Then the queries to pull data into tblPrecipitationGage and tblRiverGage look like this:
INSERT INTO tblPrecipitationGage ( LOC_ID, FILEORIG, DTT_PST, INCH_PRECIP, DATA_FLG, UpdateUser, UpdateTime, UpdateDescription )
SELECT "BES Station Number 193" AS LOC_ID, "http://or.water.usgs.gov/non-usgs/bes/astor.rain" AS FILE_ORIG, DateValue([date_measure_PST]) AS RainfallDatePST, ltblPrecip_top60.inches_rainfall, ltblPrecip_top60.bad_data_flag, fOSUserName() AS UpdateUser, Now() AS UpdateTime, "qryAppendNewPrecipitation" AS UpdateDescription
FROM ltblPrecip_top60
WHERE (((DateValue([date_measure_PST]))>(SELECT Max(tblPrecipitationGage.DTT_PST)  FROM tblPrecipitationGage) And (DateValue([date_measure_PST]))<(DateValue(Now()))));

INSERT INTO tblRiverGage ( LOC_ID, FILEORIG, LoggingDTT, TimeDatum, ELEV_NAVD88, Velocity, Discharge, Data_Flag, UpdateUser, UpdateTime, UpdateDescription )
SELECT ltblRiverGageUpdate.LOC_ID, "http://waterdata.usgs.gov/nwis/uv?14211720" AS FILE_ORIG, ltblRiverGageUpdate.loggingdtt, ltblRiverGageUpdate.TimeDatum, ltblRiverGageUpdate.Elevation_NAVD88, ltblRiverGageUpdate.Velocity, ltblRiverGageUpdate.Discharge, ltblRiverGageUpdate.bad_data_flag, fOSUserName() AS UpdateUser, Now() AS UpdateTime, "qryAppendNewWillametteRiver" AS UpdateDescription
FROM ltblRiverGageUpdate
WHERE (((ltblRiverGageUpdate.LOC_ID)=14211720) AND ((ltblRiverGageUpdate.loggingdtt)>(SELECT Max(tblRiverGage.LoggingDTT)  FROM tblRiverGage) And (ltblRiverGageUpdate.loggingdtt)<(DateValue(Now()))))
ORDER BY ltblRiverGageUpdate.loggingdtt;
Okay, those look a little hairy -- but they aren't that bad.  Really.  :-)

Gonna have to wait until tomorrow for a reasonable explanation of those -- this is a long post, and I'm out of time.  (The dog is home with her legs crossed!)

Thanks for checking this out.

No comments:

Post a Comment

Home | My Schedule (Free/Busy) | Professional CV | Learn how to Work Smarter

A Little More Background

Friends & Followers