Re: Subsetting and disk I/O from David Brown on 2009-02-25 (pyngl-talk 2009 archive)

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Wed, 25 Feb 2009 16:13:30 -0700

Hi Jesper,

Thanks for the link to your web application that employs PyNIO.
I really like the route feature.

You are correct that the GRIB reader reads a complete record at a time.
These records are cached so that subsequent accesses of the same record
(as long as it remains in the cache) require no further file I/O.
This is true both for GRIB1, where the record data is read using
local NIO code
and also for GRIB2, which uses an external library, g2clib, from NCEP
for the low-level data
I/O. There is no interface for reading less than a complete record in
either case.

I think in both cases, the code was written this way primarily for
convenience, because it is
much easier to sub-set the records after unpacking them to a standard
internal format.

This approach seems to be reasonably efficient for the most common
access
patterns, but is obviously sub-optimal for your application.
It would require some fairly significant new code development to do
sub-setting directly
on the packed record data, although it might not be too difficult if
it was limited to the case
of getting a single point. Perhaps we could consider this.
Are you using GRIB 1 or 2 data?

Currently the only alternative I can suggest would be to pre-convert
the GRIB data to NetCDF.
This would definitely allow for faster random access of single elements.
I can understand that this might not be feasible however.
-dave

On Feb 25, 2009, at 5:02 AM, Jesper Larsen wrote:

> Hi PyNIO people,
>
> I have written a web application presenting marine weather forecasts
> using PyNIO:
>
> http://www.worldwildweather.com
>
> I am presently using forecasts from NOAA (sorry that credits are not
> yet presented along with the forecasts but on a separate page - I will
> change that before it is out of beta). Last month I upgraded from the
> 1x1 degrees resolution GFS to the 0.5x0.5 degrees resolution setup (4
> times as large). Since I am about to release an upgraded version of
> the application I would also like to begin addressing an issue that
> has haunted me for some time.
>
> One of the features of my application is that you can extract
> forecasts at specific points (look for Point Weather in the plot type
> selector in the left menu) or along trajectories in space and time
> (Route Weather). Since the upgrade I have however experienced problems
> with these features (they simply take too long to perform - so that
> the browser timeouts unless the GRIB records are already cached in
> memory). My suspicion is that this is caused by disk I/O. Am I correct
> in asserting that PyNIO will read in an entire GRIB record even if I
> only request data from a single point in the record (it seems like
> that from the memory usage)?
>
> This will of course accumulate to a lot of records when I request a
> timeseries at a single point. In fact I will read in ~260.000 times
> more data than I need. Do you have any suggestions for avoiding this
> (other than making sure that the entire array of GRIB records are
> already cached in memory)?
>
> Best regards,
> Jesper
> _______________________________________________
> pyngl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/pyngl-talk

_______________________________________________
pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/pyngl-talk
Received on Wed Feb 25 2009 - 16:13:30 MST

This archive was generated by hypermail 2.2.0 : Tue Apr 07 2009 - 10:04:30 MDT