Re: PyNIO memory behaviour

From: Jesper Larsen <jl_at_nyahnyahspammersnyahnyah>
Date: Wed, 18 Apr 2007 10:23:07 +0200

On Tuesday 17 April 2007 19:52, David Ian Brown wrote:
> The GRIB reader does cache the data of the most recently accessed
> 150 records for each file that is open. This data is freed when the
> file is closed.
> It is possible that we could make the number of records cached
> (or the
> total size of the cache in bytes) a user-configurable option if this
> behavior is
> demonstrably undesirable in situations such as yours.

Hi David,

I was primarily writing because I have had some problems with memory leaks (I
think) in the plotting module I am using (matplotlib, which I was familiar
with before stumbling upon PyNGL;-). It confused me a bit that my simple test
cases for memory leaks did not show any problems but the memory usage in my
full application seemed to grow unexpectedly. Now that I know that data is
cached from the grib files I will have an easier time figuring out what is
going on.

Well, to your question. My ocean grib files have a typical number of grid
points of 100.000 in each vertical layer at each time step (a grib record -
the grib files are bit masked so in reality only a small fraction of these
points are stored in the files). When I retrieve the variables in python I
can see that they are floats (4 bytes). All my grib files have more than 150
records that I need to access. This means that the memory requirements per
grib file is (assuming bit masking is not used in the caching - which I guess
it is not):

100.000 * 4 * 150 = 60 MB

At the moment I have 7 grib files opened in the application at the same time.
This means that the memory requirement for caching is:

60 MB * 7 = 420 MB

I guess this would be approximately the amount of memory I would configure
PyNIO to cache if I had the option of setting the cache size myself - so I
guess I don't have a problem with it at the time being (assuming my
calculation is correct).


Ps. I have assumed that by 150 records you are talking in grib version 1 terms
where 1 record is a single vertical level at a single time step. If you mean
all vertical levels (i.e. in terms of a NioVariable) the memory usage of the
cache will definitely be a problem for me.
