Re: NIO: Problem creating NetCDF files larger 2GB

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Thu Aug 12 2010 - 11:54:42 MDT

HI Michael,

When you read a file, PyNIO knows that the file was created using
NetCDF's 'LargeFile' (64 bit offset) format, but when you create a
file the default is to create a 'Classic' format, because it is more
universal. There is no way to dynamically change the format of a file
once it is created. Anyway the punch line is that you need to set the
NetCDF Format option to 'LargeFile' or its synonym '64BitOffset'.
This is documented at http://www.pyngl.ucar.edu/Nio.shtml, and there
are a couple of ways to do it, but here is the original way: create an
option class variable and then pass it to the open_file method:

opt = Nio.options()
opt.Format = "LargeFile"
fh = Nio.open_file(outfile, 'w', opt)

Actually there are a couple of other options you could set that could
help improve the performance of your script dramatically. Setting
'PreFill' to False will prevent the NetCDF library from initializing
each data element to a _FillValue prior to assignment. And if you are
creating variables and assigning values sequentially, as your script
does, it helps a lot if you reserve some extra space in the header by
setting the option HeaderReserveSpace to a suitable value. This gives
some extra room in the header for defining new variables and
attributes. Otherwise, (noting that dimensions, variables and
attributes are all defined at the beginning of a NetCDF file) each
time you add a variable the NetCDF library has to move all the
existing data to make room for new header information. This gets
progressively slower as more variables are defined. Another thing to
realize about NetCDF is that, although PyNIO hides it, it has a define
mode and a read/write data mode. There is a cost associated with
switching between modes. So you can further improve performance if you
define all your variables first in one loop and write them in another
loop.

I am attaching two revised versions of your script that illustrate
these concepts.
  -dave

On Aug 12, 2010, at 2:59 AM, Michael Decker wrote:

> Dear all,
>
> Yesterday I noticed that I was not able to create NetCDF files
> larger than ~2GB using PyNIO. Strangely, it is no problem to _read_
> from files > 2GB (largest I tested was around 3GB so far). So I
> guess it is not a general problem about a LARGEFILE option not being
> set if reading works and only writing does not.
>
> When the file size reaches the 2.xGB limit, nio just kills the whole
> python process with the message "ncendef: ncid 65536: NetCDF: One or
> more variable sizes violate format constraints". The ncid reported
> varies depending on the layout of the file. The value given here is
> the one I can reproduce with my test script.
>
> I originally discovered the problem using PyNIO 1.3.0b4 yesterday.
> So I upgraded to b5 but it changed nothing as far as I can tell.
>
> Am I missing something? Does it only not work for me or is there a
> general problem? Any hints are welcome.
>
> my software versions are as follows:
> Linux 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 x86_64 GNU/
> Linux (Debian stable)
>
> Python 2.6.5 (r265:79063, Mar 20 2010, 03:56:44)
>
> NIO 1.3.0b5 installed from PyNIO-1.3.0b5.linux-x86_64-py264-
> gcc432.tar.gz
>
> numpy 1.3.0
>
> My test script is attached.
>
> Thanks for your help,
> Michael
>
> --
> Michael Decker
> Forschungszentrum Jülich
> ICG-2: Troposphäre
>
> E-Mail: m.decker@fz-juelich.de
> <nio_sizetest.py>_______________________________________________
> pyngl-talk mailing list
> List instructions, subscriber options, unsubscribe:
> http://mailman.ucar.edu/mailman/listinfo/pyngl-talk

_______________________________________________
pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/pyngl-talk

Received on Thu Aug 12 11:54:46 2010

This archive was generated by hypermail 2.1.8 : Fri Aug 13 2010 - 15:07:13 MDT