Re: NIO: Problem creating NetCDF files larger 2GB

From: Michael Decker <m.decker_at_nyahnyahspammersnyahnyah>
Date: Mon Aug 16 2010 - 02:52:37 MDT

Hi David,

thanks very much for your hints. The LargeFile option does its job and
the other two sound like good advice as well. Especially the header
tweak greatly increases performance.

Just for the record: the options are not on
http://www.pyngl.ucar.edu/Nio.shtml but on
http://www.pyngl.ucar.edu/NioFormats.shtml which I overlooked while
searching.

Thanks for your help again,
Michael

On 08/12/10 19:54, David Brown wrote:
> HI Michael,
>
> When you read a file, PyNIO knows that the file was created using
> NetCDF's 'LargeFile' (64 bit offset) format, but when you create a file
> the default is to create a 'Classic' format, because it is more
> universal. There is no way to dynamically change the format of a file
> once it is created. Anyway the punch line is that you need to set the
> NetCDF Format option to 'LargeFile' or its synonym '64BitOffset'. This
> is documented at http://www.pyngl.ucar.edu/Nio.shtml, and there are a
> couple of ways to do it, but here is the original way: create an option
> class variable and then pass it to the open_file method:
>
> opt = Nio.options()
> opt.Format = "LargeFile"
> fh = Nio.open_file(outfile, 'w', opt)
>
> Actually there are a couple of other options you could set that could
> help improve the performance of your script dramatically. Setting
> 'PreFill' to False will prevent the NetCDF library from initializing
> each data element to a _FillValue prior to assignment. And if you are
> creating variables and assigning values sequentially, as your script
> does, it helps a lot if you reserve some extra space in the header by
> setting the option HeaderReserveSpace to a suitable value. This gives
> some extra room in the header for defining new variables and attributes.
> Otherwise, (noting that dimensions, variables and attributes are all
> defined at the beginning of a NetCDF file) each time you add a variable
> the NetCDF library has to move all the existing data to make room for
> new header information. This gets progressively slower as more variables
> are defined. Another thing to realize about NetCDF is that, although
> PyNIO hides it, it has a define mode and a read/write data mode. There
> is a cost associated with switching between modes. So you can further
> improve performance if you define all your variables first in one loop
> and write them in another loop.
>
> I am attaching two revised versions of your script that illustrate these
> concepts.
> -dave
>
>
>
>
>
>
>
> On Aug 12, 2010, at 2:59 AM, Michael Decker wrote:
>
>> Dear all,
>>
>> Yesterday I noticed that I was not able to create NetCDF files larger
>> than ~2GB using PyNIO. Strangely, it is no problem to _read_ from
>> files > 2GB (largest I tested was around 3GB so far). So I guess it is
>> not a general problem about a LARGEFILE option not being set if
>> reading works and only writing does not.
>>
>> When the file size reaches the 2.xGB limit, nio just kills the whole
>> python process with the message "ncendef: ncid 65536: NetCDF: One or
>> more variable sizes violate format constraints". The ncid reported
>> varies depending on the layout of the file. The value given here is
>> the one I can reproduce with my test script.
>>
>> I originally discovered the problem using PyNIO 1.3.0b4 yesterday. So
>> I upgraded to b5 but it changed nothing as far as I can tell.
>>
>> Am I missing something? Does it only not work for me or is there a
>> general problem? Any hints are welcome.
>>
>> my software versions are as follows:
>> Linux 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 x86_64
>> GNU/Linux (Debian stable)
>>
>> Python 2.6.5 (r265:79063, Mar 20 2010, 03:56:44)
>>
>> NIO 1.3.0b5 installed from PyNIO-1.3.0b5.linux-x86_64-py264-gcc432.tar.gz
>>
>> numpy 1.3.0
>>
>> My test script is attached.
>>
>> Thanks for your help,
>> Michael
>>
>> --
>> Michael Decker
>> Forschungszentrum Jülich
>> ICG-2: Troposphäre
>>
>> E-Mail: m.decker@fz-juelich.de
>> <nio_sizetest.py>_______________________________________________
>> pyngl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
>> http://mailman.ucar.edu/mailman/listinfo/pyngl-talk
>

-- 
Michael Decker
Forschungszentrum Jülich
ICG-2: Troposphäre
E-Mail: m.decker@fz-juelich.de

_______________________________________________
pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/pyngl-talk

Received on Mon Aug 16 02:53:04 2010

This archive was generated by hypermail 2.1.8 : Sun Aug 22 2010 - 22:06:33 MDT