Re: Writing speed depends on indexed dimension

From: David Brown <dbrown_at_nyahnyahspammersnyahnyah>
Date: Tue Oct 12 2010 - 12:19:00 MDT

Hi Sourish,
I apologize for not responding to your message sooner. It slipped off my radar somehow. There are 2 things you can do, each of which will almost entirely eliminate the long initialization time
you are seeing:
1) set opt.PreFill = False
This prevents the NetCDF library from assigning a fill value to each element of the variable prior to its assignment. In PyNIO this occurs when you do the first actual data assignment. Of course, you should only set this option if you know you will assign something to every data element before trying to use the output file.
2) Make the 'days' dimension into an 'unlimited' or 'record' dimension. You do this by setting the dimension size to 0:


This changes the organization of the NetCDF file such that slices of all variables for each timestep are written in a block, rather than the complete variable being written contiguously. For files that may be extended in the future with additional timesteps this is a more generally efficient organization, although it can have drawbacks if you need to slice through all times with a subset of the other dimensions.
Hope this helps.

On Oct 4, 2010, at 4:55 AM, Sourish Basu wrote:

> Update: it seems that only writing the first block takes a lot of time. So conc_var[0] = random((25,180,360)) takes more than 5 minutes, but conc_var[1] = random((25,180,360)) right after that takes less than a second. So it's not as bad as I thought it was, but I'm still confused as to why.
> -Sourish
> On 10/04/2010 12:04 PM, Sourish Basu wrote:
>> Hi All,
>> I'm trying to write a very large array to a variable var[2922,25,180,360] in a netcdf file with the option LargeFile set. The file size per se is not a problem. I'm computing and writing it in chunks of 25x180x360. The problem is that the write speed depends on the length of the first index. For example, the following
>> opt = options()
>> opt.Format = 'LargeFile'
>> fid = open_file('', 'w', options=opt)
>> fid.create_dimension('days',2922)
>> fid.create_dimension('pres_divisions',25)
>> fid.create_dimension('pres_levels',26)
>> fid.create_dimension('latitude',180)
>> fid.create_dimension('longitude',360)
>> conc_var = fid.create_variable('mixing_ratio', 'd', ('days', 'pres_divisions', 'latitude', 'longitude'))
>> conc_var[0] = random((25,180,360))
>> takes a very long time, whereas if I just change the definition of 'days' to
>> fid.create_dimension('days',2)
>> then the writing of conc_var[0] takes less than a second. I want to write the blocks sequentially, i.e., conc_var[0], then conc_var[1], then conc_var[2] and so on. Is there any way to speed up the writing (in the case where 'days' is 2922) by using that fact?
>> Cheers,
>> -Sourish
>> _______________________________________________
>> pyngl-talk mailing list
>> List instructions, subscriber options, unsubscribe:
> <S_Basu.vcf>_______________________________________________
> pyngl-talk mailing list
> List instructions, subscriber options, unsubscribe:

pyngl-talk mailing list
List instructions, subscriber options, unsubscribe:
Received on Tue Oct 12 12:19:04 2010

This archive was generated by hypermail 2.1.8 : Mon Nov 15 2010 - 09:11:30 MST