Data Formats Supported by the PyNIO module

This page provides more detailed information about each of the PyNIO supported data formats. The common interface to all the file formats supported by PyNIO is discussed in the PyNIO reference document and will (mostly) not be repeated here. This document focuses on the individual features of each format, their differences, and what PyNIO does to translate them in a more or less uniform manner. The most detailed discussion involves the GRIB format, because of all the supported formats, it has required the most work to recast into a NetCDF-like model.

The data formats currently supported by PyNIO are:

NetCDF - Network Common Data Format

Online documentation for NetCDF is available at http://www.unidata.ucar.edu/packages/netcdf/index.html.

PyNIO offers read and write access to existing NetCDF files as well as the ability to create NetCDF files from scratch. Almost all features features of NetCDF's classic model (versions 3 and ealier) are supported by PyNIO because it was created using the NetCDF3 data model as a pattern.

As of version 1.3.0b1, PyNIO supports NetCDF4 classic model format. The classic model constrains the interface to the constructs provided by NetCDF3 and earlier. However, the underlying file format, like that of all NetCDF4 files, is HDF5. Files written in this format can take advantage of the built-in file compression available in HDF5 and limitations on file and variable size are for practical purposes eliminated. Existing files in this format will be recognized automatically and handled transparently. To create a file with this format set the Format option with the value "NetCDF4Classic". You can turn on compression using CompressionLevel option. More information about the options is given below. Standard NetCDF4 files are NOT currently supported. If a file contains any NetCDF4-only features, you can expect core dumps if you open it in read-only mode and possible file-corruption if you open an existing file in write mode.

NetCDF file format options

CompressionLevel
Specify the level of data compression as an integer in the range 0 through 9. Increasing values indicate greater compression. Compression is lossless. There are tradeoffs between the time spent compressing the file, versus the amount of compression achieved. Informal tests show that compression level 9 results in a file only a few percent smaller than a compression level 5 file, but it requires 4 or 5 times the amount of time to create it. (This option is ignored unless the Format option is set to 'NetCDF4Classic'.)
Format
This option has an effect only for files opened in "create" mode. It currently has four valid values, two of which are synonyms:
'Classic' (default)
Create a standard NetCDF file. Standard NetCDF files are more limited with respect to file size. Assuming the underlying file system has support for large files, the total size can exceed 2 GB, but there are severe restrictions regarding the number of large variables and the order in which they are written. In general, because it is more universal, the NetCDF3 classic format is recommended if the total file size will be less than 2 GB and file compression is not required.
'LargeFile' or '64BitOffset'
Create a NetCDF file with support for larger variables and a theoretically much larger total size (about 9.22e+18 bytes). Each fixed-size variable, or each 'record' (element of the first dimension) of a variable with an unlimited dimension can have a size of up to 4 GB. PyNIO automatically reads NetCDF files in either the classic or the 64-bit offset format, assuming the underlying file system has support for large files. For more detailed information about large file support in NetCDF see http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Large-File-Support.html.
'NetCDF4Classic'
Create a NetCDF4 classic model file. The classic model constrains the interface to the constructs provided by NetCDF3 and earlier. However, the underlying file format, like that of all NetCDF4 files, is HDF5. Files written in this format can take advantage of the built-in file compression available in HDF5. Use the CompressionLevel option to enable compression. Also the HDF5 format removes virtually all restrictions on file and individual variable size.

Beginning with Version 1.3.0b1 of PyNIO is built with a release version of NetCDF4 and HDF5 and fully supports the "NetCDF4Classic" option. PyNIO version 1.2.0 provided beta-level support for this format using a beta version of NetCDF4 and HDF 5. It should probably not be used for mission-critical file creation.

HeaderReserveSpace
This option has an effect only for files opened for writing. This option reserves extra space within the header of a NetCDF file. Its value is an integer that specifies the number of bytes to reserve in addition to the bytes used for the currently defined dimensions, variables, and attributes. This option can improve performance when it is likely that new dimensions, variables, or attributes will be added to an already large file.

MissingToFillValue
If set to its default value, True, this option causes a "virtual" _FillValue attribute to be created for any variable that has the attribute missing_value but not _FillValue. The purpose is to more gracefully handle files that use the COARDS-compliant missing_value instead of _FillValue to indicate missing data. Note that if a variable in a file has both a missing_value and a _FillValue, or if it has neither, the option does nothing. The virtual _FillValue attribute is not actually part of the NetCDF file, but only appears to be from within PyNIO. However, If the file is opened for writing and you assign to the attribute, it becomes an actual attribute.

PreFill
This option has an effect only when a file is opened for writing. It is logical-valued with a default value of True. If set False, PyNIO alters the standard behavior of the NetCDF library such that variable element locations in the file are not "pre-filled" with the missing (fill) value associated with the variable. This can noticeably improve performance when writing large datasets. However, if you set this option False, you are responsible for ensuring that all the elements of the variables you have defined are assigned a valid value.

SafeMode
This logical-valued option may be set for any NetCDF file. Its default value is False, meaning that PyNIO only closes a NetCDF file when the close is invoked. If set to True, PyNIO closes the file after each operation it performs, including defining a dimension or variable, adding or modifying an attribute, or reading or writing data from any variable. This helps ensure the file's integrity for writable files if the close method does not get called for some reason. However, it may result in loss of performance, particularly when adding new variables, dimensions, or attributes to files that already have large variables defined. This is because each time a new element is defined, all existing data in the file must be moved to make room for the metadata of the new element in the header. One way to mitigate the performance loss is to use the HeaderReserveSpace option when first creating the file to make room in the header for subsequently defined NetCDF elements.

Data model differences

While PyNIO has support for string data type, NetCDF3 does not. However, PyNIO maps NetCDF attributes of type character into Python strings for convenience. Likewise, you can set the value of a NetCDF character attribute using a Python string.

HDF - Hierarchical Data Format - (version 4) - Scientific Data Sets (SDS) only

Online documentation for HDF is available at http://www.hdfgroup.org/products/hdf4/.

PyNIO's HDF interface understands a subset of the content available in HDF4-formatted files. PyNIO can read and write data that uses the SDS (Scientific Data Set) interface. Opening an HDF file is analogous to opening a NetCDF file as documented above, with some minor exceptions.

The first thing you will probably notice when you attempt to print the contents of an HDF file object is that HDF files often begin with voluminous multiline global attributes. Some of this information contained in these attributes may be useful, but much of it is not of much concern to persons only interested in getting at the data.

The HDF format allows variable, attribute, and dimension names with spaces and other non-alphanumeric characters, but for reasons of compatibility with other software, PyNIO's underlying NIO library replaces spaces and non-alphanumeric characters with the underscore '_' character. To compensate for this possible loss of information, PyNIO provides an attribute for each variable, called hdf_name, that contains the name exactly as it appears in the HDF file. This attribute is redundant in cases where the actual HDF name contains only alphnumeric or underscore characters.

PyNIO has a read-only ability to understand HDF Vgroups. When a variable that is part of a Vgroup is encountered, PyNIO appends a double underscore and the group number to the end of the variable name. This ensures that the variable will have a unique name, relative to variables that belong to other Vgroups. It also provides two additional attributes to the variable: hdf_group, whose value is the HDF string name of the group, and hdf_group_id, whose value is the same as the group number appended to the end of the variable name.

There is currently no way to access 8-bit and 24-bit HDF images from PyNIO. There is also no access to the HDF VDATA interface.

HDF-EOS - Hierarchical Data Format Earth Observing System (version 2) - GRID and SWATH only

Online documentation for HDF-EOS is available at http://hdfeos.org/software/library.php.

PyNIO provides read-only access for SWATH and GRID data groups in HDF-EOS files. POINT data groups are currently ignored. As with all PyNIO's supported formats, HDF-EOS files are read into file variables that use NetCDF-like conventions. Note that since HDF-EOS2 files are a type of HDF4 file, it is possible, by setting the format optional argument of the open_file method to 'hdf', to use the HDF4 interface to open HDF-EOS2 files. This view of the file sometimes gives useful information not obtainable through the HDF-EOS interface. On the other hand, because HDF-EOS files often use the more generic '.hdf' suffix in their names, it is easy to mistakenly use the HDF4 interface when the HDF-EOS interface would lead more directly to the relevant information and data. In this case you should set format to an HDF-EOS-specific suffix to read the data using PyNIO's interface to the HDF-EOS library.

Non-alphanumeric characters in HDF-EOS variable names are replaced the '_' character when listed from PyNIO and are referenced from PyNIO in this fashion. HDF-EOS files use groups to specify the specific SWATH or GRID that a variable belongs to. In the HDF-EOS2 interface PyNIO appends the SWATH or GRID name, preceded by an underscore, to all variable names that belong to the group in order to ensure that each variable name is unique within the namespace of the NioFile instance variable. As of version 1.2.0 or later, PyNIO's HDF-EOS interface supplies an attribute called hdfeos_name that contains the actual variable name as present in the file.

Also as of version 1.2.0 or later, PyNIO provides access to the Geolocation variables associated with SWATH data groups. These variables have 1 or 2 dimensions and serve to locate the data variables in time and space. PyNIO also provides supplementary coordinate variables for GRID data that are calculated on the fly using the GCTP (General Cartographic Transformation Package) library that is a required component of the HDF-EOS interface. These supplementary variables can be distinguished from true HDF-EOS variables by their lack of the hdfeos_name attribute. Since only a few of the possible projected GRID types were available for testing, the coordinate values contained in these supplementary variables are not yet considered to be fully reliable. Users are encouraged to report cases where the coordinate values do not seem to be correct.

HDF-EOS5 - HDF5 - Earth Observing System - GRID, SWATH and ZA; limited POINT support for some metadata only

Online documentation for HDF-EOS5 is available at http://hdfeos.org/software/library.php.

As of version 1.4.0 or later PyNIO provides read-only access for SWATH, GRID, and ZA data groups in HDF-EOS5 files. POINT data groups are currently limited to accessing the headers only. As with all PyNIO's supported formats, HDF-EOS5 files are read into file variables that use NetCDF-like conventions.

Non-alphanumeric characters in HDF-EOS5 variable names are replaced the '_' character when listed from PyNIO and are referenced from PyNIO in this fashion. HDF-EOS5 files use groups to specify the specific SWATH or GRID that a variable belongs to. In the HDF-EOS5 interface PyNIO appends the SWATH or GRID name, preceded by an underscore, to all variable names that belong to the group in order to ensure that each variable name is unique within the namespace of the NioFile instance variable. PyNIO's HDF-EOS5 interface supplies an attribute called hdfeos_name that contains the actual variable name as present in the file.

CCM - Community Climate Model History Tape Format

The CCM format is a format, originally in CRAY COS blocked form, written by the NCAR CCM1, CCM2, and CCM3 global climate models. It is also possible to have IEEE CCM files. Currently, PyNIO does not support IEEE CCM files due to lack of documentation. It is possible to use the public domain tool called "ccm2nc" (available on almost all SCD computers; "man ccm2nc") to convert these files to NetCDF. PyNIO can then reference the NetCDF file(s). If not on SCD machines then the "ccm2nc" software can be downloaded from http://ftp.cgd.ucar.edu/cms/ccm3/tools/

CCM files are pretty straightforward (no special naming convention is needed as with GRIB files); the variable names and unit information are stored as character data in the CCM files. When a CCM file is opened, PyNIO scans the file and creates an index of all the data in the files. This can be expensive for large files, but it facilitates quickly accessing individual variables of the file. Because this can be expensive, you should avoid repeatedly calling addfile on the same file whenever possible.

For more information on the CCM model and CCM file format, see the CCM3 User's Guide.

GRIB - Gridded Binary (version 1) or General Regularly-distributed Information in Binary Form (version 2)

(GRIB2 support available in version 1.2.0 or later. )

Online documentation for GRIB is available at:

GRIB1
http://www.nco.ncep.noaa.gov/pmb/docs/on388/
GRIB2
http://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc.shtml.

PyNIO provides read-only support for data in GRIB1 and GRIB2 formats. To open a file, you only need to know that the file is GRIB. PyNIO figures out which version of GRIB is in the file and processes the file accordingly. PyNIO's support for GRIB is an evolving process. As an ever more diverse set of GRIB files have been encountered, PyNIO has been improved to handle many more features of the GRIB format. However, since GRIB has many features that are obscure enough that they have never been encountered in practice by the NIO library developers, there are still some aspects of GRIB that PyNIO does not handle properly. Generally, the NIO library developers try to support features that appear in GRIB files that users are actually using. The best way to help improve the GRIB-decoding capabilities of PyNIO is to call attention to files that are important to your work but that PyNIO does not seem to interpret correctly. If you have problems reading a particular set of GRIB files, please contact Mary Haley

At some point PyNIO may provide the ability to write GRIB, but, for now, the NIO library developers consider their mission is to provide read access to as many types of GRIB files as possible.

GRIB is a compact file format composed of a series of independent records each generally containing a grid of data points covering some geographical extent. Each record contains in a coded form the information required to derive the location of the data points in space and time, the type of quantity the data represents, and the origin of the data, including who produced it and the generating model. But in order to decode the data, external information is required. This much is true both for GRIB1 and GRIB2. But while decoders for GRIB1 or for GRIB2 both rely on information from external tables and many of the individual pieces of information in the tables are the same, the organization of the tables is quite different. As the newer format, GRIB2 has the capability to allow for much more diversity of data without the need for the local extensions that have evolved over the years out of necessity in GRIB1. Also in spite of the evidence of some initial divergence there is hope at least that the GRIB 2 tables will become standardized enough that one set of tables can be used to decode GRIB files from anywhere in the world.

Before discussing the specifics of PyNIO's treatment of each format, it is worth looking at the common features. These arise partly from basic similarities in the formats, but also significantly from PyNIO's unifying data model.

Dimensions in GRIB

The fundamental thing that PyNIO does for both GRIB1 and GRIB2 is to assemble related records into NetCDF-like variables with named dimensions and coordinate variables. The right-most dimensions (usually two) are specified by the grid associated with each individual record. The naming conventions for the dimensions and coordinate variables vary somewhat between GRIB versions and are discussed in the version-specific details below. Dimensions to the left of the grid dimensions result from sorting the related records into a coherent order. These dimensions, in order from left to right (slowest to fastest varying) are: Of course, variables in GRIB files seldom have all possible dimensions present. If the variable does not belong to an ensemble or probability sequence forecast, this dimension will be absent. If all the records have the same initial time then the initial time is by default presented as an attribute of the variable. This is likewise true for forecast time and level. Note, however, that as of (version 1.2.0 or later), it is possible to tell PyNIO to convert any or all of these attributes into single element dimensions using the NioOption SingleElementDimensions. This option is intended to simplify the concatenation of conforming variables from multiple files.

Generally these dimensions have associated one-dimensional coordinate variables. Following standard NetCDF conventions the coordinate variables have the same name as the dimensions. These variables have long names, types, and units that conform as much as possible to standard CF conventions. However, there are details peculiar to each of these dimensions that need discussion:

ensemble
The ensemble dimension uses an index variable as its coordinate variable. This variable simply serves to give a numeric ordering to the ensemble members. Otherwise, there is not always an easily-defined order to these elements. An auxiliary variable, implemented as an array of strings, gives information about each ensemble member.
probability
The probability dimension has as its coordinate variable an ascending series of threshold values of the quantity whose probability is measured. Note that the variable itself contains percentage values indicating the likelihood of the quantity at each coordinate value. Note that, currently, PyNIO allows for only one of the ensemble or probability dimensions for any single variable. If this becomes a problem, it may change with future releases.
initial_time
The initial_time coordinate is expressed in a CF-compliant form as the number of hours since Jan 1, 1800. However, the NioOption InitialTimeCoordinateType allows you to change the type of this coordinate to human-readable strings representing the date and time. But regardless of the type of the initial_time coordinate variable, PyNIO always supplies three auxiliary variables with three different representations of this information:
  • number of hours since Jan 1, 1800 (double type)
  • string array representation of date and time
  • an encoded representation of the date of type double, with units yyyymmddhh.hh_frac indicating the position and number of digits for the year, month, day, hour, and fractional part of the hour.
forecast_time
The forecast_time dimension is an integer offset from the initial_time. Its units are usually, but not always, hours.
levels
If GRIB records for a quantity exist at multiple vertical levels, PyNIO creates a level dimension for the variable. Depending on the level type, GRIB may specify a single level value, or it may specify the lower and upper bounds of the level. If the quantity is defined using a single value for each level, these values are incorporated into a normal coordinate variable. However, if the quantity is defined using the lower and upper bounds of the level, a standard coordinate variable (a one-dimensional variable with the same name as the dimension) cannot be used. Instead two auxiliary variables are supplied that are named by adding the suffixes "_l0" for the lower bounding level and "_l1" for the upper bounding level to the name of the dimension. PyNIO uses another scheme to represent GRIB hybrid level types. However, since hybrid levels are currently supported only for GRIB1, the particulars of this scheme are described below in the GRIB1 specific section.

GRIB Grids

PyNIO supports most of the same grid types in both versions of GRIB. Grids that are fully supported, with one exception, supply one or two dimensional coordinate variables that can be used to locate each point of the grid in latitude and longitude. For other grids, PyNIO still makes the data available as long as it can figure out the dimension sizes of the data. However, these grids provide no coordinate variables and PyNIO issues a warning noting that the support for the grid type is incomplete.

Grids with one-dimensional coordinates follow the standard NetCDF convention where the name of the dimension and the name of the coordinate variable are the same. These grids include:

Latitude/Longitude and Gaussian Latitude/Longitude grids may be encoded in a "thinned" ("quasi-regular" in the GRIB documentation) format. These grids are progressively thinned, generally along the longitudinal dimension, as the latitude coordinates approach the poles, thus maintaining a more or less uniform distance between grid points as the circles around each latitude become shorter. PyNIO automatically interpolates such grids to standard rectangular grids, building normal 1D coordinate variables for them in the process. This process is transparent to the user except that the coordinate attributes specify that the grid is "quasi-regular". By default (as of version 1.2.0), a cubic interpolation algorithm is used because it has been determined to be the most accurate. However, the NioOption ThinnedGridInterpolation can be set to specify a linear interpolation instead. The linear interpolation is forced in cases where a bit-mask is used to omit some of the grid points.

Variables with grids that require two-dimensional coordinates use the CF-conforming attribute coordinates to list the names of the two associated two-dimensional coordinate variables. These two variables give the latitude and longitude respectively of each point in the grid. These variables also provide attributes that give sufficient information to generate a "native" map projection that exactly conforms to the projection parameters of the data. In addition, an auxiliary rotation variable gives the rotation angle that needs to be applied to vector data at each grid point to convert between grid-based angles and earth-based angles. Since in most cases the vector data angles are supplied relative to the grid, you would generally need to perform this operation in order to visualize vector data in projections other than the native projection. The formulas for converting the rotation angle are given in attributes of the rotation variable. The supported grids of this type are:

PyNIO also supports grids of Spherical Harmonic Coefficients. These are unique in that PyNIO separates the real and imaginary components into a third grid dimension called "real_imaginary". As supplied, the x and y axis do not have geographic meaning. In order to convert such a grid to a Latitude/Longitude or Gaussian grid it is necessary to process the data through an appropriate function.

Note: the GRIB1 reader also supports the following three specialized versions of Rotated Latitude/Longitude Grids:

Information specific to these grids is covered in the "GRIB1 Support Details" section below

GRIB file format options

PyNIO provides the following options for GRIB files:
DefaultNCEPPtable (ignored unless file is in GRIB 1 format)
This option has two valid values: 'Operational', the default, or 'Reanalysis'. It specifies whether to default to the use of the NCEP operational parameter table (http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/ncep_opn.htm) or the NCEP reanalysis parameter table (http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/ncep_reanal.htm). The option only applies in cases where PyNIO, on its own, cannot definitively determine which of these tables to use because of historical ambiguities in NCEP usage.

InitialTimeCoordinateType
This string-valued option has two valid values: 'Numeric', the default, or 'String'. Note that in PyNIO's representation of a GRIB file the initial time dimension is distinguished from the forecast time dimension, whose coordinate values are numerical offsets from a particular initial time. The default value results in initial time coordinates that are COOARDS and CF compliant, with the time represented in units of hours since 1800-01-01. Setting the option to 'String' results in human-readable time coordinates, but with the disadvantage that they are not compliant with standard conventions and are likely not to be understood by many processing and visualization software packages. Note that in either case both the string and numerical coordinates are available as variables -- the only difference is which is considered to be the coordinate dimension.

SingleElementDimensions (available in version 1.2.0 or later)
This option allows the user to specify that variables with only a single initial time, forecast time, level, ensemble or probability value, usually handled as attributes, be treated as containing single element dimensions. It is a string-valued option whose default value 'None' means that no single-element dimensions will appear in PyNIO's representation of the GRIB file. Conversely, if the option is given the value 'All', then all possible dimensions will be created for each variable. Otherwise, the desired single element dimensions may be specified individually. The valid choices are 'Initial_time', 'Forecast_time', 'Level', 'Ensemble', and 'Probability'.

Note that dimensions are not created if the variable does not have an actual value associated with the dimension type, regardless of the value given to this option. For example, variables that are not part of an ensemble forecast will never have an ensemble dimension, and variables whose level type (e.g. Tropopause) does not have a numerical value will never have a level dimension. In the case of level types, it may depend on who wrote the record: files written by some centers may give no value for certain level types where others may use a numerical value such as 0.The intent of this option is to make it easier to concatenate conforming variables from multiple files together.

ThinnedGridInterpolation
This string-valued option has two valid values: 'Linear', the default, or 'Cubic'. It has an effect only for GRIB files that contain data on a thinned grid. The GRIB documentation refers to these grids as "quasi-regular". The option controls the interpolation performed in converting variable data on the grid to the standard rectangular form that is returned by PyNIO.

GRIB1 Support Details

As of version 1.2.0 PyNIO can read GRIB1 files of basically unlimited size on any operating system that supports 64-bit file offsets.

GRIB1 relies on parameter tables to match a specific octet (byte) in the GRIB record to a particular parameter name. These tables have proliferated over the years because each center uses different models and produces different quantities for analysis. The WMO mandates a single standard parameter table for table entries 1 - 127, but any originating center can legitimately define its own tables for entries 128 - 255. Furthermore, in spite of the supposed mandate, important centers such as ECMWF define their own values for the lower-numbered entries as well. Additionally the GRIB specification allows a single center, such as ECMWF, to define multiple parameter tables each specialized for different kinds of data. These are distinguished using another octet called the "parameter table version".

Since there is no central repository of GRIB parameter tables and the originating centers span the globe, it is not realistic for a single GRIB reading tool to have access to all the tables needed to parse every GRIB file; or, even assuming the necessary tables are available, to be able to decide in every case which table applies to a particular GRIB record. Consequently, PyNIO's approach is to provide built-in access to a set of parameter tables that are considered to be generally reliable, but, in addition, to allow the user to supply their own parameter tables locally as text files. Because the user-supplied tables take precedence over the built-in tables, the user ultimately controls which parameter table applies to a particular GRIB file. More detailed information about the parameter tables appears at the end of this document in the sections Built-in GRIB1 parameter tables and User-defined GRIB1 parameter tables.

Since GRIB files usually have records with same parameter on different grids and different level types, using various time range indicators, PyNIO has introduced naming conventions that encode these distinctions to help ensure unique variable names. For example, consider the variable TMP (temperature). One GRIB file may contain the variable with many different variations. Some records might represent average temperature, others temperature differences from one time to the next, and yet others the temperature at tropopause. Clearly, these need to be treated as different variables in the file, but GRIB gives them all the parameter number corresponding to TMP.

The following section gives the algorithm PyNIO uses to assign names to GRIB1 variables.

GRIB1 data variable name encoding

    (Note: examples show intermediate steps in the formation of the name)

    if entry matching parameter table version and parameter number is found (either in built-in or user-supplied table):
       if recognized as probability product:
          <probability_parameter_short_name>_<subject_variable_short_name> (ex: PROB_A_PCP) 
       else:
          <parameter_short_name> (ex: TMP) 
    else:
       VAR_<parameter_number> (ex: VAR_179)

    if pre-defined grid:
       _<pre-defined_grid_number> (ex: TMP_6)
    else if grid defined in GDS (Grid Description Section):
       _GDS<grid_type_number> (ex: TMP_GDS4)

    _<level_type_abbreviation> (ex: TMP_GDS4_ISBL)

    if not statistically processed variable and not duplicate name the name is complete at this point.

    if statistically-processed variable with constant specified statistical processing duration:
          _<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: ACPCP_44_SFC_acc6h) 
    else if statistically-processed variable with no specified processing duration
       _<statistical_processing_type_abbreviation> (ex: A_PCP_192_SFC_acc)

    if variable name is duplicate of existing variable name (this should not normally occur):
       _n (where n begins with 1 for first duplicate) (ex: TMP_GDS4_ISBL_1)
Notes:

GRIB1 dimensions, coordinate, and auxiliary variables

In order to ensure the uniqueness of dimension and coordinate variable names, PyNIO assigns each GRIB1 dimension a unique number when it is first recognized. This number is usually appended to the end of the name.

GRIB1 grids
A single GRIB record generally contains data on a two-dimensional horizontal grid locating variables on the surface of the globe. This grid forms the two rightmost (occasionally three rightmost) dimensions of an PyNIO GRIB variable.

GRIB1 has two systems for defining these horizontal grids. There is a set of "pre-defined" grids that are indexed by a single octet (byte) in the GRIB record. These are completely specified as to the type of projection, number of grid points, and extent. Since these are by definition unique, dimensions and coordinates that specify these grids do not use the dimension number in their names. The other system uses the GDS (Grid Description Section) to assign a general grid type (e.g Lambert Conformal) and to specify the projection parameters and number of grid points. In this case, the grid type is not sufficient to uniquely specify the grid and therefore the unique dimension number is appended to the dimensions and coordinate variables.

The grid dimensions, coordinate variables, and associated auxiliary rotation variables are named as follows:

if grid is pre-defined and has single-dimensioned coordinates:
    dimensions and coordinates:  
       lat_<pre-defined_grid_number>, lon_<pre-defined_grid_number> (ex: lat_30, lon_30)
else if grid is pre-defined:
    dimensions:
       gridx_<pre-defined_grid_number>, gridy_<pre-defined_grid_number> (ex: gridx_6, gridy_6)
    coordinates:
       gridlat_<pre-defined_grid_number>, gridlon_<pre-defined_grid_number> (ex: gridlat_6, gridlon_6)
    auxiliary rotation variable:
       gridrot_<pre-defined_grid_number> (ex: gridrot_6)
else if GDS grid has single-dimensioned coordinates:
    dimensions and coordinates:  
       g<grid_type_number>_lat_<dimension_number>, g<grid_type_number>_lon_<dimension_number> (ex: g0_lat_7, g0_lon_8)
else if GDS grid type is 50 (Spherical Harmonic Coefficients)
    dimensions: 
       real_imaginary, g50_lat_<dimension_number>, g50_lon_<dimension_number> (ex: real_imaginary, g50_lat_1, g50_lon_2)
else if GDS grid type is 203 (Arakawa semi-staggered E-Grid)
    mass dimensions: 
       g203m_x_<dimension_number>, g203m_y_<dimension_number> (ex: g203m_x_0, g203m_y_1)
    mass coordinates:
       g203m_lat_<dimension_number>, g203m_lon_<dimension_number> (ex: g203m_lat_0, g203m_lon_1)
    velocity dimensions: 
       g203v_x_<dimension_number>, g203v_y_<dimension_number> (ex: g203v_x_5, g203v_y_6)
    velocity coordinates:
       g203v_lat_<dimension_number>, g203v_lon_<dimension_number> (ex: g203v_lat_5, g203m_lon_6)
else (other GDS grid):
    dimensions: 
       g<grid_type_number>_x_<dimension_number>, g<grid_type_number>_y_<dimension_number> (ex: g10_x_5, g10_y_6)
    coordinates:
       g<grid_type_number>_lat_<dimension_number>, g<grid_type_number>_lon_<dimension_number> (ex: g10_lat_5, g10_lon_6)
   auxiliary rotation variable:
       g<grid_type_number>_rot_<dimension_number> (ex: g10_rot_6)
Notes:
GRIB1 levels
If a variable exists on more than one vertical level, it will have a level dimension. As stated earlier, depending on level type, a level may be specified as a single value, or using two bounding values. For hybrid level types, each GRIB record encodes the coefficients required to convert the hybrid levels into pressure level values. These appear as auxiliary variables. The actual level coordinate in this case is just an integer index. However it specifies how to compute the pressure levels using the CF-compliant attribute formula_terms. The naming scheme for level dimensions, coordinates, and auxiliary variables is as follows:
if single level value:
    if level type is not hybrid level:
       dimensions and coordinates:  
           lv_<level_type_abbreviation><dimension_number> (ex: lv_ISBL7)
    else:
       if hybrid coefficients given at level midpoints -- DWD convention:
           dimensions and coordinates:
               lv_HYBL<dimension_number> (ex: lv_HYBL9)
           auxiliary hybrid vertical coordinates (includes scalar parameterization values as attributes):
               lv_HYBL<dimension_number>_vc (ex: lv_HYBL9_vc) 
       else if hybrid coefficients given at level midpoints:
           dimensions and coordinates:
               lv_HYBL<dimension_number> (ex: lv_HYBL0)
           auxiliary hybrid A coefficient:
               lv_HYBL<dimension_number>_a (ex: lv_HYBL0_a) 
           auxiliary hybrid B coefficient:
               lv_HYBL<dimension_number>_b (ex: lv_HYBL0_b)
           auxiliary scalar reference pressure:
               P0
       else if hybrid coefficient given at level boundaries:
           dimensions and coordinates:
               lv_HYBL<dimension_number> (ex: lv_HYBL0)
           auxiliary boundary interface dimension (sized one greater than number of levels):
               lv_HYBL_i<dimension_number> (ex: lv_HYBL_i1)
           auxiliary hybrid A boundary interface coefficient:
               lv_HYBL_i<dimension_number>_a (ex: lv_HYBL_i1_a) 
           auxiliary hybrid B boundary interface coefficient:
               lv_HYBL_i<dimension_number>_b (ex: lv_HYBL_i1_b)
           auxiliary hybrid a coefficient (derived as average of adjacent interface coefficients):
               lv_HYBL<dimension_number>_a (ex: lv_HYBL0_a) 
           auxiliary hybrid b coefficient (derived as average of adjacent interface coefficients):
               lv_HYBL<dimension_number>_b (ex: lv_HYBL0_b)
           auxiliary scalar reference pressure:
               P0
else (if lower and upper boundary level values):
    dimensions:
        lv_<level_type_abbreviation><dimension_number> (ex: lv_DBLY8)
    auxiliary lower boundary level values:                 
        lv_<level_type_abbreviation><dimension_number>_l0 (ex: lv_DBLY8_l0)
    auxiliary upper boundary level values:                 
        lv_<level_type_abbreviation><dimension_number>_l1 (ex: lv_DBLY8_l1)
Notes:
GRIB1 forecast_time
The forecast_time dimension is handled in a very straightforward way in GRIB1. The dimension and coordinate names are derived as follows:
dimensions and coordinates:  
   forecast_time<dimension_number> (ex: forecast_time3)
Notes:
GRIB1 initial_time
Dimensions, coordinates, and associated auxiliary variables are named as follows:
if NioOption InitialTimeCoordinateType is set to "Numeric" (default):
   dimensions and coordinates (units of hours since 1800-01-01 00:00):  
      initial_time<dimension_number>_hours (ex: initial_time0_hours)
   auxiliary string representation:
      initial_time<dimension_number> (ex: initial_time0)
   auxiliary encoded double representation (units of yyyymmddhh.hh_frac):
      initial_time<dimension_number>_encoded (ex: initial_time0_encoded)
else if NioOption InitialTimeCoordinateType is set to "String":
   dimensions and coordinates:  
      initial_time<dimension_number> (ex: initial_time0)
   auxiliary numeric representation (units of hours since 1800-01-01 00:00):
      initial_time<dimension_number> (ex: initial_time0)
   auxiliary encoded double representation (units of yyyymmddhh.hh_frac):
      initial_time<dimension_number>_encoded (ex: initial_time0_encoded)
GRIB1 ensemble
The ensemble dimension coordinate variable is a simple integer index. It has an auxiliary variable that explains the significance of each element in the ensemble dimension:
dimensions and coordinates:  
   ensemble<dimension_number> (ex: ensemble2)
auxiliary informational string array:
   ensemble<dimension_number>_info (ex: ensemble2_info)
GRIB1 probability
The probability dimension coordinate variable is composed of the values representing threshold values of the quantity whose probability is under consideration.
dimensions and coordinates:  
   probability<dimension_number> (ex: probability4)

GRIB2 Support Details

Available in version 1.2.0 or later.

PyNIO uses the GRIB2 g2clib encoder/decoder library from NCEP to perform the low-level decoding of GRIB2 files. Since this library is limited to processing files that are less than 2 GB in size, PyNIO only supports reading GRIB2 files of 2 GB or less.

PyNIO distinguishes records that need to be classified into separate variables based on differences in the following characteristics:

The GRIB2 documentation provides more information about these characteristics. You may notice that they are selected from different sections of the GRIB2 record structure. Some, such as "type of statistical processing", are only applicable when the "product definition template number" has certain values. Generally speaking all of the applicable characteristics must be the same for two records to be considered part of the same variable. PyNIO uses a different set of characteristics to sort the records into a coherent variable with ordered dimensions.

PyNIO forms a name for each unique variable it discovers based on the characteristics that usually vary within a file. Some characteristics, such as the originating center, are normally constant for all the records in a single GRIB file (or for that matter in set of conforming files) and therefore are not encoded into the name. Most of the information required to establish the name is contained the GRIB2 code tables that are supplied with PyNIO. In particular the initial prefix (the "short name") of the variable is determined by looking up three characteristics: product discipline, parameter category, and parameter number. Files produced for the TIGGE project use a slightly different scheme that involves considering other characteristics, such as the type of statistical processing, to determine the initial prefix for the variable. Because this scheme is more complex, a table embedded in the source code is currently used to determine TIGGE name prefixes. Unlike the tables supplied by NCEP, TIGGE prefixes are conventionally in lower case. The following section gives the algorithm for encoding data variable names in GRIB2.

GRIB2 data variable name encoding

    (Note: examples show intermediate steps in the formation of the name)

    if production status is TIGGE test or operational and matches entry in TIGGE table:
       <parameter_short_name> (ex: t) 
    else if entry matching product discipline, parameter category, and parameter number is found:
       <parameter_short_name> (ex: TMP) 
    else:
       VAR_<product_discipline_number>_<parameter_category_number>_<parameter_number> (ex: VAR_3_0_9)

    _P<product_definition_template_number> (ex: TMP_P0)

    if single level type:
       _L<level_type_number> (ex: TMP_P0_L103)
    else if two levels of the same type: 
       _2L<level_type_number> (ex: TMP_P0_2L106)
    else if two levels of different types: 
       _2L<_first_level_type_number>_<second_level_type_number> (ex: LCLD_P0_2L212_213)

    if grid type is supported (fully or partially):
       _G<grid_abbreviation><grid_number> (ex: UGRD_P0_L108_GLC0)
    else:
       _G<grid_number> (ex: UGRD_P0_2L104_G0)

    if not statistically processed variable and not duplicate name the name is complete at this point.

    if statistically-processed variable and constant statistical processing duration:
       if statistical processing type is defined:
          _<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: APCP_P8_L1_GLL0_acc3h) 
       else
          _<statistical_processing_duration><duration_units> (ex: TMAX_P8_L103_GCA0_6h)
    else if statistically-processed variable and variable-duration processing always begins at initial time:
       _<statistical_processing_type_abbreviation> (ex: ssr_P11_GCA0_acc)

    if variable name is duplicate of existing variable name (this should not normally occur):
       _n (where n begins with 1 for first duplicate) (ex: TMAX_P8_L103_GCA0_6h_1)
The fully or partially supported grid type abbreviations are: The statistical processing type abbreviations are:

GRIB2 dimensions, coordinate, and auxiliary variables

PyNIO's GRIB2 reader uses a slightly different scheme than the GRIB1 reader to ensure the uniqueness of dimension and coordinate variable names. Whereas the GRIB1 reader maintains a single count and numbering system for dimensions of all types, the GRIB2 reader numbers each dimension type separately. For example, in GRIB1 the name forecast_time2 means the third (since the count starts with 0) unique dimension of any sort encountered while processing a file, but in GRIB2 this same name means the third unique dimension of type forecast_time.
GRIB2 grids
Unlike GRIB1, GRIB2 uses only one system, a "Grid Description Template", for specifying the parameters of a grid. PyNIO uses a single grid number that is sequentially incremented as new grids are encountered to label all the dimensions and coordinate variables belonging to a particular grid. PyNIO also uses the same naming scheme for all GRIB2 grid types. This means that PyNIO's system of naming GRIB2 grid dimensions and variables is considerably simpler than the GRIB1 scheme. The grid dimensions, coordinate variables, and associated auxiliary rotation variables are named as follows:
if grid has single-dimensioned coordinates:
    dimensions and coordinates:  
       lat_<grid_number>, lon_<grid_number> (ex: lat_4, lon_4)
else if grid type is Spherical Harmonic Coefficients:
    dimensions: 
       real_imaginary, lat_<grid_number>, lon_<grid_number> (ex: real_imaginary, lat_1, lon_1)
else (2D coordinates):
    dimensions: 
       ygrid_<grid_number>, xgrid_<grid_number> (ex: ygrid_4, xgrid_4)
    coordinates:
       gridlat_<grid_number>, gridlon_<grid_number> (ex: gridlat_4, gridlon_4)
    auxiliary rotation variable:
       gridrot_<grid_number> (ex: gridrot_4)
Notes:
GRIB2 levels
GRIB2 level dimensions and coordinates have names that are very similar to GRIB1 level names. One difference is that no auxiliary variables are currently supplied for hybrid levels. As with GRIB1, depending on level type, a level may be specified as a single value, or using two bounding values. The naming scheme for level dimensions, coordinates, and auxiliary variables is as follows:
if single level value:
    dimensions and coordinates:  
       lv_<level_type_abbreviation><level_dimension_number> (ex: lv_ISBL7)
    else:
else (if lower and upper boundary level values):
    dimensions:
        lv_<level_type_abbreviation><level_dimension_number> (ex: lv_DBLY8)
    auxiliary lower boundary level values:                 
        lv_<level_type_abbreviation><level_dimension_number>_l0 (ex: lv_DBLY8_l0)
    auxiliary upper boundary level values:                 
        lv_<level_type_abbreviation><level_dimension_number>_l1 (ex: lv_DBLY8_l1)
Notes:
GRIB2 forecast_time, initial_time, ensemble, and probability
The remaining GRIB2 dimensions and coordinate variables are handled exactly like the corresponding GRIB1 dimensions and coordinate variables, except that each dimension type is separately counted and numbered. Refer to the corresponding GRIB1 sections for details:

Built-in GRIB1 parameter tables

PyNIO's built-in parameter tables have been assembled from a number of sources. A prominent acknowledgement must go to the freely available wgrib program (http://wesley.wwb.noaa.gov/wgrib.html). The ECMWF tables are available at http://www.ecmwf.int/publications/manuals/libraries/tables/tables_index.html. The FSL tables were obtained from private contacts at NOAA's Forecast Systems Laboratory in Boulder. Thanks also to Wenchieh Yen of the Institut fuer Physick der Atmosphere who provided recent new and updated table versions for DWD. Here is a list of the supported tables with links to their contents: The first table in this list, the NMC operational table, contains the WMO standard entries described in the GRIB documentation (http://www.nco.ncep.noaa.gov/pmb/docs/on388/table2.html). Since the parameters less than 128 are mandated by the GRIB documents, they are used by default when PyNIO cannot otherwise determine an applicable parameter table. Note, however, that user-defined parameter tables must include these parameters even if they match the standard table. PyNIO does not default to the standard table for any unfound indexes if it has found a suitable user-defined parameter table.

User-defined GRIB1 parameter tables

A PyNIO GRIB parameter table file is a text file that specifies one or more distinct parameter tables for PyNIO to use when reading GRIB files. Setting the environment variable NIO_GRIB_PTABLE_PATH to the pathname of the file causes PyNIO to parse it and incorporate its parameter tables when first asked to open a GRIB file. Optionally, NIO_GRIB_PTABLE_PATH may be set to a directory path, in which case all files in the directory with the suffix '.gtb' are opened as parameter tables.

A line beginning with a '!' (optionally preceded by whitespace) is a comment. The separator string is adjustable: it may be set to any non-alphanumeric, non-whitespace character, or it may simply be two spaces in a row. It is determined from what follows immediately after the first table header row signal index (-1). Any amount of whitespace may surround the separator string. There is no way to escape the separator string, so it must be set to something that is not used in any of the informational fields.

The table header row consists of the signal index (-1) followed by 3 required fields: center, subcenter, and parameter table version. The parameter table that follows will be used for any GRIB record whose center, subcenter, and table version matches what is specified in the table header row, overriding any built-in parameter tables that would otherwise match. Setting any of these fields to '-1' results in an automatic match. If you set all three fields to '-1' the table will be used for all GRIB files. If two or more tables would equally match for a particular GRIB record, it is undefined which will be used.

Parameter rows consist of the parameter index followed by an abbreviation suitable for use as an PyNIO file variable, then a string representing units, and finally the longname (a short description of the variable). The abbreviation must contain only alphanumeric characters and/or the underscore character. Index values that are not defined need not be included.

Here is a sample portion of a valid parameter table file:

-1 : 98 : 0 : 128 
018 : T : 	K : 	Temperature
019 : Z : 	m2/s2 : 	Surface geopotential
020 : GP : 	kg/s2 : 	Geopotential
021 : U_TE : 	J/ms : 	Total energy u-flux
022 : V_TE : 	J/ms : 	Total energy v-flux
081 : U_KE : 	J/ms : 	Kinetic energy u-flux
082 : V_KE : 	J/ms : 	Kinetic energy v-flux
The table header row indicates center 98, subcenter 0, parameter table version 128. From the list of originating centers at http://www.nco.ncep.noaa.gov/pmb/docs/on388/table0.html in the GRIB documentation, one can determine that this means the center is the European Center for Medium-Range Weather Forecasts - Reading or ECMWF, and in fact this is an alternate version of the standard ECMWF parameter table version 128, required to decode a specific GRIB dataset.

OGR - Open Geospatial Consortium's Simple Features formats (Shapefile, MapInfo, GMT, TIGER)

Online documentation for OGR is available at http://gdal.org/ogr/index.html.

As of version 1.4.0 or later PyNIO optionally provides access to the GDAL/OGR library in order to read various OGR file formats. OGR is an implementation of the Open Geospatial Consortium's "Simple Features" API. The specific formats supported in PyNIO are ESRI Shapefiles, MapInfo and GMT formats, and classic TIGER format.

Data in these formats are generally comprised of a set of files that together encode the geometry, non-spatial attributes, cartographic projection, indices, etc.. Although these formats are identified to PyNIO by the suffix of a specific file in the set, it is important that the complete set of files is available. For example, the suffix for Shapefiles is .shp; however a Shapefile properly consists of four or more commonly-named files with .dbf, .shx, .prj suffixes, in addition to the .shp file.

To the user, OGR data is viewed as a set of features that consist of geometry and other non-spatial fields. The non-spatial data in a given OGR file is mapped directly to a PyNIO variable. Non-spatial fields exist in a 1:1 relationship with features, so that the data for the ith feature is found at the ith elements of the PyNIO variable.

The geometry for a feature can be complex however; a feature might be comprised of one or more line segments, which in turn may represent polylines or polygons, etc. Because the relationship of geometry to features is not a simple 1:1 correspondence, the following convention has been adopted to encode geometry into specific PyNIO variables.

For each feature in the OGR file, there is an entry in an PyNIO variable named geometry. Each entry contains 2 items:

The PyNIO variable segments contains a partially ordered list of segments (individual points, polylines, polygons). There are one or more entries in this table per feature; geometry points to the first such segment of a feature, and any additional segments directly follow. Entries in segments in turn point to the actual xy(z) coordinates:

The coordinates of geometry are stored in the x, y, and optional z variables. These are one-dimensional arrays that contain partially ordered lists of coordinates that make up the individual segments.

Several global attributes are defined for convenience. The attribute layer_name contains the name of the features, as extracted from the OGR file. The attribute geometry_type denotes whether the feature-geometry is comprised of points, polylines, or polygons. The remaining attributes are intended as symbolic indices into the second index of the geometry and segments variables, and should be used in preference to hard coding indices; these are: geom_segIndex(=0), geom_numSegs(=1), segs_xyzIndex(=0), and segs_numPnts(=1). OGR file-variables do not contain coordinate variables or variable attributes, as the GDAL/OGR implementation does not expose information that could be used as such.

Known Issues