Data Formats Supported by the PyNIO module
This page provides more detailed information about each of the PyNIO supported data formats. The common interface to all the file formats supported by PyNIO is discussed in the PyNIO reference document and will (mostly) not be repeated here. This document focuses on the individual features of each format, their differences, and what PyNIO does to translate them in a more or less uniform manner. The most detailed discussion involves the GRIB format, because of all the supported formats, it has required the most work to recast into a NetCDF-like model.
The data formats currently supported by PyNIO are:
- NetCDF - Network Common Data Format (.nc, .cdf, .netcdf, .nc3, .nc4 extensions)
- HDF - Hierarchical Data Format (version 4) - Scientific Data Sets (SDS) only (.hdf or .hd extensions)
- HDF-EOS - Hierarchical Data Format Earth Observing System (version 2) - GRID and SWATH only (.hdfeos, .he2, or .he4 extensions)
- HDF-EOS5 - Hierarchical Data Format Earth Observing System (version 5) - GRID and SWATH only (.hdfeos5 or .he5 extensions)
- CCM - Community Climate Model History Tape Format (.ccm extension)
- GRIB - Gridded Binary (version 1) or General Regularly-distributed Information in Binary Form (version 2) (.gr, .gr1, .grb, .grib, .grb1, .grib1, .gr2, .grib2, or .grb2 extensions)
- OGR - Open Geospatial Consortium's Simple Feature formats (ESRI Shapefile: .shp; MapInfo: .mif; GMT: .gmt; TIGER: .rt1 extensions)
PyNIO offers read and write access to existing NetCDF files as well as the ability to create NetCDF files from scratch. Almost all features features of NetCDF's classic model (versions 3 and ealier) are supported by PyNIO because it was created using the NetCDF3 data model as a pattern.
As of version 1.3.0b1, PyNIO supports NetCDF4 classic model format. The classic model constrains the interface to the constructs provided by NetCDF3 and earlier. However, the underlying file format, like that of all NetCDF4 files, is HDF5. Files written in this format can take advantage of the built-in file compression available in HDF5 and limitations on file and variable size are for practical purposes eliminated. Existing files in this format will be recognized automatically and handled transparently. To create a file with this format set the Format option with the value "NetCDF4Classic". You can turn on compression using CompressionLevel option. More information about the options is given below. Standard NetCDF4 files are NOT currently supported. If a file contains any NetCDF4-only features, you can expect core dumps if you open it in read-only mode and possible file-corruption if you open an existing file in write mode.
- Specify the level of data compression as an integer in the range 0 through 9. Increasing values indicate greater compression. Compression is lossless. There are tradeoffs between the time spent compressing the file, versus the amount of compression achieved. Informal tests show that compression level 9 results in a file only a few percent smaller than a compression level 5 file, but it requires 4 or 5 times the amount of time to create it. (This option is ignored unless the Format option is set to 'NetCDF4Classic'.)
- This option has an effect only for files opened in
It currently has four valid values, two of which are synonyms:
- 'Classic' (default)
- Create a standard NetCDF file. Standard NetCDF files are more limited with respect to file size. Assuming the underlying file system has support for large files, the total size can exceed 2 GB, but there are severe restrictions regarding the number of large variables and the order in which they are written. In general, because it is more universal, the NetCDF3 classic format is recommended if the total file size will be less than 2 GB and file compression is not required.
- 'LargeFile' or '64BitOffset'
- Create a NetCDF file with support for larger variables and a theoretically much larger total size (about 9.22e+18 bytes). Each fixed-size variable, or each 'record' (element of the first dimension) of a variable with an unlimited dimension can have a size of up to 4 GB. PyNIO automatically reads NetCDF files in either the classic or the 64-bit offset format, assuming the underlying file system has support for large files. For more detailed information about large file support in NetCDF see http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Large-File-Support.html.
- Create a NetCDF4 classic model file. The classic model
constrains the interface to the constructs provided by NetCDF3 and
earlier. However, the underlying file format, like that of all NetCDF4
files, is HDF5. Files written in this format can take advantage of
the built-in file compression available in HDF5. Use the
CompressionLevel option to enable compression. Also the HDF5
format removes virtually all restrictions on file and individual
Beginning with Version 1.3.0b1 of PyNIO is built with a release version of NetCDF4 and HDF5 and fully supports the "NetCDF4Classic" option. PyNIO version 1.2.0 provided beta-level support for this format using a beta version of NetCDF4 and HDF 5. It should probably not be used for mission-critical file creation.
- This option has an effect only for files opened for writing. This option reserves extra space within the header of a NetCDF file. Its value is an integer that specifies the number of bytes to reserve in addition to the bytes used for the currently defined dimensions, variables, and attributes. This option can improve performance when it is likely that new dimensions, variables, or attributes will be added to an already large file.
- If set to its default value, True, this option causes a "virtual" _FillValue attribute to be created for any variable that has the attribute missing_value but not _FillValue. The purpose is to more gracefully handle files that use the COARDS-compliant missing_value instead of _FillValue to indicate missing data. Note that if a variable in a file has both a missing_value and a _FillValue, or if it has neither, the option does nothing. The virtual _FillValue attribute is not actually part of the NetCDF file, but only appears to be from within PyNIO. However, If the file is opened for writing and you assign to the attribute, it becomes an actual attribute.
- This option has an effect only when a file is opened for writing. It is logical-valued with a default value of True. If set False, PyNIO alters the standard behavior of the NetCDF library such that variable element locations in the file are not "pre-filled" with the missing (fill) value associated with the variable. This can noticeably improve performance when writing large datasets. However, if you set this option False, you are responsible for ensuring that all the elements of the variables you have defined are assigned a valid value.
- This logical-valued option may be set for any NetCDF file. Its default value is False, meaning that PyNIO only closes a NetCDF file when the close is invoked. If set to True, PyNIO closes the file after each operation it performs, including defining a dimension or variable, adding or modifying an attribute, or reading or writing data from any variable. This helps ensure the file's integrity for writable files if the close method does not get called for some reason. However, it may result in loss of performance, particularly when adding new variables, dimensions, or attributes to files that already have large variables defined. This is because each time a new element is defined, all existing data in the file must be moved to make room for the metadata of the new element in the header. One way to mitigate the performance loss is to use the HeaderReserveSpace option when first creating the file to make room in the header for subsequently defined NetCDF elements.
Data model differencesWhile PyNIO has support for string data type, NetCDF3 does not. However, PyNIO maps NetCDF attributes of type character into Python strings for convenience. Likewise, you can set the value of a NetCDF character attribute using a Python string. http://www.hdfgroup.org/products/hdf4/.
PyNIO's HDF interface understands a subset of the content available in HDF4-formatted files. PyNIO can read and write data that uses the SDS (Scientific Data Set) interface. Opening an HDF file is analogous to opening a NetCDF file as documented above, with some minor exceptions.
The first thing you will probably notice when you attempt to print the contents of an HDF file object is that HDF files often begin with voluminous multiline global attributes. Some of this information contained in these attributes may be useful, but much of it is not of much concern to persons only interested in getting at the data.
The HDF format allows variable, attribute, and dimension names with spaces and other non-alphanumeric characters, but for reasons of compatibility with other software, PyNIO's underlying NIO library replaces spaces and non-alphanumeric characters with the underscore '_' character. To compensate for this possible loss of information, PyNIO provides an attribute for each variable, called hdf_name, that contains the name exactly as it appears in the HDF file. This attribute is redundant in cases where the actual HDF name contains only alphnumeric or underscore characters.
PyNIO has a read-only ability to understand HDF Vgroups. When a variable that is part of a Vgroup is encountered, PyNIO appends a double underscore and the group number to the end of the variable name. This ensures that the variable will have a unique name, relative to variables that belong to other Vgroups. It also provides two additional attributes to the variable: hdf_group, whose value is the HDF string name of the group, and hdf_group_id, whose value is the same as the group number appended to the end of the variable name.
There is currently no way to access 8-bit and 24-bit HDF images from PyNIO. There is also no access to the HDF VDATA interface.http://hdfeos.org/software/library.php.
PyNIO provides read-only access for SWATH and GRID data groups in HDF-EOS files. POINT data groups are currently ignored. As with all PyNIO's supported formats, HDF-EOS files are read into file variables that use NetCDF-like conventions. Note that since HDF-EOS2 files are a type of HDF4 file, it is possible, by setting the format optional argument of the open_file method to 'hdf', to use the HDF4 interface to open HDF-EOS2 files. This view of the file sometimes gives useful information not obtainable through the HDF-EOS interface. On the other hand, because HDF-EOS files often use the more generic '.hdf' suffix in their names, it is easy to mistakenly use the HDF4 interface when the HDF-EOS interface would lead more directly to the relevant information and data. In this case you should set format to an HDF-EOS-specific suffix to read the data using PyNIO's interface to the HDF-EOS library.
Non-alphanumeric characters in HDF-EOS variable names are replaced the '_' character when listed from PyNIO and are referenced from PyNIO in this fashion. HDF-EOS files use groups to specify the specific SWATH or GRID that a variable belongs to. In the HDF-EOS2 interface PyNIO appends the SWATH or GRID name, preceded by an underscore, to all variable names that belong to the group in order to ensure that each variable name is unique within the namespace of the NioFile instance variable. As of version 1.2.0 or later, PyNIO's HDF-EOS interface supplies an attribute called hdfeos_name that contains the actual variable name as present in the file.
Also as of version 1.2.0 or later, PyNIO provides access to the Geolocation variables associated with SWATH data groups. These variables have 1 or 2 dimensions and serve to locate the data variables in time and space. PyNIO also provides supplementary coordinate variables for GRID data that are calculated on the fly using the GCTP (General Cartographic Transformation Package) library that is a required component of the HDF-EOS interface. These supplementary variables can be distinguished from true HDF-EOS variables by their lack of the hdfeos_name attribute. Since only a few of the possible projected GRID types were available for testing, the coordinate values contained in these supplementary variables are not yet considered to be fully reliable. Users are encouraged to report cases where the coordinate values do not seem to be correct.
HDF-EOS5 - HDF5 - Earth Observing System - GRID, SWATH and ZA; limited POINT support for some metadata onlyOnline documentation for HDF-EOS5 is available at http://hdfeos.org/software/library.php.
As of version 1.4.0 or later PyNIO provides read-only access for SWATH, GRID, and ZA data groups in HDF-EOS5 files. POINT data groups are currently limited to accessing the headers only. As with all PyNIO's supported formats, HDF-EOS5 files are read into file variables that use NetCDF-like conventions.
Non-alphanumeric characters in HDF-EOS5 variable names are replaced the '_' character when listed from PyNIO and are referenced from PyNIO in this fashion. HDF-EOS5 files use groups to specify the specific SWATH or GRID that a variable belongs to. In the HDF-EOS5 interface PyNIO appends the SWATH or GRID name, preceded by an underscore, to all variable names that belong to the group in order to ensure that each variable name is unique within the namespace of the NioFile instance variable. PyNIO's HDF-EOS5 interface supplies an attribute called hdfeos_name that contains the actual variable name as present in the file.
CCM files are pretty straightforward (no special naming convention is needed as with GRIB files); the variable names and unit information are stored as character data in the CCM files. When a CCM file is opened, PyNIO scans the file and creates an index of all the data in the files. This can be expensive for large files, but it facilitates quickly accessing individual variables of the file. Because this can be expensive, you should avoid repeatedly calling addfile on the same file whenever possible.
For more information on the CCM model and CCM file format, see the CCM3 User's Guide.
GRIB - Gridded Binary (version 1) or General Regularly-distributed Information in Binary Form (version 2)(GRIB2 support available in version 1.2.0 or later. )
Online documentation for GRIB is available at:
PyNIO provides read-only support for data in GRIB1 and GRIB2 formats. To open a file, you only need to know that the file is GRIB. PyNIO figures out which version of GRIB is in the file and processes the file accordingly. PyNIO's support for GRIB is an evolving process. As an ever more diverse set of GRIB files have been encountered, PyNIO has been improved to handle many more features of the GRIB format. However, since GRIB has many features that are obscure enough that they have never been encountered in practice by the NIO library developers, there are still some aspects of GRIB that PyNIO does not handle properly. Generally, the NIO library developers try to support features that appear in GRIB files that users are actually using. The best way to help improve the GRIB-decoding capabilities of PyNIO is to call attention to files that are important to your work but that PyNIO does not seem to interpret correctly. If you have problems reading a particular set of GRIB files, please contact Mary Haley
At some point PyNIO may provide the ability to write GRIB, but, for now, the NIO library developers consider their mission is to provide read access to as many types of GRIB files as possible.
GRIB is a compact file format composed of a series of independent records each generally containing a grid of data points covering some geographical extent. Each record contains in a coded form the information required to derive the location of the data points in space and time, the type of quantity the data represents, and the origin of the data, including who produced it and the generating model. But in order to decode the data, external information is required. This much is true both for GRIB1 and GRIB2. But while decoders for GRIB1 or for GRIB2 both rely on information from external tables and many of the individual pieces of information in the tables are the same, the organization of the tables is quite different. As the newer format, GRIB2 has the capability to allow for much more diversity of data without the need for the local extensions that have evolved over the years out of necessity in GRIB1. Also in spite of the evidence of some initial divergence there is hope at least that the GRIB 2 tables will become standardized enough that one set of tables can be used to decode GRIB files from anywhere in the world.
Before discussing the specifics of PyNIO's treatment of each format, it is worth looking at the common features. These arise partly from basic similarities in the formats, but also significantly from PyNIO's unifying data model.
- ensemble or probability (version 1.2.0 or later)
Generally these dimensions have associated one-dimensional coordinate variables. Following standard NetCDF conventions the coordinate variables have the same name as the dimensions. These variables have long names, types, and units that conform as much as possible to standard CF conventions. However, there are details peculiar to each of these dimensions that need discussion:
- The ensemble dimension uses an index variable as its coordinate variable. This variable simply serves to give a numeric ordering to the ensemble members. Otherwise, there is not always an easily-defined order to these elements. An auxiliary variable, implemented as an array of strings, gives information about each ensemble member.
- The probability dimension has as its coordinate variable an ascending series of threshold values of the quantity whose probability is measured. Note that the variable itself contains percentage values indicating the likelihood of the quantity at each coordinate value. Note that, currently, PyNIO allows for only one of the ensemble or probability dimensions for any single variable. If this becomes a problem, it may change with future releases.
- The initial_time coordinate is expressed in a
CF-compliant form as the number of hours since Jan 1, 1800.
However, the NioOption
InitialTimeCoordinateType allows you to change the type of
this coordinate to human-readable strings representing the date and time.
But regardless of the type of the initial_time
coordinate variable, PyNIO always supplies three
auxiliary variables with three different representations of this information:
- number of hours since Jan 1, 1800 (double type)
- string array representation of date and time
- an encoded representation of the date of type double, with units yyyymmddhh.hh_frac indicating the position and number of digits for the year, month, day, hour, and fractional part of the hour.
- The forecast_time dimension is an integer offset from the initial_time. Its units are usually, but not always, hours.
- If GRIB records for a quantity exist at multiple vertical levels, PyNIO creates a level dimension for the variable. Depending on the level type, GRIB may specify a single level value, or it may specify the lower and upper bounds of the level. If the quantity is defined using a single value for each level, these values are incorporated into a normal coordinate variable. However, if the quantity is defined using the lower and upper bounds of the level, a standard coordinate variable (a one-dimensional variable with the same name as the dimension) cannot be used. Instead two auxiliary variables are supplied that are named by adding the suffixes "_l0" for the lower bounding level and "_l1" for the upper bounding level to the name of the dimension. PyNIO uses another scheme to represent GRIB hybrid level types. However, since hybrid levels are currently supported only for GRIB1, the particulars of this scheme are described below in the GRIB1 specific section.
Grids with one-dimensional coordinates follow the standard NetCDF convention where the name of the dimension and the name of the coordinate variable are the same. These grids include:
- Latitude/Longitude (also known as Cylindrical Equidistant or Plate Carree)
- Gaussian Latitude/Longitude
Variables with grids that require two-dimensional coordinates use the CF-conforming attribute coordinates to list the names of the two associated two-dimensional coordinate variables. These two variables give the latitude and longitude respectively of each point in the grid. These variables also provide attributes that give sufficient information to generate a "native" map projection that exactly conforms to the projection parameters of the data. In addition, an auxiliary rotation variable gives the rotation angle that needs to be applied to vector data at each grid point to convert between grid-based angles and earth-based angles. Since in most cases the vector data angles are supplied relative to the grid, you would generally need to perform this operation in order to visualize vector data in projections other than the native projection. The formulas for converting the rotation angle are given in attributes of the rotation variable. The supported grids of this type are:
- Rotated Latitude/Longitude
- Polar Stereographic
- Lambert Conformal
Note: the GRIB1 reader also supports the following three specialized versions of Rotated Latitude/Longitude Grids:
- Arakawa Semi-Staggered E-Grid
- Arakawa Filled E-Grid
- Arakawa Staggered E-Grid
- DefaultNCEPPtable (ignored unless file is in GRIB 1 format)
- This option has two valid values: 'Operational', the default, or 'Reanalysis'. It specifies whether to default to the use of the NCEP operational parameter table (http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/ncep_opn.htm) or the NCEP reanalysis parameter table (http://www.ncl.ucar.edu/Document/Manuals/Ref_Manual/ncep_reanal.htm). The option only applies in cases where PyNIO, on its own, cannot definitively determine which of these tables to use because of historical ambiguities in NCEP usage.
- This string-valued option has two valid values: 'Numeric', the default, or 'String'. Note that in PyNIO's representation of a GRIB file the initial time dimension is distinguished from the forecast time dimension, whose coordinate values are numerical offsets from a particular initial time. The default value results in initial time coordinates that are COOARDS and CF compliant, with the time represented in units of hours since 1800-01-01. Setting the option to 'String' results in human-readable time coordinates, but with the disadvantage that they are not compliant with standard conventions and are likely not to be understood by many processing and visualization software packages. Note that in either case both the string and numerical coordinates are available as variables -- the only difference is which is considered to be the coordinate dimension.
- SingleElementDimensions (available in version 1.2.0 or later)
This option allows the user to specify that variables with only a
single initial time, forecast time, level, ensemble or probability
value, usually handled as attributes, be treated as containing single
element dimensions. It is a string-valued option whose default value
'None' means that no single-element dimensions will appear in
PyNIO's representation of the GRIB file. Conversely, if the option is
given the value 'All', then all possible dimensions will be
created for each variable. Otherwise, the desired single element
dimensions may be specified individually. The valid choices are
'Initial_time', 'Forecast_time', 'Level',
'Ensemble', and 'Probability'.
Note that dimensions are not created if the variable does not have an actual value associated with the dimension type, regardless of the value given to this option. For example, variables that are not part of an ensemble forecast will never have an ensemble dimension, and variables whose level type (e.g. Tropopause) does not have a numerical value will never have a level dimension. In the case of level types, it may depend on who wrote the record: files written by some centers may give no value for certain level types where others may use a numerical value such as 0.The intent of this option is to make it easier to concatenate conforming variables from multiple files together.
- This string-valued option has two valid values: 'Linear', the default, or 'Cubic'. It has an effect only for GRIB files that contain data on a thinned grid. The GRIB documentation refers to these grids as "quasi-regular". The option controls the interpolation performed in converting variable data on the grid to the standard rectangular form that is returned by PyNIO.
GRIB1 relies on parameter tables to match a specific octet (byte) in the GRIB record to a particular parameter name. These tables have proliferated over the years because each center uses different models and produces different quantities for analysis. The WMO mandates a single standard parameter table for table entries 1 - 127, but any originating center can legitimately define its own tables for entries 128 - 255. Furthermore, in spite of the supposed mandate, important centers such as ECMWF define their own values for the lower-numbered entries as well. Additionally the GRIB specification allows a single center, such as ECMWF, to define multiple parameter tables each specialized for different kinds of data. These are distinguished using another octet called the "parameter table version".
Since there is no central repository of GRIB parameter tables and the originating centers span the globe, it is not realistic for a single GRIB reading tool to have access to all the tables needed to parse every GRIB file; or, even assuming the necessary tables are available, to be able to decide in every case which table applies to a particular GRIB record. Consequently, PyNIO's approach is to provide built-in access to a set of parameter tables that are considered to be generally reliable, but, in addition, to allow the user to supply their own parameter tables locally as text files. Because the user-supplied tables take precedence over the built-in tables, the user ultimately controls which parameter table applies to a particular GRIB file. More detailed information about the parameter tables appears at the end of this document in the sections Built-in GRIB1 parameter tables and User-defined GRIB1 parameter tables.
Since GRIB files usually have records with same parameter on different grids and different level types, using various time range indicators, PyNIO has introduced naming conventions that encode these distinctions to help ensure unique variable names. For example, consider the variable TMP (temperature). One GRIB file may contain the variable with many different variations. Some records might represent average temperature, others temperature differences from one time to the next, and yet others the temperature at tropopause. Clearly, these need to be treated as different variables in the file, but GRIB gives them all the parameter number corresponding to TMP.
The following section gives the algorithm PyNIO uses to assign names to GRIB1 variables.
(Note: examples show intermediate steps in the formation of the name) if entry matching parameter table version and parameter number is found (either in built-in or user-supplied table): if recognized as probability product: <probability_parameter_short_name>_<subject_variable_short_name> (ex: PROB_A_PCP) else: <parameter_short_name> (ex: TMP) else: VAR_<parameter_number> (ex: VAR_179) if pre-defined grid: _<pre-defined_grid_number> (ex: TMP_6) else if grid defined in GDS (Grid Description Section): _GDS<grid_type_number> (ex: TMP_GDS4) _<level_type_abbreviation> (ex: TMP_GDS4_ISBL) if not statistically processed variable and not duplicate name the name is complete at this point. if statistically-processed variable with constant specified statistical processing duration: _<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: ACPCP_44_SFC_acc6h) else if statistically-processed variable with no specified processing duration _<statistical_processing_type_abbreviation> (ex: A_PCP_192_SFC_acc) if variable name is duplicate of existing variable name (this should not normally occur): _n (where n begins with 1 for first duplicate) (ex: TMP_GDS4_ISBL_1)Notes:
- Probability products are properly recognized in version 1.2.0 or later.
- PyNIO outputs a warning message when a variable is given the unrecognized parameter prefix "VAR_".
The parameter index could be unrecognized for several reasons:
- No parameter table has been supplied for the originating center and the index is greater than 127. (The default GRIB parameter table properly applies only to indexes less than 128.)
- The index is not present in the applicable parameter table, perhaps because the table is out of date or is otherwise incorrect.
- The GRIB file has been generated incorrectly, perhaps specifying a wrong parameter table or a non-existent index.
- Pre-defined grids are enumerated in Table B of the NCEP GRIB1 documentation.
- GDS Grids types are listed in Table 6 of the NCEP GRIB1 documentation.
- Level type abbreviations are taken from Table 3 of the NCEP GRIB1 documentation.
- The abbreviations corresponding to the supported statistical processing methods are:
- ave - average
- acc - accumulation
- dif - difference
GRIB1 gridsA single GRIB record generally contains data on a two-dimensional horizontal grid locating variables on the surface of the globe. This grid forms the two rightmost (occasionally three rightmost) dimensions of an PyNIO GRIB variable.
GRIB1 has two systems for defining these horizontal grids. There is a set of "pre-defined" grids that are indexed by a single octet (byte) in the GRIB record. These are completely specified as to the type of projection, number of grid points, and extent. Since these are by definition unique, dimensions and coordinates that specify these grids do not use the dimension number in their names. The other system uses the GDS (Grid Description Section) to assign a general grid type (e.g Lambert Conformal) and to specify the projection parameters and number of grid points. In this case, the grid type is not sufficient to uniquely specify the grid and therefore the unique dimension number is appended to the dimensions and coordinate variables.
The grid dimensions, coordinate variables, and associated auxiliary rotation variables are named as follows:
if grid is pre-defined and has single-dimensioned coordinates: dimensions and coordinates: lat_<pre-defined_grid_number>, lon_<pre-defined_grid_number> (ex: lat_30, lon_30) else if grid is pre-defined: dimensions: gridx_<pre-defined_grid_number>, gridy_<pre-defined_grid_number> (ex: gridx_6, gridy_6) coordinates: gridlat_<pre-defined_grid_number>, gridlon_<pre-defined_grid_number> (ex: gridlat_6, gridlon_6) auxiliary rotation variable: gridrot_<pre-defined_grid_number> (ex: gridrot_6) else if GDS grid has single-dimensioned coordinates: dimensions and coordinates: g<grid_type_number>_lat_<dimension_number>, g<grid_type_number>_lon_<dimension_number> (ex: g0_lat_7, g0_lon_8) else if GDS grid type is 50 (Spherical Harmonic Coefficients) dimensions: real_imaginary, g50_lat_<dimension_number>, g50_lon_<dimension_number> (ex: real_imaginary, g50_lat_1, g50_lon_2) else if GDS grid type is 203 (Arakawa semi-staggered E-Grid) mass dimensions: g203m_x_<dimension_number>, g203m_y_<dimension_number> (ex: g203m_x_0, g203m_y_1) mass coordinates: g203m_lat_<dimension_number>, g203m_lon_<dimension_number> (ex: g203m_lat_0, g203m_lon_1) velocity dimensions: g203v_x_<dimension_number>, g203v_y_<dimension_number> (ex: g203v_x_5, g203v_y_6) velocity coordinates: g203v_lat_<dimension_number>, g203v_lon_<dimension_number> (ex: g203v_lat_5, g203m_lon_6) else (other GDS grid): dimensions: g<grid_type_number>_x_<dimension_number>, g<grid_type_number>_y_<dimension_number> (ex: g10_x_5, g10_y_6) coordinates: g<grid_type_number>_lat_<dimension_number>, g<grid_type_number>_lon_<dimension_number> (ex: g10_lat_5, g10_lon_6) auxiliary rotation variable: g<grid_type_number>_rot_<dimension_number> (ex: g10_rot_6)Notes:
- The dimensions of two-dimensional coordinates have the dimension labelled 'y' as the right-most (fastest-varying) dimension. PyNIO's GRIB2 reader changed this convention to use 'x' as the indicator of the right-most dimension. The change was intended to be more in accord with the GRIB documentation as well as with conventional usage.
- Grids of spherical harmonic coefficients have an additional 2-element dimension called "real_imaginary". However, they do not supply coordinate variables. Also although the dimensions have "lat" and "lon" in the names the data cannot be placed on any kind of lat/lon grid without transforming the data through an appropriate function.
- The Arakawa semi-staggered E-grid has two grids, each offset slightly from the other. One grid locates the "mass" data points, the other the "velocity" points. PyNIO assigns the velocity grid to variables that it believes are of vector type. However, if this assignment seems incorrect, users can freely decide to use either grid for any variable as they think appropriate.
GRIB1 levelsIf a variable exists on more than one vertical level, it will have a level dimension. As stated earlier, depending on level type, a level may be specified as a single value, or using two bounding values. For hybrid level types, each GRIB record encodes the coefficients required to convert the hybrid levels into pressure level values. These appear as auxiliary variables. The actual level coordinate in this case is just an integer index. However it specifies how to compute the pressure levels using the CF-compliant attribute formula_terms. The naming scheme for level dimensions, coordinates, and auxiliary variables is as follows:
if single level value: if level type is not hybrid level: dimensions and coordinates: lv_<level_type_abbreviation><dimension_number> (ex: lv_ISBL7) else: if hybrid coefficients given at level midpoints -- DWD convention: dimensions and coordinates: lv_HYBL<dimension_number> (ex: lv_HYBL9) auxiliary hybrid vertical coordinates (includes scalar parameterization values as attributes): lv_HYBL<dimension_number>_vc (ex: lv_HYBL9_vc) else if hybrid coefficients given at level midpoints: dimensions and coordinates: lv_HYBL<dimension_number> (ex: lv_HYBL0) auxiliary hybrid A coefficient: lv_HYBL<dimension_number>_a (ex: lv_HYBL0_a) auxiliary hybrid B coefficient: lv_HYBL<dimension_number>_b (ex: lv_HYBL0_b) auxiliary scalar reference pressure: P0 else if hybrid coefficient given at level boundaries: dimensions and coordinates: lv_HYBL<dimension_number> (ex: lv_HYBL0) auxiliary boundary interface dimension (sized one greater than number of levels): lv_HYBL_i<dimension_number> (ex: lv_HYBL_i1) auxiliary hybrid A boundary interface coefficient: lv_HYBL_i<dimension_number>_a (ex: lv_HYBL_i1_a) auxiliary hybrid B boundary interface coefficient: lv_HYBL_i<dimension_number>_b (ex: lv_HYBL_i1_b) auxiliary hybrid a coefficient (derived as average of adjacent interface coefficients): lv_HYBL<dimension_number>_a (ex: lv_HYBL0_a) auxiliary hybrid b coefficient (derived as average of adjacent interface coefficients): lv_HYBL<dimension_number>_b (ex: lv_HYBL0_b) auxiliary scalar reference pressure: P0 else (if lower and upper boundary level values): dimensions: lv_<level_type_abbreviation><dimension_number> (ex: lv_DBLY8) auxiliary lower boundary level values: lv_<level_type_abbreviation><dimension_number>_l0 (ex: lv_DBLY8_l0) auxiliary upper boundary level values: lv_<level_type_abbreviation><dimension_number>_l1 (ex: lv_DBLY8_l1)Notes:
- Level type abbreviations are taken from Table 3 of the NCEP GRIB1 documentation.
dimensions and coordinates: forecast_time<dimension_number> (ex: forecast_time3)Notes:
- Versions of PyNIO prior to version 1.2.0 may incorrectly specify the forecast_time units as hours when they are not. These same previous versions of PyNIO do not properly handle GRIB files with forecast_time units that vary for a single variable.
if NioOption InitialTimeCoordinateType is set to
"Numeric"(default): dimensions and coordinates (units of hours since 1800-01-01 00:00): initial_time<dimension_number>_hours (ex: initial_time0_hours) auxiliary string representation: initial_time<dimension_number> (ex: initial_time0) auxiliary encoded double representation (units of yyyymmddhh.hh_frac): initial_time<dimension_number>_encoded (ex: initial_time0_encoded) else if NioOption InitialTimeCoordinateType is set to
"String": dimensions and coordinates: initial_time<dimension_number> (ex: initial_time0) auxiliary numeric representation (units of hours since 1800-01-01 00:00): initial_time<dimension_number> (ex: initial_time0) auxiliary encoded double representation (units of yyyymmddhh.hh_frac): initial_time<dimension_number>_encoded (ex: initial_time0_encoded)
dimensions and coordinates: ensemble<dimension_number> (ex: ensemble2) auxiliary informational string array: ensemble<dimension_number>_info (ex: ensemble2_info)
dimensions and coordinates: probability<dimension_number> (ex: probability4)Available in version 1.2.0 or later.
PyNIO uses the GRIB2 g2clib encoder/decoder library from NCEP to perform the low-level decoding of GRIB2 files. Since this library is limited to processing files that are less than 2 GB in size, PyNIO only supports reading GRIB2 files of 2 GB or less.
PyNIO distinguishes records that need to be classified into separate variables based on differences in the following characteristics:
- originating center
- production status
- type of processed data
- significance of the reference (initial) time
- product definition template number
- product discipline
- parameter category
- parameter number
- type of statistical processing
- statistical processing duration
- first fixed surface (level) type
- second fixed surface (level) type
- probability type
- grid definition template number
- grid parameters (contents of grid template)
PyNIO forms a name for each unique variable it discovers based on the characteristics that usually vary within a file. Some characteristics, such as the originating center, are normally constant for all the records in a single GRIB file (or for that matter in set of conforming files) and therefore are not encoded into the name. Most of the information required to establish the name is contained the GRIB2 code tables that are supplied with PyNIO. In particular the initial prefix (the "short name") of the variable is determined by looking up three characteristics: product discipline, parameter category, and parameter number. Files produced for the TIGGE project use a slightly different scheme that involves considering other characteristics, such as the type of statistical processing, to determine the initial prefix for the variable. Because this scheme is more complex, a table embedded in the source code is currently used to determine TIGGE name prefixes. Unlike the tables supplied by NCEP, TIGGE prefixes are conventionally in lower case. The following section gives the algorithm for encoding data variable names in GRIB2.
(Note: examples show intermediate steps in the formation of the name) if production status is TIGGE test or operational and matches entry in TIGGE table: <parameter_short_name> (ex: t) else if entry matching product discipline, parameter category, and parameter number is found: <parameter_short_name> (ex: TMP) else: VAR_<product_discipline_number>_<parameter_category_number>_<parameter_number> (ex: VAR_3_0_9) _P<product_definition_template_number> (ex: TMP_P0) if single level type: _L<level_type_number> (ex: TMP_P0_L103) else if two levels of the same type: _2L<level_type_number> (ex: TMP_P0_2L106) else if two levels of different types: _2L<_first_level_type_number>_<second_level_type_number> (ex: LCLD_P0_2L212_213) if grid type is supported (fully or partially): _G<grid_abbreviation><grid_number> (ex: UGRD_P0_L108_GLC0) else: _G<grid_number> (ex: UGRD_P0_2L104_G0) if not statistically processed variable and not duplicate name the name is complete at this point. if statistically-processed variable and constant statistical processing duration: if statistical processing type is defined: _<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: APCP_P8_L1_GLL0_acc3h) else _<statistical_processing_duration><duration_units> (ex: TMAX_P8_L103_GCA0_6h) else if statistically-processed variable and variable-duration processing always begins at initial time: _<statistical_processing_type_abbreviation> (ex: ssr_P11_GCA0_acc) if variable name is duplicate of existing variable name (this should not normally occur): _n (where n begins with 1 for first duplicate) (ex: TMAX_P8_L103_GCA0_6h_1)The fully or partially supported grid type abbreviations are:
- LL - Latitude/Longitude
- RLL - Rotated Latitude/Longitude
- ME - Mercator
- ST - Polar Stereographic
- LC - Lambert Conformal
- GA - Gaussian Latitude/Longitude
- SH - Spherical Harmonic Coefficients
- SV - Space View Perspective or Orthographic (partial support)
- avg - average
- acc - accumulation
- max - maximum
- min - minimum
- dife - difference (end - beginning)
- rms - root mean square
- std - standard deviation
- cov - covariance
- difb - difference (beginning - end)
- rat - ratio
GRIB2 gridsUnlike GRIB1, GRIB2 uses only one system, a "Grid Description Template", for specifying the parameters of a grid. PyNIO uses a single grid number that is sequentially incremented as new grids are encountered to label all the dimensions and coordinate variables belonging to a particular grid. PyNIO also uses the same naming scheme for all GRIB2 grid types. This means that PyNIO's system of naming GRIB2 grid dimensions and variables is considerably simpler than the GRIB1 scheme. The grid dimensions, coordinate variables, and associated auxiliary rotation variables are named as follows:
if grid has single-dimensioned coordinates: dimensions and coordinates: lat_<grid_number>, lon_<grid_number> (ex: lat_4, lon_4) else if grid type is Spherical Harmonic Coefficients: dimensions: real_imaginary, lat_<grid_number>, lon_<grid_number> (ex: real_imaginary, lat_1, lon_1) else (2D coordinates): dimensions: ygrid_<grid_number>, xgrid_<grid_number> (ex: ygrid_4, xgrid_4) coordinates: gridlat_<grid_number>, gridlon_<grid_number> (ex: gridlat_4, gridlon_4) auxiliary rotation variable: gridrot_<grid_number> (ex: gridrot_4)Notes:
- GRIB1 uses the label 'y' as a designator for the right-most (fastest-varying) dimension of grids with 2D coordinates. PyNIO's GRIB2 reader changes this convention to use 'x' as the indicator of the right-most dimension. The change is intended to be more in accord with the GRIB documentation as well as with conventional usage.
- Grids of spherical harmonic coefficients have an additional 2-element dimension called "real_imaginary". However, they do not supply coordinate variables. Also although the dimensions have "lat" and "lon" in the names the data cannot be placed on any kind of lat/lon grid without transforming the data through an appropriate function.
GRIB2 levelsGRIB2 level dimensions and coordinates have names that are very similar to GRIB1 level names. One difference is that no auxiliary variables are currently supplied for hybrid levels. As with GRIB1, depending on level type, a level may be specified as a single value, or using two bounding values. The naming scheme for level dimensions, coordinates, and auxiliary variables is as follows:
if single level value: dimensions and coordinates: lv_<level_type_abbreviation><level_dimension_number> (ex: lv_ISBL7) else: else (if lower and upper boundary level values): dimensions: lv_<level_type_abbreviation><level_dimension_number> (ex: lv_DBLY8) auxiliary lower boundary level values: lv_<level_type_abbreviation><level_dimension_number>_l0 (ex: lv_DBLY8_l0) auxiliary upper boundary level values: lv_<level_type_abbreviation><level_dimension_number>_l1 (ex: lv_DBLY8_l1)Notes:
- GRIB2 level type abbreviations are taken from Table 4.5 of the NCEP GRIB2 documentation.
GRIB2 forecast_time, initial_time, ensemble, and probabilityThe remaining GRIB2 dimensions and coordinate variables are handled exactly like the corresponding GRIB1 dimensions and coordinate variables, except that each dimension type is separately counted and numbered. Refer to the corresponding GRIB1 sections for details:
http://wesley.wwb.noaa.gov/wgrib.html). The ECMWF tables are available at http://www.ecmwf.int/publications/manuals/libraries/tables/tables_index.html. The FSL tables were obtained from private contacts at NOAA's Forecast Systems Laboratory in Boulder. Thanks also to Wenchieh Yen of the Institut fuer Physick der Atmosphere who provided recent new and updated table versions for DWD. Here is a list of the supported tables with links to their contents:
- US Weather Service - NMC Operational Table
- US Weather Service - NMC Reanalysis Table
- US Weather Service - NCEP Oceanographic Parameters (Parameter table version 128) (updated in version 1.3.0b5)
- US Weather Service - NCEP (Parameter table version 129) (updated in version 1.3.0b5)
- US Weather Service - NCEP Land Modeling and Land Data Assimilation (Parameter table version 130) (updated in version 1.3.0b5)
- US Weather Service - NCEP North American Regional Reanalysis (Parameter table version 131) (updated in version 1.3.0b5)
- US Weather Service - NCEP (Parameter table version 133) (available in versions 1.3.0b5 or later. )
- US Weather Service - NCEP Aviation World Area Forecast/ICAO (Parameter table version 140) (available in versions 1.3.0b5 or later.)
- US Weather Service - NCEP (Parameter table version 141) (available in versions 1.3.0b5 or later.)
- NOAA FSL
- NOAA FSL/FRD Regional Analysis and Prediction Branch
- NOAA FSL/FRD Local Analysis and Prediction Branch
- US Navy - Fleet Numerical Meteorology and Oceanography Center
- ECMWF Parameter table version 128
- ECMWF Parameter table version 129
- ECMWF Parameter table version 130
- ECMWF Parameter table version 131
- ECMWF Parameter table version 132
- ECMWF Parameter table version 133 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 140
- ECMWF Parameter table version 150
- ECMWF Parameter table version 151
- ECMWF Parameter table version 160
- ECMWF Parameter table version 162
- ECMWF Parameter table version 170
- ECMWF Parameter table version 171 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 172 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 173 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 174 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 175 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 180
- ECMWF Parameter table version 190
- ECMWF Parameter table version 200
- ECMWF Parameter table version 201 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 210 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 211 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 228 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 230 (available in versions 1.3.0b5 or later.)
- ECMWF Parameter table version 234 (available in versions 1.3.0b5 or later.)
- Offenbach (DWD) Parameter table version 2
- Offenbach (DWD) Parameter table version 201
- Offenbach (DWD) Parameter table version 202
- Offenbach (DWD) Parameter table version 203
- Offenbach (DWD) Parameter table version 204
- Offenbach (DWD) Parameter table version 205
- Offenbach (DWD) Parameter table version 206
- Offenbach (DWD) Parameter table version 207
- Brazilian Space Agency - INPE/CPTEC Parameter table version 254
- Japanese Meteorological Agency Parameter table version 3
A line beginning with a '!' (optionally preceded by whitespace) is a comment. The separator string is adjustable: it may be set to any non-alphanumeric, non-whitespace character, or it may simply be two spaces in a row. It is determined from what follows immediately after the first table header row signal index (-1). Any amount of whitespace may surround the separator string. There is no way to escape the separator string, so it must be set to something that is not used in any of the informational fields.
The table header row consists of the signal index (-1) followed by 3 required fields: center, subcenter, and parameter table version. The parameter table that follows will be used for any GRIB record whose center, subcenter, and table version matches what is specified in the table header row, overriding any built-in parameter tables that would otherwise match. Setting any of these fields to '-1' results in an automatic match. If you set all three fields to '-1' the table will be used for all GRIB files. If two or more tables would equally match for a particular GRIB record, it is undefined which will be used.
Parameter rows consist of the parameter index followed by an abbreviation suitable for use as an PyNIO file variable, then a string representing units, and finally the longname (a short description of the variable). The abbreviation must contain only alphanumeric characters and/or the underscore character. Index values that are not defined need not be included.
Here is a sample portion of a valid parameter table file:
-1 : 98 : 0 : 128 018 : T : K : Temperature 019 : Z : m2/s2 : Surface geopotential 020 : GP : kg/s2 : Geopotential 021 : U_TE : J/ms : Total energy u-flux 022 : V_TE : J/ms : Total energy v-flux 081 : U_KE : J/ms : Kinetic energy u-flux 082 : V_KE : J/ms : Kinetic energy v-fluxThe table header row indicates center 98, subcenter 0, parameter table version 128. From the list of originating centers at http://www.nco.ncep.noaa.gov/pmb/docs/on388/table0.html in the GRIB documentation, one can determine that this means the center is the European Center for Medium-Range Weather Forecasts - Reading or ECMWF, and in fact this is an alternate version of the standard ECMWF parameter table version 128, required to decode a specific GRIB dataset.
OGR - Open Geospatial Consortium's Simple Features formats (Shapefile, MapInfo, GMT, TIGER)Online documentation for OGR is available at http://gdal.org/ogr/index.html.
As of version 1.4.0 or later PyNIO optionally provides access to the GDAL/OGR library in order to read various OGR file formats. OGR is an implementation of the Open Geospatial Consortium's "Simple Features" API. The specific formats supported in PyNIO are ESRI Shapefiles, MapInfo and GMT formats, and classic TIGER format.
Data in these formats are generally comprised of a set of files that together encode the geometry, non-spatial attributes, cartographic projection, indices, etc.. Although these formats are identified to PyNIO by the suffix of a specific file in the set, it is important that the complete set of files is available. For example, the suffix for Shapefiles is .shp; however a Shapefile properly consists of four or more commonly-named files with .dbf, .shx, .prj suffixes, in addition to the .shp file.
To the user, OGR data is viewed as a set of features that consist of geometry and other non-spatial fields. The non-spatial data in a given OGR file is mapped directly to a PyNIO variable. Non-spatial fields exist in a 1:1 relationship with features, so that the data for the ith feature is found at the ith elements of the PyNIO variable.
The geometry for a feature can be complex however; a feature might be comprised of one or more line segments, which in turn may represent polylines or polygons, etc. Because the relationship of geometry to features is not a simple 1:1 correspondence, the following convention has been adopted to encode geometry into specific PyNIO variables.
For each feature in the OGR file, there is an entry in an PyNIO variable named
geometry. Each entry contains 2 items:
geometry[i, 0]= index into the
segmentsvariable of the first segment for the
geometry[i, 1]= the number of segments belonging to the
The PyNIO variable
segments contains a partially ordered list of segments (individual points, polylines, polygons).
There are one or more entries in this table per feature;
geometry points to the first such segment of a feature,
and any additional segments directly follow. Entries in
segments in turn point to the actual xy(z) coordinates:
segments[j, 0]= index into
zarrays of the first point of the
segments[j, 1]= number of points making up the
The coordinates of geometry are stored in
y, and optional
variables. These are one-dimensional arrays that contain partially
ordered lists of coordinates that make up the individual segments.
Several global attributes are defined for convenience. The
layer_name contains the name of the features,
as extracted from the OGR file. The
geometry_type denotes whether the
feature-geometry is comprised of points, polylines, or polygons. The
remaining attributes are intended as symbolic indices into the second
index of the
variables, and should be used in preference to hard coding indices;
segs_numPnts(=1). OGR file-variables do not contain
coordinate variables or variable attributes, as the GDAL/OGR
implementation does not expose information that could be used as such.
- Incomplete mapping of database types. For shapefiles in particular, certain types in the associated .dbf database have not yet been mapped to a suitable NumPy type, e.g., timestamps, binary data, variable-length data, etc. In the present implementation, non-spatial fields of these types are disregarded and no corresponding PyNIO variable is created.