面冷心慈的饭盒 · 线程池 | 现代C++并发编程教程· 4 周前 · |
淡定的钥匙 · 【NASA每日一图】Messier 88 ...· 2 月前 · |
越狱的大脸猫 · 12000字分析万青新作-虎嗅网· 3 月前 · |
痴情的啄木鸟 · 辽国“女皇”萧太后堪称“铁娘子”,嫁过六位帝 ...· 5 月前 · |
憨厚的皮蛋 · aoa官网app(中国)有限公司· 5 月前 · |
netcdf4-python is a Python interface to the netCDF C library.
netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5 . This module can read and write files in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF , and should be familiar to users of that module.
Most new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound (struct), variable length (vlen) and enumerated (enum) data types are supported, but not the opaque data type. Mixtures of compound, vlen and enum data types (such as compound types containing enums, or vlens containing compound types) are not supported.
pip install
netCDF4
.
(or if you use the
conda
package manager
conda install -c conda-forge netCDF4
).
git submodule update --init
to
ensure all the submodules are also checked out.
nc-config
(installed with netcdf-c)
will be run used to determine where all the dependencies live.
nc-config
is not in your default
PATH
, you can set the
NETCDF4_DIR
environment variable and
setup.py
will look in
$NETCDF4_DIR/bin
.
You can also use the file
setup.cfg
to set the path to
nc-config
, or
enter the paths to the libraries and include files manually. Just
edit the
setup.cfg
file
in a text editor and follow the instructions in the comments.
To disable the use of
nc-config
, set the env var
USE_NCCONFIG
to 0.
To disable the use of
setup.cfg
, set
USE_SETUPCFG
to 0.
As a last resort, the library and include paths can be set via environment variables.
If you go this route, set
USE_NCCONFIG
and
USE_SETUPCFG
to 0, and specify
NETCDF4_LIBDIR
,
NETCDF4_INCDIR
,
HDF5_LIBDIR
and
HDF5_INCDIR
.
If the dependencies are not found
in any of the paths specified by environment variables, then standard locations
(such as
/usr
and
/usr/local
) are searched.
NETCDF_PLUGIN_DIR
is set to point to the location of the netcdf-c compression
plugins built by netcdf >= 4.9.0, they will be installed inside the package.
In this
case
HDF5_PLUGIN_PATH
will be set to the package installation path on import,
so the extra compression algorithms available in netcdf-c >= 4.9.0 will automatically
be available.
Otherwise, the user will have to set
HDF5_PLUGIN_PATH
explicitly
to have access to the extra compression plugins.
pip install -v .
(as root if necessary)
python run_all.py
.
All of the code in this tutorial is available in
examples/tutorial.py
, except
the parallel IO example, which is in
examples/mpi_example.py
.
Unit tests are in the
test
directory.
To create a netCDF file from python, you simply call the
Dataset
constructor. This is also the method used to open an existing netCDF
file.
If the file is open for write access (
mode='w', 'r+'
or
'a'
), you may
write any type of data including new dimensions, groups, variables and
attributes.
netCDF files come in five flavors (
NETCDF3_CLASSIC,
NETCDF3_64BIT_OFFSET, NETCDF3_64BIT_DATA, NETCDF4_CLASSIC<code>, and </code>NETCDF4
).
NETCDF3_CLASSIC
was the original netcdf binary format, and was limited
to file sizes less than 2 Gb.
NETCDF3_64BIT_OFFSET
was introduced
in version 3.6.0 of the library, and extended the original binary format
to allow for file sizes greater than 2 Gb.
NETCDF3_64BIT_DATA
is a new format that requires version 4.4.0 of
the C library - it extends the
NETCDF3_64BIT_OFFSET
binary format to
allow for unsigned/64 bit integer data types and 64-bit dimension sizes.
NETCDF3_64BIT
is an alias for
NETCDF3_64BIT_OFFSET
.
NETCDF4_CLASSIC
files use the version 4 disk format (HDF5), but omits features
not found in the version 3 API. They can be read by netCDF 3 clients
only if they have been relinked against the netCDF 4 library. They can
also be read by HDF5 clients.
NETCDF4
files use the version 4 disk
format (HDF5) and use the new features of the version 4 API.
netCDF4 module can read and write files in any of these formats. When
creating a new file, the format may be specified using the
format
keyword in the
Dataset
constructor.
The default format is
NETCDF4
. To see how a given file is formatted, you can examine the
data_model
attribute.
Closing the netCDF file is
accomplished via the
Dataset.close()
method of the
Dataset
instance.
Here's an example:
>>> from netCDF4 import Dataset
>>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
>>> print(rootgrp.data_model)
NETCDF4
>>> rootgrp.close()
Remote OPeNDAP-hosted datasets can be accessed for
reading over http if a URL is provided to the Dataset
constructor instead of a
filename.
However, this requires that the netCDF library be built with
OPenDAP support, via the --enable-dap
configure option (added in
version 4.0.1).
Groups in a netCDF file
netCDF version 4 added support for organizing data in hierarchical
groups, which are analogous to directories in a filesystem. Groups serve
as containers for variables, dimensions and attributes, as well as other
groups.
A Dataset
creates a special group, called
the 'root group', which is similar to the root directory in a unix
filesystem.
To create Group
instances, use the
Dataset.createGroup()
method of a Dataset
or Group
instance. Dataset.createGroup()
takes a single argument, a
python string containing the name of the new group. The new Group
instances contained within the root group can be accessed by name using
the groups
dictionary attribute of the Dataset
instance.
NETCDF4
formatted files support Groups, if you try to create a Group
in a netCDF 3 file you will get an error message.
>>> rootgrp = Dataset("test.nc", "a")
>>> fcstgrp = rootgrp.createGroup("forecasts")
>>> analgrp = rootgrp.createGroup("analyses")
>>> print(rootgrp.groups)
{'forecasts': <class 'netCDF4._netCDF4.Group'>
group /forecasts:
dimensions(sizes):
variables(dimensions):
groups: , 'analyses': <class 'netCDF4._netCDF4.Group'>
group /analyses:
dimensions(sizes):
variables(dimensions):
groups: }
Groups can exist within groups in a Dataset
, just as directories
exist within directories in a unix filesystem. Each Group
instance
has a groups
attribute dictionary containing all of the group
instances contained within that group. Each Group
instance also has a
path
attribute that contains a simulated unix directory path to
that group.
To simplify the creation of nested groups, you can
use a unix-like path as an argument to Dataset.createGroup()
.
>>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
>>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
If any of the intermediate elements of the path do not exist, they are created,
just as with the unix command 'mkdir -p'
. If you try to create a group
that already exists, no error will be raised, and the existing group will be
returned.
Here's an example that shows how to navigate all the groups in a
Dataset
. The function walktree
is a Python generator that is used
to walk the directory tree. Note that printing the Dataset
or Group
object yields summary information about it's contents.
>>> def walktree(top):
... yield top.groups.values()
... for value in top.groups.values():
... yield from walktree(value)
>>> print(rootgrp)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes):
variables(dimensions):
groups: forecasts, analyses
>>> for children in walktree(rootgrp):
... for child in children:
... print(child)
<class 'netCDF4._netCDF4.Group'>
group /forecasts:
dimensions(sizes):
variables(dimensions):
groups: model1, model2
<class 'netCDF4._netCDF4.Group'>
group /analyses:
dimensions(sizes):
variables(dimensions):
groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
dimensions(sizes):
variables(dimensions):
groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model2:
dimensions(sizes):
variables(dimensions):
groups:
Dimensions in a netCDF file
netCDF defines the sizes of all variables in terms of dimensions, so
before any variables can be created the dimensions they use must be
created first. A special case, not often used in practice, is that of a
scalar variable, which has no dimensions. A dimension is created using
the Dataset.createDimension()
method of a Dataset
or Group
instance. A Python string is used to set the name of the
dimension, and an integer value is used to set the size. To create an
unlimited dimension (a dimension that can be appended to), the size
value is set to None
or 0. In this example, there both the time
and
level
dimensions are unlimited.
Having more than one unlimited
dimension is a new netCDF 4 feature, in netCDF 3 files there may be only
one, and it must be the first (leftmost) dimension of the variable.
>>> level = rootgrp.createDimension("level", None)
>>> time = rootgrp.createDimension("time", None)
>>> lat = rootgrp.createDimension("lat", 73)
>>> lon = rootgrp.createDimension("lon", 144)
All of the Dimension
instances are stored in a python dictionary.
>>> print(rootgrp.dimensions)
{'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
Using the python len
function with a Dimension
instance returns
current size of that dimension.
Dimension.isunlimited()
method of a Dimension
instance
be used to determine if the dimensions is unlimited, or appendable.
>>> print(len(lon))
>>> print(lon.isunlimited())
False
>>> print(time.isunlimited())
Printing the Dimension
object
provides useful summary info, including the name and length of the dimension,
and whether it is unlimited.
>>> for dimobj in rootgrp.dimensions.values():
... print(dimobj)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73
<class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144
Dimension
names can be changed using the
Datatset.renameDimension
method of a Dataset
or
Group
instance.
Variables in a netCDF file
netCDF variables behave much like python multidimensional array objects
supplied by the numpy module. However,
unlike numpy arrays, netCDF4 variables can be appended to along one or
more 'unlimited' dimensions. To create a netCDF variable, use the
Dataset.createVariable()
method of a Dataset
or
Group
instance. The Dataset.createVariable()
j method
has two mandatory arguments, the variable name (a Python string), and
the variable datatype. The variable's dimensions are given by a tuple
containing the dimension names (defined previously with
Dataset.createDimension()
). To create a scalar
variable, simply leave out the dimensions keyword. The variable
primitive datatypes correspond to the dtype attribute of a numpy array.
You can specify the datatype as a numpy dtype object, or anything that
can be converted to a numpy dtype object.
Valid datatype specifiers
include: 'f4'
(32-bit floating point), 'f8'
(64-bit floating
point), 'i4'
(32-bit signed integer), 'i2'
(16-bit signed
integer), 'i8'
(64-bit signed integer), 'i1'
(8-bit signed
integer), 'u1'
(8-bit unsigned integer), 'u2'
(16-bit unsigned
integer), 'u4'
(32-bit unsigned integer), 'u8'
(64-bit unsigned
integer), or 'S1'
(single-character string).
The old Numeric
single-character typecodes ('f'
,'d'
,'h'
,
's'
,'b'
,'B'
,'c'
,'i'
,'l'
), corresponding to
('f4'
,'f8'
,'i2'
,'i2'
,'i1'
,'i1'
,'S1'
,'i4'
,'i4'
),
will also work. The unsigned integer types and the 64-bit integer type
can only be used if the file format is NETCDF4
.
The dimensions themselves are usually also defined as variables, called
coordinate variables. The Dataset.createVariable()
method returns an instance of the Variable
class whose methods can be
used later to access and set variable data and attributes.
>>> times = rootgrp.createVariable("time","f8",("time",))
>>> levels = rootgrp.createVariable("level","i4",("level",))
>>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
>>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
>>> # two dimensions unlimited
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
>>> temp.units = "K"
To get summary info on a Variable
instance in an interactive session,
just print it.
>>> print(temp)
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used
You can use a path to create a Variable inside a hierarchy of groups.
>>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
If the intermediate groups do not yet exist, they will be created.
You can also query a Dataset
or Group
instance directly to obtain Group
or
Variable
instances using paths.
>>> print(rootgrp["/forecasts/model1"]) # a Group instance
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
dimensions(sizes):
variables(dimensions): float32 temp(time,level,lat,lon)
groups:
>>> print(rootgrp["/forecasts/model1/temp"]) # a Variable instance
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
path = /forecasts/model1
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used
All of the variables in the Dataset
or Group
are stored in a
Python dictionary, in the same way as the dimensions:
>>> print(rootgrp.variables)
{'time': <class 'netCDF4._netCDF4.Variable'>
float64 time(time)
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'level': <class 'netCDF4._netCDF4.Variable'>
int32 level(level)
unlimited dimensions: level
current shape = (0,)
filling on, default _FillValue of -2147483647 used, 'lat': <class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
unlimited dimensions:
current shape = (73,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'lon': <class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
unlimited dimensions:
current shape = (144,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'temp': <class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used}
Variable
names can be changed using the
Dataset.renameVariable()
method of a Dataset
instance.
Variables can be sliced similar to numpy arrays, but there are some differences.
Writing data to and retrieving data from a netCDF variable for more details.
Attributes in a netCDF file
There are two types of attributes in a netCDF file, global and variable.
Global attributes provide information about a group, or the entire
dataset, as a whole. Variable
attributes provide information about
one of the variables in a group. Global attributes are set by assigning
values to Dataset
or Group
instance variables. Variable
attributes are set by assigning values to Variable
instances
variables. Attributes can be strings, numbers or sequences. Returning to
our example,
>>> import time
>>> rootgrp.description = "bogus example script"
>>> rootgrp.history = "Created " + time.ctime(time.time())
>>> rootgrp.source = "netCDF4 python module tutorial"
>>> latitudes.units = "degrees north"
>>> longitudes.units = "degrees east"
>>> levels.units = "hPa"
>>> temp.units = "K"
>>> times.units = "hours since 0001-01-01 00:00:00.0"
>>> times.calendar = "gregorian"
The Dataset.ncattrs()
method of a Dataset
, Group
or
Variable
instance can be used to retrieve the names of all the netCDF
attributes. This method is provided as a convenience, since using the
built-in dir
Python function will return a bunch of private methods
and attributes that cannot (or should not) be modified by the user.
>>> for name in rootgrp.ncattrs():
... print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
Global attr description = bogus example script
Global attr history = Created Mon Jul 8 14:19:41 2019
Global attr source = netCDF4 python module tutorial
The __dict__
attribute of a Dataset
, Group
or Variable
instance provides all the netCDF attribute name/value pairs in a python
dictionary:
>>> print(rootgrp.__dict__)
{'description': 'bogus example script', 'history': 'Created Mon Jul 8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
Attributes can be deleted from a netCDF Dataset
, Group
or
Variable
using the python del
statement (i.e. del grp.foo
removes the attribute foo
the the group grp
).
Writing data to and retrieving data from a netCDF variable
Now that you have a netCDF Variable
instance, how do you put data
into it? You can just treat it like an array and assign data to a slice.
>>> import numpy as np
>>> lats = np.arange(-90,91,2.5)
>>> lons = np.arange(-180,180,2.5)
>>> latitudes[:] = lats
>>> longitudes[:] = lons
>>> print("latitudes =\n{}".format(latitudes[:]))
latitudes =
[-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5
-60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5
-30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5
0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5
30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5
60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5
90. ]
Unlike NumPy's array objects, netCDF Variable
objects with unlimited dimensions will grow along those dimensions if you
assign data outside the currently defined range of indices.
>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(rootgrp.dimensions["lat"])
>>> nlons = len(rootgrp.dimensions["lon"])
>>> print("temp shape before adding data = {}".format(temp.shape))
temp shape before adding data = (0, 0, 73, 144)
>>> from numpy.random import uniform
>>> temp[0:5, 0:10, :, :] = uniform(size=(5, 10, nlats, nlons))
>>> print("temp shape after adding data = {}".format(temp.shape))
temp shape after adding data = (5, 10, 73, 144)
>>> # levels have grown, but no values yet assigned.
>>> print("levels shape after adding pressure data = {}".format(levels.shape))
levels shape after adding pressure data = (10,)
Note that the size of the levels variable grows when data is appended
along the level
dimension of the variable temp
, even though no
data has yet been assigned to levels.
>>> # now, assign data to levels dimension variable.
>>> levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
However, that there are some differences between NumPy and netCDF
variable slicing rules. Slices behave as usual, being specified as a
start:stop:step
triplet. Using a scalar integer index i
takes the ith
element and reduces the rank of the output array by one. Boolean array and
integer sequence indexing behaves differently for netCDF variables
than for numpy arrays.
Only 1-d boolean arrays and integer sequences are
allowed, and these indices work independently along each dimension (similar
to the way vector subscripts work in fortran).
This means that
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
(4, 4)
returns an array of shape (4,4) when slicing a netCDF variable, but for a
numpy array it returns an array of shape (4,).
Similarly, a netCDF variable of shape (2,3,4,5)
indexed
with [0, array([True, False, True]), array([False, True, True, True]), :]
would return a (2, 3, 5)
array. In NumPy, this would raise an error since
it would be equivalent to [0, [0,1], [1,2,3], :]
. When slicing with integer
sequences, the indices need not be sorted and may contain
duplicates (both of these are new features in version 1.2.1).
While this behaviour may cause some confusion for those used to NumPy's 'fancy indexing' rules,
it provides a very powerful way to extract data from multidimensional netCDF
variables by using logical operations on the dimension arrays to create slices.
For example,
>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
will extract time indices 0,2 and 4, pressure levels
850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern
Hemisphere longitudes, resulting in a numpy array of shape
(3, 3, 36, 71).
>>> print("shape of fancy temp slice = {}".format(tempdat.shape))
shape of fancy temp slice = (3, 3, 36, 71)
Special note for scalar variables: To extract data from a scalar variable
v
with no associated dimensions, use numpy.asarray(v)
or v[…]
.
The result will be a numpy scalar array.
By default, netcdf4-python returns numpy masked arrays with values equal to the
missing_value
or _FillValue
variable attributes masked for primitive and
enum data types.
The Dataset.set_auto_mask()
Dataset
and Variable
methods
can be used to disable this feature so that
numpy arrays are always returned, with the missing values included. Prior to
version 1.4.0 the default behavior was to only return masked arrays when the
requested slice contained missing values.
This behavior can be recovered
using the Dataset.set_always_mask()
method. If a masked array is
written to a netCDF variable, the masked elements are filled with the
value specified by the missing_value
attribute.
If the variable has
no missing_value
, the _FillValue
is used instead.
Dealing with time coordinates
Time coordinate values pose a special challenge to netCDF users.
metadata standards (such as CF) specify that time should be
measure relative to a fixed date using a certain calendar, with units
specified like hours since YY-MM-DD hh:mm:ss
.
These units can be
awkward to deal with, without a utility to convert the values to and
from calendar dates.
The functions num2date
and date2num are
provided by cftime to do just that.
Here's an example of how they can be used:
>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from cftime import num2date, date2num
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)
>>> print("time values (in units {}):\n{}".format(times.units, times[:]))
time values (in units hours since 0001-01-01 00:00:00.0):
[17533104. 17533116. 17533128. 17533140. 17533152.]
>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)
>>> print("dates corresponding to time values:\n{}".format(dates))
[cftime.DatetimeGregorian(2001, 3, 1, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2001, 3, 1, 12, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2001, 3, 2, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2001, 3, 2, 12, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(2001, 3, 3, 0, 0, 0, 0, has_year_zero=False)]
num2date()
converts numeric values of time in the specified units
and calendar
to datetime objects, and date2num()
does the reverse.
All the calendars currently defined in the
CF metadata convention are supported.
A function called date2index()
is also provided which returns the indices
of a netCDF time variable corresponding to a sequence of datetime instances.
Reading data from a multi-file netCDF dataset
If you want to read data from a variable that spans multiple netCDF files,
you can use the MFDataset
class to read the data as if it were
contained in a single file. Instead of using a single filename to create
a Dataset
instance, create a MFDataset
instance with either a list
of filenames, or a string with a wildcard (which is then converted to
a sorted list of files using the python glob module).
Variables in the list of files that share the same unlimited
dimension are aggregated together, and can be sliced across multiple
files.
To illustrate this, let's first create a bunch of netCDF files with
the same variable (with the same unlimited dimension).
The files
must in be in NETCDF3_64BIT_OFFSET
, NETCDF3_64BIT_DATA
, NETCDF3_CLASSIC
or
NETCDF4_CLASSIC
format (NETCDF4
formatted multi-file
datasets are not supported).
>>> for nf in range(10):
... with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
... _ = f.createDimension("x",None)
... x = f.createVariable("x","i",("x",))
... x[0:10] = np.arange(nf*10,10*(nf+1))
Now read all the files back in at once with MFDataset
>>> from netCDF4 import MFDataset
>>> f = MFDataset("mftest*nc")
>>> print(f.variables["x"][:])
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
Note that MFDataset
can only be used to read, not write, multi-file
datasets.
Efficient compression of netCDF variables
Data stored in netCDF Variable
objects can be compressed and
decompressed on the fly. The compression algorithm used is determined
by the compression
keyword argument to the Dataset.createVariable()
method.
zlib
compression is always available, szip
is available if the linked HDF5
library supports it, and zstd
, bzip2
, blosc_lz
,blosc_lz4
,blosc_lz4hc
,
blosc_zlib
and blosc_zstd
are available via optional external plugins.
The complevel
keyword regulates the
speed and efficiency of the compression for zlib
, bzip
and zstd
(1 being fastest, but lowest
compression ratio, 9 being slowest but best compression ratio). The
default value of complevel
is 4. Setting shuffle=False
will turn
off the HDF5 shuffle filter, which de-interlaces a block of data before
zlib
compression by reordering the bytes.
The shuffle filter can
significantly improve compression ratios, and is on by default if compression=zlib
.
Setting
fletcher32
keyword argument to
Dataset.createVariable()
to True
(it's False
by
default) enables the Fletcher32 checksum algorithm for error detection.
It's also possible to set the HDF5 chunking parameters and endian-ness
of the binary data stored in the HDF5 file with the chunksizes
and endian
keyword arguments to
Dataset.createVariable()
.
These keyword arguments only
are relevant for NETCDF4
and NETCDF4_CLASSIC
files (where the
underlying file format is HDF5) and are silently ignored if the file
format is NETCDF3_CLASSIC
, NETCDF3_64BIT_OFFSET
or NETCDF3_64BIT_DATA
.
If the HDF5 library is built with szip support, compression=szip
can also
be used (in conjunction with the szip_coding
and szip_pixels_per_block
keyword
arguments).
If your data only has a certain number of digits of precision (say for
example, it is temperature data that was measured with a precision of
0.1 degrees), you can dramatically improve compression by
quantizing (or truncating) the data. There are two methods supplied for
doing this.
You can use the least_significant_digit
keyword argument to Dataset.createVariable()
to specify
the power of ten of the smallest decimal place in
the data that is a reliable value. For example if the data has a
precision of 0.1, then setting least_significant_digit=1
will cause
data the data to be quantized using numpy.around(scale*data)/scale
, where
scale = 2**bits, and bits is determined so that a precision of 0.1 is
retained (in this case bits=4).
This is done at the python level and is
not a part of the underlying C library.
Starting with netcdf-c version 4.9.0,
a quantization capability is provided in the library.
This can be
used via the significant_digits
Dataset.createVariable()
kwarg (new in
version 1.6.0).
The interpretation of significant_digits
is different than least_signficant_digit
in that it specifies the absolute number of significant digits independent
of the magnitude of the variable (the floating point exponent).
Either of these approaches makes the compression
'lossy' instead of 'lossless', that is some precision in the data is
sacrificed for the sake of disk space.
In our example, try replacing the line
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
and then
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
or with netcdf-c >= 4.9.0
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
and see how much smaller the resulting files are.
Beyond homogeneous arrays of a fixed type - compound data types
Compound data types map directly to numpy structured (a.k.a 'record')
arrays.
Structured arrays are akin to C structs, or derived types
in Fortran. They allow for the construction of table-like structures
composed of combinations of other data types, including other
compound types. Compound types might be useful for representing multiple
parameter values at each point on a grid, or at each time and space
location for scattered (point) data. You can then access all the
information for a point by reading one variable, instead of reading
different parameters from different variables.
Compound data types
are created from the corresponding numpy data type using the
Dataset.createCompoundType()
method of a Dataset
or Group
instance.
Since there is no native complex data type in netcdf (but see
Support for complex numbers), compound
types are handy for storing numpy complex arrays. Here's an example:
>>> f = Dataset("complex.nc","w")
>>> size = 3 # length of 1-d complex array
>>> # create sample complex data.
>>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
>>> # create complex128 compound data type.
>>> complex128 = np.dtype([("real",np.float64),("imag",np.float64)])
>>> complex128_t = f.createCompoundType(complex128,"complex128")
>>> # create a variable with this data type, write some data to it.
>>> x_dim = f.createDimension("x_dim",None)
>>> v = f.createVariable("cmplx_var",complex128_t,"x_dim")
>>> data = np.empty(size,complex128) # numpy structured array
>>> data["real"] = datac.real; data["imag"] = datac.imag
>>> v[:] = data # write numpy structured array to netcdf compound var
>>> # close and reopen the file, check the contents.
>>> f.close(); f = Dataset("complex.nc")
>>> v = f.variables["cmplx_var"]
>>> datain = v[:] # read in all the data into a numpy structured array
>>> # create an empty numpy complex array
>>> datac2 = np.empty(datain.shape,np.complex128)
>>> # .. fill it with contents of structured array.
>>> datac2.real = datain["real"]; datac2.imag = datain["imag"]
>>> print('{}: {}'.format(datac.dtype, datac)) # original data
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>> print('{}: {}'.format(datac2.dtype, datac2)) # data from file
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
Compound types can be nested, but you must create the 'inner'
ones first. All possible numpy structured arrays cannot be
represented as Compound variables - an error message will be
raise if you try to create one that is not supported.
All of the compound types defined for a Dataset
or Group
are stored
in a Python dictionary, just like variables and dimensions. As always, printing
objects gives useful summary information in an interactive session:
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x_dim(3)
variables(dimensions): {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True} cmplx_var(x_dim)
groups:
>>> print(f.variables["cmplx_var"])
<class 'netCDF4._netCDF4.Variable'>
compound cmplx_var(x_dim)
compound data type: {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
unlimited dimensions: x_dim
current shape = (3,)
>>> print(f.cmptypes)
{'complex128': <class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}}
>>> print(f.cmptypes["complex128"])
<class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
Variable-length (vlen) data types
NetCDF 4 has support for variable-length or "ragged" arrays.
These are arrays
of variable length sequences having the same type. To create a variable-length
data type, use the Dataset.createVLType()
method
method of a Dataset
or Group
instance.
>>> f = Dataset("tst_vlen.nc","w")
>>> vlen_t = f.createVLType(np.int32, "phony_vlen")
The numpy datatype of the variable-length sequences and the name of the
new datatype must be specified. Any of the primitive datatypes can be
used (signed and unsigned integers, 32 and 64 bit floats, and characters),
but compound data types cannot.
A new variable can then be created using this datatype.
>>> x = f.createDimension("x",3)
>>> y = f.createDimension("y",4)
>>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
Since there is no native vlen datatype in numpy, vlen arrays are represented
in python as object arrays (arrays of dtype object
). These are arrays whose
elements are Python object pointers, and can contain any type of python object.
For this application, they must contain 1-D numpy arrays all of the same type
but of varying length.
In this case, they contain 1-D numpy int32
arrays of random length between
1 and 10.
>>> import random
>>> random.seed(54321)
>>> data = np.empty(len(y)*len(x),object)
>>> for n in range(len(y)*len(x)):
... data[n] = np.arange(random.randint(1,10),dtype="int32")+1
>>> data = np.reshape(data,(len(y),len(x)))
>>> vlvar[:] = data
>>> print("vlen variable =\n{}".format(vlvar[:]))
vlen variable =
[[array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32) array([1, 2], dtype=int32)
array([1, 2, 3, 4], dtype=int32)]
[array([1, 2, 3], dtype=int32)
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)]
[array([1, 2, 3, 4, 5, 6, 7], dtype=int32) array([1, 2, 3], dtype=int32)
array([1, 2, 3, 4, 5, 6], dtype=int32)]
[array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
array([1, 2, 3, 4, 5], dtype=int32) array([1, 2], dtype=int32)]]
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(3), y(4)
variables(dimensions): int32 phony_vlen_var(y,x)
groups:
>>> print(f.variables["phony_vlen_var"])
<class 'netCDF4._netCDF4.Variable'>
vlen phony_vlen_var(y, x)
vlen data type: int32
unlimited dimensions:
current shape = (4, 3)
>>> print(f.vltypes["phony_vlen"])
<class 'netCDF4._netCDF4.VLType'>: name = 'phony_vlen', numpy dtype = int32
Numpy object arrays containing python strings can also be written as vlen
variables,
For vlen strings, you don't need to create a vlen data type.
Instead, simply use the python str
builtin (or a numpy string datatype
with fixed length greater than 1) when calling the
Dataset.createVariable()
method.
>>> z = f.createDimension("z",10)
>>> strvar = f.createVariable("strvar", str, "z")
In this example, an object array is filled with random python strings with
random lengths between 2 and 12 characters, and the data in the object
array is assigned to the vlen string variable.
>>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> data = np.empty(10,"O")
>>> for n in range(10):
... stringlen = random.randint(2,12)
... data[n] = "".join([random.choice(chars) for i in range(stringlen)])
>>> strvar[:] = data
>>> print("variable-length string variable:\n{}".format(strvar[:]))
variable-length string variable:
['Lh' '25F8wBbMI' '53rmM' 'vvjnb3t63ao' 'qjRBQk6w' 'aJh' 'QF'
'jtIJbJACaQk4' '3Z5' 'bftIIq']
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(3), y(4), z(10)
variables(dimensions): int32 phony_vlen_var(y,x), <class 'str'> strvar(z)
groups:
>>> print(f.variables["strvar"])
<class 'netCDF4._netCDF4.Variable'>
vlen strvar(z)
vlen data type: <class 'str'>
unlimited dimensions:
current shape = (10,)
It is also possible to set contents of vlen string variables with numpy arrays
of any string or unicode data type. Note, however, that accessing the contents
of such variables will always return numpy arrays with dtype object
.
Enum data type
netCDF4 has an enumerated data type, which is an integer datatype that is
restricted to certain named values. Since Enums don't map directly to
a numpy data type, they are read and written as integer arrays.
Here's an example of using an Enum type to hold cloud type data.
The base integer data type and a python dictionary describing the allowed
values and their names are used to define an Enum data type using
Dataset.createEnumType()
.
>>> nc = Dataset('clouds.nc','w')
>>> # python dict with allowed values and their names.
>>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
... 'Stratus': 2, 'Clear': 0,
... 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5,
... 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> # create the Enum type called 'cloud_t'.
>>> cloud_type = nc.createEnumType(np.uint8,'cloud_t',enum_dict)
>>> print(cloud_type)
<class 'netCDF4._netCDF4.EnumType'>: name = 'cloud_t', numpy dtype = uint8, fields/values ={'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}
A new variable can be created in the usual way using this data type.
Integer data is written to the variable that represents the named
cloud types in enum_dict. A ValueError
will be raised if an attempt
is made to write an integer value not associated with one of the
specified names.
>>> time = nc.createDimension('time',None)
>>> # create a 1d variable of type 'cloud_type'.
>>> # The fill_value is set to the 'Missing' named value.
>>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
... fill_value=enum_dict['Missing'])
>>> # write some data to the variable.
>>> cloud_var[:] = [enum_dict[k] for k in ['Clear', 'Stratus', 'Cumulus',
... 'Missing', 'Cumulonimbus']]
>>> nc.close()
>>> # reopen the file, read the data.
>>> nc = Dataset('clouds.nc')
>>> cloud_var = nc.variables['primary_cloud']
>>> print(cloud_var)
<class 'netCDF4._netCDF4.Variable'>
enum primary_cloud(time)
_FillValue: 255
enum data type: uint8
unlimited dimensions: time
current shape = (5,)
>>> print(cloud_var.datatype.enum_dict)
{'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> print(cloud_var[:])
[0 2 4 -- 1]
>>> nc.close()
Parallel IO
If MPI parallel enabled versions of netcdf and hdf5 or pnetcdf are detected,
and mpi4py is installed, netcdf4-python will
be built with parallel IO capabilities enabled. Parallel IO of NETCDF4 or
NETCDF4_CLASSIC formatted files is only available if the MPI parallel HDF5
library is available. Parallel IO of classic netcdf-3 file formats is only
available if the PnetCDF library is
available. To use parallel IO, your program must be running in an MPI
environment using mpi4py.
>>> from mpi4py import MPI
>>> import numpy as np
>>> from netCDF4 import Dataset
>>> rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run)
To run an MPI-based parallel program like this, you must use mpiexec
to launch several
parallel instances of Python (for example, using mpiexec -np 4 python mpi_example.py
).
The parallel features of netcdf4-python are mostly transparent -
when a new dataset is created or an existing dataset is opened,
use the parallel
keyword to enable parallel access.
>>> nc = Dataset('parallel_test.nc','w',parallel=True)
The optional comm
keyword may be used to specify a particular
MPI communicator (MPI_COMM_WORLD
is used by default).
Each process (or rank)
can now write to the file indepedently.
In this example the process rank is
written to a different variable index on each task
>>> d = nc.createDimension('dim',4)
>>> v = nc.createVariable('var', np.int64, 'dim')
>>> v[rank] = rank
>>> nc.close()
% ncdump parallel_test.nc
netcdf parallel_test {
dimensions:
dim = 4 ;
variables:
int64 var(dim) ;
data:
var = 0, 1, 2, 3 ;
There are two types of parallel IO, independent (the default) and collective.
Independent IO means that each process can do IO independently. It should not
depend on or be affected by other processes. Collective IO is a way of doing
IO defined in the MPI-IO standard; unlike independent IO, all processes must
participate in doing IO. To toggle back and forth between
the two types of IO, use the Variable.set_collective()
Variable
method. All metadata
operations (such as creation of groups, types, variables, dimensions, or attributes)
are collective.
There are a couple of important limitations of parallel IO:
parallel IO for NETCDF4 or NETCDF4_CLASSIC formatted files is only available
if the netcdf library was compiled with MPI enabled HDF5.
parallel IO for all classic netcdf-3 file formats is only available if the
netcdf library was compiled with PnetCDF.
If a variable has an unlimited dimension, appending data must be done in collective mode.
If the write is done in independent mode, the operation will fail with a
a generic "HDF Error".
You can write compressed data in parallel only with netcdf-c >= 4.7.4
and hdf5 >= 1.10.3 (although you can read in parallel with earlier versions). To write
compressed data in parallel, the variable must be in 'collective IO mode'.
This is done
automatically on variable creation if compression is turned on, but if you are appending
to a variable in an existing file, you must use Variable.set_collective()(True)
before attempting
to write to it.
You cannot use variable-length (VLEN) data types.
Dealing with strings
The most flexible way to store arrays of strings is with the
Variable-length (vlen) string data type. However, this requires
the use of the NETCDF4 data model, and the vlen type does not map very well
numpy arrays (you have to use numpy arrays of dtype=object
, which are arrays of
arbitrary python objects). numpy does have a fixed-width string array
data type, but unfortunately the netCDF data model does not.
Instead fixed-width byte strings are typically stored as arrays of 8-bit
characters.
To perform the conversion to and from character arrays to fixed-width numpy string arrays, the
following convention is followed by the python interface.
If the _Encoding
special attribute is set for a character array
(dtype S1
) variable, the chartostring()
utility function is used to convert the array of
characters to an array of strings with one less dimension (the last dimension is
interpreted as the length of each string) when reading the data. The character
set (usually ascii) is specified by the _Encoding
attribute. If _Encoding
is 'none' or 'bytes', then the character array is converted to a numpy
fixed-width byte string array (dtype S#
), otherwise a numpy unicode (dtype
U#
) array is created.
When writing the data,
stringtochar()
is used to convert the numpy string array to an array of
characters with one more dimension. For example,
>>> from netCDF4 import stringtochar
>>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
>>> _ = nc.createDimension('nchars',3)
>>> _ = nc.createDimension('nstrings',None)
>>> v = nc.createVariable('strings','S1',('nstrings','nchars'))
>>> datain = np.array(['foo','bar'],dtype='S3')
>>> v[:] = stringtochar(datain) # manual conversion to char array
>>> print(v[:]) # data returned as char array
[[b'f' b'o' b'o']
[b'b' b'a' b'r']]
>>> v._Encoding = 'ascii' # this enables automatic conversion
>>> v[:] = datain # conversion to char array done internally
>>> print(v[:]) # data returned in numpy string array
['foo' 'bar']
>>> nc.close()
Even if the _Encoding
attribute is set, the automatic conversion of char
arrays to/from string arrays can be disabled with
Variable.set_auto_chartostring()
.
A similar situation is often encountered with numpy structured arrays with subdtypes
containing fixed-wdith byte strings (dtype=S#
). Since there is no native fixed-length string
netCDF datatype, these numpy structure arrays are mapped onto netCDF compound
types with character array elements.
In this case the string <-> char array
conversion is handled automatically (without the need to set the _Encoding
attribute) using numpy
views.
The structured array dtype (including the string elements) can even be used to
define the compound data type - the string dtype will be converted to
character array dtype under the hood when creating the netcdf compound type.
Here's an example:
>>> nc = Dataset('compoundstring_example.nc','w')
>>> dtype = np.dtype([('observation', 'f4'),
... ('station_name','S10')])
>>> station_data_t = nc.createCompoundType(dtype,'station_data')
>>> _ = nc.createDimension('station',None)
>>> statdat = nc.createVariable('station_obs', station_data_t, ('station',))
>>> data = np.empty(2,dtype)
>>> data['observation'][:] = (123.,3.14)
>>> data['station_name'][:] = ('Boulder','New York')
>>> print(statdat.dtype) # strings actually stored as character arrays
{'names':['observation','station_name'], 'formats':['<f4',('S1', (10,))], 'offsets':[0,4], 'itemsize':16, 'aligned':True}
>>> statdat[:] = data # strings converted to character arrays internally
>>> print(statdat[:]) # character arrays converted back to strings
[(123. , b'Boulder') ( 3.14, b'New York')]
>>> print(statdat[:].dtype)
{'names':['observation','station_name'], 'formats':['<f4','S10'], 'offsets':[0,4], 'itemsize':16, 'aligned':True}
>>> statdat.set_auto_chartostring(False) # turn off auto-conversion
>>> statdat[:] = data.view(dtype=[('observation', 'f4'),('station_name','S1',10)])
>>> print(statdat[:]) # now structured array with char array subtype is returned
[(123. , [b'B', b'o', b'u', b'l', b'd', b'e', b'r', b'', b'', b''])
( 3.14, [b'N', b'e', b'w', b' ', b'Y', b'o', b'r', b'k', b'', b''])]
>>> nc.close()
Note that there is currently no support for mapping numpy structured arrays with
unicode elements (dtype U#
) onto netCDF compound types, nor is there support
for netCDF compound types with vlen string components.
In-memory (diskless) Datasets
You can create netCDF Datasets whose content is held in memory
instead of in a disk file.
There are two ways to do this.
If you
don't need access to the memory buffer containing the Dataset from
within python, the best way is to use the diskless=True
keyword
argument when creating the Dataset.
If you want to save the Dataset
to disk when you close it, also set persist=True
.
If you want to
create a new read-only Dataset from an existing python memory buffer, use the
memory
keyword argument to pass the memory buffer when creating the Dataset.
If you want to create a new in-memory Dataset, and then access the memory buffer
directly from Python, use the memory
keyword argument to specify the
estimated size of the Dataset in bytes when creating the Dataset with
mode='w'
.
Then, the Dataset.close()
method will return a python memoryview
object representing the Dataset. Below are examples illustrating both
approaches.
>>> # create a diskless (in-memory) Dataset,
>>> # and persist the file to disk when it is closed.
>>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
>>> d = nc.createDimension('x',None)
>>> v = nc.createVariable('v',np.int32,'x')
>>> v[0:5] = np.arange(5)
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(5)
variables(dimensions): int32 v(x)
groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close() # file saved to disk
>>> # create an in-memory dataset from an existing python
>>> # python memory buffer.
>>> # read the newly created netcdf file into a python
>>> # bytes object.
>>> with open('diskless_example.nc', 'rb') as f:
... nc_bytes = f.read()
>>> # create a netCDF in-memory dataset from the bytes object.
>>> nc = Dataset('inmemory.nc', memory=nc_bytes)
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(5)
variables(dimensions): int32 v(x)
groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close()
>>> # create an in-memory Dataset and retrieve memory buffer
>>> # estimated size is 1028 bytes - this is actually only
>>> # used if format is NETCDF3
>>> # (ignored for NETCDF4/HDF5 files).
>>> nc = Dataset('inmemory.nc', mode='w',memory=1028)
>>> d = nc.createDimension('x',None)
>>> v = nc.createVariable('v',np.int32,'x')
>>> v[0:5] = np.arange(5)
>>> nc_buf = nc.close() # close returns memoryview
>>> print(type(nc_buf))
<class 'memoryview'>
>>> # save nc_buf to disk, read it back in and check.
>>> with open('inmemory.nc', 'wb') as f:
... f.write(nc_buf)
>>> nc = Dataset('inmemory.nc')
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
dimensions(sizes): x(5)
variables(dimensions): int32 v(x)
groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close()
Support for complex numbers
Although there is no native support for complex numbers in netCDF, there are
some common conventions for storing them. Two of the most common are to either
use a compound datatype for the real and imaginary components, or a separate
dimension. netCDF4
supports reading several of these conventions, as well as
writing using one of two conventions (depending on file format). This support
for complex numbers is enabled by setting auto_complex=True
when opening a
Dataset
:
>>> complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j])
>>> with netCDF4.Dataset("complex.nc", "w", auto_complex=True) as nc:
... nc.createDimension("x", size=len(complex_array))
... var = nc.createVariable("data", "c16", ("x",))
... var[:] = complex_array
... print(var)
<class 'netCDF4._netCDF4.Variable'>
compound data(x)
compound data type: complex128
unlimited dimensions:
current shape = (5,)
When reading files using auto_complex=True
, netCDF4
will interpret variables
stored using the following conventions as complex numbers:
compound datatypes with two float
or double
members who names begin with
r
and i
(case insensitive)
a dimension of length 2 named complex
or ri
When writing files using auto_complex=True
, netCDF4
will use:
a compound datatype named _PFNC_DOUBLE_COMPLEX_TYPE
(or *FLOAT*
as
appropriate) with members r
and i
for netCDF4 formats;
or a dimension of length 2 named _pfnc_complex
for netCDF3 or classic
formats.
Support for complex numbers is handled via the
nc-complex
library. See there for
further details.
contact: Jeffrey Whitaker [email protected]
copyright: 2008 by Jeffrey Whitaker.
license: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
chartostring(b,encoding='utf-8')
convert a character array to a string array with one less dimension.
Input character array (numpy datatype 'S1'
or 'U1'
).
Will be converted to a array of strings, where each string has a fixed
length of b.shape[-1]
characters.
optional kwarg encoding
can be used to specify character encoding (default
utf-8
). If encoding
is 'none' or 'bytes', a numpy.string_
btye array is
returned.
returns a numpy string array with datatype 'UN'
(or 'SN'
) and shape
b.shape[:-1]
where where N=b.shape[-1]
.
def date2index(dates, nctime, calendar=None, select='exact', has_year_zero=None)
Return indices of a netCDF time variable corresponding to the given dates.
dates: A datetime object or a sequence of datetime objects.
The datetime objects should not include a time-zone offset.
nctime: A netCDF time variable object. The nctime object must have a
units attribute.
calendar: describes the calendar to be used in the time calculations.
All the values currently defined in the
CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>
__ are supported.
Valid calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'.
Default is None
which means the calendar associated with the first
input datetime instance will be used.
select: 'exact', 'before', 'after', 'nearest'
The index selection method. exact will return the indices perfectly
matching the dates given. before and after will return the indices
corresponding to the dates just before or just after the given dates if
an exact match cannot be found. nearest will return the indices that
correspond to the closest dates.
has_year_zero: if set to True, astronomical year numbering
is used and the year zero exists.
If set to False for real-world
calendars, then historical year numbering is used and the year 1 is
preceded by year -1 and no year zero exists.
The defaults are set to conform with
CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True
for 'proleptic_gregorian' (ISO 8601) and True for the idealized
calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap')
The defaults can only be over-ridden for the real-world calendars,
for the the idealized calendars the year zero
always exists and the has_year_zero kwarg is ignored.
This kwarg is not needed to define calendar systems allowed by CF
(the calendar-specific defaults do this).
returns an index (indices) of the netCDF time variable corresponding
to the given datetime object(s).
def date2num(dates, units, calendar=None, has_year_zero=None, longdouble=False)
Return numeric time values given datetime objects. The units
of the numeric time values are described by the units argument
and the calendar keyword. The datetime objects must
be in UTC with no time-zone offset.
If there is a
time-zone offset in units, it will be applied to the
returned numeric values.
dates: A datetime object or a sequence of datetime objects.
The datetime objects should not include a time-zone offset. They
can be either native python datetime instances (which use
the proleptic gregorian calendar) or cftime.datetime instances.
units: a string of the form
describing the time units. can be days, hours, minutes,
seconds, milliseconds or microseconds. is the time
origin. months since is allowed only for the 360_day calendar
and common_years since is allowed only for the 365_day calendar.
calendar: describes the calendar to be used in the time calculations.
All the values currently defined in the
CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>
__ are supported.
Valid calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'.
Default is None
which means the calendar associated with the first
input datetime instance will be used.
has_year_zero: If set to True, astronomical year numbering
is used and the year zero exists.
If set to False for real-world
calendars, then historical year numbering is used and the year 1 is
preceded by year -1 and no year zero exists.
The defaults are set to conform with
CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True
for 'proleptic_gregorian' (ISO 8601) and True for the idealized
calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap')
Note that CF v1.9 does not specifically mention whether year zero
is allowed in the proleptic_gregorian calendar, but ISO-8601 has
a year zero so we have adopted this as the default.
The defaults can only be over-ridden for the real-world calendars,
for the the idealized calendars the year zero
always exists and the has_year_zero kwarg is ignored.
This kwarg is not needed to define calendar systems allowed by CF
(the calendar-specific defaults do this).
longdouble: If set True, output is in the long double float type
(numpy.float128) instead of float (numpy.float64), allowing microsecond
accuracy when converting a time value to a date and back again. Otherwise
this is only possible if the discretization of the time variable is an
integer multiple of the units.
returns a numeric time value, or an array of numeric time values
with approximately 1 microsecond accuracy.
def get_alignment()
return current netCDF alignment within HDF5 files in a tuple
(threshold,alignment). See netcdf C library documentation for
nc_get_alignment
for details. Values can be reset with
set_alignment()
.
This function was added in netcdf 4.9.0.
def get_chunk_cache()
return current netCDF chunk cache information in a tuple (size,nelems,preemption).
See netcdf C library documentation for nc_get_chunk_cache
for
details. Values can be reset with set_chunk_cache()
.
def getlibversion()
returns a string describing the version of the netcdf library
used to build the module, and when it was built.
def num2date(times, units, calendar='standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None)
Return datetime objects given numeric time values. The units
of the numeric time values are described by the units argument
and the calendar keyword. The returned datetime objects represent
UTC with no time-zone offset, even if the specified
units contain a time-zone offset.
times: numeric time values.
units: a string of the form
describing the time units. can be days, hours, minutes,
seconds, milliseconds or microseconds. is the time
origin. months since is allowed only for the 360_day calendar
and common_years since is allowed only for the 365_day calendar.
calendar: describes the calendar used in the time calculations.
All the values currently defined in the
CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>
__ are supported.
Valid calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'.
Default is 'standard', which is a mixed Julian/Gregorian calendar.
only_use_cftime_datetimes: if False, python datetime.datetime
objects are returned from num2date where possible; if True dates which
subclass cftime.datetime are returned for all calendars. Default True.
only_use_python_datetimes: always return python datetime.datetime
objects and raise an error if this is not possible. Ignored unless
only_use_cftime_datetimes=False. Default False.
has_year_zero: if set to True, astronomical year numbering
is used and the year zero exists.
If set to False for real-world
calendars, then historical year numbering is used and the year 1 is
preceded by year -1 and no year zero exists.
The defaults are set to conform with
CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True
for 'proleptic_gregorian' (ISO 8601) and True for the idealized
calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap')
The defaults can only be over-ridden for the real-world calendars,
for the the idealized calendars the year zero
always exists and the has_year_zero kwarg is ignored.
This kwarg is not needed to define calendar systems allowed by CF
(the calendar-specific defaults do this).
returns a datetime instance, or an array of datetime instances with
microsecond accuracy, if possible.
Note: If only_use_cftime_datetimes=False and
use_only_python_datetimes=False, the datetime instances
returned are 'real' python datetime
objects if calendar='proleptic_gregorian', or
calendar='standard' or 'gregorian'
and the date is after the breakpoint between the Julian and
Gregorian calendars (1582-10-15). Otherwise, they are ctime.datetime
objects which support some but not all the methods of native python
datetime objects. The datetime instances
do not contain a time-zone offset, even if the specified units
contains one.
def set_alignment(threshold, alignment)
set_alignment()(threshold,alignment)
Change the HDF5 file alignment.
See netcdf C library documentation for nc_set_alignment
for
details.
This function was added in netcdf 4.9.0.
def set_chunk_cache(size=None, nelems=None, preemption=None)
set_chunk_cache(self,size=None,nelems=None,preemption=None)
change netCDF4 chunk cache settings.
See netcdf C library documentation for nc_set_chunk_cache
for
details.
def stringtoarr(string, NUMCHARS, dtype='S')
stringtoarr(a, NUMCHARS,dtype='S')
convert a string to a character array of length NUMCHARS
Input python string.
NUMCHARS
:
number of characters used to represent string
(if len(a) < NUMCHARS
, it will be padded on the right with blanks).
dtype
:
type of numpy array to return.
Default is 'S'
, which
means an array of dtype 'S1'
will be returned.
If dtype='U'
, a
unicode array (dtype = 'U1'
) will be returned.
returns a rank 1 numpy character array of length NUMCHARS with datatype 'S1'
(default) or 'U1'
(if dtype='U'
)
def stringtochar(a, encoding='utf-8')
stringtochar(a,encoding='utf-8')
convert a string array to a character array with one extra dimension
Input numpy string array with numpy datatype 'SN'
or 'UN'
, where N
is the number of characters in each string.
Will be converted to
an array of characters (datatype 'S1'
or 'U1'
) of shape a.shape + (N,)
.
optional kwarg encoding
can be used to specify character encoding (default
utf-8
). If encoding
is 'none' or 'bytes', a numpy.string_
the input array
is treated a raw byte strings (numpy.string_
).
returns a numpy character array with datatype 'S1'
or 'U1'
and shape a.shape + (N,)
, where N is the length of each string in a.
A CompoundType
instance is used to describe a compound data
type, and can be passed to the the Dataset.createVariable()
method of
a Dataset
or Group
instance.
Compound data types map to numpy structured arrays.
See CompoundType
for more details.
The instance variables dtype
and name
should not be modified by
the user.
__init__(group, datatype, datatype_name)
CompoundType constructor.
group
: Group
instance to associate with the compound datatype.
datatype
: A numpy dtype object describing a structured (a.k.a record)
array.
Can be composed of homogeneous numeric or character data types, or
other structured array data types.
datatype_name
: a Python string containing a description of the
compound data type.
Note 1: When creating nested compound data types,
the inner compound data types must already be associated with CompoundType
instances (so create CompoundType instances for the innermost structures
first).
Note 2: CompoundType
instances should be created using the
Dataset.createCompoundType()
method of a Dataset
or
Group
instance, not using this class directly.
Instance variables
var dtype
Return an attribute of instance, which is of type owner.
var dtype_view
Return an attribute of instance, which is of type owner.
var name
Return an attribute of instance, which is of type owner.
A netCDF Dataset
is a collection of dimensions, groups, variables and
attributes. Together they describe the meaning of data and relations among
data fields stored in a netCDF file. See Dataset
for more
details.
A list of attribute names corresponding to global netCDF attributes
defined for the Dataset
can be obtained with the
Dataset.ncattrs()
method.
These attributes can be created by assigning to an attribute of the
Dataset
instance. A dictionary containing all the netCDF attribute
name/value pairs is provided by the __dict__
attribute of a
Dataset
instance.
The following class variables are read-only and should not be
modified by the user.
dimensions
: The dimensions
dictionary maps the names of
dimensions defined for the Group
or Dataset
to instances of the
Dimension
class.
variables
: The variables
dictionary maps the names of variables
defined for this Dataset
or Group
to instances of the
Variable
class.
groups
: The groups dictionary maps the names of groups created for
this Dataset
or Group
to instances of the Group
class (the
Dataset
class is simply a special case of the Group
class which
describes the root group in the netCDF4 file).
cmptypes
: The cmptypes
dictionary maps the names of
compound types defined for the Group
or Dataset
to instances of the
CompoundType
class.
vltypes
: The vltypes
dictionary maps the names of
variable-length types defined for the Group
or Dataset
to instances
of the VLType
class.
enumtypes
: The enumtypes
dictionary maps the names of
Enum types defined for the Group
or Dataset
to instances
of the EnumType
class.
data_model
: data_model
describes the netCDF
data model version, one of NETCDF3_CLASSIC
, NETCDF4
,
NETCDF4_CLASSIC
, NETCDF3_64BIT_OFFSET
or NETCDF3_64BIT_DATA
.
file_format
: same as data_model
, retained for backwards compatibility.
disk_format
: disk_format
describes the underlying
file format, one of NETCDF3
, HDF5
, HDF4
,
PNETCDF
, DAP2
, DAP4
or UNDEFINED
. Only available if using
netcdf C library version >= 4.3.1, otherwise will always return
UNDEFINED
.
parent
: parent
is a reference to the parent
Group
instance. None
for the root group or Dataset
instance.
path
: path
shows the location of the Group
in
the Dataset
in a unix directory format (the names of groups in the
hierarchy separated by backslashes). A Dataset
instance is the root
group, so the path is simply '/'
.
keepweakref
: If True
, child Dimension and Variables objects only keep weak
references to the parent Dataset or Group.
_ncstring_attrs__
: If True
, all text attributes will be written as variable-length
strings.
__init__(self, filename, mode="r", clobber=True, diskless=False,
persist=False, keepweakref=False, memory=None, encoding=None,
parallel=False, comm=None, info=None, format='NETCDF4')
Dataset
constructor.
filename
: Name of netCDF file to hold dataset. Can also
be a python 3 pathlib instance or the URL of an OpenDAP dataset.
When memory is
set this is just used to set the filepath()
.
mode
: access mode. r
means read-only; no data can be
modified. w
means write; a new file is created, an existing file with
the same name is deleted. x
means write, but fail if an existing
file with the same name already exists. a
and r+
mean append;
an existing file is opened for reading and writing, if
file does not exist already, one is created.
Appending s
to modes r
, w
, r+
or a
will enable unbuffered shared
access to NETCDF3_CLASSIC
, NETCDF3_64BIT_OFFSET
or
NETCDF3_64BIT_DATA
formatted files.
Unbuffered access may be useful even if you don't need shared
access, since it may be faster for programs that don't access data
sequentially. This option is ignored for NETCDF4
and NETCDF4_CLASSIC
formatted files.
clobber
: if True
(default), opening a file with mode='w'
will clobber an existing file with the same name.
if False
, an
exception will be raised if a file with the same name already exists.
mode=x
is identical to mode=w
with clobber=False.
format
: underlying file format (one of 'NETCDF4',
'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'<code>, </code>'NETCDF3_64BIT_OFFSET'
or
'NETCDF3_64BIT_DATA'
.
Only relevant if mode = 'w'
(if mode = 'r','a'
or 'r+'
the file format
is automatically detected). Default 'NETCDF4'
, which means the data is
stored in an HDF5 file, using netCDF 4 API features.
Setting
format='NETCDF4_CLASSIC'
will create an HDF5 file, using only netCDF 3
compatible API features. netCDF 3 clients must be recompiled and linked
against the netCDF 4 library to read files in NETCDF4_CLASSIC
format.
'NETCDF3_CLASSIC'
is the classic netCDF 3 file format that does not
handle 2+ Gb files. 'NETCDF3_64BIT_OFFSET'
is the 64-bit offset
version of the netCDF 3 file format, which fully supports 2+ GB files, but
is only compatible with clients linked against netCDF version 3.6.0 or
later. 'NETCDF3_64BIT_DATA'
is the 64-bit data version of the netCDF 3
file format, which supports 64-bit dimension sizes plus unsigned and
64 bit integer data types, but is only compatible with clients linked against
netCDF version 4.4.0 or later.
diskless
: If True
, create diskless (in-core) file.
This is a feature added to the C library after the
netcdf-4.2 release. If you need to access the memory buffer directly,
use the in-memory feature instead (see memory
kwarg).
persist
: if diskless=True
, persist file to disk when closed
(default False
).
keepweakref
: if True
, child Dimension and Variable instances will keep weak
references to the parent Dataset or Group object.
Default is False
, which
means strong references will be kept.
Having Dimension and Variable instances
keep a strong reference to the parent Dataset instance, which in turn keeps a
reference to child Dimension and Variable instances, creates circular references.
Circular references complicate garbage collection, which may mean increased
memory usage for programs that create may Dataset instances with lots of
Variables. It also will result in the Dataset object never being deleted, which
means it may keep open files alive as well. Setting keepweakref=True
allows
Dataset instances to be garbage collected as soon as they go out of scope, potentially
reducing memory usage and open file handles.
However, in many cases this is not
desirable, since the associated Variable instances may still be needed, but are
rendered unusable when the parent Dataset instance is garbage collected.
memory
: if not None
, create or open an in-memory Dataset.
If mode = r
, the memory kwarg must contain a memory buffer object
(an object that supports the python buffer interface).
The Dataset will then be created with contents taken from this block of memory.
If mode = w
, the memory kwarg should contain the anticipated size
of the Dataset in bytes (used only for NETCDF3 files).
A memory
buffer containing a copy of the Dataset is returned by the
Dataset.close()
method. Requires netcdf-c version 4.4.1 for mode=r
netcdf-c 4.6.2 for mode=w
. To persist the file to disk, the raw
bytes from the returned buffer can be written into a binary file.
The Dataset can also be re-opened using this memory buffer.
encoding
: encoding used to encode filename string into bytes.
Default is None (sys.getdefaultfileencoding()
is used).
parallel
: open for parallel access using MPI (requires mpi4py and
parallel-enabled netcdf-c and hdf5 libraries).
Default is False
. If
True
, comm
and info
kwargs may also be specified.
comm
: MPI_Comm object for parallel access. Default None
, which
means MPI_COMM_WORLD will be used.
Ignored if parallel=False
.
info
: MPI_Info object for parallel access. Default None
, which
means MPI_INFO_NULL will be used.
Ignored if parallel=False
.
auto_complex
: if True
, then automatically convert complex number types
Subclasses
netCDF4._netCDF4.Group
netCDF4._netCDF4.MFDataset
Static methods
def fromcdl(cdlfilename, ncfilename=None, mode='a', format='NETCDF4')
fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')
call ncgen via subprocess to create Dataset from CDL
text representation. Requires ncgen to be installed and in $PATH
.
cdlfilename
:
CDL file.
ncfilename
: netCDF file to create. If not given, CDL filename with
suffix replaced by .nc
is used..
mode
:
Access mode to open Dataset (Default 'a'
).
format
: underlying file format to use (one of 'NETCDF4',
'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'<code>, </code>'NETCDF3_64BIT_OFFSET'
or
'NETCDF3_64BIT_DATA'
. Default 'NETCDF4'
.
Dataset instance for ncfilename
is returned.
Instance variables
var auto_complex
Return an attribute of instance, which is of type owner.
var cmptypes
Return an attribute of instance, which is of type owner.
var data_model
Return an attribute of instance, which is of type owner.
var dimensions
Return an attribute of instance, which is of type owner.
var disk_format
Return an attribute of instance, which is of type owner.
var enumtypes
Return an attribute of instance, which is of type owner.
var file_format
Return an attribute of instance, which is of type owner.
var groups
Return an attribute of instance, which is of type owner.
var keepweakref
Return an attribute of instance, which is of type owner.
var name
string name of Group instance
var parent
Return an attribute of instance, which is of type owner.
var path
Return an attribute of instance, which is of type owner.
var variables
Return an attribute of instance, which is of type owner.
var vltypes
Return an attribute of instance, which is of type owner.
Methods
def close(self)
close(self)
Close the Dataset.
def createCompoundType(self, datatype, datatype_name)
createCompoundType(self, datatype, datatype_name)
Creates a new compound data type named datatype_name
from the numpy
dtype object datatype
.
Note: If the new compound data type contains other compound data types
(i.e. it is a 'nested' compound type, where not all of the elements
are homogeneous numeric data types), then the 'inner' compound types must be
created first.
The return value is the CompoundType
class instance describing the new
datatype.
def createDimension(self, dimname, size=None)
createDimension(self, dimname, size=None)
Creates a new dimension with the given dimname
and size
.
size
must be a positive integer or None
, which stands for
"unlimited" (default is None
). Specifying a size of 0 also
results in an unlimited dimension. The return value is the Dimension
class instance describing the new dimension.
To determine the current
maximum size of the dimension, use the len
function on the Dimension
instance. To determine if a dimension is 'unlimited', use the
Dimension.isunlimited()
method of the Dimension
instance.
def createEnumType(self, datatype, datatype_name, enum_dict)
createEnumType(self, datatype, datatype_name, enum_dict)
Creates a new Enum data type named datatype_name
from a numpy
integer dtype object datatype
, and a python dictionary
defining the enum fields and values.
The return value is the EnumType
class instance describing the new
datatype.
def createGroup(self, groupname)
createGroup(self, groupname)
Creates a new Group
with the given groupname
.
If groupname
is specified as a path, using forward slashes as in unix to
separate components, then intermediate groups will be created as necessary
(analogous to mkdir -p
in unix).
For example,
createGroup('/GroupA/GroupB/GroupC')
will create GroupA
,
GroupA/GroupB
, and GroupA/GroupB/GroupC
, if they don't already exist.
If the specified path describes a group that already exists, no error is
raised.
The return value is a Group
class instance.
def createVLType(self, datatype, datatype_name)
createVLType(self, datatype, datatype_name)
Creates a new VLEN data type named datatype_name
from a numpy
dtype object datatype
.
The return value is the VLType
class instance describing the new
datatype.
def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)
createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False,
complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None,
szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1,
endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom',
fill_value=None, chunk_cache=None)
Creates a new variable with the given varname
, datatype
, and
dimensions
. If dimensions are not given, the variable is assumed to be
a scalar.
If varname
is specified as a path, using forward slashes as in unix to
separate components, then intermediate groups will be created as necessary
For example, createVariable('/GroupA/GroupB/VarC', float, ('x','y'))
will create groups GroupA
and GroupA/GroupB
, plus the variable GroupA/GroupB/VarC
, if the preceding
groups don't already exist.
The datatype
can be a numpy datatype object, or a string that describes
a numpy dtype object (like the dtype.str
attribute of a numpy array).
Supported specifiers include: 'S1' or 'c' (NC_CHAR), 'i1' or 'b' or 'B'
(NC_BYTE), 'u1' (NC_UBYTE), 'i2' or 'h' or 's' (NC_SHORT), 'u2'
(NC_USHORT), 'i4' or 'i' or 'l' (NC_INT), 'u4' (NC_UINT), 'i8' (NC_INT64),
'u8' (NC_UINT64), 'f4' or 'f' (NC_FLOAT), 'f8' or 'd' (NC_DOUBLE)
.
datatype
can also be a CompoundType
instance
(for a structured, or compound array), a VLType
instance
(for a variable-length array), or the python str
builtin
(for a variable-length string array). Numpy string and unicode datatypes with
length greater than one are aliases for str
.
Data from netCDF variables is presented to python as numpy arrays with
the corresponding data type.
dimensions
must be a tuple containing Dimension
instances and/or
dimension names (strings) that have been defined
previously using Dataset.createDimension()
. The default value
is an empty tuple, which means the variable is a scalar.
If the optional keyword argument compression
is set, the data will be
compressed in the netCDF file using the specified compression algorithm.
Currently zlib
,szip
,zstd
,bzip2
,blosc_lz
,blosc_lz4
,blosc_lz4hc
,
blosc_zlib
and blosc_zstd
are supported.
Default is None
(no compression).
All of the compressors except
zlib
and szip
use the HDF5 plugin architecture.
If the optional keyword zlib
is True
, the data will be compressed in
the netCDF file using zlib compression (default False
).
The use of this option is
deprecated in favor of compression='zlib'
.
The optional keyword complevel
is an integer between 0 and 9 describing
the level of compression desired (default 4). Ignored if compression=None
.
A value of zero disables compression.
If the optional keyword shuffle
is True
, the HDF5 shuffle filter
will be applied before compressing the data with zlib (default True
).
significantly improves compression. Default is True
. Ignored if
zlib=False
.
The optional kwarg blosc_shuffle
is
ignored
unless the blosc compressor is used. blosc_shuffle
can be 0 (no shuffle),
1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1.
The optional kwargs szip_coding
and szip_pixels_per_block
are ignored
unless the szip compressor is used. szip_coding
can be ec
(entropy coding)
or nn
(nearest neighbor coding). Default is nn
. szip_pixels_per_block
can be 4, 8, 16 or 32 (default 8).
If the optional keyword fletcher32
is True
, the Fletcher32 HDF5
checksum algorithm is activated to detect errors. Default False
.
If the optional keyword contiguous
is True
, the variable data is
stored contiguously on disk.
Default False
. Setting to True
for
a variable with an unlimited dimension will trigger an error.
Fixed size variables (with no unlimited dimension) with no compression filters
are contiguous by default.
The optional keyword chunksizes
can be used to manually specify the
HDF5 chunksizes for each dimension of the variable.
A detailed discussion of HDF chunking and I/O performance is available
here.
The default chunking scheme in the netcdf-c library is discussed
here.
Basically, you want the chunk size for each dimension to match as
closely as possible the size of the data block that users will read
from the file. chunksizes
cannot be set if contiguous=True
.
The optional keyword endian
can be used to control whether the
data is stored in little or big endian format on disk. Possible
values are little, big
or native
(default). The library
will automatically handle endian conversions when the data is read,
but if the data is always going to be read on a computer with the
opposite format as the one used to create the file, there may be
some performance advantage to be gained by setting the endian-ness.
The optional keyword fill_value
can be used to override the default
netCDF _FillValue
(the value that the variable gets filled with before
any data is written to it, defaults given in the dict netCDF4.default_fillvals
).
If fill_value is set to False
, then the variable is not pre-filled.
If the optional keyword parameters least_significant_digit
or significant_digits
are
specified, variable data will be truncated (quantized). In conjunction
with compression='zlib'
this produces 'lossy', but significantly more
efficient compression. For example, if least_significant_digit=1
,
data will be quantized using numpy.around(scale*data)/scale
, where
scale = 2**bits, and bits is determined so that a precision of 0.1 is
retained (in this case bits=4). From the
PSL metadata conventions:
"least_significant_digit – power of ten of the smallest decimal place
in unpacked data that is a reliable value." Default is None
, or no
quantization, or 'lossless' compression.
If significant_digits=3
then the data will be quantized so that three significant digits are retained, independent
of the floating point exponent. The keyword argument quantize_mode
controls
the quantization algorithm (default 'BitGroom', 'BitRound' and
'GranularBitRound' also available).
The 'GranularBitRound'
algorithm may result in better compression for typical geophysical datasets.
This significant_digits
kwarg is only available
with netcdf-c >= 4.9.0, and
only works with NETCDF4
or NETCDF4_CLASSIC
formatted files.
When creating variables in a NETCDF4
or NETCDF4_CLASSIC
formatted file,
HDF5 creates something called a 'chunk cache' for each variable.
default size of the chunk cache may be large enough to completely fill
available memory when creating thousands of variables.
The optional
keyword chunk_cache
allows you to reduce (or increase) the size of
the default chunk cache when creating a variable.
The setting only
persists as long as the Dataset is open - you can use the set_var_chunk_cache
method to change it the next time the Dataset is opened.
Warning - messing with this parameter can seriously degrade performance.
The return value is the Variable
class instance describing the new
variable.
A list of names corresponding to netCDF variable attributes can be
obtained with the Variable
method Variable.ncattrs()
. A dictionary
containing all the netCDF attribute name/value pairs is provided by
the __dict__
attribute of a Variable
instance.
Variable
instances behave much like array objects. Data can be
assigned to or retrieved from a variable with indexing and slicing
operations on the Variable
instance. A Variable
instance has six
Dataset standard attributes: dimensions, dtype, shape, ndim, name
and
least_significant_digit
. Application programs should never modify
these attributes. The dimensions
attribute is a tuple containing the
names of the dimensions associated with this variable. The dtype
attribute is a string describing the variable's data type (i4, f8,
S1,<code> etc). The </code>shape
attribute is a tuple describing the current
sizes of all the variable's dimensions. The name
attribute is a
string containing the name of the Variable instance.
The least_significant_digit
attributes describes the power of ten of the smallest decimal place in
the data the contains a reliable value.
assigned to the Variable
instance. The ndim
attribute
is the number of variable dimensions.
def delncattr(self, name)
delncattr(self,name,value)
delete a netCDF dataset or group attribute.
Use if you need to delete a
netCDF attribute with the same name as one of the reserved python
attributes.
def filepath(self, encoding=None)
filepath(self,encoding=None)
Get the file system path (or the opendap URL) which was used to
open/create the Dataset. Requires netcdf >= 4.1.2.
The path
is decoded into a string using sys.getfilesystemencoding()
by default, this can be
changed using the encoding
kwarg.
def get_variables_by_attributes(self, **kwargs)
get_variables_by_attributes(self, **kwargs)
Returns a list of variables that match specific conditions.
Can pass in key=value parameters and variables are returned that
contain all of the matches. For example,
>>> # Get variables with x-axis attribute.
>>> vs = nc.get_variables_by_attributes(axis='X')
>>> # Get variables with matching "standard_name" attribute
>>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
Can pass in key=callable parameter and variables are returned if the
callable returns True.
The callable should accept a single parameter,
the attribute value.
None is given as the attribute value when the
attribute does not exist on the variable. For example,
>>> # Get Axis variables
>>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
>>> # Get variables that don't have an "axis" attribute
>>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
>>> # Get variables that have a "grid_mapping" attribute
>>> vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None)
def getncattr(self, name, encoding='utf-8')
getncattr(self,name)
retrieve a netCDF dataset or group attribute.
Use if you need to get a netCDF attribute with the same
name as one of the reserved python attributes.
option kwarg encoding
can be used to specify the
character encoding of a string attribute (default is utf-8
).
def has_blosc_filter(self)
has_blosc_filter(self)
returns True if blosc compression filter is available
def has_bzip2_filter(self)
has_bzip2_filter(self)
returns True if bzip2 compression filter is available
def has_szip_filter(self)
has_szip_filter(self)
returns True if szip compression filter is available
def has_zstd_filter(self)
has_zstd_filter(self)
returns True if zstd compression filter is available
def isopen(self)
isopen(self)
Is the Dataset open or closed?
def ncattrs(self)
def renameAttribute(self, oldname, newname)
renameAttribute(self, oldname, newname)
rename a Dataset
or Group
attribute named oldname
to newname
.
def renameDimension(self, oldname, newname)
renameDimension(self, oldname, newname)
rename a Dimension
named oldname
to newname
.
def renameGroup(self, oldname, newname)
renameGroup(self, oldname, newname)
rename a Group
named oldname
to newname
(requires netcdf >= 4.3.1).
def renameVariable(self, oldname, newname)
renameVariable(self, oldname, newname)
rename a Variable
named oldname
to newname
def set_always_mask(self, value)
set_always_mask(self, True_or_False)
Call Variable.set_always_mask()
for all variables contained in
this Dataset
or Group
, as well as for all
variables in all its subgroups.
True_or_False
: Boolean determining if automatic conversion of
masked arrays with no missing values to regular numpy arrays shall be
applied for all variables. Default True. Set to False to restore the default behaviour
in versions prior to 1.4.1 (numpy array returned unless missing values are present,
otherwise masked array returned).
Note: Calling this function only affects existing
variables. Variables created after calling this function will follow
the default behaviour.
def set_auto_chartostring(self, value)
set_auto_chartostring(self, True_or_False)
Call Variable.set_auto_chartostring()
for all variables contained in this Dataset
or
Group
, as well as for all variables in all its subgroups.
True_or_False
: Boolean determining if automatic conversion of
all character arrays <–> string arrays should be performed for
character variables (variables of type NC_CHAR
or S1
) with the
_Encoding
attribute set.
Note: Calling this function only affects existing variables. Variables created
after calling this function will follow the default behaviour.
def set_auto_mask(self, value)
set_auto_mask(self, True_or_False)
Call Variable.set_auto_mask()
for all variables contained in this Dataset
or
Group
, as well as for all variables in all its subgroups. Only affects
Variables with primitive or enum types (not compound or vlen Variables).
True_or_False
: Boolean determining if automatic conversion to masked arrays
shall be applied for all variables.
Note: Calling this function only affects existing variables. Variables created
after calling this function will follow the default behaviour.
def set_auto_maskandscale(self, value)
set_auto_maskandscale(self, True_or_False)
Call Variable.set_auto_maskandscale()
for all variables contained in this Dataset
or
Group
, as well as for all variables in all its subgroups.
True_or_False
: Boolean determining if automatic conversion to masked arrays
and variable scaling shall be applied for all variables.
Note: Calling this function only affects existing variables. Variables created
after calling this function will follow the default behaviour.
def set_auto_scale(self, value)
set_auto_scale(self, True_or_False)
Call Variable.set_auto_scale()
for all variables contained in this Dataset
or
Group
, as well as for all variables in all its subgroups.
True_or_False
: Boolean determining if automatic variable scaling
shall be applied for all variables.
Note: Calling this function only affects existing variables. Variables created
after calling this function will follow the default behaviour.
def set_fill_off(self)
set_fill_off(self)
Sets the fill mode for a Dataset
open for writing to off
.
This will prevent the data from being pre-filled with fill values, which
may result in some performance improvements. However, you must then make
sure the data is actually written before being read.
def set_fill_on(self)
set_fill_on(self)
Sets the fill mode for a Dataset
open for writing to on
.
This causes data to be pre-filled with fill values. The fill values can be
controlled by the variable's _Fill_Value
attribute, but is usually
sufficient to the use the netCDF default _Fill_Value
(defined
separately for each variable type). The default behavior of the netCDF
library corresponds to set_fill_on
.
Data which are equal to the
_Fill_Value
indicate that the variable was created, but never written
def set_ncstring_attrs(self, value)
set_ncstring_attrs(self, True_or_False)
Call Variable.set_ncstring_attrs()
for all variables contained in
this Dataset
or Group
, as well as for all its
subgroups and their variables.
True_or_False
: Boolean determining if all string attributes are
created as variable-length NC_STRINGs, (if True), or if ascii text
attributes are stored as NC_CHARs (if False; default)
Note: Calling this function only affects newly created attributes
of existing (sub-) groups and their variables.
def setncattr(self, name, value)
setncattr(self,name,value)
set a netCDF dataset or group attribute using name,value pair.
Use if you need to set a netCDF attribute with the
with the same name as one of the reserved python attributes.
def setncattr_string(self, name, value)
setncattr_string(self,name,value)
set a netCDF dataset or group string attribute using name,value pair.
Use if you need to ensure that a netCDF attribute is created with type
NC_STRING
if the file format is NETCDF4
.
def setncatts(self, attdict)
setncatts(self,attdict)
set a bunch of netCDF dataset or group attributes at once using a python dictionary.
This may be faster when setting a lot of attributes for a NETCDF3
formatted file, since nc_redef/nc_enddef is not called in between setting
each attribute
def sync(self)
sync(self)
Writes all buffered data in the Dataset
to the disk file.
def tocdl(self, coordvars=False, data=False, outfile=None)
tocdl(self, coordvars=False, data=False, outfile=None)
call ncdump via subprocess to create CDL
text representation of Dataset. Requires ncdump
to be installed and in $PATH
.
coordvars
: include coordinate variable data (via ncdump -c
). Default False
data
: if True, write out variable data (Default False).
outfile
: If not None, file to output ncdump to. Default is to return a string.
A netCDF Dimension
is used to describe the coordinates of a Variable
.
See Dimension
for more details.
The current maximum size of a Dimension
instance can be obtained by
calling the python len
function on the Dimension
instance. The
Dimension.isunlimited()
method of a Dimension
instance can be used to
determine if the dimension is unlimited.
Read-only class variables:
name
: String name, used when creating a Variable
with
Dataset.createVariable()
.
size
: Current Dimension
size (same as len(d)
, where d
is a
Dimension
instance).
__init__(self, group, name, size=None)
Dimension
constructor.
group
: Group
instance to associate with dimension.
name
: Name of the dimension.
size
: Size of the dimension. None
or 0 means unlimited. (Default None
).
Note: Dimension
instances should be created using the
Dataset.createDimension()
method of a Group
or
Dataset
instance, not using Dimension
directly.
Instance variables
var name
string name of Dimension instance
var size
current size of Dimension (calls len
on Dimension instance)
Methods
def group(self)
group(self)
return the group that this Dimension
is a member of.
def isunlimited(self)
isunlimited(self)
returns True
if the Dimension
instance is unlimited, False
otherwise.
A EnumType
instance is used to describe an Enum data
type, and can be passed to the the Dataset.createVariable()
method of
a Dataset
or Group
instance. See
EnumType
for more details.
The instance variables dtype
, name
and enum_dict
should not be modified by
the user.
__init__(group, datatype, datatype_name, enum_dict)
EnumType constructor.
group
: Group
instance to associate with the VLEN datatype.
datatype
: An numpy integer dtype object describing the base type
for the Enum.
datatype_name
: a Python string containing a description of the
Enum data type.
enum_dict
: a Python dictionary containing the Enum field/value
pairs.
Note
: EnumType
instances should be created using the
Dataset.createEnumType()
method of a Dataset
or
Group
instance, not using this class directly.
Instance variables
var dtype
Return an attribute of instance, which is of type owner.
var enum_dict
Return an attribute of instance, which is of type owner.
var name
Return an attribute of instance, which is of type owner.
Groups define a hierarchical namespace within a netCDF file. They are
analogous to directories in a unix filesystem. Each Group
behaves like
a Dataset
within a Dataset, and can contain it's own variables,
dimensions and attributes (and other Groups). See Group
for more details.
Group
inherits from Dataset
, so all the
Dataset
class methods and variables are available
to a Group
instance (except the close
method).
Additional read-only class variables:
name
: String describing the group name.
__init__(self, parent, name)
Group
constructor.
parent
: Group
instance for the parent group.
If being created
in the root group, use a Dataset
instance.
name
: - Name of the group.
Note: Group
instances should be created using the
Dataset.createGroup()
method of a Dataset
instance, or
another Group
instance, not using this class directly.
Ancestors
netCDF4._netCDF4.Dataset
Methods
def close(self)
Class for reading multi-file netCDF Datasets, making variables
spanning multiple files appear as if they were in one file.
Datasets must be in NETCDF4_CLASSIC, NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET
or NETCDF3_64BIT_DATA<code> format (</code>NETCDF4
Datasets won't work).
Adapted from pycdf by Andre Gosselin.
Example usage (See MFDataset
for more details):
>>> import numpy as np
>>> # create a series of netCDF files with a variable sharing
>>> # the same unlimited dimension.
>>> for nf in range(10):
... with Dataset("mftest%s.nc" % nf, "w", format='NETCDF4_CLASSIC') as f:
... f.createDimension("x",None)
... x = f.createVariable("x","i",("x",))
... x[0:10] = np.arange(nf*10,10*(nf+1))
>>> # now read all those files in at once, in one Dataset.
>>> f = MFDataset("mftest*nc")
>>> print(f.variables["x"][:])
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
__init__(self, files, check=False, aggdim=None, exclude=[],
master_file=None)
Open a Dataset spanning multiple files, making it look as if it was a
single file. Variables in the list of files that share the same
dimension (specified with the keyword aggdim
) are aggregated. If
aggdim
is not specified, the unlimited is aggregated.
Currently,
aggdim
must be the leftmost (slowest varying) dimension of each
of the variables to be aggregated.
files
: either a sequence of netCDF files or a string with a
wildcard (converted to a sorted list of files using glob)
the master_file
kwarg is not specified, the first file
in the list will become the "master" file, defining all the
variables with an aggregation dimension which may span
subsequent files. Attribute access returns attributes only from "master"
file. The files are always opened in read-only mode.
check
: True if you want to do consistency checking to ensure the
correct variables structure for all of the netcdf files.
Checking makes
the initialization of the MFDataset instance much slower. Default is
False.
aggdim
: The name of the dimension to aggregate over (must
be the leftmost dimension of each of the variables to be aggregated).
If None (default), aggregate over the unlimited dimension.
exclude
: A list of variable names to exclude from aggregation.
Default is an empty list.
master_file
: file to use as "master file", defining all the
variables with an aggregation dimension and all global attributes.
Ancestors
netCDF4._netCDF4.Dataset
Methods
def close(self)
close(self)
close all the open files.
def isopen(self)
isopen(self)
True if all files are open, False otherwise.
def ncattrs(self)
ncattrs(self)
return the netcdf attribute names from the master file.
Class providing an interface to a MFDataset time Variable by imposing a unique common
time unit and/or calendar to all files.
Example usage (See MFTime
for more details):
>>> import numpy as np
>>> f1 = Dataset("mftest_1.nc","w", format="NETCDF4_CLASSIC")
>>> f2 = Dataset("mftest_2.nc","w", format="NETCDF4_CLASSIC")
>>> f1.createDimension("time",None)
>>> f2.createDimension("time",None)
>>> t1 = f1.createVariable("time","i",("time",))
>>> t2 = f2.createVariable("time","i",("time",))
>>> t1.units = "days since 2000-01-01"
>>> t2.units = "days since 2000-02-01"
>>> t1.calendar = "standard"
>>> t2.calendar = "standard"
>>> t1[:] = np.arange(31)
>>> t2[:] = np.arange(30)
>>> f1.close()
>>> f2.close()
>>> # Read the two files in at once, in one Dataset.
>>> f = MFDataset("mftest_*nc")
>>> t = f.variables["time"]
>>> print(t.units)
days since 2000-01-01
>>> print(t[32]) # The value written in the file, inconsistent with the MF time units.
>>> T = MFTime(t)
>>> print(T[32])
__init__(self, time, units=None, calendar=None)
Create a time Variable with units consistent across a multifile
dataset.
time
: Time variable from a MFDataset
.
units
: Time units, for example, 'days since 1979-01-01'
. If None
,
use the units from the master variable.
calendar
: Calendar overload to use across all files, for example,
'standard'
or 'gregorian'
. If None
, check that the calendar attribute
is present on each variable and values are unique across files raising a
ValueError
otherwise.
Ancestors
netCDF4._netCDF4._Variable
class VLType
(...)
A VLType
instance is used to describe a variable length (VLEN) data
type, and can be passed to the the Dataset.createVariable()
method of
a Dataset
or Group
instance. See
VLType
for more details.
The instance variables dtype
and name
should not be modified by
the user.
__init__(group, datatype, datatype_name)
VLType constructor.
group
: Group
instance to associate with the VLEN datatype.
datatype
: An numpy dtype object describing the component type for the
variable length array.
datatype_name
: a Python string containing a description of the
VLEN data type.
Note
: VLType
instances should be created using the
Dataset.createVLType()
method of a Dataset
or
Group
instance, not using this class directly.
Instance variables
var dtype
Return an attribute of instance, which is of type owner.
var name
Return an attribute of instance, which is of type owner.
A netCDF Variable
is used to read and write netCDF data.
They are
analogous to numpy array objects. See Variable
for more
details.
A list of attribute names corresponding to netCDF attributes defined for
the variable can be obtained with the Variable.ncattrs()
method. These
attributes can be created by assigning to an attribute of the
Variable
instance. A dictionary containing all the netCDF attribute
name/value pairs is provided by the __dict__
attribute of a
Variable
instance.
The following class variables are read-only:
dimensions
: A tuple containing the names of the
dimensions associated with this variable.
dtype
: A numpy dtype object describing the
variable's data type.
ndim
: The number of variable dimensions.
shape
: A tuple with the current shape (length of all dimensions).
scale
: If True, scale_factor
and add_offset
are
applied, and signed integer data is automatically converted to
unsigned integer data if the _Unsigned
attribute is set to "true" or "True".
Default is True
, can be reset using Variable.set_auto_scale()
and
Variable.set_auto_maskandscale()
methods.
mask
: If True, data is automatically converted to/from masked
arrays when missing values or fill values are present. Default is True
, can be
reset using Variable.set_auto_mask()
and Variable.set_auto_maskandscale()
methods. Only relevant for Variables with primitive or enum types (ignored
for compound and vlen Variables).
chartostring()
: If True, data is automatically converted to/from character
arrays to string arrays when the _Encoding
variable attribute is set.
Default is True
, can be reset using
Variable.set_auto_chartostring()
method.
least_significant_digit
: Describes the power of ten of the
smallest decimal place in the data the contains a reliable value.
Data is
truncated to this decimal place when it is assigned to the Variable
instance. If None
, the data is not truncated.
significant_digits
: New in version 1.6.0. Describes the number of significant
digits in the data the contains a reliable value.
Data is
truncated to retain this number of significant digits when it is assigned to the
Variable
instance. If None
, the data is not truncated.
Only available with netcdf-c >= 4.9.0,
and only works with NETCDF4
or NETCDF4_CLASSIC
formatted files.
The number of significant digits used in the quantization of variable data can be
obtained using the Variable.significant_digits
method. Default None
-
no quantization done.
quantize_mode
: New in version 1.6.0. Controls
the quantization algorithm (default 'BitGroom', 'BitRound' and
'GranularBitRound' also available).
The 'GranularBitRound'
algorithm may result in better compression for typical geophysical datasets.
Ignored if significant_digits
not specified. If 'BitRound' is used, then
significant_digits
is interpreted as binary (not decimal) digits.
__orthogonal_indexing__
: Always True
.
Indicates to client code
that the object supports 'orthogonal indexing', which means that slices
that are 1d arrays or lists slice along each dimension independently.
behavior is similar to Fortran or Matlab, but different than numpy.
datatype
: numpy data type (for primitive data types) or VLType/CompoundType
instance (for compound or vlen data types).
name
: String name.
size
: The number of stored elements.
__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False,
complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8,
blosc_shuffle=1, fletcher32=False, contiguous=False,
chunksizes=None, endian='native',
least_significant_digit=None,fill_value=None,chunk_cache=None)
Variable
constructor.
group
: Group
or Dataset
instance to associate with variable.
name
: Name of the variable.
datatype
: Variable
data type. Can be specified by providing a
numpy dtype object, or a string that describes a numpy dtype object.
Supported values, corresponding to str
attribute of numpy dtype
objects, include 'f4'
(32-bit floating point), 'f8'
(64-bit floating
point), 'i4'
(32-bit signed integer), 'i2'
(16-bit signed integer),
'i8'
(64-bit signed integer), 'i4'
(8-bit signed integer), 'i1'
(8-bit signed integer), 'u1'
(8-bit unsigned integer), 'u2'
(16-bit
unsigned integer), 'u4'
(32-bit unsigned integer), 'u8'
(64-bit
unsigned integer), or 'S1'
(single-character string).
compatibility with Scientific.IO.NetCDF, the old Numeric single character
typecodes can also be used ('f'
instead of 'f4'
, 'd'
instead of
'f8'
, 'h'
or 's'
instead of 'i2'
, 'b'
or 'B'
instead of
'i1'
, 'c'
instead of 'S1'
, and 'i'
or 'l'
instead of
'i4'
). datatype
can also be a CompoundType
instance
(for a structured, or compound array), a VLType
instance
(for a variable-length array), or the python str
builtin
(for a variable-length string array). Numpy string and unicode datatypes with
length greater than one are aliases for str
.
dimensions
: a tuple containing the variable's Dimension instances
(defined previously with createDimension
). Default is an empty tuple
which means the variable is a scalar (and therefore has no dimensions).
compression
: compression algorithm to use.
Currently zlib
,szip
,zstd
,bzip2
,blosc_lz
,blosc_lz4
,blosc_lz4hc
,
blosc_zlib
and blosc_zstd
are supported.
Default is None
(no compression).
All of the compressors except
zlib
and szip
use the HDF5 plugin architecture.
zlib
: if True
, data assigned to the Variable
instance is compressed on disk. Default False
. Deprecated - use
compression='zlib'
instead.
complevel
: the level of compression to use (1 is the fastest,
but poorest compression, 9 is the slowest but best compression). Default 4.
Ignored if compression=None
or szip
. A value of 0 disables compression.
shuffle
: if True
, the HDF5 shuffle filter is applied
to improve zlib compression. Default True
. Ignored unless compression = 'zlib'
.
blosc_shuffle
: shuffle filter inside blosc compressor (only
relevant if compression kwarg set to one of the blosc compressors).
Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise
shuffle)). Default is 1. Ignored if blosc compressor not used.
szip_coding
: szip coding method. Can be ec
(entropy coding)
or nn
(nearest neighbor coding). Default is nn
.
Ignored if szip compressor not used.
szip_pixels_per_block
: Can be 4,8,16 or 32 (Default 8).
Ignored if szip compressor not used.
fletcher32
: if True
(default False
), the Fletcher32 checksum
algorithm is used for error detection.
contiguous
: if True
(default False
), the variable data is
stored contiguously on disk.
Default False
. Setting to True
for
a variable with an unlimited dimension will trigger an error. Fixed
size variables (with no unlimited dimension) with no compression
filters are contiguous by default.
chunksizes
: Can be used to specify the HDF5 chunksizes for each
dimension of the variable. A detailed discussion of HDF chunking and I/O
performance is available
here.
The default chunking scheme in the netcdf-c library is discussed
here.
Basically, you want the chunk size for each dimension to match as
closely as possible the size of the data block that users will read
from the file. chunksizes
cannot be set if contiguous=True
.
endian
: Can be used to control whether the
data is stored in little or big endian format on disk. Possible
values are little, big
or native
(default). The library
will automatically handle endian conversions when the data is read,
but if the data is always going to be read on a computer with the
opposite format as the one used to create the file, there may be
some performance advantage to be gained by setting the endian-ness.
For netCDF 3 files (that don't use HDF5), only endian='native'
is allowed.
The compression, zlib, complevel, shuffle, fletcher32, contiguous
and chunksizes
keywords are silently ignored for netCDF 3 files that do not use HDF5.
least_significant_digit
: If this or significant_digits
are specified,
variable data will be truncated (quantized).
In conjunction with compression='zlib'
this produces
'lossy', but significantly more efficient compression. For example, if
least_significant_digit=1
, data will be quantized using
around(scaledata)/scale, where scale = 2*bits, and bits is determined
so that a precision of 0.1 is retained (in this case bits=4). Default is
None
, or no quantization.
significant_digits
: New in version 1.6.0.
As described for least_significant_digit
except the number of significant digits retained is prescribed independent
of the floating point exponent. Default None
- no quantization done.
quantize_mode
: New in version 1.6.0. Controls
the quantization algorithm (default 'BitGroom', 'BitRound' and
'GranularBitRound' also available).
The 'GranularBitRound'
algorithm may result in better compression for typical geophysical datasets.
Ignored if significant_digts
not specified. If 'BitRound' is used, then
significant_digits
is interpreted as binary (not decimal) digits.
fill_value
:
If specified, the default netCDF _FillValue
(the
value that the variable gets filled with before any data is written to it)
is replaced with this value.
If fill_value is set to False
, then
the variable is not pre-filled. The default netCDF fill values can be found
in the dictionary netCDF4.default_fillvals
.
chunk_cache
: If specified, sets the chunk cache size for this variable.
Persists as long as Dataset is open. Use set_var_chunk_cache
to
change it when Dataset is re-opened.
Note: Variable
instances should be created using the
Dataset.createVariable()
method of a Dataset
or
Group
instance, not using this class directly.
Instance variables
var always_mask
Return an attribute of instance, which is of type owner.
var auto_complex
Return an attribute of instance, which is of type owner.
var chartostring
Return an attribute of instance, which is of type owner.
var datatype
numpy data type (for primitive data types) or
VLType/CompoundType/EnumType instance
(for compound, vlen
or enum data types)
var dimensions
get variables's dimension names
var dtype
Return an attribute of instance, which is of type owner.
var mask
Return an attribute of instance, which is of type owner.
var name
string name of Variable instance
var ndim
Return an attribute of instance, which is of type owner.
var scale
Return an attribute of instance, which is of type owner.
var shape
find current sizes of all variable dimensions
var size
Return the number of stored elements.
Methods
def assignValue(self, val)
assignValue(self, val)
assign a value to a scalar variable.
Provided for compatibility with
Scientific.IO.NetCDF, can also be done by assigning to an Ellipsis slice ([…]).
def chunking(self)
chunking(self)
return variable chunking information.
If the dataset is
defined to be contiguous (and hence there is no chunking) the word 'contiguous'
is returned.
Otherwise, a sequence with the chunksize for
each dimension is returned.
def delncattr(self, name)
delncattr(self,name,value)
delete a netCDF variable attribute.
Use if you need to delete a
netCDF attribute with the same name as one of the reserved python
attributes.
def endian(self)
endian(self)
return endian-ness (little,big,native
) of variable (as stored in HDF5 file).
def filters(self)
filters(self)
return dictionary containing HDF5 filter parameters.
def getValue(self)
getValue(self)
get the value of a scalar variable.
Provided for compatibility with
Scientific.IO.NetCDF, can also be done by slicing with an Ellipsis ([…]).
def get_dims(self)
def get_var_chunk_cache(self)
get_var_chunk_cache(self)
return variable chunk cache information in a tuple (size,nelems,preemption).
See netcdf C library documentation for nc_get_var_chunk_cache
for
details.
def getncattr(self, name, encoding='utf-8')
getncattr(self,name)
retrieve a netCDF variable attribute.
Use if you need to set a
netCDF attribute with the same name as one of the reserved python
attributes.
option kwarg encoding
can be used to specify the
character encoding of a string attribute (default is utf-8
).
def group(self)
group(self)
return the group that this Variable
is a member of.
def ncattrs(self)
ncattrs(self)
return netCDF attribute names for this Variable
in a list.
def quantization(self)
quantization(self)
return number of significant digits and the algorithm used in quantization.
Returns None if quantization not active.
def renameAttribute(self, oldname, newname)
renameAttribute(self, oldname, newname)
rename a Variable
attribute named oldname
to newname
.
def set_always_mask(self, always_mask)
set_always_mask(self,always_mask)
turn on or off conversion of data without missing values to regular
numpy arrays.
always_mask
is a Boolean determining if automatic conversion of
masked arrays with no missing values to regular numpy arrays shall be
applied. Default is True. Set to False to restore the default behaviour
in versions prior to 1.4.1 (numpy array returned unless missing values are present,
otherwise masked array returned).
def set_auto_chartostring(self, chartostring)
set_auto_chartostring(self,chartostring())
turn on or off automatic conversion of character variable data to and
from numpy fixed length string arrays when the _Encoding
variable attribute
is set.
If chartostring()
is set to True
, when data is read from a character variable
(dtype = S1
) that has an _Encoding
attribute, it is converted to a numpy
fixed length unicode string array (dtype = UN
, where N
is the length
of the the rightmost dimension of the variable).
The value of _Encoding
is the unicode encoding that is used to decode the bytes into strings.
When numpy string data is written to a variable it is converted back to
indiviual bytes, with the number of bytes in each string equalling the
rightmost dimension of the variable.
The default value of chartostring()
is True
(automatic conversions are performed).
def set_auto_mask(self, mask)
set_auto_mask(self,mask)
turn on or off automatic conversion of variable data to and
from masked arrays .
If mask
is set to True
, when data is read from a variable
it is converted to a masked array if any of the values are exactly
equal to the either the netCDF _FillValue or the value specified by the
missing_value variable attribute. The fill_value of the masked array
is set to the missing_value attribute (if it exists), otherwise
the netCDF _FillValue attribute (which has a default value
for each data type). If the variable has no missing_value attribute, the
_FillValue is used instead. If the variable has valid_min/valid_max and
missing_value attributes, data outside the specified range will be masked.
When data is written to a variable, the masked
array is converted back to a regular numpy array by replacing all the
masked values by the missing_value attribute of the variable (if it
exists).
If the variable has no missing_value attribute, the _FillValue
is used instead.
The default value of mask
is True
(automatic conversions are performed).
def set_auto_maskandscale(self, maskandscale)
set_auto_maskandscale(self,maskandscale)
turn on or off automatic conversion of variable data to and
from masked arrays, automatic packing/unpacking of variable
data using scale_factor
and add_offset
attributes and
automatic conversion of signed integer data to unsigned integer
data if the _Unsigned
attribute exists and is set to "true" (or "True").
If maskandscale
is set to True
, when data is read from a variable
it is converted to a masked array if any of the values are exactly
equal to the either the netCDF _FillValue or the value specified by the
missing_value variable attribute. The fill_value of the masked array
is set to the missing_value attribute (if it exists), otherwise
the netCDF _FillValue attribute (which has a default value
for each data type). If the variable has no missing_value attribute, the
_FillValue is used instead. If the variable has valid_min/valid_max and
missing_value attributes, data outside the specified range will be masked.
When data is written to a variable, the masked
array is converted back to a regular numpy array by replacing all the
masked values by the missing_value attribute of the variable (if it
exists).
If the variable has no missing_value attribute, the _FillValue
is used instead.
If maskandscale
is set to True
, and the variable has a
scale_factor
or an add_offset
attribute, then data read
from that variable is unpacked using::
data = self.scale_factor*data + self.add_offset
When data is written to a variable it is packed using::
data = (data - self.add_offset)/self.scale_factor
If either scale_factor is present, but add_offset is missing, add_offset
is assumed zero.
If add_offset is present, but scale_factor is missing,
scale_factor is assumed to be one.
For more information on how scale_factor
and add_offset
can be
used to provide simple compression, see the
PSL metadata conventions.
In addition, if maskandscale
is set to True
, and if the variable has an
attribute _Unsigned
set to "true", and the variable has a signed integer data type,
a view to the data is returned with the corresponding unsigned integer data type.
This convention is used by the netcdf-java library to save unsigned integer
data in NETCDF3
or NETCDF4_CLASSIC
files (since the NETCDF3
data model does not have unsigned integer data types).
The default value of maskandscale
is True
(automatic conversions are performed).
def set_auto_scale(self, scale)
set_auto_scale(self,scale)
turn on or off automatic packing/unpacking of variable
data using scale_factor
and add_offset
attributes.
Also turns on and off automatic conversion of signed integer data
to unsigned integer data if the variable has an _Unsigned
attribute set to "true" or "True".
If scale
is set to True
, and the variable has a
scale_factor
or an add_offset
attribute, then data read
from that variable is unpacked using::
data = self.scale_factor*data + self.add_offset
When data is written to a variable it is packed using::
data = (data - self.add_offset)/self.scale_factor
If either scale_factor is present, but add_offset is missing, add_offset
is assumed zero.
If add_offset is present, but scale_factor is missing,
scale_factor is assumed to be one.
For more information on how scale_factor
and add_offset
can be
used to provide simple compression, see the
PSL metadata conventions.
In addition, if scale
is set to True
, and if the variable has an
attribute _Unsigned
set to "true", and the variable has a signed integer data type,
a view to the data is returned with the corresponding unsigned integer datatype.
This convention is used by the netcdf-java library to save unsigned integer
data in NETCDF3
or NETCDF4_CLASSIC
files (since the NETCDF3
data model does not have unsigned integer data types).
The default value of scale
is True
(automatic conversions are performed).
def set_collective(self, value)
set_collective(self,True_or_False)
turn on or off collective parallel IO access. Ignored if file is not
open for parallel access.
def set_ncstring_attrs(self, ncstring_attrs)
set_always_mask(self,ncstring_attrs)
turn on or off creating NC_STRING string attributes.
If ncstring_attrs
is set to True
then text attributes will be variable-length
NC_STRINGs.
The default value of ncstring_attrs
is False
(writing ascii text attributes as
NC_CHAR).
def set_var_chunk_cache(self, size=None, nelems=None, preemption=None)
set_var_chunk_cache(self,size=None,nelems=None,preemption=None)
change variable chunk cache settings.
See netcdf C library documentation for nc_set_var_chunk_cache
for
details.
def setncattr(self, name, value)
setncattr(self,name,value)
set a netCDF variable attribute using name,value pair.
Use if you need to set a
netCDF attribute with the same name as one of the reserved python
attributes.
def setncattr_string(self, name, value)
setncattr_string(self,name,value)
set a netCDF variable string attribute using name,value pair.
Use if you need to ensure that a netCDF attribute is created with type
NC_STRING
if the file format is NETCDF4
.
Use if you need to set an attribute to an array of variable-length strings.
def setncatts(self, attdict)
setncatts(self,attdict)
set a bunch of netCDF variable attributes at once using a python dictionary.
This may be faster when setting a lot of attributes for a NETCDF3
formatted file, since nc_redef/nc_enddef is not called in between setting
each attribute
def use_nc_get_vars(self, use_nc_get_vars)
use_nc_get_vars(self,_use_get_vars)
enable the use of netcdf library routine nc_get_vars
to retrieve strided variable slices.
By default,
nc_get_vars
may not used by default (depending on the
version of the netcdf-c library being used) since it may be
slower than multiple calls to the unstrided read routine nc_get_vara
.
Variables in a netCDF file
Attributes in a netCDF file
Writing data to and retrieving data from a netCDF variable
Dealing with time coordinates
Reading data from a multi-file netCDF dataset
Efficient compression of netCDF variables
Beyond homogeneous arrays of a fixed type - compound data types
Variable-length (vlen) data types
Enum data type
Parallel IO
Dealing with strings
In-memory (diskless) Datasets
Support for complex numbers