5. CIS as a Python library (API)¶
5.1. Main API¶
As a command line tool, CIS has not been designed with a python API in mind. There are however some utility functions
that may provide a useful start for those who wish to use CIS as a python library. For example, the functions in the
base cis module provide a straightforward way to load your data. They can be easily import using, for example: from cis import read_data
.
One of the advantages of using CIS as a Python library is that you are able to perform multiple operations in one go,
that is without writing to disk in between. In certain cases this may provide a significant speed-up.
Note
This section of the documentation expects a greater level of Python experience than the other sections. There are many helpful Python guides and tutorials available around the web if you wish to learn more.
The read_data()
function is a simple way to read a single gridded or ungridded data object (e.g. a NetCDF
variable) from one or more files. CIS will determine the best way to interpret the datafile by comparing the file
signature with the built-in data reading plugins and any user defined plugins. Specifying a particular product
allows the user to override this automatic detection.
-
cis.
read_data
(filenames, variable, product=None)¶ Read a specific variable from a list of files Files can be either gridded or ungridded but not a mix of both. First tries to read data as gridded, if that fails, tries as ungridded.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variable (str) – The variable to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
).
Returns: The specified data as either a
GriddedData
orUngriddedData
object.- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
The read_data_list()
function is very similar to read_data()
except that it allows the user to specify
more than one variable name. This function returns a list of data objects, either all of which will be gridded, or all
ungridded, but not a mix. For ungridded data lists it is assumed that all objects share the same coordinates.
-
cis.
read_data_list
(filenames, variables, product=None, aliases=None)¶ Read multiple data objects from a list of files. Files can be either gridded or ungridded but not a mix of both.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - variables (string or list) – One or more variables to read from the files
- product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
). - aliases (string or list) – List of aliases to put on each variable’s data object as an alternative means of identifying them.
Returns: A list of the data read out (either a
GriddedDataList
orUngriddedDataList
depending on the type of data contained in the files)- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
The get_variables()
function returns a list of variable names from one or more specified files. This can be useful
to inspect a set of files before calling the read routines described above.
-
cis.
get_variables
(filenames, product=None, type=None)¶ Get a list of variables names from a list of files. Files can be either gridded or ungridded but not a mix of both.
Parameters: - filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
list
of string filenames. Filenames can include directories which will be expanded to include all files in that directory, or wildcards such as*
or?
. - product (str) – The name of the data reading plugin to use to read the data (e.g.
Cloud_CCI
). - type (str) – The type of HDF data to read, i.e. ‘VD’ or ‘SD’
Returns: A list of the variables
- filenames (string or list) – The filenames of the files to read. This can be either a single filename as a string, a comma
separated list, or a
5.1.1. Data Objects¶
Each of the above methods return either GriddedData
or UngriddedData
objects. These objects are the main
data handling objects used within CIS, and their main methods are discussed in the following section. These classes
share a common interface, defined by the CommonData
class, which is detailed below. For technical reasons some
methods which are common to both GriddedData
and UngriddedData
are not defined in the
CommonData
interface. The most useful of these methods are probably summary()
and save_data()
.
These objects can also be ‘sliced’ analogously to the underlying numpy arrays, and will return a copy of the requested
data as a new CommonData
object with the correct data, coordinates and metadata.
-
class
cis.data_io.common_data.
CommonData
Interface of common methods implemented for gridded and ungridded data.
-
alias
Return an alias for the variable name. This is an alternative name by which this data object may be identified if, for example, the actual variable name is not valid for some use (such as performing a python evaluation).
Returns: The alias Return type: str
-
as_data_frame
(copy) Convert a CommonData object to a Pandas DataFrame.
Parameters: copy – Create a copy of the data for the new DataFrame? Default is True. Returns: A Pandas DataFrame representing the data and coordinates. Note that this won’t include any metadata.
-
collocated_onto
(sample, how='', kernel=None, missing_data_for_missing_sample=True, fill_value=None, var_name='', var_long_name='', var_units='', **kwargs) Collocate the CommonData object with another CommonData object using the specified collocator and kernel.
Parameters: - sample (CommonData) – The sample data to collocate onto
- how (str) – Collocation method (e.g. lin, nn, bin or box)
- or cis.collocation.col_framework.Kernel kernel (str) –
- missing_data_for_missing_sample (bool) – Should missing values in sample data be ignored for collocation?
- fill_value (float) – Value to use for missing data
- var_name (str) – The output variable name
- var_long_name (str) – The output variable’s long name
- var_units (str) – The output variable’s units
- kwargs – Constraint arguments such as h_sep, a_sep, etc.
Return CommonData: The collocated dataset
-
get_all_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_coordinates_points
() Returns a list-like object allowing access to the coordinates of all points as HyperPoints. The object should allow iteration over points and access to individual points.
Returns: list-like object of data points
-
get_non_masked_points
() Returns a list-like object allowing access to all points as HyperPoints. The object should allow iteration over non-masked points and access to individual points.
Returns: list-like object of data points
-
history
Return the associated history of the object
Returns: The history Return type: str
-
is_gridded
() Returns value indicating whether the data/coordinates are gridded.
-
plot
(*args, **kwargs) Plot the data. A matplotlib Axes is created if none is provided.
The default method for series data is ‘line’, otherwise (for e.g. a map plot) is ‘scatter2d’ for UngriddedData and ‘heatmap’ for GriddedData.
Parameters: how (string) – The method to use, one of: “contour”, “contourf”, “heatmap”, “line”, “scatter”, “scatter2d”, “comparativescatter”, “histogram”, “histogram2d” or “taylor” :param Axes ax: A matplotlib axes on which to draw the plot :param Coord or CommonData xaxis: The data to plot on the x axis :param Coord or CommonData yaxis: The data to plot on the y axis :param string or cartopy.crs.Projection projection: The projection to use for map plots (default is PlateCaree) :param float central_longitude: The central longitude to use for PlateCaree (if no other projection specified) :param string label: A label for the data. This is used for the title, colorbar or legend depending on plot type :param args: Other plot-specific args :param kwargs: Other plot-specific kwargs :return Axes: The matplotlib Axes on which the plot was drawn
-
sampled_from
(data, how='', kernel=None, missing_data_for_missing_sample=True, fill_value=None, var_name='', var_long_name='', var_units='', **kwargs) Collocate the CommonData object with another CommonData object using the specified collocator and kernel
Parameters: - or CommonDataList data (CommonData) – The data to resample
- how (str) – Collocation method (e.g. lin, nn, bin or box)
- or cis.collocation.col_framework.Kernel kernel (str) –
- missing_data_for_missing_sample (bool) – Should missing values in sample data be ignored for collocation?
- fill_value (float) – Value to use for missing data
- var_name (str) – The output variable name
- var_long_name (str) – The output variable’s long name
- var_units (str) – The output variable’s units
- kwargs – Constraint arguments such as h_sep, a_sep, etc.
Return CommonData: The collocated dataset
-
set_longitude_range
(range_start) Rotates the longitude coordinate array and changes its values by 360 as necessary to force the values to be within a 360 range starting at the specified value. :param range_start: starting value of required longitude range
-
subset
(**kwargs) Subset the CommonData object based on the specified constraints. Constraints on arbitrary coordinates are specified using keyword arguments. Each constraint must have two entries (a maximum and a minimum) although one of these can be None. Datetime objects can be used to specify upper and lower datetime limits, or a single PartialDateTime object can be used to specify a datetime range.
The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.
- For example:
- data.subset(time=[datetime.datetime(1984, 8, 28), datetime.datetime(1984, 8, 29)],
- altitude=[45.0, 75.0])
Will subset the data from the start of the 28th of August 1984, to the end of the 29th, and between altitudes of 45 and 75 (in whatever units ares used for that Coordinate).
- And:
- data.subset(time=[PartialDateTime(1984, 9)])
Will subset the data to all of September 1984.
Parameters: kwargs – The constraint arguments Return CommonData: The subset of the data
-
var_name
Return the variable name associated with this data object
Returns: The variable name
-
5.2. Analysis Methods¶
5.2.1. Collocation¶
Each data object provides both collocated_onto()
and sampled_from()
methods, which are different ways of
calling the collocation depending on whether the object being called is the source or the sample. For example the
function performed by the command line:
$ cis col Temperature:2010.nc 2009.nc:variable=Temperature
can be performed in Python using:
temperature_2010 = cis.read_data('Temperature', '2010.nc')
temperature_2009 = cis.read_data('Temperature', '2009.nc')
temperature_2010.sampled_from(temperature_2009)
or, equivalently:
temperature_2009.collocated_onto(temperature_2010)
5.2.2. Aggregation¶
UngriddedData
objects provide the aggregate()
method to allow easy aggregation. Each dimension of the
desired grid is specified as a keyword and the start, end and step as the argument (as a tuple, list or slice).
For example:
data.aggregate(x=[-180, 180, 360], y=slice(-90, 90, 10))
or:
data.aggregate(how='mean', t=[PartialDateTime(2008,9), timedelta(days=1))
Datetime objects can be used to specify upper and lower datetime limits, or a single PartialDateTime object can be used to specify a datetime range. The gridstep can be specified as a DateTimeDelta object.
The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.
GriddedData
objects provide the collapsed()
method which shadows the Iris method of the same name. Our
implementation is a slight extension of the Iris method which allows partial collapsing of multi-dimensional auxilliary
coordinates.
5.2.3. Subsetting¶
All objects have a subset()
method for easily subsetting data across arbitrary dimensions. Constraints on
arbitrary coordinates are specified using keyword arguments. Each constraint must have two entries (a maximum and a
minimum) although one of these can be None. Datetime objects can be used to specify upper and lower datetime limits, or
a single PartialDateTime object can be used to specify a datetime range.
The keyword keys are used to find the relevant coordinate, they are looked for in order of name, standard_name, axis and var_name.
For example:
data.subset(time=[datetime.datetime(1984, 8, 28), datetime.datetime(1984, 8, 29)],
altitude=[45.0, 75.0])
will subset the data from the start of the 28th of August 1984, to the end of the 29th, and between altitudes of 45 and 75 (in whatever units ares used for that Coordinate).
And:
data.subset(time=[PartialDateTime(1984, 9)])
will subset the data to all of September 1984.
5.2.4. Plotting¶
Plotting can also easily be performed on these objects. Many options are available depending on the plot type, but CIS will attempt to make a sensible default plot regardless of the datatype or dimensionality. The default method for series data is ‘line’, otherwise (for e.g. a map plot) is ‘scatter2d’ for UngriddedData and ‘heatmap’ for GriddedData.
A matplotlib Axes is created if none is provided, meaning the user is able to reformat, or export the plot however they like.