API Documentation

hdfstream.open(server, name, user=None, password=None, max_depth=5, data_size_limit=1048576)

Connect to the server and return a RemoteDirectory or RemoteFile corresponding to the specified virtual path. If a user name is specified with no password, prompt for the password.

Parameters:
  • server (str) – URL of the server to connect to

  • name (str) – path to the virtual file or directory on the server

  • user (str, optional) – name of the user account for login, defaults to None

  • password (str, optional) – password for login, defaults to None

  • max_depth (int, optional) – maximum recursion depth for group metadata requests

  • data_size_limit (int, optional) – max. dataset size (bytes) to download with metadata

Returns:

RemoteFile or RemoteDirectory corresponding to the requested path

Return type:

RemoteFile or RemoteDirectory

class hdfstream.RemoteDirectory(server, name='/', user=None, password=None, data=None, max_depth=5, data_size_limit=1048576, lazy_load=False, connection=None)

Bases: Mapping

This class represents a virtual directory on the server. To open a remote directory, call hdfstream.open() with the required path or index the parent RemoteDirectory with a relative path. The class constructor documented here is used to implement lazy loading of directory information and should not usually be called directly.

Indexing a RemoteDirectory with a relative path yields another RemoteDirectory or a RemoteFile.

Parameters:
  • server (str) – URL of the server

  • name (str) – virtual path of the directory to open, defaults to “/”

  • user (str, optional) – name of the user account for login, defaults to None

  • password (str, optional) – password for login, defaults to None

  • data (dict, optional) – decoded msgpack data describing the directory, defaults to None

  • max_depth (int, optional) – maximum recursion depth for group metadata requests

  • data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata

  • lazy_load (bool, optional) – directory listing is requested immediately if False, or delayed until needed if True

  • connection (hdfstream.connection.Connection) – connection object which stores http session information

property files

Return a {name : RemoteFile} dict of files in this directory

Return type:

dict

property directories

Return a {name : RemoteDirectory} dict of sub-directories in this directory

Return type:

dict

property size

Return the size of this directory’s contents in bytes

Return type:

int

property filename

Return the full path to this remote directory

Return type:

str

File(filename, mode='r')

Open the file at the specified path relative to this directory. The mode parameter is present for compatibility with h5py. Only mode=”r” is accepted.

Parameters:
  • filename (str) – path of the file to open

  • mode (str) – mode to open the file, defaults to “r”

Return type:

hdfstream.RemoteFile

is_hdf5(filename)

Return True if the specified file is a HDF5 file, False otherwise

Parameters:

filename (str) – name of the file to check

Return type:

bool

class hdfstream.RemoteFile(connection, file_path, max_depth=5, data_size_limit=1048576, data=None)

Bases: Mapping

This class represents a file on the server. To open a remote file, call hdfstream.open() with the full virtual path or index the parent RemoteDirectory object. The class constructor documented here is used to implement lazy loading of file metadata and should not usually be called directly.

Indexing a RemoteFile with a HDF5 object name will yield a RemoteGroup or RemoteDataset object, if the file is a HDF5 file.

Parameters:
  • connection (hdfstream.connection.Connection) – connection object which stores http session information

  • file_path (str) – virtual path of the file

  • max_depth (int, optional) – maximum recursion depth for group metadata requests

  • data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata

  • data (dict, optional) – decoded msgpack data describing the file, defaults to None

property root

Return a RemoteGroup corresponding to this file’s HDF5 root group

Return type:

hdfstream.RemoteGroup

open(mode='r')

Return a File-like object with the contents of the file. This can be used to access non-HDF5 files.

Parameters:

mode (str) – open the file in binary (‘rb’) or text (‘r’) mode

Return type:

requests.Response.raw

get(key, getlink=False)

Return the object at the specified path in the HDF5 file.

Parameters:
  • key (str) – path to the object

  • getlink (bool) – if True, returns a SoftLink or HardLink object

is_hdf5()

Return True if this is a HDF5 file, False otherwise

Return type:

bool

property parent

For RemoteFile objects, the parent property returns the root HDF5 group

Return type:

hdfstream.RemoteGroup

visit(func)

Recursively call func on all HDF5 objects in the file. The function should take a single parameter which is the name of the visited object. If the function returns a value other than None then iteration stops and the value is returned.

Parameters:

func (callable func(name)) – The function to call

Return type:

returns the value returned by func

visititems(func)

Recursively call func on all HDF5 objects in the file. The function should take two parameters: the name of the visited object and the object itself. If the function returns a value other than None then iteration stops and the value is returned.

Parameters:

func (callable func(name, object)) – The function to call

Return type:

returns the value returned by func

close()

Close the file. Only included for compatibility (there’s nothing to close.)

copy(source, dest, name=None, shallow=False, expand_soft=False)

Copy a RemoteGroup or RemoteDataset object to a writable h5py.File or h5py.Group.

Parameters:
  • source (RemoteGroup, RemoteDataset or str) – the object or path to copy

  • dest (h5py.File or h5py.Group) – a local HDF5 file or group to copy the object to

  • name (str) – name of the new object to create in dest

  • shallow (bool) – only copy immediate group members

  • expand_soft (bool) – follow soft links and copy linked objects

property filename

Return the full path to this remote file

Return type:

str

class hdfstream.RemoteGroup(connection, file_path, name, max_depth=5, data_size_limit=1048576, data=None, parent=None)

Bases: Mapping

This class represents a HDF5 group in a file on the server. To open a group, index the parent RemoteFile object. The class constructor documented here is used to implement lazy loading of HDF5 metadata and should not usually be called directly.

Indexing a RemoteGroup with a HDF5 object name yields a RemoteGroup or RemoteDataset object.

Parameters:
  • connection (hdfstream.connection.Connection) – connection object which stores http session information

  • file_path (str) – virtual path of the file containing the group

  • name (str) – name of the HDF5 group

  • max_depth (int, optional) – maximum recursion depth for group metadata requests

  • data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata

  • data (dict, optional) – decoded msgpack data describing the group, defaults to None

  • parent (hdfstream.RemoteGroup, optional) – parent HDF5 group, defaults to None

property attrs

Return a dict containing this group’s HDF5 attributes

get(key, getlink=False, getval=True)

Return the object at the specified absolute or relative path.

Can be used to distinguish soft links if getlink=True.

If getval=False we don’t return the object. This is used to implement __contains__ without triggering lazy loading of objects. In this case we either return none or raise a KeyError.

Parameters:
  • key (str or Path) – path to the object

  • getlink (bool) – if True, returns a SoftLink or HardLink object

  • getlink – if False, just check if the obejct exists

property parent

Return the parent group of this group

Return type:

hdfstream.RemoteGroup

visit(func)

Recursively call func on all members of this HDF5 group. The function should take a single parameter which is the name of the visited object. If the function returns a value other than None then iteration stops and the value is returned.

Parameters:

func (callable func(name)) – The function to call

Return type:

returns the value returned by func

visititems(func)

Recursively call func on all members of this HDF5 group. The function should take two parameters: the name of the visited object and the object itself. If the function returns a value other than None then iteration stops and the value is returned.

Parameters:

func (callable func(name, object)) – The function to call

Return type:

returns the value returned by func

close()

Close the group. Only included for compatibility (there’s nothing to close.)

copy(source, dest, name=None, shallow=False, expand_soft=False)

Copy a RemoteGroup or RemoteDataset object to a writable h5py.File or h5py.Group.

Parameters:
  • source (RemoteGroup, RemoteDataset or str) – the object or path to copy

  • dest (h5py.File or h5py.Group) – a local HDF5 file or group to copy the object to

  • name (str) – name of the new object to create in dest

  • shallow (bool) – only copy immediate group members

  • expand_soft (bool) – follow soft links and copy linked objects

class hdfstream.RemoteDataset(connection, file_path, name, data, parent)

Bases: object

This class represents a HDF5 dataset in a file on the server. To open a dataset, index the parent RemoteGroup or RemoteFile object. The class constructor documented here is used to implement lazy loading of HDF5 metadata and should not usually be called directly.

Indexing a RemoteDataset with numpy style slicing yields a numpy array with the dataset contents. Indexing with an integer or boolean array is supported, but only in the first dimension.

Parameters:
  • connection (hdfstream.connection.Connection) – connection object which stores http session information

  • file_path (str) – virtual path of the file containing the dataset

  • name (str) – name of the HDF5 dataset

  • data (dict, optional) – decoded msgpack data describing the dataset, defaults to None

  • parent (hdfstream.RemoteGroup, optional) – parent HDF5 group, defaults to None

Variables:
  • attrs (dict) – dict of HDF5 attribute values of the form {name : np.ndarray}

  • dtype (np.dtype) – data type for this dataset

  • shape (tuple of integers) – shape of this dataset

read_direct(array, source_sel=None, dest_sel=None)

Read data directly into a destination buffer. This can save time by preventing unneccessary copying of the data but only works for fixed length types (e.g. integer or floating point data).

Copies the data if the destination array does not have the same data type as the dataset.

Parameters:
  • array (np.ndarray) – output array which will receive the data

  • source_sel (slice or list of slices, optional) – selection in the source dataset as a numpy slice, defaults to None

  • dest_sel (slice or list of slices, optional) – selection in the output array as a numpy slice, defaults to None

close()

Close the group. Only included for compatibility (there’s nothing to close.)

request_slices(slices, dest=None)

Request a series of dataset slices from the server and return a single array with the slices concatenated along the first dimension. Slices may only differ in the first dimension, must be in ascending order of starting index in the first dimension, and must not overlap. Slices must have step=1. Example usage:

slices = []
slices.append(np.s_[0:10,:])
slices.append(np.s_[100:110,:])
result = dataset.request_slices(slices)

If the optional dest parameter is used the result is written to dest. Otherwise a new np.ndarray is returned.

Parameters:
  • keys (list of tuples of slice objects) – list of multidimensional slices to read

  • dest (np.ndarray, optional) – destination buffer to write to, defaults to None

Return type:

np.ndarray or None

Bases: object

This class represents a soft link in a HDF5 file. It’s just a container for a single string with the link target path.

Parameters:

data (dict, optional) – decoded msgpack data describing the link

Bases: object

hdfstream.disable_progress(disable)

Disable the progress bar when downloading data.

Parameters:

disable (bool or None) – set True to never show the progress bar, False to always show, and None to show if stdout is a terminal

hdfstream.set_progress_delay(delay)

Set the delay in seconds before the progress bar is shown

Parameters:

delay (float) – time delay in seconds

class hdfstream.Config

Bases: object

Class to store module configuration info

add_alias(name, url, user=None, use_keyring=False)

Add a new alias for the specified URL

Parameters:
  • name (str) – name of the alias to create

  • url (str) – URL of the alias to create

  • user (str or None) – default username to use when connecting

  • use_keyring (bool) – whether to use the system keyring to store passwords

write(filename=None, mode='x')

Write this config object to a yaml file. Writes to config.yml in the user’s default configuration directory (supplied by the platformdirs module) if no filename is provided.

Parameters:
  • filename (str or None) – name of the file to write

  • mode (str) – mode used to open the file (usually ‘w’ or ‘x’)

read(filename=None)

Read the specified config file and update this Config object. Reads config.yml in the user’s default configuration directory (supplied by the platformdirs module) if no filename is provided.

Parameters:

filename (str or None) – name of the file to write

resolve_alias(name, user)

Given an alias, return the corresponding server URL, the user name to use, and a flag indicating if we should use the system keyring to access passwords. If the supplied name is not an alias then it is assumed to be a URL already and is returned unmodified. If a username is specified, it overrides any configured username.

Parameters:
  • name (str) – name of the alias to look up

  • user (str or None) – overrides configured user name if not None

Return type:

(str, str, bool)

hdfstream.get_config()

Return the active configuration object. Read the user’s config file if possible, otherwise write a new default config file.

Return type:

hdfstream.Config

hdfstream.set_config(config)

Set the active configuration object

Parameters:

config (hdfstream.Config) – the Config object to use

hdfstream.verify_cert(enable)

Disable SSL certificate validation. Should only be used for testing.

Parameters:

enable (bool) – whether to validate the server’s certificate

class hdfstream.testing.GzipMsgpackSerializer

Bases: object

Gzipped msgpack serializer for use with vcrpy and pytest-recording

serialize(cassette_dict)

Serialize a cassette dict to bytes

Parameters:

cassette_dict (dict) – dict containing the data to serialize

Returns:

serialized data

Return type:

bytes

deserialize(cassette_bytes)

Deserialize bytes to a cassette dict

Parameters:

cassette_bytes (bytes) – the data to deserialize

Returns:

a cassette dict

Return type:

dict

class hdfstream.testing.BinaryFilesystemPersister

Bases: object

A vcrpy persister which can write binary files

classmethod load_cassette(cassette_path, serializer)

Read and deserialize cassette data from a file path

Parameters:
  • cassette_path (Path) – path to the file to read

  • serializer_class (class) – serializer class to use

Returns:

a cassette dict

Return type:

dict

static save_cassette(cassette_path, cassette_dict, serializer)

Serialize cassette data and write it to a file

Parameters:
  • cassette_path (Path) – path to the file to write

  • cassette_dict (dict) – dict containing the data to write

  • serializer_class (class) – serializer class to use

hdfstream.testing.pytest_recording_configure(config, vcr)

This registers the vcrpy serializer and persister used to store responses from the server for use in unit tests. Should be imported in conftest.py when using pytest.

Parameters:
  • config (_pytest.config.Config) – pytest configuration object

  • vcr (vcr.config.VCR) – an instance of the VCR config object

hdfstream.testing.vcr_config()

Configure vcrpy to use the gzipped messagepack serializer. Should be imported in conftest.py when using pytest. Also strip out auth headers in case we accidentally record an authenticated request.

exception hdfstream.testing.KeyringNotAvailableError

Bases: Exception

class hdfstream.util.LocalOrRemoteFile

Bases: object

Mixin class used to open local or remote files.

This is intended to help with implementing classes which can read from local HDF5 files or from a hdfstream server. Classes which inherit from this should call LocalOrRemoteFile.set_directory() in their __init__ method to specify where files should be read from. Set the remote_dir parameter to None to read local HDF5 files. Set it to a hdfstream.RemoteDirectory instance to read from a remote server.

Class methods can then open files as follows:

with self.open_file(filename) as f:
  # read from file f here

If remote_dir was set to a remote directory, then the filename is taken to be relative to that directory.

set_directory(remote_dir=None)

Specify where to read files from.

Parameters:

remote_dir (hdfstream.RemoteDirectory or None) – The remote directory to read from, if any

open_direct(filename)

Open the specified file and return a file object.

Parameters:

filename (str) – The name of the file to open

Return type:

h5py.File or hdfstream.RemoteFile

open_file(filename)

Context manager used to open local or remote files.

Parameters:

filename (str) – Name of the file to open.

Yields:

A file object opened for reading.

Return type:

h5py.File or hdfstream.RemoteFile