API Documentation
- hdfstream.open(server, name, user=None, password=None, max_depth=5, data_size_limit=1048576)
Connect to the server and return a RemoteDirectory or RemoteFile corresponding to the specified virtual path. If a user name is specified with no password, prompt for the password.
- Parameters:
server (str) – URL of the server to connect to
name (str) – path to the virtual file or directory on the server
user (str, optional) – name of the user account for login, defaults to None
password (str, optional) – password for login, defaults to None
max_depth (int, optional) – maximum recursion depth for group metadata requests
data_size_limit (int, optional) – max. dataset size (bytes) to download with metadata
- Returns:
RemoteFile or RemoteDirectory corresponding to the requested path
- Return type:
- class hdfstream.RemoteDirectory(server, name='/', user=None, password=None, data=None, max_depth=5, data_size_limit=1048576, lazy_load=False, connection=None)
Bases:
MappingThis class represents a virtual directory on the server. To open a remote directory, call hdfstream.open() with the required path or index the parent RemoteDirectory with a relative path. The class constructor documented here is used to implement lazy loading of directory information and should not usually be called directly.
Indexing a RemoteDirectory with a relative path yields another RemoteDirectory or a RemoteFile.
- Parameters:
server (str) – URL of the server
name (str) – virtual path of the directory to open, defaults to “/”
user (str, optional) – name of the user account for login, defaults to None
password (str, optional) – password for login, defaults to None
data (dict, optional) – decoded msgpack data describing the directory, defaults to None
max_depth (int, optional) – maximum recursion depth for group metadata requests
data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata
lazy_load (bool, optional) – directory listing is requested immediately if False, or delayed until needed if True
connection (hdfstream.connection.Connection) – connection object which stores http session information
- property files
Return a {name : RemoteFile} dict of files in this directory
- Return type:
dict
- property directories
Return a {name : RemoteDirectory} dict of sub-directories in this directory
- Return type:
dict
- property size
Return the size of this directory’s contents in bytes
- Return type:
int
- property filename
Return the full path to this remote directory
- Return type:
str
- File(filename, mode='r')
Open the file at the specified path relative to this directory. The mode parameter is present for compatibility with h5py. Only mode=”r” is accepted.
- Parameters:
filename (str) – path of the file to open
mode (str) – mode to open the file, defaults to “r”
- Return type:
- is_hdf5(filename)
Return True if the specified file is a HDF5 file, False otherwise
- Parameters:
filename (str) – name of the file to check
- Return type:
bool
- class hdfstream.RemoteFile(connection, file_path, max_depth=5, data_size_limit=1048576, data=None)
Bases:
MappingThis class represents a file on the server. To open a remote file, call hdfstream.open() with the full virtual path or index the parent RemoteDirectory object. The class constructor documented here is used to implement lazy loading of file metadata and should not usually be called directly.
Indexing a RemoteFile with a HDF5 object name will yield a RemoteGroup or RemoteDataset object, if the file is a HDF5 file.
- Parameters:
connection (hdfstream.connection.Connection) – connection object which stores http session information
file_path (str) – virtual path of the file
max_depth (int, optional) – maximum recursion depth for group metadata requests
data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata
data (dict, optional) – decoded msgpack data describing the file, defaults to None
- property root
Return a RemoteGroup corresponding to this file’s HDF5 root group
- Return type:
- open(mode='r')
Return a File-like object with the contents of the file. This can be used to access non-HDF5 files.
- Parameters:
mode (str) – open the file in binary (‘rb’) or text (‘r’) mode
- Return type:
requests.Response.raw
- get(key, getlink=False)
Return the object at the specified path in the HDF5 file.
- Parameters:
key (str) – path to the object
getlink (bool) – if True, returns a SoftLink or HardLink object
- is_hdf5()
Return True if this is a HDF5 file, False otherwise
- Return type:
bool
- property parent
For RemoteFile objects, the parent property returns the root HDF5 group
- Return type:
- visit(func)
Recursively call func on all HDF5 objects in the file. The function should take a single parameter which is the name of the visited object. If the function returns a value other than None then iteration stops and the value is returned.
- Parameters:
func (callable func(name)) – The function to call
- Return type:
returns the value returned by func
- visititems(func)
Recursively call func on all HDF5 objects in the file. The function should take two parameters: the name of the visited object and the object itself. If the function returns a value other than None then iteration stops and the value is returned.
- Parameters:
func (callable func(name, object)) – The function to call
- Return type:
returns the value returned by func
- close()
Close the file. Only included for compatibility (there’s nothing to close.)
- copy(source, dest, name=None, shallow=False, expand_soft=False)
Copy a RemoteGroup or RemoteDataset object to a writable h5py.File or h5py.Group.
- Parameters:
source (RemoteGroup, RemoteDataset or str) – the object or path to copy
dest (h5py.File or h5py.Group) – a local HDF5 file or group to copy the object to
name (str) – name of the new object to create in dest
shallow (bool) – only copy immediate group members
expand_soft (bool) – follow soft links and copy linked objects
- property filename
Return the full path to this remote file
- Return type:
str
- class hdfstream.RemoteGroup(connection, file_path, name, max_depth=5, data_size_limit=1048576, data=None, parent=None)
Bases:
MappingThis class represents a HDF5 group in a file on the server. To open a group, index the parent RemoteFile object. The class constructor documented here is used to implement lazy loading of HDF5 metadata and should not usually be called directly.
Indexing a RemoteGroup with a HDF5 object name yields a RemoteGroup or RemoteDataset object.
- Parameters:
connection (hdfstream.connection.Connection) – connection object which stores http session information
file_path (str) – virtual path of the file containing the group
name (str) – name of the HDF5 group
max_depth (int, optional) – maximum recursion depth for group metadata requests
data_size_limit (int, optional) – max. dataset size (bytes) to be downloaded with metadata
data (dict, optional) – decoded msgpack data describing the group, defaults to None
parent (hdfstream.RemoteGroup, optional) – parent HDF5 group, defaults to None
- property attrs
Return a dict containing this group’s HDF5 attributes
- get(key, getlink=False, getval=True)
Return the object at the specified absolute or relative path.
Can be used to distinguish soft links if getlink=True.
If getval=False we don’t return the object. This is used to implement __contains__ without triggering lazy loading of objects. In this case we either return none or raise a KeyError.
- Parameters:
key (str or Path) – path to the object
getlink (bool) – if True, returns a SoftLink or HardLink object
getlink – if False, just check if the obejct exists
- property parent
Return the parent group of this group
- Return type:
- visit(func)
Recursively call func on all members of this HDF5 group. The function should take a single parameter which is the name of the visited object. If the function returns a value other than None then iteration stops and the value is returned.
- Parameters:
func (callable func(name)) – The function to call
- Return type:
returns the value returned by func
- visititems(func)
Recursively call func on all members of this HDF5 group. The function should take two parameters: the name of the visited object and the object itself. If the function returns a value other than None then iteration stops and the value is returned.
- Parameters:
func (callable func(name, object)) – The function to call
- Return type:
returns the value returned by func
- close()
Close the group. Only included for compatibility (there’s nothing to close.)
- copy(source, dest, name=None, shallow=False, expand_soft=False)
Copy a RemoteGroup or RemoteDataset object to a writable h5py.File or h5py.Group.
- Parameters:
source (RemoteGroup, RemoteDataset or str) – the object or path to copy
dest (h5py.File or h5py.Group) – a local HDF5 file or group to copy the object to
name (str) – name of the new object to create in dest
shallow (bool) – only copy immediate group members
expand_soft (bool) – follow soft links and copy linked objects
- class hdfstream.RemoteDataset(connection, file_path, name, data, parent)
Bases:
objectThis class represents a HDF5 dataset in a file on the server. To open a dataset, index the parent RemoteGroup or RemoteFile object. The class constructor documented here is used to implement lazy loading of HDF5 metadata and should not usually be called directly.
Indexing a RemoteDataset with numpy style slicing yields a numpy array with the dataset contents. Indexing with an integer or boolean array is supported, but only in the first dimension.
- Parameters:
connection (hdfstream.connection.Connection) – connection object which stores http session information
file_path (str) – virtual path of the file containing the dataset
name (str) – name of the HDF5 dataset
data (dict, optional) – decoded msgpack data describing the dataset, defaults to None
parent (hdfstream.RemoteGroup, optional) – parent HDF5 group, defaults to None
- Variables:
attrs (dict) – dict of HDF5 attribute values of the form {name : np.ndarray}
dtype (np.dtype) – data type for this dataset
shape (tuple of integers) – shape of this dataset
- read_direct(array, source_sel=None, dest_sel=None)
Read data directly into a destination buffer. This can save time by preventing unneccessary copying of the data but only works for fixed length types (e.g. integer or floating point data).
Copies the data if the destination array does not have the same data type as the dataset.
- Parameters:
array (np.ndarray) – output array which will receive the data
source_sel (slice or list of slices, optional) – selection in the source dataset as a numpy slice, defaults to None
dest_sel (slice or list of slices, optional) – selection in the output array as a numpy slice, defaults to None
- close()
Close the group. Only included for compatibility (there’s nothing to close.)
- request_slices(slices, dest=None)
Request a series of dataset slices from the server and return a single array with the slices concatenated along the first dimension. Slices may only differ in the first dimension, must be in ascending order of starting index in the first dimension, and must not overlap. Slices must have step=1. Example usage:
slices = [] slices.append(np.s_[0:10,:]) slices.append(np.s_[100:110,:]) result = dataset.request_slices(slices)
If the optional dest parameter is used the result is written to dest. Otherwise a new np.ndarray is returned.
- Parameters:
keys (list of tuples of slice objects) – list of multidimensional slices to read
dest (np.ndarray, optional) – destination buffer to write to, defaults to None
- Return type:
np.ndarray or None
- class hdfstream.SoftLink(data)
Bases:
objectThis class represents a soft link in a HDF5 file. It’s just a container for a single string with the link target path.
- Parameters:
data (dict, optional) – decoded msgpack data describing the link
- class hdfstream.HardLink
Bases:
object
- hdfstream.disable_progress(disable)
Disable the progress bar when downloading data.
- Parameters:
disable (bool or None) – set True to never show the progress bar, False to always show, and None to show if stdout is a terminal
- hdfstream.set_progress_delay(delay)
Set the delay in seconds before the progress bar is shown
- Parameters:
delay (float) – time delay in seconds
- class hdfstream.Config
Bases:
objectClass to store module configuration info
- add_alias(name, url, user=None, use_keyring=False)
Add a new alias for the specified URL
- Parameters:
name (str) – name of the alias to create
url (str) – URL of the alias to create
user (str or None) – default username to use when connecting
use_keyring (bool) – whether to use the system keyring to store passwords
- write(filename=None, mode='x')
Write this config object to a yaml file. Writes to config.yml in the user’s default configuration directory (supplied by the platformdirs module) if no filename is provided.
- Parameters:
filename (str or None) – name of the file to write
mode (str) – mode used to open the file (usually ‘w’ or ‘x’)
- read(filename=None)
Read the specified config file and update this Config object. Reads config.yml in the user’s default configuration directory (supplied by the platformdirs module) if no filename is provided.
- Parameters:
filename (str or None) – name of the file to write
- resolve_alias(name, user)
Given an alias, return the corresponding server URL, the user name to use, and a flag indicating if we should use the system keyring to access passwords. If the supplied name is not an alias then it is assumed to be a URL already and is returned unmodified. If a username is specified, it overrides any configured username.
- Parameters:
name (str) – name of the alias to look up
user (str or None) – overrides configured user name if not None
- Return type:
(str, str, bool)
- hdfstream.get_config()
Return the active configuration object. Read the user’s config file if possible, otherwise write a new default config file.
- Return type:
- hdfstream.set_config(config)
Set the active configuration object
- Parameters:
config (hdfstream.Config) – the Config object to use
- hdfstream.verify_cert(enable)
Disable SSL certificate validation. Should only be used for testing.
- Parameters:
enable (bool) – whether to validate the server’s certificate
- class hdfstream.testing.GzipMsgpackSerializer
Bases:
objectGzipped msgpack serializer for use with vcrpy and pytest-recording
- serialize(cassette_dict)
Serialize a cassette dict to bytes
- Parameters:
cassette_dict (dict) – dict containing the data to serialize
- Returns:
serialized data
- Return type:
bytes
- deserialize(cassette_bytes)
Deserialize bytes to a cassette dict
- Parameters:
cassette_bytes (bytes) – the data to deserialize
- Returns:
a cassette dict
- Return type:
dict
- class hdfstream.testing.BinaryFilesystemPersister
Bases:
objectA vcrpy persister which can write binary files
- classmethod load_cassette(cassette_path, serializer)
Read and deserialize cassette data from a file path
- Parameters:
cassette_path (Path) – path to the file to read
serializer_class (class) – serializer class to use
- Returns:
a cassette dict
- Return type:
dict
- static save_cassette(cassette_path, cassette_dict, serializer)
Serialize cassette data and write it to a file
- Parameters:
cassette_path (Path) – path to the file to write
cassette_dict (dict) – dict containing the data to write
serializer_class (class) – serializer class to use
- hdfstream.testing.pytest_recording_configure(config, vcr)
This registers the vcrpy serializer and persister used to store responses from the server for use in unit tests. Should be imported in conftest.py when using pytest.
- Parameters:
config (_pytest.config.Config) – pytest configuration object
vcr (vcr.config.VCR) – an instance of the VCR config object
- hdfstream.testing.vcr_config()
Configure vcrpy to use the gzipped messagepack serializer. Should be imported in conftest.py when using pytest. Also strip out auth headers in case we accidentally record an authenticated request.
- exception hdfstream.testing.KeyringNotAvailableError
Bases:
Exception
- class hdfstream.util.LocalOrRemoteFile
Bases:
objectMixin class used to open local or remote files.
This is intended to help with implementing classes which can read from local HDF5 files or from a hdfstream server. Classes which inherit from this should call
LocalOrRemoteFile.set_directory()in their__init__method to specify where files should be read from. Set theremote_dirparameter toNoneto read local HDF5 files. Set it to ahdfstream.RemoteDirectoryinstance to read from a remote server.Class methods can then open files as follows:
with self.open_file(filename) as f: # read from file f here
If
remote_dirwas set to a remote directory, then the filename is taken to be relative to that directory.- set_directory(remote_dir=None)
Specify where to read files from.
- Parameters:
remote_dir (hdfstream.RemoteDirectory or None) – The remote directory to read from, if any
- open_direct(filename)
Open the specified file and return a file object.
- Parameters:
filename (str) – The name of the file to open
- Return type:
h5py.File or hdfstream.RemoteFile
- open_file(filename)
Context manager used to open local or remote files.
- Parameters:
filename (str) – Name of the file to open.
- Yields:
A file object opened for reading.
- Return type:
h5py.File or hdfstream.RemoteFile