cvpl_tools/im/ndblock.py

View source at ndblock.py.

APIs

class cvpl_tools.im.ndblock.NDBlock(arr: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Array | NDBlock | None)

This class represents an N-dimensional grid, each block in the grid is of arbitrary ndarray shape

When the grid is of size 1 in all axes, it represents a Numpy array; When the grid has blocks of matching size on all axes, it represents a Dask array; In the general case the blocks can be of varying sizes e.g. A block of size (2, 2) may be neighboring to a block of size (5, 10)

Currently, we assume block_index is a list always ordered in (0, …, 0), (0, …, 1)… (N, …, M-1) Increasing from the side of the tail first

as_dask_array(storage_options: dict | None = None) Array

Get a copy of the array value as dask array

Parameters:

storage_options – Optionally, specify a compression format when persist

Returns:

converted/retrieved dask array

get_chunksize() tuple[int, ...]

Get a single tuple of chunk size on each axis

Returns:

A tuple of int of length ndim

is_numpy() bool

Returns True if this is Numpy array

Note besides type ReprFormat.NUMPY, ReprFormat.DICT_BLOCK_INDEX_SLICES may have either Numpy arrays as each block, or Dask delayed objects each returning a Numpy array; in the former case is_numpy() will return True, and in the latter case is_numpy() will return False.

Returns:

Returns True if this is Numpy array

static load(file: str | RDirFileSystem, storage_options: dict | None = None) NDBlock

Load the NDBlock from the given path.

Parameters:
  • file – The path to load from, same as used in the save() function

  • storage_options – Specifies options in saving method and saved file format; this includes ‘compressor’ - compressor (numcodecs.abc.Codec, optional): The compressor to use to compress array or chunks

Returns:

The loaded NDBlock. Guaranteed to have the same properties as the saved one, and the content of each block will be the same as when they are saved.

async static map_ndblocks(inputs: Sequence[NDBlock], fn: Callable, out_dtype: dtype, use_input_index_as_arrloc: int = 0, new_slices: list = None, fn_args: dict = None) NDBlock

Similar to da.map_blocks, but works with NDBlock.

Different from dask array’s map_blocks: For each input i, the block_info[i] provided to the mapping function will contain only two keys, ‘chunk-location’ and ‘array-location’.

Parameters:
  • inputs – A list of inputs to be mapped, either all dask or all numpy; All inputs must have the same number of blocks, block indices, and over the same slices as well if inputs are dask images

  • fn – fn(*block, block_info) maps input blocks to an output block

  • out_dtype – Output block type (provide a Numpy dtype to this)

  • use_input_index_as_arrloc – output slices_list will be the same as this input (ignore this for variable sized blocks)

  • new_slices – will be used to replace the slices attribute if specified

  • fn_args – extra arguments to be passed to the mapping function

Returns:

Result mapped, of format ReprFormat.DICT_BLOCK_INDEX_SLICES

property ndim: int

Returns: Integer indicating the dimensionality of the image contained in the NDBlock object

persist(compressor=None) NDBlock

Using dask client persist() to save and reload the NDBlock object

Parameters:
  • ndblock – The NDBlock to be saved

  • compressor – The compression used

Returns:

Reloaded NDBlock object; if is numpy, then no saving is done and the object is directly returned

async reduce(force_numpy: bool = False) ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Array

Concatenate all blocks on the first axis

Parameters:

force_numpy – If True, the result will be forced from dask to a numpy array, if not already; useful for outputting to analysis that requires numpy input

Returns:

The concatenated result, is Numpy if previous array is Numpy, or if force_numpy is True

async static save(file: str | RDirFileSystem, ndblock: NDBlock, storage_options: dict | None = None)

Save the NDBlock to the given path

Will compute immediately if the ndblock is delayed dask computations

Storage Opitons
multiscale (int) = 0

This only applies if ndblock is dask, will write multilevel if non-zero

compressor = None

The compressor to use to compress array or chunks

Parameters:
  • file – The file to save to

  • ndblock – The block which will be saved

  • storage_options – Specifies options in saving method and saved file format

select_columns(cols: slice | Sequence[int] | int) NDBlock

Performs columns selection on a 2d array

sum(axis: Sequence | None = None, keepdims: bool = False)

sum over axes for each block

to_dask_array(storage_options: dict | None = None)

Convert representation format to dask array

Parameters:

storage_options – Optionally, specify a compression format when persist

to_dict_block_index_slices()

Convert representation format to ReprFormat.DICT_BLOCK_INDEX_SLICES

async to_numpy_array()

Convert representation format to numpy array

class cvpl_tools.im.ndblock.ReprFormat(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

ReprFormat specifies all possible NDBlock formats to use.

NUMPY_ARRAY = 0 DASK_ARRAY = 1 DICT_BLOCK_INDEX_SLICES = 2 # block can be either numpy or dask