.. _beta_amyloid_plaque_detection:

Beta-Amyloid Plaque Detection
#############################

Workflow Overview
*****************

This section pertains to mouse brain lightsheet data, in effort to quantify the locations and sizes of plaques
in a mouse brain lightsheet scan image.

Below are figures illustrating an overview of the training and prediction stages of the pipeline, ultimately
processing image volumes each to a list of centroids:

.. figure:: ../assets/image_to_nnunet_training.png

    training workflow, overview

.. figure:: ../assets/image_to_list_of_centroids.png

    prediction workflow, overview

Things to note:

1. n4 calculation is calculated at third downsample (i.e. (8, 16, 16) times downsampled compared to original)
level as the n4 bias correction is a smooth operation insensitive to small location change. It runs slowly
on large images so easier to work with using a smaller input.

2. The final step of prediction workflow involves 3 inputs: Original full resolution image, third downsampled
bias image, and second downsampled negative mask prediction. The image is first masked by upsampled negative
mask, then corrected by the upsampled bias image. Finally, the corrected image is used to compute the list
of centroids.

3. The goal of the training workflow is to obtain the model file; after which the model file may be uploaded
to cloud to be downloaded somewhere else where prediction workflow needs to be run. Unlike other operations
that can be done on cloud, nn-UNet predicts on local files and write to a local output location (though you
can write a wrapper that wires the input and output to cloud locations).

4. Rather than nearest neighbor, downsampling is done using max-pooling to avoid a shift in image origin
location after downsampling. For up-sampling, this is not an issue.

Python Implementation
*********************

For more information about nn-UNet related annotation, training and prediction, refer to the
`nn-UNet <GettingStarted/nnunet>`_ page.

Below draws example from the file :code:`cvpl_tools/examples/mousebrain_processing.py` to explain how various
steps in the above diagram is done via code. We start with an image on network in OME ZARR file and would
like to obtain a final counting of list of centroids of where plaques are at.

- Downsampling

.. code-block:: Python

    print(f'first downsample: from path {subject.OME_ZARR_PATH}')
    first_downsample = lightsheet_preprocess.downsample(
        subject.OME_ZARR_PATH, reduce_fn=np.max, ndownsample_level=(1, 2, 2), ba_channel=subject.BA_CHANNEL,
        write_loc=subject.FIRST_DOWNSAMPLE_PATH
    )
    print(f'first downsample done. result is of shape {first_downsample.shape}')

    second_downsample = lightsheet_preprocess.downsample(
        first_downsample, reduce_fn=np.max, ndownsample_level=(1,) * 3,
        write_loc=subject.SECOND_DOWNSAMPLE_PATH
    )

Here, :code:`lightsheet_preprocess.downsample` is a function that outputs a dask array, and also writes to
local disk at :code:`write_loc`. This location can be either local or on a gcs cloud bucket. Writing to cloud
allow us to switch coiled cloud computing if needed, as the file location is not bound to our local laptop.

BA_CHANNEL stands for beta-amyloid channel, which is an integer selecting the channel of the image we want
to process. the downsample function takes this parameter for 4d images, but can skip for 3d images.

- n4

.. code-block:: Python

    import cvpl_tools.nnunet.n4 as n4

    third_downsample_bias = await n4.obtain_bias(third_downsample, write_loc=subject.THIRD_DOWNSAMPLE_BIAS_PATH)
    print('third downsample bias done.')

    print(f'im.shape={second_downsample.shape}, bias.shape={third_downsample_bias.shape}; applying bias over image to obtain corrected image...')
    second_downsample_bias = dask_ndinterp.scale_nearest(third_downsample_bias, scale=(2, 2, 2),
                                                         output_shape=second_downsample.shape, output_chunks=(4, 4096, 4096)).persist()

    second_downsample_corr = lightsheet_preprocess.apply_bias(second_downsample, (1,) * 3, second_downsample_bias, (1,) * 3)
    asyncio.run(ome_io.write_ome_zarr_image(subject.SECOND_DOWNSAMPLE_CORR_PATH, da_arr=second_downsample_corr, MAX_LAYER=1))
    print('second downsample corrected image done')

:code:`coiled_run` can be used instead if local laptop can not install :code:`antspyx` library, as follows:

.. code-block:: Python

    async def compute_bias(dask_worker):
        third_downsample = ome_io.load_dask_array_from_path(subject.THIRD_DOWNSAMPLE_PATH, mode='r', level=0)
        await n4.obtain_bias(third_downsample, write_loc=subject.THIRD_DOWNSAMPLE_BIAS_PATH)
    if not RDirFileSystem(subject.THIRD_DOWNSAMPLE_BIAS_PATH).exists(''):
        cvpl_nnunet_api.coiled_run(fn=compute_bias, nworkers=1, local_testing=False)
    third_downsample_bias = ome_io.load_dask_array_from_path(subject.THIRD_DOWNSAMPLE_BIAS_PATH, mode='r', level=0)

- upscale

.. code-block:: Python

    second_downsample_bias = dask_ndinterp.scale_nearest(third_downsample_bias, scale=(2, 2, 2),
                                                         output_shape=second_downsample.shape, output_chunks=(4, 4096, 4096)).persist()

This step takes apparently redundant :code:`output_shape` parameter alongside scale and the pre-scaled image. However, downscale and
upscale often has to deal with integer division issues, while the scale is whole number, the larger image size may
not be a perfect multiple of the size of the image to be scaled. :code:`output_shape` adjusts for this by crop or pad the
resulting image with zeros.

- uploading bias and negative mask

.. code-block:: Python

    if not RDirFileSystem(subject.GCS_NEG_MASK_TGT).exists(''):
        cvpl_nnunet_api.upload_negmask(
            subject.NNUNET_OUTPUT_TIFF_PATH,
            subject.GCS_NEG_MASK_TGT,
            subject.THIRD_DOWNSAMPLE_BIAS_PATH,
            f'{subject.SUBJECT_FOLDER}/.temp',
            subject.GCS_BIAS_PATH
        )

This is an intermediate step not drawn in the diagram. It simply uploads the bias image and nn-UNet output
negative mask to cloud, since those are computed locally in :code:`mousebrain_processing.py`

- computing list of centroids

.. code-block:: Python

    ppm_to_im_upscale = (4, 8, 8)
    async def fn(dask_worker):
        await cvpl_nnunet_api.mousebrain_forward(
            dask_worker=dask_worker,
            CACHE_DIR_PATH=subject.COILED_CACHE_DIR_PATH,
            ORIG_IM_PATH=subject.OME_ZARR_PATH,
            NEG_MASK_PATH=subject.GCS_NEG_MASK_TGT,
            GCS_BIAS_PATH=subject.GCS_BIAS_PATH,
            BA_CHANNEL=subject.BA_CHANNEL,
            MAX_THRESHOLD=subject.MAX_THRESHOLD,
            ppm_to_im_upscale=ppm_to_im_upscale
        )
    cvpl_nnunet_api.coiled_run(fn=fn, nworkers=10, local_testing=False)

:code:`ppm_to_im_upscale` is the scale from negative mask image to the size of the original image. This is
the final step of the prediction workflow, where we will obtain our list of centroids from gcs after running
this. In a local test (without coiled, but still using gcs), we may set :code:`local_testing=True`

Here, the MAX_THRESHOLD variable specifies the threshold above which objects will be counted as plaque. This
threshold is applied over corrected ome zarr image based on the original image and bias image provided.

After running the above code, a numpy file will be written to :code:`f'{subject.COILED_CACHE_DIR_PATH}/final_lc.npy'`,
which contains the results in a N * 5 matrix of type np.float64. First three columns are the Z, Y, X locations of
the plaques, and the fourth and fifth column are sizes (in voxels) and a unique ID for the plaque. To retrieve:

.. code-block:: Python

    from cvpl_tools.fsspec import RDirFileSystem
    cdir_fs = RDirFileSystem(subject.COILED_CACHE_DIR_PATH)
    with cdir_fs.open('final_lc.npy', mode='rb') as fd:
        lc = np.load(cdir_fs)
    print(f'First 10 rows of lc:\n')
    print(lc[:10])

Alternatively, use :code:`cvpl_tools.fsspec.copyfile` function to obtain the .npy file locally.