Input/Output#

Overview#

Our goal with movement is to enable pipelines that are input-agnostic, meaning they are not tied to a specific motion tracking tool or data format. Therefore, our input/output functions are designed to facilitate data flows between various third-party formats and movement’s own native data structure based on xarray.

It may be useful to think of movement supporting two types of data loading/saving:

  • Supported third-party formats. movement provides convenient functions for loading/saving data in formats written by popular motion tracking tools as well as established data specifications. You can think of these as “Import” and “Export/Save As” functions.

  • Native saving and loading with netCDF. movement leverages xarray’s built-in netCDF support to save and load datasets while preserving all variables and metadata. This is the recommended way to save your analysis state, allowing your movement-powered workflows to resume exactly where they left off.

You are also welcome to try movement by loading some sample data included with the package.

Supported third-party formats#

movement supports the analysis of trajectories of keypoints (pose tracks) and of bounding box centroids (bounding box tracks), which are represented as movement datasets and can be loaded from and saved to various third-party formats.

Source Software

Abbreviation

Source Format

Dataset Type

Supported Operations

DeepLabCut

DLC

DLC-style .h5 or .csv file, or corresponding pandas DataFrame

Pose

Load & Save

SLEAP

SLEAP

analysis .h5 or .slp file

Pose

Load & Save

LightningPose

LP

DLC-style .csv file, or corresponding pandas DataFrame

Pose

Load & Save

Anipose

triangulation .csv file, or corresponding pandas DataFrame

Pose

Load

VGG Image Annotator

VIA

.csv file for tracks annotation

Bounding box

Load

Neurodata Without Borders

NWB

.nwb file or NWBFile object with the ndx-pose extension

Pose

Load & Save

Any

Numpy arrays

Pose or Bounding box

Load & Save*

*Exporting any movement DataArray to a NumPy array is as simple as calling xarray’s built-in xarray.DataArray.to_numpy() method, so no specialised “Export/Save As” function is needed, see xarray’s documentation for more details.

Note

Currently, movement only works with tracked data: either keypoints or bounding boxes whose identities are known from one frame to the next, across consecutive frames. For pose estimation, this means it only supports the predictions output by the supported software packages listed above. Loading manually labelled data—often defined over a non-continuous set of frames—is not currently supported.

Below, we explain how to load pose and bounding box tracks from these supported formats, as well as how to save pose tracks back to some of them.

Loading pose tracks#

The pose tracks loading functionalities are provided by the movement.io.load_poses module, which can be imported as follows:

from movement.io import load_poses

To read a pose tracks file into a movement poses dataset, we provide specific functions for each of the supported formats. We additionally provide a more general from_numpy() function, with which we can build a movement poses dataset from a set of NumPy arrays.

To load DeepLabCut files in .h5 format:

ds = load_poses.from_dlc_file("/path/to/file.h5", fps=30)

# or equivalently
ds = load_poses.from_file(
    "/path/to/file.h5", source_software="DeepLabCut", fps=30
)

To load DeepLabCut files in .csv format:

ds = load_poses.from_dlc_file("/path/to/file.csv", fps=30)

You can also directly load any pandas DataFrame df that’s formatted in the DeepLabCut style:

ds = load_poses.from_dlc_style_df(df, fps=30)

To load SLEAP analysis files in .h5 format (recommended):

ds = load_poses.from_sleap_file("/path/to/file.analysis.h5", fps=30)

# or equivalently
ds = load_poses.from_file(
    "/path/to/file.analysis.h5", source_software="SLEAP", fps=30
)

To load SLEAP files in .slp format (experimental, see notes in movement.io.load_poses.from_sleap_file()):

ds = load_poses.from_sleap_file("/path/to/file.predictions.slp", fps=30)

To load LightningPose files in .csv format:

ds = load_poses.from_lp_file("/path/to/file.analysis.csv", fps=30)

# or equivalently
ds = load_poses.from_file(
    "/path/to/file.analysis.csv", source_software="LightningPose", fps=30
)

Because LightningPose follows the DeepLabCut dataframe format, you can also directly load an appropriately formatted pandas DataFrame df:

ds = load_poses.from_dlc_style_df(df, fps=30, source_software="LightningPose")

To load Anipose files in .csv format:

ds = load_poses.from_anipose_file(
    "/path/to/file.analysis.csv", fps=30, individual_name="id_0"
)  # Optionally specify the individual name; defaults to "id_0"

# or equivalently
ds = load_poses.from_file(
    "/path/to/file.analysis.csv",
    source_software="Anipose",
    fps=30,
    individual_name="id_0",
)

You can also directly load any pandas DataFrame df that’s formatted in the Anipose triangulation style:

ds = load_poses.from_anipose_style_df(
    df, fps=30, individual_name="id_0"
)

To load NWB files in .nwb format:

ds = load_poses.from_nwb_file(
    "path/to/file.nwb",
    # Optionally name of the ProcessingModule to load
    processing_module_key="behavior",
    # Optionally name of the PoseEstimation object to load
    pose_estimation_key="PoseEstimation",
)

# or equivalently
ds = load_poses.from_file(
    "path/to/file.nwb",
    source_software="NWB",
    processing_module_key="behavior",
    pose_estimation_key="PoseEstimation",
)

The above functions also accept an NWBFile object as input:

with pynwb.NWBHDF5IO("path/to/file.nwb", mode="r") as io:
    nwb_file = io.read()
    ds = load_poses.from_nwb_file(
        nwb_file, pose_estimation_key="PoseEstimation"
    )

In the example below, we create random position data for two individuals, Alice and Bob, with three keypoints each: snout, centre, and tail_base. These keypoints are tracked in 2D space for 100 frames, at 30 fps. The confidence scores are set to 1 for all points.

import numpy as np

rng = np.random.default_rng(seed=42)
ds = load_poses.from_numpy(
    position_array=rng.random((100, 2, 3, 2)),
    confidence_array=np.ones((100, 3, 2)),
    individual_names=["Alice", "Bob"],
    keypoint_names=["snout", "centre", "tail_base"],
    fps=30,
)

The resulting poses data structure ds will include the predicted trajectories for each individual and keypoint, as well as the associated point-wise confidence values reported by the pose estimation software.

For more information on the poses data structure, see the movement datasets page.

Loading bounding box tracks#

To load bounding box tracks into a movement bounding boxes dataset, we need the functions from the movement.io.load_bboxes module, which can be imported as follows:

from movement.io import load_bboxes

We currently support loading bounding box tracks in the VGG Image Annotator (VIA) format only. However, like in the poses datasets, we additionally provide a from_numpy() function, with which we can build a movement bounding boxes dataset from a set of NumPy arrays.

To load a VIA tracks .csv file:

ds = load_bboxes.from_via_tracks_file("path/to/file.csv", fps=30)

# or equivalently
ds = load_bboxes.from_file(
    "path/to/file.csv",
    source_software="VIA-tracks",
    fps=30,
)

Bounding boxes format

Note that the x,y coordinates in the input VIA tracks .csv file represent the the top-left corner of each bounding box. Instead the corresponding movement dataset ds will hold in its position array the centroid of each bounding box.

In the example below, we create random position data for two bounding boxes, id_0 and id_1, both with the same width (40 pixels) and height (30 pixels). These are tracked in 2D space for 100 frames, which will be numbered in the resulting dataset from 0 to 99. The confidence score for all bounding boxes is set to 0.5.

import numpy as np

rng = np.random.default_rng(seed=42)
ds = load_bboxes.from_numpy(
    position_array=rng.random((100, 2, 2)),
    shape_array=np.ones((100, 2, 2)) * [40, 30],
    confidence_array=np.ones((100, 2)) * 0.5,
    individual_names=["id_0", "id_1"]
)

The resulting data structure ds will include the centroid trajectories for each tracked bounding box, the boxes’ widths and heights, and their associated confidence values if provided.

For more information on the bounding boxes data structure, see the movement datasets page.

Saving pose tracks#

To export movement poses datasets to any of the supported third-party formats, we’ll need functions from the movement.io.save_poses module:

from movement.io import save_poses

Depending on the desired format, use one of the following functions:

To save as a DeepLabCut file, in .h5 or .csv format:

save_poses.to_dlc_file(ds, "/path/to/file.h5")  # preferred format
save_poses.to_dlc_file(ds, "/path/to/file.csv")

The movement.io.save_poses.to_dlc_file() function also accepts a split_individuals boolean argument. If set to True, the function will save the data as separate single-animal DeepLabCut-style files.

To save as a SLEAP analysis file in .h5 format:

save_poses.to_sleap_analysis_file(ds, "/path/to/file.h5")

When saving to SLEAP-style files, only track_names, node_names, tracks, track_occupancy, and point_scores are saved. labels_path will only be saved if the source file of the dataset is a SLEAP .slp file. Otherwise, it will be an empty string. Other attributes and data variables (i.e., instance_scores, tracking_scores, edge_names, edge_inds, video_path, video_ind, and provenance) are not currently supported. To learn more about what each attribute and data variable represents, see the SLEAP documentation.

To save as a LightningPose file in .csv format:

save_poses.to_lp_file(ds, "/path/to/file.csv")

Because LightningPose follows the single-animal DeepLabCut .csv format, the above command is equivalent to:

save_poses.to_dlc_file(ds, "/path/to/file.csv", split_individuals=True)

To convert a movement poses dataset to NWBFile objects:

nwb_files = save_poses.to_nwb_file(ds)

To allow adding additional data to NWB files before saving, to_nwb_file does not write to disk directly. Instead, it returns a list of NWBFile objects—one per individual in the dataset—since NWB files are designed to represent data from a single individual.

The to_nwb_file function also accepts a NWBFileSaveConfig object as its config argument for customising metadata such as session or subject information in the resulting NWBFiles (see the API reference for examples).

These NWBFile objects can then be saved to disk as .nwb files using pynwb.NWBHDF5IO:

from pynwb import NWBHDF5IO

for file in nwb_files:
    with NWBHDF5IO(f"{file.identifier}.nwb", "w") as io:
        io.write(file)

Saving bounding box tracks#

We currently support exporting a movement bboxes datasets as a VIA tracks .csv file, so that you can visualise and correct your bounding box tracks with the VGG Image Annotator (VIA-2) software. Alternatively, you can save the bounding box tracks to a .csv file with a custom header using the standard Python library csv.

To export your bounding boxes dataset ds, you will need to import the movement.io.save_bboxes module:

from movement.io import save_bboxes

Then you can save it as a VIA tracks .csv file:

save_bboxes.to_via_tracks_file(ds, "/path/to/output/file.csv")

By default the movement.io.save_bboxes.to_via_tracks_file() function will try to derive the track IDs from the trailing numbers in the individuals’ names, but you can also set track_ids_from_trailing_numbers=False to assign the track IDs sequentially (0, 1, 2, …) based on the alphabetically sorted list of individuals.

Below is an example of how you can export a movement bounding boxes dataset as a .csv file with a custom header:

# define name for output csv file
filepath = "tracking_output.csv"

# open the csv file in write mode
with open(filepath, mode="w", newline="") as file:
    writer = csv.writer(file)

    # write the header
    writer.writerow(["frame_idx", "bbox_ID", "x", "y", "width", "height", "confidence"])

    # write the data
    for individual in ds.individuals.data:
        for frame in ds.time.data:
            x, y = ds.position.sel(time=frame, individuals=individual).data
            width, height = ds.shape.sel(time=frame, individuals=individual).data
            confidence = ds.confidence.sel(time=frame, individuals=individual).data
            writer.writerow([frame, individual, x, y, width, height, confidence])

Using pandas

If you prefer to work with pandas, you can alternatively convert the movement bounding boxes dataset to a pandas DataFrame with the xarray.DataArray.to_dataframe() method, wrangle the dataframe as required, and then apply the pandas.DataFrame.to_csv() method to save the data as a .csv file.

Native saving and loading with netCDF#

Because movement datasets are xarray.Dataset objects, we can rely on xarray’s built-in support for the netCDF file format.

netCDF is a binary file format for self-described datasets that originated in the geosciences, and netCDF files on disk directly correspond to xarray.Dataset objects.

Saving to netCDF is the recommended way to preserve the complete state of your analysis, including all variables, coordinates, and attributes.

To save any xarray dataset ds to a netCDF file:

ds.to_netcdf("/path/to/my_data.nc")

To load the dataset back:

import xarray as xr

ds = xr.open_dataset("my_data.nc")

Similarly, an xarray.DataArray object (e.g. the position variable of a movement dataset) can be saved to disk using the to_netcdf() method, and loaded from disk using the xarray.open_dataarray() function. As netCDF files correspond to Dataset objects, these functions internally convert the DataArray to a Dataset before saving, and then convert back when loading.

Note

xarray also supports compression and chunking options with netCDF, which can be useful for managing large datasets. For more details, see the xarray documentation on netCDF.

Below is an example of how you may integrate netCDF into you movement-powered workflows:

from movement.io import load_poses
from movement.filtering import rolling_filter
from movement.kinematics import compute_speed

ds = load_poses.from_file(
    "path/to/my_data.h5", source_software="DeepLabCut", fps=30
)

# Apply a rolling median filter to smooth the position data
ds["position_smooth"] = rolling_filter(
    ds["position"], window=5, statistic="median"
)

# Compute speed based on the smoothed position data
ds["speed"] = compute_speed(ds["position_smooth"])

# Save the dataset to a netCDF file
# This includes the original position and confidence data,
# the smoothed position, and the computed speed
ds.to_netcdf("my_data_processed.nc")

Sample data#

movement includes some sample data files that you can use to try the package out. These files contain pose and bounding box tracks from various supported third-party formats.

You can list the available sample data files using:

from movement import sample_data

file_names = sample_data.list_datasets()
print(*file_names, sep='\n')  # print each sample file in a separate line

Each sample file is prefixed with the name (or abbreviation) of the software package that was used to generate it.

To load one of the sample files as a movement dataset, use the fetch_dataset function:

filename = "SLEAP_three-mice_Aeon_proofread.analysis.h5"
ds = sample_data.fetch_dataset(filename)

Some sample datasets also have an associated video file (the video for which the data was predicted). You can request to download the sample video by setting with_video=True:

ds = sample_data.fetch_dataset(filename, with_video=True)

If available, the video file is downloaded and its path is stored in the video_path attribute of the dataset (i.e., ds.video_path). This attribute will not be set if no video is available for this dataset, or if you did not request it.

Some datasets also include a sample frame file, which is a single still frame extracted from the video. This can be useful for visualisation (e.g., as a background image for plotting trajectories). If available, this file is always downloaded when fetching the dataset, and its path is stored in the frame_path attribute (i.e., ds.frame_path). If no frame file is available for the dataset, the frame_path attribute will not be set.

Under the hood

When you import the sample_data module with from movement import sample_data, movement downloads a small metadata file to your local machine with information about the latest sample datasets available. Then, the first time you call the fetch_dataset() function, movement downloads the requested file to your machine and caches it in the ~/.movement/data directory. On subsequent calls, the data are directly loaded from this local cache.