CUDA 0-Copy IPC

So let say you have a pytorch tensor on cuda and you want to share it between nodes.

Good news is that you can do it without copying the data using CUDA 0 Copy IPC.

Installation

To use this feature, make sure to have the following requirements:

# Install pyarrow with gpu support
conda install pyarrow "arrow-cpp-proc=*=cuda" -c conda-forge

## Test installation with
python -c "import pyarrow.cuda"

# Install numba for translation from arrow to torch
pip install numba

## Test installation with
python -c "import numba.cuda"

# Install torch if it's not already present
pip install torch

## Test installation with
python -c "import torch; assert torch.cuda.is_available()"

Sending data

To create an IPC handle that is going to be sent between process, do the following:

import torch
from dora.cuda import torch_to_ipc_buffer

torch_tensor = torch.tensor([1, 2, 3], dtype=torch.int64, device="cuda")
ipc_buffer, metadata = torch_to_ipc_buffer(torch_tensor)
node.send_output("latency", ipc_buffer, metadata)

Receiving data

import pyarrow as pa
from dora import Node
from dora.cuda import ipc_buffer_to_ipc_handle, cudabuffer_to_torch

ctx = pa.cuda.context()
node = Node()
event = node.next() # Get an event with a torch handle

ipc_handle = ipc_buffer_to_ipc_handle(event["value"])
cudabuffer = ctx.open_ipc_buffer(ipc_handle)
torch_tensor = cudabuffer_to_torch(cudabuffer, event["metadata"])  # on cuda

CUDA 0-Copy IPC

Installation​

Sending data​

Receiving data​

Installation

Sending data

Receiving data