Arrow
Arrow
dora-rs 使用 Apache Arrow 数据格式传送消息
接收数据时,值类型为 arrow 数组。 对于Python,您将能够如下转换数据:
import numpy as np
import pandas as pd
import pyarrow as pa
## ...
arrow_array = dora_event["value"]
list = arrow_array.to_pylist()
numpy_array = arrow_array.to_numpy() # 只读零拷贝
pandas_series = arrow_array.to_pandas()
send_output("topic", arrow_array)
备忘
在Arrow所有皆为数组。 因此,即使您可能只想传递标量,也必须将其封装在列表中。
import pyarrow as pa
import numpy as np
import pandas as pd
# 列表消息
array = pa.array([1, 2, 3])
assert array.to_pylist() == [1, 2, 3], "Did not convert to the Same list"
# 字符串消息
array = pa.array(["Hello World"])
assert array.to_pylist() == ["Hello World"], "Did not convert to the Same list"
# 字典/结构 消息
array = pa.array([{"a": 1, "b": 2, "c":[1, 2, 3]}])
assert array.to_pylist() == [{"a": 1, "b": 2, "c":[1, 2, 3]}], "Did not convert to the Same list"
# Numpy 数组
array = np.array([1, 2, 3])
pyarrow_array = pa.array(array)
assert (pyarrow_array.to_numpy() == array).all(), "Did not convert to the Same list"
# Pandas 系列
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
pyarrow_array = pa.array(df["col1"])
assert (pyarrow_array.to_pandas() == df["col1"]).all(), "Did not convert to the Same list"