Serialization API
Class ComplexDataSerializer
supports data serialization of complex types
with lambdas, functions, member functions and large data arrays. In particular, the class supports out-of-band data serialization
for a set of pickle 5 protocol compatible libraries - pandas and NumPy, specifically pandas.DataFrame
, pandas.Series
and np.ndarray
types.
This class is used in case of serialization of compound objects with different unknown types, possibly large arrays.
Class SimpleDataSerializer
supports data serialization of simple data types.
Also, the class has an API for serialization of lambdas, functions, member functions (and uses cloudpickle
library for that purpose),
but with no performance optimization for large raw data. This class is used when exact object type is known for serialization and it doesn’t consist of large datasets.
API
- class unidist.core.backends.mpi.core.serialization.ComplexDataSerializer(buffers=None, buffer_count=None)
Class for data serialization/de-serialization for MPI comminication.
- Parameters:
buffers (list, default: None) – A list of
PickleBuffer
objects for data decoding.buffer_count (list, default: None) – List of the number of buffers for each object to be serialized/deserialized using the pickle 5 protocol.
Notes
Uses a combination of msgpack, cloudpickle and pickle libraries. Msgpack allows to serialize/deserialize internal objects of a container separately, but send them as one object. For example, for an array of pandas DataFrames, each DataFrame will be serialized separately using pickle 5, and all buffers will be stored in one array to be sent together. To deserialize it buffer_count is used, which contains information about the number of buffers for each internal object.
- _buffer_callback(pickle_buffer)
Callback for pickle protocol 5 out-of-band data buffers collection.
- Parameters:
pickle_buffer (pickle.PickleBuffer) – Pickle library buffer wrapper.
- _cpkl_encode(obj)
Encode with cloudpickle library.
- Parameters:
obj (object) – Python object.
- Returns:
Dictionary with array of serialized bytes.
- Return type:
dict
- _dataframe_encode(frame)
Encode with pickle library using protocol 5.
- Parameters:
data (object) – Pickle 5 serializable object (e.g. pandas DataFrame or NumPy array).
- Returns:
Dictionary with array of serialized bytes.
- Return type:
dict
- _decode_custom(obj)
De-serialization hook for msgpack library.
It decodes complex data types the library couldn`t handle.
- Parameters:
obj (object) – Python object.
- _encode_custom(obj)
Serialization hook for msgpack library.
It encodes complex data types the library couldn`t handle.
- Parameters:
obj (object) – Python object.
- _pkl_encode(obj)
Encode with pickle library.
- Parameters:
obj (object) – Python object.
- Returns:
Dictionary with array of serialized bytes.
- Return type:
dict
- deserialize(s_data)
De-serialize data from a bytearray.
- Parameters:
s_data (bytearray) – Data to de-serialize.
- Returns:
Deserialized data.
- Return type:
object
Notes
Uses msgpack, cloudpickle and pickle libraries.
- serialize(data)
Serialize data to a byte array.
- Parameters:
data (object) – Data to serialize.
- Returns:
Serialized data.
- Return type:
bytes
Notes
Uses msgpack, cloudpickle and pickle libraries.
- class unidist.core.backends.mpi.core.serialization.SimpleDataSerializer
Class for simple data serialization/de-serialization for MPI communication.
Notes
Uses cloudpickle and pickle libraries as separate APIs.
- deserialize_cloudpickle(data)
De-serialization with cloudpickle library.
- Parameters:
obj (bytearray) – Python object.
- Returns:
Original reconstructed object.
- Return type:
object
- deserialize_pickle(data)
De-serialization with pickle library.
- Parameters:
obj (bytearray) – Python object.
- Returns:
Original reconstructed object.
- Return type:
object
- serialize_cloudpickle(data)
Encode with a cloudpickle library.
- Parameters:
obj (object) – Python object.
- Returns:
Array of serialized bytes.
- Return type:
bytearray
- serialize_pickle(data)
Encode with a pickle library.
- Parameters:
obj (object) – Python object.
- Returns:
Array of serialized bytes.
- Return type:
bytearray