Serialization API

Class ComplexDataSerializer supports data serialization of complex types with lambdas, functions, member functions and large data arrays. In particular, the class supports out-of-band data serialization for a set of pickle 5 protocol compatible libraries - pandas and NumPy, specifically pandas.DataFrame, pandas.Series and np.ndarray types. This class is used in case of serialization of compound objects with different unknown types, possibly large arrays. Class SimpleDataSerializer supports data serialization of simple data types. Also, the class has an API for serialization of lambdas, functions, member functions (and uses cloudpickle library for that purpose), but with no performance optimization for large raw data. This class is used when exact object type is known for serialization and it doesn’t consist of large datasets.

API

class unidist.core.backends.mpi.core.serialization.ComplexDataSerializer(buffers=None, buffer_count=None)

Class for data serialization/de-serialization for MPI comminication.

Parameters:
  • buffers (list, default: None) – A list of PickleBuffer objects for data decoding.

  • buffer_count (list, default: None) – List of the number of buffers for each object to be serialized/deserialized using the pickle 5 protocol.

Notes

Uses a combination of msgpack, cloudpickle and pickle libraries. Msgpack allows to serialize/deserialize internal objects of a container separately, but send them as one object. For example, for an array of pandas DataFrames, each DataFrame will be serialized separately using pickle 5, and all buffers will be stored in one array to be sent together. To deserialize it buffer_count is used, which contains information about the number of buffers for each internal object.

_buffer_callback(pickle_buffer)

Callback for pickle protocol 5 out-of-band data buffers collection.

Parameters:

pickle_buffer (pickle.PickleBuffer) – Pickle library buffer wrapper.

_cpkl_encode(obj)

Encode with cloudpickle library.

Parameters:

obj (object) – Python object.

Returns:

Dictionary with array of serialized bytes.

Return type:

dict

_dataframe_encode(frame)

Encode with pickle library using protocol 5.

Parameters:

data (object) – Pickle 5 serializable object (e.g. pandas DataFrame or NumPy array).

Returns:

Dictionary with array of serialized bytes.

Return type:

dict

_decode_custom(obj)

De-serialization hook for msgpack library.

It decodes complex data types the library couldn`t handle.

Parameters:

obj (object) – Python object.

_encode_custom(obj)

Serialization hook for msgpack library.

It encodes complex data types the library couldn`t handle.

Parameters:

obj (object) – Python object.

_pkl_encode(obj)

Encode with pickle library.

Parameters:

obj (object) – Python object.

Returns:

Dictionary with array of serialized bytes.

Return type:

dict

deserialize(s_data)

De-serialize data from a bytearray.

Parameters:

s_data (bytearray) – Data to de-serialize.

Returns:

Deserialized data.

Return type:

object

Notes

Uses msgpack, cloudpickle and pickle libraries.

serialize(data)

Serialize data to a byte array.

Parameters:

data (object) – Data to serialize.

Returns:

Serialized data.

Return type:

bytes

Notes

Uses msgpack, cloudpickle and pickle libraries.

class unidist.core.backends.mpi.core.serialization.SimpleDataSerializer

Class for simple data serialization/de-serialization for MPI communication.

Notes

Uses cloudpickle and pickle libraries as separate APIs.

deserialize_cloudpickle(data)

De-serialization with cloudpickle library.

Parameters:

obj (bytearray) – Python object.

Returns:

Original reconstructed object.

Return type:

object

deserialize_pickle(data)

De-serialization with pickle library.

Parameters:

obj (bytearray) – Python object.

Returns:

Original reconstructed object.

Return type:

object

serialize_cloudpickle(data)

Encode with a cloudpickle library.

Parameters:

obj (object) – Python object.

Returns:

Array of serialized bytes.

Return type:

bytearray

serialize_pickle(data)

Encode with a pickle library.

Parameters:

obj (object) – Python object.

Returns:

Array of serialized bytes.

Return type:

bytearray