Custom Serialization

When we communicate data between computers we first convert that data into a sequence of bytes that can be communicated across the network.

Dask can convert data to bytes using the standard solutions of Pickle and Cloudpickle. However, sometimes pickle and cloudpickle are suboptimal so Dask also supports custom serialization formats for special types. This helps Dask to be faster on common formats like NumPy and Pandas and gives power-users more control about how their objects get moved around on the network if they want to extend the system.

We include a small example and then follow with the full API documentation describing the serialize and deserialize functions, which convert objects into a msgpack header and a list of bytestrings and back.

Example

Here is how we special case handling raw Python bytes objects. In this case there is no need to call pickle.dumps on the object. The object is already a sequence of bytes.

def serialize_bytes(obj):
    header = {}  # no special metadata
    frames = [obj]
    return header, frames

def deserialize_bytes(header, frames):
    return frames[0]

register_serialization(bytes, serialize_bytes, deserialize_bytes)

API

register_serialization(cls, serialize, …) Register a new class for custom serialization
serialize(x) Convert object to a header and list of bytestrings
deserialize(header, frames) Convert serialized header and list of bytestrings back to a Python object
distributed.protocol.serialize.register_serialization(cls, serialize, deserialize)[source]

Register a new class for custom serialization

Parameters:

cls: type

serialize: function

deserialize: function

Examples

>>> class Human(object):
...     def __init__(self, name):
...         self.name = name
>>> def serialize(human):
...     header = {}
...     frames = [human.name.encode()]
...     return header, frames
>>> def deserialize(header, frames):
...     return Human(frames[0].decode())
>>> register_serialization(Human, serialize, deserialize)
>>> serialize(Human('Alice'))
({}, [b'Alice'])
distributed.protocol.serialize.serialize(x)[source]

Convert object to a header and list of bytestrings

This takes in an arbitrary Python object and returns a msgpack serializable header and a list of bytes or memoryview objects. By default this uses pickle/cloudpickle but can use special functions if they have been pre-registered.

Returns:

header: dictionary containing any msgpack-serializable metadata

frames: list of bytes or memoryviews, commonly of length one

See also

deserialize
Convert header and frames back to object
to_serialize
Mark that data in a message should be serialized
register_serialization
Register custom serialization functions

Examples

>>> serialize(1)
({}, [b'\x80\x04\x95\x03\x00\x00\x00\x00\x00\x00\x00K\x01.'])
>>> serialize(b'123')  # some special types get custom treatment
({'type': 'builtins.bytes'}, [b'123'])
>>> deserialize(*serialize(1))
1
distributed.protocol.serialize.deserialize(header, frames)[source]

Convert serialized header and list of bytestrings back to a Python object

Parameters:

header: dict

frames: list of bytes

See also

serialize