Categorize¶
-
class
numcodecs.categorize.
Categorize
(labels, dtype, astype='u1')[source]¶ Filter encoding categorical string data as integers.
- Parameters
- labelssequence of strings
Category labels.
- dtypedtype
Data type to use for decoded data.
- astypedtype, optional
Data type to use for encoded data.
Examples
>>> import numcodecs >>> import numpy as np >>> x = np.array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> x array(['male', 'female', 'female', 'male', 'unexpected'], dtype=object) >>> codec = numcodecs.Categorize(labels=['female', 'male'], dtype=object) >>> y = codec.encode(x) >>> y array([2, 1, 1, 2, 0], dtype=uint8) >>> z = codec.decode(y) >>> z array(['male', 'female', 'female', 'male', ''], dtype=object)
-
codec_id
= 'categorize'¶
-
encode
(self, buf)[source]¶ Encode data in buf.
- Parameters
- bufbuffer-like
Data to be encoded. May be any object supporting the new-style buffer protocol or array.array under Python 2.
- Returns
- encbuffer-like
Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.
-
decode
(self, buf, out=None)[source]¶ Decode data in buf.
- Parameters
- bufbuffer-like
Encoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.
- outbuffer-like, optional
Writeable buffer to store decoded data. N.B. if provided, this buffer must be exactly the right size to store the decoded data.
- Returns
- decbuffer-like
Decoded data. May be any object supporting the new-style buffer protocol or array.array under Python 2.
-
get_config
(self)[source]¶ Return a dictionary holding configuration parameters for this codec. Must include an ‘id’ field with the codec identifier. All values must be compatible with JSON encoding.
-
classmethod
from_config
(config)¶ Instantiate codec from a configuration object.