zgly.encode

Getting started

Zgli.Folder

Zgli.encoder

Zgli.tree

Encoder.encode_df

Encode the given categorical columns into patterned strings. [source]

Parameters:

df : pandas.DataFrame
A dataframe with standardized categorical columns obtained using the standardize_categorical_cols function.
cols : list
A list with the columns to be encoded.
hop : list
A number representing the the character difference between each patterned string. Ex: hop = 1 s1 = 000000 s2 = 010101 | hop = 2 s1 = 000000 s2 = 012012

Returns:

df_enc : pandas.DataFrame
The Dataframe with the given columns encoded.

Example:

# Imports
>>> from zgli.encoder import Encoder
>>> from sklearn import datasets

# Load Iris df
>>> iris = datasets.load_iris()
>>> iris_df = pd.DataFrame(iris['data'])

# Encode iris df
>>> cols = [0,1,2,3]

# Divide iris df
>>> cuts = [4,4,4,4]
>>> encoder = Encoder()
>>> df_ct = encoder.categorize_cols(iris_df,cols,cuts)
>>> df_ct.head()
        0		1		2		3
0	(4.296, 5.2]	(3.2, 3.8]	(0.994, 2.475]	(0.0976, 0.7]
1	(4.296, 5.2]	(2.6, 3.2]	(0.994, 2.475]	(0.0976, 0.7]
2	(4.296, 5.2]	(2.6, 3.2]	(0.994, 2.475]	(0.0976, 0.7]
3	(4.296, 5.2]	(2.6, 3.2]	(0.994, 2.475]	(0.0976, 0.7]
4	(4.296, 5.2]	(3.2, 3.8]	(0.994, 2.475]	(0.0976, 0.7]

# Standardize df_div iris df
>>> df_std = encoder.standardize_categorical_cols(df_ct,cols)
>>> df_std.head()
0	1	2	3
0	0	2	0	0
1	0	1	0	0
2	0	1	0	0
3	0	1	0	0
4	0	2	0	0

# Encode df
>>> hop = 1
>>> df_enc = encoder.encode_df(df_std,cols,hop) # We use the encoding function here.
>>> df_enc.head()
        0		1		2		3
0	000000000000	012012012012	000000000000	000000000000
1	000000000000	010101010101	000000000000	000000000000
2	000000000000	010101010101	000000000000	000000000000
3	000000000000	010101010101	000000000000	000000000000
4	000000000000	012012012012	000000000000	000000000000

zgli.Folder

zgli.Encoder

zgli.Tree