We can help you solve company communication.
Simplify your workflow in minutes.
Already have an account? Log in.
Don't have an account yet? Sign up.
Perform operations and compute the ncd matrix.
Categorize continuous features and encode categorical features.
Cluster using a distance matrix and trees.
Keep track of changes
Compress and cluster your files using python!
Zgli is inspired by the complearn tool available here. Show them some support if you find our tool usefull!
We created zgli to make the experience of clustering by compression simple to use and easy to integrate in python machine learning pipelines.
We've implemented 4 compression methods, a feature encoder and intend to implement the quartet method in the foreseeable future.
This code is only made possible by the awsome work that came before it.
All our python source code is available at our Github.
Take a look at the original command line tool that inspired our work here.
Check Paul Vitány et. al. work here to get a closer look at the theory behind this method.
Take a look at the simple work
1 We start by colecting the files we want to cluster and placing them inside a folder.
We start by colecting the files we want to cluster and placing them inside a folder.
2 We take the files form the foler as input and compress and calculate the normalized compression distance between the files inside the folder.
We take the files form the foler as input and compress and calculate the normalized compression distance between the files inside the folder.
3 After the computation of all distances, the zigly library outputs a distance matrix.
After the computation of all distances, the zigly library outputs a distance matrix.
What have we been up to?
February 24th, 2023
v0.2.0
November 22th, 2022
v0.1.0
February 21th, 2022
v0.0.6
February 15th, 2022
v0.0.5
January 21th, 2022
v0.0.4