DiviK package
Python implementation of Divisive iK-means (DiviK) algorithm.
Last updated
Was this helpful?
Python implementation of Divisive iK-means (DiviK) algorithm.
Last updated
Was this helpful?
Clustering at your command line with
Set of algorithm implementations for unsupervised analyses
- hands-free clustering method with built-in feature selection
for selecting the number of clusters
for selecting the number of clusters
Modular with custom distance metrics and initializations
meta-clustering
data-driven feature selection
- allows you to select highly variant features above noise level, based on GMM-decomposition
- allows you to select highly variant features above noise level, based on outlier detection
- allows you to select highly variant features above noise level with your predefined thresholds for each
- generates samples of fixed number of rows from given dataset, preserving groups proportion
- generates samples of random observations within boundaries of an original dataset, and preserving the rotation of the data
- generates samples of random observations within boundaries of an original dataset
To install latest stable version use:
Prerequisites for installation of base package:
Python 3.6 / 3.7 / 3.8
compiler capable of compiling the native C code and OpenMP support
You should have it already installed with GCC compiler, but if somehow not, try the following:
OpenMP is available as part of LLVM. You may need to install it with conda:
Having prerequisites installed, one can install latest base version of the package:
Note: Remember about \
before [
and ]
in zsh
shell.
You can install all extras with:
set all parameters named n_jobs
to 1
;
set all parameters named allow_dask
to True
.
Note: Never set n_jobs>1
and allow_dask=True
at the same time, the computations will freeze due to how multiprocessing
and dask
handle parallelism.
It can happen if the he gamred_native
package (part of divik
package) was compiled with different numpy ABI than scikit-learn. This could happen if you used different set of compilers than the developers of the scikit-learn package.
In such a case, a handler is defined to display the stack trace. If the trace comes from _matlab_legacy.py
, the most probably this is the issue.
To resolve the issue, consider following the installation instructions once again. The exact versions get updated to avoid the issue.
Contribution guide will be developed soon.
Format the code with:
The recommended way to use this software is through . This is the most convenient way, if you want to use divik
application.
If you want to have compatibility with , you can install necessary extras with:
If you are using DiviK to run the analysis that could fail to fit RAM of your computer, consider disabling the default parallelism and switch to . It's easy to achieve through configuration:
This software is part of contribution made by , rest of which is published .