Usage
Installation
To use Rel-AI, first make sure you have the latest version of pip installed
(.venv) $ pip install --upgrade pip
Then install it using pip:
(.venv) $ python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ReliabilityPackage
Usage - Classification Problem
Here’s a simple example of usage of RelAI for a typical classification problem, with the breast_cancer dataset of sklearn.
import the needed functions from the package
from ReliabilityPackage.ReliabilityFunctions import *
Import all the other relevant packages
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import plotly.offline as pyo
load the breast cancer dataset and split it in a training, a validation, and a test set
X, y = datasets.load_breast_cancer(return_X_y=True)
X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)
Train a classifier on the training set
clf = RandomForestClassifier(random_state=42, min_samples_leaf=10, n_estimators=100)
clf.fit(X_train, y_train)
5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)
ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)
Generate the dataset of the synthetic points and their associated values of accuracy
syn_pts, acc_syn_pts = generate_synthetic_points(problem_type = 'classification', predict_func=clf.predict, X_train=X_train, y_train=y_train, method='GN', k=5)
7. Define a Mean Squared Error threshold and an Accuracy threshold
(the mse_threshold_plot can be generated to see how the performances change based on percentiles of the MSE of the validation set)
fig_mse_thresh = mse_threshold_plot(ae, X_val, y_val, clf.predict, metric = 'balanced_accuracy')
fig_mse_thresh.show()
mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
acc_thresh = 0.90
Generate an instance of the ReliabilityDetector class for classification problems
RD = create_reliability_detector('classification', ae, syn_pts, acc_syn_pts, mse_thresh=mse_thresh, perf_thresh=acc_thresh, proxy_model="MLP")
It is now possible to compute the Reliability of the test_set
test_reliability= compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(test_reliability == 1)]
unreliable_test = X_test[np.where(test_reliability == 0)]
Usage - Regression Problem
Here’s a simple example of usage of RelAI for a regression problem generated through the make_regression function of sklearn.
import the needed functions from the package
from ReliabilityPackage.ReliabilityFunctions import *
Import all the other relevant packages
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
Generate a random regression dataset and split it in a training, a validation, and a test set
X, y = make_regression(n_samples=1000, n_features=20, noise=1, random_state=42)
X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)
Train a linear regressor on the training set
reg = LinearRegression().fit(X_train, y_train)
5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)
ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)
Generate the dataset of the synthetic points and their associated values of Mean Squared Error
syn_pts, mse_syn_pts = generate_synthetic_points(problem_type = 'regression', predict_func=reg.predict, X_train=X_train, y_train=y_train, method='GN', k=5)
Define a Mean Squared Error threshold for the Density Principle and a performance threshold for the Local Fit Principle (MSE as the performance metric for the Local Fit Principle)
mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
performance_thresh = 0.8
Generate an instance of the ReliabilityDetector class for regression problems
RD = create_reliability_detector('regression', ae, syn_pts, mse_syn_pts, mse_thresh=mse_thresh, perf_thresh=performance_thresh, proxy_model="MLP")
It is now possible to compute the Reliability of the test_set
test_reliability= compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(test_reliability == 1)]
unreliable_test = X_test[np.where(test_reliability == 0)]