Usage

Installation

To use Rel-AI, first make sure you have the latest version of pip installed

(.venv) $ pip install --upgrade pip

Then install it using pip:

(.venv) $ python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ReliabilityPackage

Usage - Classification Problem

Here’s a simple example of usage of RelAI for a typical classification problem, with the breast_cancer dataset of sklearn.

import the needed functions from the package

from ReliabilityPackage.ReliabilityFunctions import *

Import all the other relevant packages

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import plotly.offline as pyo

load the breast cancer dataset and split it in a training, a validation, and a test set

X, y = datasets.load_breast_cancer(return_X_y=True)
X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)

Train a classifier on the training set

clf = RandomForestClassifier(random_state=42, min_samples_leaf=10, n_estimators=100)
clf.fit(X_train, y_train)

5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)

ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)

Generate the dataset of the synthetic points and their associated values of accuracy

syn_pts, acc_syn_pts = generate_synthetic_points(problem_type = 'classification', predict_func=clf.predict, X_train=X_train, y_train=y_train, method='GN', k=5)

7. Define a Mean Squared Error threshold and an Accuracy threshold (the mse_threshold_plot can be generated to see how the performances change based on percentiles of the MSE of the validation set)

fig_mse_thresh = mse_threshold_plot(ae, X_val, y_val, clf.predict, metric = 'balanced_accuracy')
fig_mse_thresh.show()

mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
acc_thresh = 0.90

Generate an instance of the ReliabilityDetector class for classification problems

RD = create_reliability_detector('classification', ae, syn_pts, acc_syn_pts, mse_thresh=mse_thresh, perf_thresh=acc_thresh, proxy_model="MLP")

It is now possible to compute the Reliability of the test_set

test_reliability= compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(test_reliability == 1)]
unreliable_test = X_test[np.where(test_reliability == 0)]

Usage - Regression Problem

Here’s a simple example of usage of RelAI for a regression problem generated through the make_regression function of sklearn.

import the needed functions from the package

from ReliabilityPackage.ReliabilityFunctions import *

Import all the other relevant packages

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

Generate a random regression dataset and split it in a training, a validation, and a test set

X, y = make_regression(n_samples=1000, n_features=20, noise=1, random_state=42)
X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)

Train a linear regressor on the training set

reg = LinearRegression().fit(X_train, y_train)

5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)

ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)

Generate the dataset of the synthetic points and their associated values of Mean Squared Error

syn_pts, mse_syn_pts = generate_synthetic_points(problem_type = 'regression', predict_func=reg.predict, X_train=X_train, y_train=y_train, method='GN', k=5)

Define a Mean Squared Error threshold for the Density Principle and a performance threshold for the Local Fit Principle (MSE as the performance metric for the Local Fit Principle)

mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
performance_thresh = 0.8

Generate an instance of the ReliabilityDetector class for regression problems

RD = create_reliability_detector('regression', ae, syn_pts, mse_syn_pts, mse_thresh=mse_thresh, perf_thresh=performance_thresh, proxy_model="MLP")

It is now possible to compute the Reliability of the test_set

test_reliability= compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(test_reliability == 1)]
unreliable_test = X_test[np.where(test_reliability == 0)]