Usage ===== .. _installation: Installation ------------ To use Rel-AI, first make sure you have the latest version of pip installed .. code-block:: console (.venv) $ pip install --upgrade pip Then install it using pip: .. code-block:: console (.venv) $ python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ReliabilityPackage Usage - Classification Problem ---------------- Here's a simple example of usage of ``RelAI`` for a typical classification problem, with the ``breast_cancer`` dataset of ``sklearn``. 1. import the needed functions from the package .. code-block:: console from ReliabilityPackage.ReliabilityFunctions import * 2. Import all the other relevant packages .. code-block:: console import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier import plotly.offline as pyo 3. load the breast cancer dataset and split it in a training, a validation, and a test set .. code-block:: console X, y = datasets.load_breast_cancer(return_X_y=True) X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42) 4. Train a classifier on the training set .. code-block:: console clf = RandomForestClassifier(random_state=42, min_samples_leaf=10, n_estimators=100) clf.fit(X_train, y_train) 5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture) .. code-block:: console ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000) 6. Generate the dataset of the synthetic points and their associated values of accuracy .. code-block:: console syn_pts, acc_syn_pts = generate_synthetic_points(problem_type = 'classification', predict_func=clf.predict, X_train=X_train, y_train=y_train, method='GN', k=5) 7. Define a Mean Squared Error threshold and an Accuracy threshold (the ``mse_threshold_plot`` can be generated to see how the performances change based on percentiles of the MSE of the validation set) .. code-block:: console fig_mse_thresh = mse_threshold_plot(ae, X_val, y_val, clf.predict, metric = 'balanced_accuracy') fig_mse_thresh.show() mse_thresh = perc_mse_threshold(ae, X_val, perc=95) acc_thresh = 0.90 8. Generate an instance of the ReliabilityDetector class for classification problems .. code-block:: console RD = create_reliability_detector('classification', ae, syn_pts, acc_syn_pts, mse_thresh=mse_thresh, perf_thresh=acc_thresh, proxy_model="MLP") 9. It is now possible to compute the Reliability of the test_set .. code-block:: test_reliability= compute_dataset_reliability(RD, X_test, mode='total') reliable_test = X_test[np.where(test_reliability == 1)] unreliable_test = X_test[np.where(test_reliability == 0)] Usage - Regression Problem ---------------- Here's a simple example of usage of ``RelAI`` for a regression problem generated through the ``make_regression`` function of ``sklearn``. 1. import the needed functions from the package .. code-block:: console from ReliabilityPackage.ReliabilityFunctions import * 2. Import all the other relevant packages .. code-block:: console import numpy as np from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression 3. Generate a random regression dataset and split it in a training, a validation, and a test set .. code-block:: console X, y = make_regression(n_samples=1000, n_features=20, noise=1, random_state=42) X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42) X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42) 4. Train a linear regressor on the training set .. code-block:: console reg = LinearRegression().fit(X_train, y_train) 5. Create and train an autoencoder for the implementation of the Density Principle (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture) .. code-block:: console ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000) 6. Generate the dataset of the synthetic points and their associated values of Mean Squared Error .. code-block:: console syn_pts, mse_syn_pts = generate_synthetic_points(problem_type = 'regression', predict_func=reg.predict, X_train=X_train, y_train=y_train, method='GN', k=5) 7. Define a Mean Squared Error threshold for the Density Principle and a performance threshold for the Local Fit Principle (MSE as the performance metric for the Local Fit Principle) .. code-block:: console mse_thresh = perc_mse_threshold(ae, X_val, perc=95) performance_thresh = 0.8 8. Generate an instance of the ReliabilityDetector class for regression problems .. code-block:: console RD = create_reliability_detector('regression', ae, syn_pts, mse_syn_pts, mse_thresh=mse_thresh, perf_thresh=performance_thresh, proxy_model="MLP") 9. It is now possible to compute the Reliability of the test_set .. code-block:: test_reliability= compute_dataset_reliability(RD, X_test, mode='total') reliable_test = X_test[np.where(test_reliability == 1)] unreliable_test = X_test[np.where(test_reliability == 0)]