Skip to content

Latest commit

 

History

History
299 lines (216 loc) · 12.3 KB

File metadata and controls

299 lines (216 loc) · 12.3 KB

Explainable AI Techniques for UCI Adult Income Dataset Using Python

Slide 1: Introduction to Explainable AI (XAI)

Explainable AI (XAI) refers to methods and techniques that make AI systems' decisions more transparent and interpretable to humans. As machine learning models become increasingly complex, understanding their decision-making process becomes crucial for trust, accountability, and regulatory compliance. This slideshow will explore XAI techniques using the UCI Adult Income dataset.

# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import lime
import lime.lime_tabular

# Print XAI definition
print("Explainable AI (XAI): Methods and techniques that make AI systems' decisions more transparent and interpretable to humans.")

Slide 2: Loading the UCI Adult Income Dataset

The UCI Adult Income dataset is commonly used for binary classification tasks. It contains demographic information about individuals and predicts whether their income exceeds $50,000 per year. We'll use this dataset to demonstrate XAI techniques.

# Load the UCI Adult Income dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
column_names = ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']
data = pd.read_csv(url, names=column_names, skipinitialspace=True)

print(data.head())
print(f"Dataset shape: {data.shape}")

Slide 3: Data Preparation

Before applying XAI techniques, we need to preprocess the data. This involves handling missing values, encoding categorical variables, and splitting the data into training and test sets.

# Handle missing values
data = data.replace('?', np.nan).dropna()

# Encode categorical variables
categorical_columns = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
data_encoded = pd.get_dummies(data, columns=categorical_columns)

# Encode target variable
data_encoded['income'] = data_encoded['income'].map({' <=50K': 0, ' >50K': 1})

# Split the data
X = data_encoded.drop('income', axis=1)
y = data_encoded['income']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Preprocessed data shape:", X.shape)
print("Target variable distribution:")
print(y.value_counts(normalize=True))

Slide 4: Training a Random Forest Classifier

We'll train a Random Forest classifier on our preprocessed data. Random Forests are ensemble learning methods that construct multiple decision trees and merge them to get a more accurate and stable prediction.

# Train a Random Forest classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Evaluate the model
train_accuracy = rf_classifier.score(X_train, y_train)
test_accuracy = rf_classifier.score(X_test, y_test)

print(f"Train accuracy: {train_accuracy:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

Slide 5: Introduction to SHAP (SHapley Additive exPlanations)

SHAP is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.

# Create a SHAP explainer
explainer = shap.TreeExplainer(rf_classifier)

# Calculate SHAP values for a subset of the test data
shap_values = explainer.shap_values(X_test.iloc[:100])

# Plot summary of SHAP values
shap.summary_plot(shap_values[1], X_test.iloc[:100], plot_type="bar")

Slide 6: SHAP Feature Importance

SHAP values provide a measure of feature importance. The higher the absolute SHAP value for a feature, the more impact it has on the model's prediction.

# Calculate mean absolute SHAP values
mean_shap = np.abs(shap_values[1]).mean(axis=0)
feature_importance = pd.DataFrame(list(zip(X.columns, mean_shap)), columns=['feature', 'shap_importance'])
feature_importance = feature_importance.sort_values('shap_importance', ascending=False)

print(feature_importance.head(10))

# Plot feature importance
feature_importance.plot(x='feature', y='shap_importance', kind='bar', figsize=(12, 6))

Slide 7: SHAP Force Plot

SHAP force plots show how each feature contributes to pushing the model output from the base value to the final prediction for a single instance.

# Generate a force plot for a single instance
instance = X_test.iloc[0]
instance_shap_values = explainer.shap_values(instance)

shap.force_plot(explainer.expected_value[1], instance_shap_values[1], instance, matplotlib=True)

Slide 8: Introduction to LIME (Local Interpretable Model-agnostic Explanations)

LIME explains the predictions of any classifier by approximating it locally with an interpretable model. It helps understand why a model made a specific prediction for an instance.

# Create a LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns,
    class_names=['<=50K', '>50K'],
    mode='classification'
)

# Generate a LIME explanation for a single instance
instance = X_test.iloc[0]
lime_exp = lime_explainer.explain_instance(instance.values, rf_classifier.predict_proba, num_features=10)

# Plot the LIME explanation
lime_exp.as_pyplot_figure()

Slide 9: LIME Feature Importance

LIME provides feature importance scores for individual predictions. These scores indicate how much each feature contributes to the prediction for a specific instance.

# Get feature importance from LIME explanation
lime_feature_importance = lime_exp.as_list()

# Plot LIME feature importance
import matplotlib.pyplot as plt

features, scores = zip(*lime_feature_importance)
plt.figure(figsize=(10, 6))
plt.barh(features, scores)
plt.title('LIME Feature Importance for Single Instance')
plt.xlabel('Importance Score')
plt.ylabel('Feature')
plt.show()

Slide 10: Comparing SHAP and LIME

Both SHAP and LIME provide local explanations, but they use different approaches. SHAP is based on game theory and provides consistent global interpretations, while LIME focuses on local linear approximations.

# Compare SHAP and LIME explanations for a single instance
instance = X_test.iloc[0]
shap_values_instance = explainer.shap_values(instance)[1]
lime_exp = lime_explainer.explain_instance(instance.values, rf_classifier.predict_proba, num_features=10)

# SHAP values
shap_df = pd.DataFrame(list(zip(X.columns, shap_values_instance)), columns=['Feature', 'SHAP Value'])
shap_df = shap_df.sort_values('SHAP Value', key=abs, ascending=False).head(10)

# LIME values
lime_df = pd.DataFrame(lime_exp.as_list(), columns=['Feature', 'LIME Value'])

print("Top 10 SHAP values:")
print(shap_df)
print("\nTop 10 LIME values:")
print(lime_df)

Slide 11: Real-life Example: Healthcare Diagnosis

In healthcare, XAI techniques can be used to explain diagnostic decisions made by AI systems. For instance, a model predicting the likelihood of a patient having a certain disease based on various health indicators can be interpreted using SHAP or LIME.

# Simulating a healthcare diagnosis scenario
health_data = pd.DataFrame({
    'age': [45], 'blood_pressure': [130], 'cholesterol': [220],
    'bmi': [28], 'glucose': [100], 'smoking': [0], 'alcohol': [1],
    'exercise': [2], 'family_history': [1]
})

# Assume we have a trained model 'health_model'
# health_model = trained_model_here

# LIME explanation
health_explainer = lime.lime_tabular.LimeTabularExplainer(
    health_data.values,
    feature_names=health_data.columns,
    class_names=['Low Risk', 'High Risk'],
    mode='classification'
)

# exp = health_explainer.explain_instance(health_data.iloc[0].values, health_model.predict_proba, num_features=5)
# exp.as_pyplot_figure()

print("In a real scenario, LIME would explain which factors (e.g., high cholesterol, lack of exercise) contribute most to the diagnosis.")

Slide 12: Real-life Example: Customer Churn Prediction

XAI techniques can be valuable in customer relationship management, particularly for explaining churn predictions. This helps businesses understand why a customer might be likely to leave and take appropriate retention actions.

# Simulating a customer churn prediction scenario
customer_data = pd.DataFrame({
    'tenure': [24], 'monthly_charges': [65.0], 'total_charges': [1560.0],
    'contract_type': ['Month-to-month'], 'internet_service': ['Fiber optic'],
    'online_security': ['No'], 'tech_support': ['No'], 'payment_method': ['Electronic check']
})

# Assume we have a trained model 'churn_model'
# churn_model = trained_model_here

# SHAP explanation
# explainer = shap.TreeExplainer(churn_model)
# shap_values = explainer.shap_values(customer_data)

# shap.force_plot(explainer.expected_value[1], shap_values[1], customer_data)

print("In a real scenario, SHAP would show how factors like contract type and lack of additional services contribute to churn risk.")

Slide 13: Challenges and Limitations of XAI

While XAI techniques provide valuable insights, they also have limitations. It's important to be aware of these challenges when applying XAI in practice.

  1. Model Complexity: As models become more complex, explanations may become less intuitive.
  2. Stability: Explanations can vary for similar instances, potentially leading to inconsistent interpretations.
  3. Computational Cost: Generating explanations, especially for large datasets, can be computationally expensive.
  4. Human Interpretability: Even with XAI, some explanations may still be difficult for non-experts to understand.
# Demonstrating explanation stability
instance1 = X_test.iloc[0]
instance2 = X_test.iloc[1]

# LIME explanations
exp1 = lime_explainer.explain_instance(instance1.values, rf_classifier.predict_proba, num_features=5)
exp2 = lime_explainer.explain_instance(instance2.values, rf_classifier.predict_proba, num_features=5)

print("LIME explanation for instance 1:")
print(exp1.as_list())
print("\nLIME explanation for instance 2:")
print(exp2.as_list())

print("\nNote how explanations can differ even for instances with similar predictions.")

Slide 14: Future Directions in XAI

The field of XAI is rapidly evolving. Some promising future directions include:

  1. Causal Explanations: Incorporating causal inference to provide more meaningful explanations.
  2. Interactive Explanations: Developing tools that allow users to interactively explore model behavior.
  3. Standardization: Establishing industry standards for XAI techniques and their evaluation.
  4. Multidisciplinary Approach: Integrating insights from cognitive science, psychology, and human-computer interaction to improve explanation quality.
# Simulating an interactive explanation tool
def interactive_explanation(instance, feature_to_change):
    original_prediction = rf_classifier.predict_proba(instance.values.reshape(1, -1))[0][1]
    
    modified_instance = instance.()
    modified_instance[feature_to_change] *= 1.1  # Increase feature value by 10%
    
    new_prediction = rf_classifier.predict_proba(modified_instance.values.reshape(1, -1))[0][1]
    
    print(f"Original probability of income >50K: {original_prediction:.4f}")
    print(f"After increasing {feature_to_change} by 10%: {new_prediction:.4f}")
    print(f"Change in probability: {new_prediction - original_prediction:.4f}")

# Example usage
instance = X_test.iloc[0]
interactive_explanation(instance, 'age')

Slide 15: Additional Resources

For those interested in diving deeper into Explainable AI, here are some valuable resources:

  1. "A Survey of Methods for Explaining Black Box Models" by Guidotti et al. (2018) ArXiv: https://arxiv.org/abs/1802.01933
  2. "Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI" by Arrieta et al. (2020) ArXiv: https://arxiv.org/abs/1910.10045
  3. "Explainable Machine Learning for Scientific Insights and Discoveries" by Roscher et al. (2020) ArXiv: https://arxiv.org/abs/1905.08883

These papers provide comprehensive overviews of XAI techniques, their applications, and future research directions.