Add Q&A Entry Regarding Feature Importance Computation in Khiops Sklearn Estimators

The Q&A should be a consequence issue of https://github.com/KhiopsML/khiops-python/issues/480 mitigation. It should:
- explain why we cannot use an attribute like `feature_importances_` on Khiops Sklearn estimators (see https://github.com/KhiopsML/khiops-python/issues/480#issuecomment-3382572200):
   - as Khiops uses part of the features in the input dataset (after feature selection in the preprocessing phase), plus the constructed features (feature pairs, trees, multi-table features or features derived via rules), importances (as averaged Shapley values over the training dataset) are computed for these used features, which only *partially* overlap with the input features.
   - as a result, it is impossible to provide, in practice and in general, a `feature_importances_` estimator attribute that abides by the expectations of Scikit-learn, i.e. contain importances of all the input features and only those.
   - consequently, providing such an attribute would only make sense in very particular cases:
      - using only monotable training datasets (which would preclude the application of multi-table specific feature construction rules);
      - forbidding the construction of variable pairs, trees, and text features; while this is possible, this would deny much of the strength of the Khiops models, resulting in potentially subpar predictors (with respect to the achievable potential);
   - as this would seem very limiting and rather confusing, it seems better not to provide the `feature_importances_` attribute directly (and to rely on the `model_report_` attribute instead, in order to retrieve the importances of the variables that we wish).

- show how the `model_report_` `KhiopsPredictor` attribute can be used to determine the importances of evaluated and selected features, with the caveat of the relevance of such an approach in just a few limit cases (as stated above);

- explain and show how feature importance can be given a consistent meaning in the Khiops context by:
   -  using the Core API to use train_recoder in order to "flatten" a multi-table dataset, then train_predictor to build an SNB predictor, then to set as "unused" in the encoder model the variables that are "unused" variables in the predictor model.
   - using the flattened dataset representation (via the encoder - see point above) as input to a custom subclass of `KhiopsPredictor` which provides the `feature_importances_` and `feature_names_in_` attributes as explained above; the relevant features are the input features, and their importances are all non-zero because the input only uses "used" features in the predictor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Q&A Entry Regarding Feature Importance Computation in Khiops Sklearn Estimators #575

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Q&A Entry Regarding Feature Importance Computation in Khiops Sklearn Estimators #575

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions