Understand Machine Learning Easily Using Python Shapash Library
Topics to be covered
- What is Shapash library
- The objective of Shapash library
- Features of Shapash library
- How does Shapash Work
- Getting Started
What is Shapash library?
Model interpretability and intelligibility have been the point of convergence of many research papers and open source commitments. Be that as it may, a large portion of these is equipped with data experts and trained professionals. Shapash is a Python library to imagines AI models’ dynamic interaction. It expects to make AI models reliable for everybody by making them more straightforward and straightforward. Shapash makes straightforward visualizations of global and local reasonableness.
It additionally works with making a web application that can give a great deal of significant worth to end clients and entrepreneurs. Shapash is viable with most sklearn, lightgbm, xgboost, catboost models and can be utilized for classification and regression tasks. It utilizes a Shap backend to figure the local commitment of features, however, this can be supplanted with some other strategy for computing local commitments. Data scientists can utilize the Shapash explainer for investigating and troubleshooting their models or deploy to furnish visualizations with each surmising.
Objective of Shapash
1. To show clear and reasonable outcomes: Plots and outputs utilize labels for each component and its modalities:
2. To permit Data Scientists to rapidly comprehend their models by utilizing a web app to effortlessly explore among worldwide and neighborhood logic, and see how the various highlights contribute:
3. To Summarize and export the nearby explanation: Shapash proposes a short and clear nearby explanation. It permits every client, whatever their Data background is, to comprehend a nearby expectation of a managed model, because of a summed up and clear clarification of Shapash Features.
4. Full Data science Report
Check the report example here
Some of the features of Shapash are shown below:
- Machine Learning models: It works with both classification (Binary or Multiclass problems) and Regression problems. It supports many models like Catboost, Xgboost, LightGBM, Sklearn Ensemble, Linear models, and SVM.
- Feature Encoding: It supports a large number of encoding techniques to handle categorical features in our dataset like One Hot Encoding, Ordinal Encoding, Base N Encoding, Target Encoding, or Binary Encoding, etc.
- SklearnColumnTransformer: OneHotEncoder, OrdinalEncoder, StandardScaler, QuantileTransformer or PowerTransformer
- Visualizations: Provides a set of visuals to easily interpret your results. Display understandable and clear results.
- It is compatible with Lime and Shap. It uses Shap backend to show results in just a few lines of code.
- It provides a lot of options for parameters to get your results concisely.
- Shapash is not difficult to install and utilize: It gives a SmartExplainer class to comprehend your model and sum up clarification with straightforward syntax.
- Deployment: Important for investigation and deployment(through an API or in Batch mode) for operational use. Easy web app creation to navigate from global to local.
- High versatility: Very couple of contentions are needed to show results. However, the more you work on cleaning and archiving the data, the more clear the outcomes will be for the end client.
How does Shapash Work
Shapash is a package that makes machine learning understandable and interpretable. Data Enthusiasts can understand their models easily and at the same time can share them. Shapash uses Lime and Shap as a backend to show results in just a few lines of code. Shapash depends on the various advances important to build a Machine Learning model to make the outcomes reasonable. The below image shows the working of the shapash package:
It works on the following principle:
- First, it compiles the elements of each step like Data Preparation, Feature Engineering, Model Fitting, model Evaluation and Model Understanding
- Secondly, It provides a WebApp and charts to understand the model better. We can share and discuss our results with clients.
- Lastly, It provides you with a summary of explainability.
Shapash can be installed using the below code:
pip install shapash
For Jupyter notebook: If you are using jupyter notebook and you want to see inline graphs, then you need to use one more command other than the above command:
pip install ipywidgets
In this section, we are going to explore Shapash by using the dataset House Prices Prediction. It is a regression problem where we have to predict house prices. The link for the dataset is here. First, let’s analyse the dataset including univariate and bivariate analysis then model explainability using Feature Importance, Feature Contribution, Local and Compare plots after that Model performance and finally WebApp.
Use can see the below image for a feature named First Floor Square Feet. We can see a table that shows broad statistics of our train and test dataset like mean, max, min, standard deviation, median, and many more. On the right-hand side, we can see distribution graphs for both training and test datasets. Shapash also mentions whether our feature is categorical or numerical and it also gives you the option of the dropdown. In the drop-down, all the features are available.
For categorical features, distinct values and missing values are shown for the training and test dataset. On the right-hand side, a bar plot is shown with the respective percentage of category in a feature.
We can also see the detailed analysis of our target variable called Sales Price. On the left-hand side, all the statistics are shown like count, mean, standard deviation, min, max, median, and many more for both training and prediction datasets. On the right-hand side, a distribution is shown for both training and prediction datasets.
In the previous section, we had a detailed discussion on univariate analysis. In this section, we are going to see Multivariate Analysis. The below image shows the Correlation Matrix of the top 20 features for both training and testing datasets. A correlation scale is also shown based on different colors. That’s how we can visualize the relation between features using Shapash.
Feature Importance Plot
By using this package, we can see the feature’s importance. Feature importance is a way to find the importance of input features in predicting the output value. The below image shows the feature importance curve
Feature Contribution Plot
These curves help us to answer the questions like how does a feature impacts my prediction, does it contribute positively or negatively, etc. This plot finishes the significance of the highlights for the interpretability, the global coherence of the model to all the more likely comprehend the impact of a feature on a model.
We can see the contribution plots for both numerical and categorical features.
For numerical feature:
For categorical feature:
We can draw Local plots. The Below image shows the local plot:
We can draw Compare plots. The below image shows the compare plot:
After Data Analysis, we are training the Machine learning model. The below image shows the output of our prediction. On the left-hand side, statistics like count, min, max, median, standard deviation etc are shown for true values and predicted values. On the right-hand side, a distribution is shown for both predicted and actual values.
After a model training, we can build a WebApp as well. This web app shows a complete dashboard of our data including the things which we have covered till now. The below image shows the dashboard.
For more information, check this Link
In this blog, we have studied Shapash in a detailed discussion. You can try this library to automate machine learning tasks and save your time.
You can check my articles here: Articles
Thanks for reading this article and for your patience. Do let me in the comment section. Share this article, it will give me the motivation to write more blogs for the data science community.
Email id: firstname.lastname@example.org
Follow me on LinkedIn: LinkedIn
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.