The data consists of a two-dimensional array of categorical or continuous data points for each field also called features that have a great impact on the dependent variables that need to be predicted using any Machine learning models. Mindless about the different features present in the dataset, how we make the best out of the data is what affects the performance of the forecasting model on a large basis.
- Extra Tree Classifier
- Pearson correlation
- Forward selection
- Logit (Logistic Regression model)
Extra Tree Classifier
X=heart.iloc[:,0:12] Y=heart['DEATH_EVENT'] from sklearn.ensemble import ExtraTreesClassifier model = ExtraTreesClassifier() model.fit(X, Y) print(model.feature_importances_) feat_importances = pd.Series(model.feature_importances_, index=X.columns) feat_importances.nlargest(13).plot.bar() plt.show() list1=feat_importances.keys().to_list()
#Correlation with the output variable cor_target = abs(corr["DEATH_EVENT"]) #Selecting highly correlated features relevant_features = cor_target[cor_target>0.2] list4=relevant_features.keys().to_list() list4
chi2_features = SelectKBest(chi2, k = 6) X_kbest_features = chi2_features.fit_transform(X, Y) mask=chi2_features.get_support() new_feature= for bool,feature in zip(mask,X.columns): if bool: new_feature.append(feature) list3=new_feature list3
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier from sklearn.metrics import roc_auc_score from mlxtend.feature_selection import SequentialFeatureSelector forward_feature_selector = SequentialFeatureSelector(RandomForestClassifier(n_jobs=-1), k_features=6, forward=True, verbose=2, scoring='roc_auc', cv=4) fselector = forward_feature_selector.fit(X,Y)
import statsmodels.api as sm logit_model=sm.Logit(Y,X) result=logit_model.fit() print(result.summary2())
You can reach out to me through LinkedIn
The media shown in this article on feature selection methods are not owned by Analytics Vidhya and is used at the Author’s discretion.