Sklearn - My Cheat Sheet

Sometimes I get just really lost with all available commands and tricks one can make on sklearn. This way, I really wanted a place to gather my tricks that I really don’t want to forget.

How to check results of accuracy for each of the classes available on a classification problem?

from sklearn.metrics import classification_report

classification_report(y_test, y_pred, target_names=['A', 'B', 'C']

# Results:

             precision    recall    f1-score 

A              0.9         ...         ...
B              0.9         ...         ...
C              0.9         ...         ...
avg/total      0.9         ...         ...

How to normalize features?

from sklearn import preprocessing

normalized_X = preprocessing.normalize(X)

How to create a bag of words dataframe matrix

import pandas as pd
from sklearn.feature_extraction.text imprt CountVectorizer

documents = ['Hello, how are you!',
             'Win money, win from home.',
             'Call me now.',
             'Hello, Call hello you tomorrow?']

count_vector = CountVectorizer()

doc_array = count_vector.transform(documents).toarray()

freq_matrix = pd.DataFrame(doc_array, columns=count_vector.get_feature_name())

How to export a trained model

from sklearn.externals import joblib

joblib.dump(model, 'name.pkl')

# to read
model = joblib.load('name.pkl')