Series
How to change a Series type?
import pandas as pd
serie = pd.Series([1, 2, 3, 4])
series.astype(float)
How to apply a function to every item of my Serie?
import pandas as pd
serie = pd.Series(['a', 'b', 'b', 'a'])
series.apply(lambda x: 0 if x=='a' else 1)
How to prepare my DataFrame to apply get_dummies?
import pandas as pd
X = pd.read_csv(..)
categorical = ['x1', 'x2', 'x4'] # columns that have categorical features in your X
for cat in categorical:
X[cat] = X[cat].astype(object)
X_dummy = pd.get_dummies(X)
read_csv errors of encoding
Usually you can read a csv just by doing something like:
pd.read_csv('file.csv')
Sometimes, an encoding error appears. The first option is to pass ‘utf8’ as a value of
the parameter encoding
.
pd.read_csv('file.csv', encoding='utf8')
But there are some cases where this is not enough and the following error keeps appearing:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 4: invalid continuation byte
The only thing that could resolve this was:
pd.read_csv('file.csv', encoding='latin-1')
Sum values of all columns
df.sum(axis=1)
Use apply for multiple columns
def my_function(a, b):
return a + b
df.apply(lambda row: my_function(row['a'], row['b']), axis=1)