# Summary

# Machine Learning Models Cheat Sheet

## Supervised Models

This is a small revision on advantages and disadvantages of each model, based on suggested models of Udacity’s Nanodegree in Machine Learning Engineer.

### Logistic Regression

#### Advantages

- Don’t have to worry about features being correlated
- You can easily update your model to take in new data (unlike Decision Trees or SVM)

#### Disadvantages

- Deals bad with outliers
- Must have lots of incomes for each class
- Presence of multicollinearity

### Decision Tree

#### Advantages

- Easy to understand and interpret (for some people)
- Easy to use - Doesn’t need data normalisation, dummy variables, etc
- Can handle multi-output models
- Easily handle feature interactions
- Don’t have to worry about outliers

#### Disadvantages

- It can be easily overfitted
- Stability —> small changes in data can lead to completely different trees
- If a class dominates, it can easily be biased
- Don’t support online learning –> you should rebuilt the tree when new data comes

### Ensemble Methods

#### Advantages

- Harder to overfit
- Usually better perfomance than a single model

#### Disadvantages

- Scaling —> usually it trains several models, which can have a bad performance with larger datasets
- Hard to implement in real time platform
- Complexity increases
- Boosting delivers poor probability estimates (https://arxiv.org/ftp/arxiv/papers/1207/1207.1403.pdf)

### K-nearest Neighbors

#### Advantages

- Little training time
- Works well with multiclass datasets
- Good for highly unusual data

#### Disadvantages

- Need to determine value of k (distance)
- Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data
- The accuracy of KNN can be severely degraded with high-dimension data because there is little difference between the nearest and farthest neighbor.

### Gaussian Naive Bayes

#### Advantages

- Need less training data tran models like logistic regression
- Highly scalable
- Not sensitive to irrelevant features
- Returns the degree of certanty of the answer
- Good when you need something fast and that perfoms well

#### Disavantages

- Can’t learn interactions between features e.g., it can’t learn that although you love movies with Brad Pitt and Tom Cruise, you hate movies where they’re together).

### SVM

#### Advantages

- High accuracy
- Nice theoretical guarantees regarding overfitting
- Especially popular in text classification problems

#### Disavantages

- Memory-intensive
- Hard to interpret
- Complicated to run and tune

### Stochastic Gradient Descent

#### Advantages

- Efficiency
- Ease implementation

#### Disavantages

- A lot of hyperparameters to tune
- Sensitive to feature scaling

## Unupervised Models

### KMeans

#### Advantages

- Good when you have an idea of an ideal number of clusters
- Can scale well with lots of samples, scale medium with number of clusters

#### Disadvantages

- Doesn’t handle missing values very well
- Can’t find clusters that aren’t circular or spherical

#### Choosing the value of K

For choosing the value of k cluster we can use the elbow method:

```
from sklearn.clusters import Kmeans
from sklearn.metrics import silhouette_score
X = pd.DataFrame(...)
possible_k_values = range(2, len(X)+1, 5)
scores = []
for k in possible_k_values:
model = Kmeans(n_clusters=k).fit(X)
prediction = model.predict(X)
score = silhouette_score(X, predictions)
scores.append((k, score))
```

Then find the best numbers of clusters by choosing a k that has a lower score of errors but can still be good enough for your problem.

### Hierarchical Clustering

#### Advantages

- Resulting hierarchical representation can be very informative
- Provides an additional ability to visualize
- Especially potent when the dataset contains real hierarchical relationship (e.g. Evolutionary biology)

#### Disadvantages

- Sensitive to noise and outliers
- Computationally intensive O(N^2)

#### Implementation on Sklearn

```
from sklearn import cluster
X = pd.DataFrame(...)
cls = cluster.AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = cls.predict(X)
```

#### Get a dendrogram from a hierarchical clustering

```
from scipy.cluster.hierarchy import dendogram, ward
import matplotlib.pyplot as plt
X = pd.DataFrame(...)
linkage_matrix = ward(X)
dendogram(linkage_matrix)
plt.show()
```

### DBSCAN

#### Advantages:

- We don’t need to specify the number of clusters
- Flexibility in shapes and sizes of clusters
- Able to deal with noise and outliers

#### Disadvantages

- Border points that are reachable from two clusters is assigned to the cluster that finds it first
- Faces difficulty finding clusters of varying densities

#### Tips:

- Small min samples and small episilon results in many small clusters
- Small min samples and large episilon results in most points being on the same cluster
- Large min samples results in most of points being classified as noise, except on desen regions when episilon is high
- Do not use silhouetter coefficient to test this model! Recomendado

### Gaussian Mixture Model

#### Advantages

- Soft-clustering (you can see percentages of cluster participation on each sample)
- Cluster shape flexibility

#### Disadvantages

- Sensitive to initialization values
- Possible to converge to a local optimum
- Slow convergence rate

## General References

- Choosing a machine learning classifier
- 1
- Sklearn documentation on Neighbors
- 3
- Sklearn documentation on Stochatic Gradient Descent
- Sklearn documentation on Ensemble Methods
- Logistic Regression Wikipedia
- Logistic Regression for machine learning
- What are the advantages of logistic regression
- The disadvantages of Logistic Regression