Machine Learning & NLTK Analyses: Extra Tree Classifier & Effect of parameters

Extra Trees Classifier

Extra Trees Classifier technique is tested here to see their accuracy in terms of output.

Python program:

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from matplotlib.colors import ListedColormap
>>> from sklearn import ensemble, datasets
>>> iris = datasets.load_iris()
>>> x = iris.data[:, :2]
>>> y = iris.target
>>> h = .02
>>> cmap_bold = ListedColormap(['firebrick', 'lawngreen', 'b'])
>>> cmap_light = ListedColormap(['pink', 'palegreen', 'lightcyan'])

//Plotting the analysis//
>>> for min_samples_leaf in [1, 8, 50, 250]:
... for n_estimators in [1, 12, 25, 125, 1250]:
... for n_jobs in [1, 12, 250, 1250, 7200]:
... for bootstrap in ['True']:
... for oob_score in ['False', 'True']:
... clf = ensemble.ExtraTreesClassifier(min_samples_leaf=min_samples_leaf, n_estimators=n_estimators, n_jobs=n_jobs, bootstrap=bootstrap, oob_score=oob_score)
... clf.fit(x, y)
... x_min, x_max = x[:, 0].min() -1, x[:, 0].max() +1
... y_min, y_max = x[:, 1].min() -1, x[:, 1].max() +1
... xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
... z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
... z = z.reshape(xx.shape)
... plt.figure()
... plt.pcolormesh(xx, yy, z, cmap=cmap_light)
... plt.scatter(x[:, 0], x[:, 1], c=y, cmap=cmap_bold, edgecolor='k', s=24)
... plt.xlim(xx.min(), xx.max())
... plt.ylim(yy.min(), yy.max())
... plt.title("ExtraTreesClassifier (min_samples_leaf='%s', n_estimators='%s', n_jobs='%s', bootstrap='%s', oob_score='%s')" %(min_samples_leaf, n_estimators, n_jobs, bootstrap, oob_score))
...
Output:
a) Effect of minimum sample leaf at node ends (min_samples_leaf) on output:

We can see that increase in number of value of minimum sample leaf at the nodes result in reduced analyses accuracy.

b) Effect of minimum samples for splitting at leaf nodes(min_samples_split) on output:

Analysis accuracy decreases as value at min_samples_split increases.

c) Effect of number of estimators(n_estimators) on output:

Increase in number of estimator results in increased Analyses accuracy.

Increase in number of jobs (n_jobs) with higher number of estimators (n_estimators=1250) does not result in improved accuracy.

d) Effect of number of jobs(n_jobs) on output: