This lesson is in the early stages of development (Alpha version)

Introduction to Machine Learning with Scikit Learn: Glossary

Key Points

Introduction
  • Machine learning is a set of tools and techniques that use data to make predictions.

  • Artificial intelligence is a broader term that refers to making computers show human-like intelligence.

  • Deep learning is a subset of machine learning.

  • All machine learning systems have limitations to be aware of.

Supervised methods - Regression
  • Scikit-Learn is a Python library with lots of useful machine learning functions.

  • Scikit-Learn includes a linear regression function.

  • Scikit-Learn can perform polynomial regressions to model non-linear data.

Supervised methods - Classification
  • Classification requires labelled data (is supervised)

Ensemble methods
  • Ensemble methods can be used to reduce under/over fitting training data.

Unsupervised methods - Clustering
  • Clustering is a form of unsupervised learning.

  • Unsupervised learning algorithms don’t need training.

  • Kmeans is a popular clustering algorithm.

  • Kmeans is less useful when one cluster exists within another, such as concentric circles.

  • Spectral clustering can overcome some of the limitations of Kmeans.

  • Spectral clustering is much slower than Kmeans.

  • Scikit-Learn has functions to create example data.

Unsupervised methods - Dimensionality reduction
  • PCA is a linear dimensionality reduction technique for tabular data

  • t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA

Neural Networks
  • Perceptrons are artificial neurons which build neural networks.

  • A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum.

  • A single perceptron can solve simple functions which are linearly separable.

  • Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable.

  • We can train a whole neural network with the back propagation algorithm. Scikit-learn includes an implementation of this algorithm.

  • Training a neural network requires some training data to show the network examples of what to learn.

  • To validate our training we split the training data into a training set and a test set.

  • To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation.

  • Deep learning neural networks are a very powerful modern machine learning technique. Scikit-Learn does not support these but other libraries like Tensorflow do.

  • Several companies now offer cloud APIs where we can train neural networks on powerful computers.

Ethics and the Implications of Machine Learning
  • The results of machine learning reflect biases in the training and input data.

  • Many machine learning algorithms can’t explain how they arrived at a decision.

  • Machine learning can be used for unethical purposes.

  • Consider the implications of false positives and false negatives.

Find out more
  • This course has only touched on a few areas of machine learning and is designed to teach you just enough to do something useful.

  • Machine learning is a rapidly evolving field and new tools and techniques are constantly appearing.

Glossary