AI in a nutshell

Neurons are computatonals Units of Cognition.

They are inspired by biological Neurons like the ones in the Human brain.

The human brain tries to learn prototypes of classes and learn the class mean, analogusly in a computer

Prototypes can be interpreted as means .

Buid a classification boundary with a line.

This method is often also called the Nearest Centroid Classifier

___________________________________________

Another Method for linear classification, the Perceptron

It is more advanced and better than the Protypes Method, because it uses a learning rate and a Error Function to learn.

Algo Overview:

The Actual Algo:

What makes this so Powerful, is :

Problems with the Nearest Centroid Classification

Since NNC is a linear classifier , it is bound to have Problems with non linear Data and correlted data .

Correlated Data makes prediction more difficult , however, we have a tool called LDA for dealing with the corrolation and decorrolate Data.

Applications : Handwritten Digit Recogition, Classification, Automatic character recognition

Supervised Linear Classification with Fisher's LDA

Kinda like PCA, but it focuses on maximizing the class seperability among known categories

LDA Creates a new Axis and maps the data to the new Axis, while minimizing Between class variance-> by maximizing mean Distance between both classes and minimize in class vriance

Decorrolate corrolated data and measure Class seperability.

We Do this by maximizing the Fisiher Criterion and setting the derivative to 0!.

If we Compare NNC and LDA on the same Dataset, we can see that LDA seperates the Classes better than NNC.

LDA Maximizes the difference Between the classes

LDA Minizies the differen INSIDE the classes!

Lda first decorrolates the data and then uses ne Nearest Centroid classification method.

LDA Algo:

Applications : Brain Computer Interface,

If data is Guassion with equal class covariances, then LDA is the optimal classifier.

Cross Validation

Cross Validation is a method of measuring the performance of an Classification Algorithm. Instead of using the whole dataset, we train the data only on one part of the data. Then we train the model on another disjunkt part of the data.

Repeat this process on different folds(disjunkt Sets) and calculate the average of the performance on those disjunkt folds.

Cross Validation Algo

Regression

Example of the most simple form of Linear Regression, Least Square Error, OLS

Applications of Ridge Regression: Stock Predictiom based on basis of company performance measures and economic data, predict crop production from weather variables, control robotic arm with electric activity measued on the arm,

Comparision of Supervised Algorithmns

Ridge Regression,

Regression for Datapoints in finite Label Space( Regression had Infinite possible Labes, now its Limited Again by a certain amount of classes that can be predicted)

Linear Regression is a generic framework for prediction straightforwardly extends to vector labels can model nonlinear dependencies between data and labels can be made more robust (Ridge Regression)

Applications : Myoelectric Control of Prostheses, Mind Controlled Robot Arms!

Kernel Methods

A trick for classifying linear non seperatable Data.

Calculations are done in a higher dimensional Space. Take data and project it to higher dimensional space, then compare it in this space (look for linear relationships) and use this knowledge to classify data . The Kernel is a measurement for the similarity of the data

Non linear Problems become linear in Kernel space

Popular Kernel Methods are the Linear Kernel, Polynomial Kernel and the Gaussian Kernel.

Neural Networks - Multilayer Neural Networks

Example of an One layer Network

As you can see, it only has 1 Layer of Input Nods, x1-x4, which all go into 1 Evaluation/Activation function which generates Label Y.

One-Layerd Neural Networks arenot powerful enough to solve non linear Problems,like :

Solution to Label non Linear Problems/Data?

Multilayer Netwroks!

Unsupervised Learning :

"Normal" neural Networks usually have one or two hidden layers and are used for supervised learning.

Deep Learning

neural network architectures differ from " normal " nueral networks because they have more hidden layers . One special difference is that deep learning Algorithmns can work Supervised or Unsupervised .

Deep Learning differe from "normal" neural networks, in that they have multiple HIDDEN layerrs and are used for Un

Application Conv Nets : Style Transfer, Manifold Transfer (Faceswap) ,

Application of Recurrent Neural Networks _ Generating Images and sound, generate Text, generate a picture from text , Force Estimation for Robitic Operation OP Arms, Visual Question Answering

Genaral Adversarial Networks (GANS ) applications : Super Resolution, image post processing, image generation,

Deep Reinforcement Learning Applications : Video Game Playing Ai'S (GO, Doom Bot, Atari Bot),

Unsupervised Learning Methods

Priniple Component Analysis PCA

can be used for maximizing variance in Data, but also for Dimensionality reduction or finding a fitting line for the data

When dimensionality of data is too high to visualize it in 3 Dimensional space, we can reduce dimensions by using PCA

The variance between the data will be maximized, by projecting the data in another lover dimensional space.

PCA for Maximizing Variance

PCA can be defined as the orthogonal projection of the data onto a lower Dimensional linear space, known as the pricipal subspace. The projection must be in a way, such that the variance of the projhected data is maximized.

Es wird in die Daten die Linie einzezeichnet, die die Maximale Varianz zu den Daten hat.

Non Negative Matrix Factorization- NMF

For some uses Cases PCA does not make since, for example for data with non-negative (only positive data)

PCA fails data sets that are strictly positive , like Text data, Image Data, Probabiistic data

Applications: Face Recognition, news learning from bag of words

Clustering

Kinda the same like NCC , but with multiple classes!

How to Choose Hyperparameter K ? Choose K with least instable Clusterings! in supervised scenarios we could use cross-validation to optimize hyper-parameters, now we cant, magic number tryout!.

Applications : Pulse Code Modulation, Geyser Eruptions,

Previous2. Transition Detection

Last updated 7 years ago

Protoypes are how the human brain learns and are very closely related to linear classification.