COMPUTATIONAL SCIENCE WITH SUMAN: perception and convergence rule

1. The Perceptron

The Perceptron is a kind of a single-layer artificial network with only one neuron

The Percepton is a network in which the neuron unit calculates the linear combination of its real-valued or boolean inputs and passes it through a threshold activation function:

o = Threshold( S _i=0^d w_i x_i )

where x_i are the components of the input x_e= ( x_e1, x_e2,..., x_ed ) from the set {( x_e, y_e )}_e=1^N

Threshold is the activation function defined as follows: Threshold ( s ) = 1 if s > 0 and –1 otherwise

Perceptron

The Perceptron is sometimes referred to a threshold logic unit (TLU) since it discriminates the data depending on whether the sum is greater than the threshold value S _i=1^d w_i x_i> -w₀ or the sum is less than the threshold value S _i=1^d w_i x_i< -w₀. In the above formulation we imagine that the threshold value w₀ is the weight of an additional connection held constantly to x₀ = 1.

The Perceptron is strictly equivalent to a linear discriminant, and it is often used as a device that decides whether an input pattern belongs to one of two classes.

Linear discriminant

Networks of such threshold units can represent a rich variety of functions while single units alone can not. For example, every boolean function can be presented by some network of interconnected units.

The Perceptron can represent most of the primitive boolean functions: AND, OR, NAND and NOR but can not represent XOR.

2. Perceptron Learning Algorithms

As with other linear discriminants the main learning task is to learn the weights.

Unlike the normal linear discriminant, these training procedures are non-parametric; no assumptions are made about the population distribution; no means or covariances are calculated.

Instead of examining sample statistics, sample cases are presented sequentially, and errors are corrected by adjusting the weights. If the output is correct, the weights are not adjusted.

There are several learning algorithms for training single-layer Perceptrons based on:

- the Perceptron rule

- the Gradient descent rule

- the Delta rule

All these rules define error-correction learning procedures.

2.1. The Perceptron Rule

The Perceptron rule is a sequential learning procedure for updating the weights.

Perceptron Learning Algorithm

Initialization: Examples {( x_e, y_e )}_e=1^N, initial weights w_i set to small random values, learning rate parameter h = 0.1

Repeat

for each training example ( x_e, y_e)

- calculate the output: o = Threshold( S_i=0^d w_i x_ie )

- if the Perceptron does not respond correctly update the weights:

w_i = w_i + h ( y_e - o_e ) x_ie // this is the Perceptron Rule

until termination condition is satisfied.

where: y_e is the desired output, o_e is the output generated by the Perceptron,

w_i is the weight associated with the i-th connection.

The role of the learning parameter is to moderate the degree to which weights are changed at each step of the learning algorithm.

During learning the decision boundary defined by the Perceptron moves, and some points that have been previously misclassified will become correctly classified so that the set of examples that contribute to the weighted sum change.

2.2. Perceptron Convergence Theorem

The Perceptron convergence theorem states that for any data set which is linearly separable the Perceptron learning rule is guaranteed to find a solution in a finite number of steps.

In other words, the Perceptron learning rule is guaranteed to converge to a weight vector that correctly classifies the examples provided the training examples are linearly separable.

A function is said to be linearly separable when its outputs can be discriminated by a function which is a linear combination of features, that is we can discriminate its outputs by a line or a hyperplane.

Example: Suppose an example of perceptron which accepts two inputs x₁ = 2 and x₂ = 1, with weights w₁ = 0.5 and w₂ = 0.3 and w₀ = -1.

The output of the perceptron is :

O = 2 * 0.5 + 1 * 0.3 - 1 = 0.3

Therefore the output is 1. If the correct output however is 0, the weights will be adjusted according to the Perceptron rule as follows:

w₁ = 0.5 + ( 0 - 1 ) * 2 = - 1.5

w₂ = 0.3 + ( 0 - 1 ) * 1 = - 0.7

w₀ = - 1 + ( 0 - 1 ) * 1 = - 2

perception and convergence rule

1 comment: