perception and convergence rule

1. The Perceptron
The Perceptron is a kind of a single-layer artificial network with only one neuron
The Percepton is a network in which the neuron unit calculates the linear combination of its real-valued or boolean inputs and passes it through a threshold activation function:
o = ThresholdS i=0d wi xi )
where xi are the components of the input xe= ( xe1xe2,..., xed ) from the set {( xeye )}e=1N 
Threshold is the activation function defined as follows: Threshold ( s ) = 1 if s > 0 and –1 otherwise


Perceptron
The Perceptron is sometimes referred to a threshold logic unit (TLU) since it discriminates the data depending on whether the sum is greater than the threshold value S i=1d wi x> -w0 or the sum is less than the threshold value S i=1d wi x< -w0. In the above formulation we imagine that the threshold value w0 is the weight of an additional connection held constantly to x0 = 1.
The Perceptron is strictly equivalent to a linear discriminant, and it is often used as a device that decides whether an input pattern belongs to one of two classes.

Linear discriminant
Networks of such threshold units can represent a rich variety of functions while single units alone can not. For example, every boolean function can be presented by some network of interconnected units.
The Perceptron can represent most of the primitive boolean functions: AND, OR, NAND and NOR but can not represent XOR.
2. Perceptron Learning Algorithms
As with other linear discriminants the main learning task is to learn the weights.
Unlike the normal linear discriminant, these training procedures are non-parametric; no assumptions are made about the population distribution; no means or covariances are calculated.
Instead of examining sample statistics, sample cases are presented sequentially, and errors are corrected by adjusting the weights. If the output is correct, the weights are not adjusted.

There are several learning algorithms for training single-layer Perceptrons based on: 
- the Perceptron rule 
- the Gradient descent rule 
- the Delta rule

All these rules define error-correction learning procedures.
2.1. The Perceptron Rule
The Perceptron rule is a sequential learning procedure for updating the weights.
Perceptron Learning Algorithm
Initialization: Examples {( xeye )}e=1N, initial weights wi set to small random values, learning rate parameter h = 0.1
Repeat
for each training example ( xey)
calculate the output: o = ThresholdSi=0d wi xie )
- if the Perceptron does not respond correctly update the weights:
wi = wi + h ( ye - oe ) xie // this is the Perceptron Rule
until termination condition is satisfied.
where: ye is the desired output, oe is the output generated by the Perceptron,
wi is the weight associated with the i-th connection.
The role of the learning parameter is to moderate the degree to which weights are changed at each step of the learning algorithm.
During learning the decision boundary defined by the Perceptron moves, and some points that have been previously misclassified will become correctly classified so that the set of examples that contribute to the weighted sum change.
2.2. Perceptron Convergence Theorem
The Perceptron convergence theorem states that for any data set which is linearly separable the Perceptron learning rule is guaranteed to find a solution in a finite number of steps.
In other words, the Perceptron learning rule is guaranteed to converge to a weight vector that correctly classifies the examples provided the training examples are linearly separable.
A function is said to be linearly separable when its outputs can be discriminated by a function which is a linear combination of features, that is we can discriminate its outputs by a line or a hyperplane.
Example: Suppose an example of perceptron which accepts two inputs x1 = 2 and x2 = 1, with weights w1 = 0.5 and w2 = 0.3 and w0 = -1.
The output of the perceptron is :
O = 2 * 0.5 + 1 * 0.3 - 1 = 0.3
Therefore the output is 1. If the correct output however is 0, the weights will be adjusted according to the Perceptron rule as follows:
w1 = 0.5 + ( 0 - 1 ) * 2 = - 1.5
w2 = 0.3 + ( 0 - 1 ) * 1 = - 0.7

w0 = - 1 + ( 0 - 1 ) * 1 = - 2

1 comment: