Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...

Question

Question

(10) 3. Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation function is: (C 、σ, where σ is a parameter to be learned for each neuron? Simplify α v) the equations as much as possible. e

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Answer:

For the derivation, lets assume certain conventions:

1. Subscript k will represent an output node. Subscript j will represent a hidden node.

2. Weight from a hidden node to an output node will be represented as $w_{hj}^{k}$ .

3. Weight change rule for hidden to output layer weights will be given by:

$w_{hj}^{k+1}=w_{hj}^{k} + \Delta w_{hj}^{k}$

where $\Delta w_{hj}^{k}$ is the weight change.

4. Activation function or signal will be represented as $\varphi (\nu)$ . In present case, $\varphi (\nu) = e^{-(\frac{\nu}{\sigma})^{2}}$ .

5. $y_{j}^{k}$ will be activation value at any output node.

Derivation of backpropagation algorithm for output layer:

Computation of neoronal signals:

A. For output layer, neuronal signal and activation value at any output node can be calculated as:

1. $y_{j}^{k} = \sum_{h=0}^{q}w_{hj}^{k}\varphi (\nu_{h}^{k}), j = 1,............,p$

2. $\varphi (\nu_{h}^{k}) = e^{-(\frac{\nu_{h}^{k}}{\sigma })^{2}}, j = 1,............,p$

where $w_{0j}^{k}$ will be the biases of output neurons.

Computation of error gradients:

B. Weight gradients of hidden weight gradients:

It can be calculated using chain rule of calculus as shown below:

3. $\frac{\partial \varepsilon _{k}}{\partial w_{hj}^{k}} = \frac{\partial \varepsilon _{k}}{\partial \varphi(y_{j}^{k})}\frac{\partial \varphi(y_{j}^{k})}{\partial y_{j}^{k}}\frac{\partial y_{j}^{k}}{\partial w_{hj}^{k}}$

Individual derivatives in the above equation can be calculated as:

i) $\frac{\partial \varepsilon _{k}}{\partial \varphi(y_{j}^{k})} = -(d_{j}^{k} - \varphi(y_{j}^{k})) = -e_{j}^{k}$

where $d_{j}^{k}$ is the desired output for given node.

ii) $\frac{\partial \varphi(y_{j}^{k})}{\partial y_{j}^{k}} = \varphi(y_{j}^{k})(1-\varphi(y_{j}^{k}))$

iii) $\frac{\partial y_{j}^{k}}{\partial w_{hj}^{k}} = \varphi(z_{h}^{j})$

where $z_{h}^{j}$ is activaion at concerned hidden unit.

Now on replacing values from i), ii) and iii) in 3, we get:

$\frac{\partial \varepsilon _{k}}{\partial w_{hj}^{k}} = -e_{j}^{k}\varphi(y_{j}^{k})(1-\varphi(y_{j}^{k}))\varphi (z_{h}^{k})$

Weight updates for output layer:

C. For hidden to output layer weights:

$w_{hj}^{k+1}=w_{hj}^{k} + \Delta w_{hj}^{k}$

$=w_{hj}^{k} + \sigma \left ( -\frac{\partial \varepsilon _{k}}{\partial w_{hj}^{k}} \right )$

$= w_{hj}^{k} + \sigma e_{j}^{k}\varphi(y_{j}^{k})(1-\varphi(y_{j}^{k}))\varphi (z_{h}^{k})$

Add a comment

Answer 2

Similar Homework Help Questions

der the multi-layer perceptron shown in Fig, 4.2. Use back propagation gontuhim to find updated values for weights ws a...

der the multi-layer perceptron shown in Fig, 4.2. Use back propagation gontuhim to find updated values for weights ws and we, given the inputs (xi desired outouts (d, de the outputs from the two neurons in the output layer. Assume t where e 0.5, x2 0) and the corresponding 0, d2 1). yo1 and yo2 function is, el" dy -yoi and 1, and, the activation function is, ф e d2 Yo2, the learning rate parameter is, I+ (15 Marks) o1...
1. Consider a neural network, which contains one hidden layer and an output layer with one...

1. Consider a neural network, which contains one hidden layer and an output layer with one output unit. Let the hidden units have negative sigmoid as the activation function, which is formulated as 1 n(v) 1 + exp(-1) and the output unit has a linear activation function in which the output is equal to the activation input). (a) Show that the derivative of the negative sigmoid obeys the following relation dn(v) dv = n(v)(1 + n(v)) (b) Let the cost...
1. Compared with PID Control, what are the advantages and disadvantages of Neural Network Control...

1. Compared with PID Control, what are the advantages and disadvantages of Neural Network Control? 2. The multi-layer neural network shown in Figure I has two inputs and one output. The network has two neurons in a hidden layer. The network is to be trained with backpropagation algorithm. Each neuron has a sigmoid activation function: Assume that the biases to the neurons is +1 and the learning rate is 1. The network has the following initial weights: (w). w1 wa1...

python Machine Learning problem Introduction In this project, you need to build a Multi-layer Perceptron (MLP)...

python Machine Learning problem Introduction In this project, you need to build a Multi-layer Perceptron (MLP) model for a specific dataset to do predictions. Wine Data Set. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. Specifically, the attributes are 1)Alcohol, 2) Malic acid, 3) Ash, 4) Alcalinity...
He second form for one-parameter exponential family distributions, introduced during lecture 09.1...

he second form for one-parameter exponential family distributions, introduced during lecture 09.1, was Jy (y | θ) = b(y)ec(0)t(y)-d(0) Let η = c(0). If c is an invertible function, we can rewrite (1) as where η is called the natural, or canonical, parameter and K(n) = d(C-1(n)). Expression (2) is referred to as the canonical representation of the exponential family distribution (a) Function κ(η) is called the log-normalizer: it ensures that the distribution fy(y n) integrates to one. Show that,...
Write code for RSA encryption package rsa; import java.util.ArrayList; import java.util.Random; import java.util.Scanner; public class RSA...

Write code for RSA encryption package rsa; import java.util.ArrayList; import java.util.Random; import java.util.Scanner; public class RSA { private BigInteger phi; private BigInteger e; private BigInteger d; private BigInteger num; public static void main(String[] args) { Scanner keyboard = new Scanner(System.in); System.out.println("Enter the message you would like to encode, using any ASCII characters: "); String input = keyboard.nextLine(); int[] ASCIIvalues = new int[input.length()]; for (int i = 0; i < input.length(); i++) { ASCIIvalues[i] = input.charAt(i); } String ASCIInumbers...

Only question b and f need aid Question B1 Where x is an output produced using...

Only question b and f need aid Question B1 Where x is an output produced using inputs labour (l) and capital (k) a) Do the following production functions exhibit decreasing, constant, or increasing returns to scale? Explain your answers (2 marks each) - x = 5070.3k0.3 x = 2020.45 k0.55 x = 570.610k0.6 Suppose our price-taking and wage-taking firm can produce a single output x using inputs labour (l) and capital (k) according to the production function: x = f(1,k)...
Learning Goal: To use the node-voltage method to solve circuits with branches containing only a voltage...

Learning Goal: To use the node-voltage method to solve circuits with branches containing only a voltage source. The node-voltage method is a general technique for solving circuits. Fundamentally, it involves writing KCL equations at essential nodes. When the circuit contains a dependent source, you must write a constraint equation for each dependent source, in addition to the KCL equations. When the circuit contains one or more voltage sources that are the only components in branches connecting two essential nodes, the...
1. Suppose YPoisson(A) and Y2 ~Poisson(2X) are two independent observations. (a) Derive the MLE o...

1. Suppose YPoisson(A) and Y2 ~Poisson(2X) are two independent observations. (a) Derive the MLE of λ based on (Yi,Yo) (b) Show that the estimator λ (Y + Y)/3 is unbiased for λ and compute its variance. (c) With as much rigor as possible, show that if A is large then (A-X)/v is approximately normally distributed. (d) Derive a 95 percent confidence interval for A based on the asymptotic distribution of λ in part (c) (e) Extra Credit Based on part...

Python. Just work in the def sierpinski. No output needed. Will give thumbs up for any attempt beginning this code. Your task is to implement this algorithm in Python, returning a random collection of...

Python. Just work in the def sierpinski. No output needed. Will give thumbs up for any attempt beginning this code. Your task is to implement this algorithm in Python, returning a random collection of inum-100, 000 points. You should then plot the points to see the structure. Please complete the following function: def sierpinski (po, v, f, inum) The four arguments are ·po the initial point. You may assume this is the origin, i.e., po = [0, 0] . v:...

Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...

Homework Answers

Add Answer to:
Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...

Post as a guest

Earn Coins

der the multi-layer perceptron shown in Fig, 4.2. Use back propagation gontuhim to find updated values for weights ws a...

1. Consider a neural network, which contains one hidden layer and an output layer with one...

1. Compared with PID Control, what are the advantages and disadvantages of Neural Network Control...

python Machine Learning problem Introduction In this project, you need to build a Multi-layer Perceptron (MLP)...

He second form for one-parameter exponential family distributions, introduced during lecture 09.1...

Write code for RSA encryption package rsa; import java.util.ArrayList; import java.util.Random; import java.util.Scanner; public class RSA...

Only question b and f need aid Question B1 Where x is an output produced using...

Learning Goal: To use the node-voltage method to solve circuits with branches containing only a voltage...

1. Suppose YPoisson(A) and Y2 ~Poisson(2X) are two independent observations. (a) Derive the MLE o...

Python. Just work in the def sierpinski. No output needed. Will give thumbs up for any attempt beginning this code. Your task is to implement this algorithm in Python, returning a random collection of...

Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...

Homework Answers

Add Answer to: Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...

Post as a guest

Earn Coins

Add Answer to:
Derive the Backpropagation algorithm (only for the output nodes of a multi-layer perceptron) if the activation...