Question

Suppose =0.5 and the following sequence of rewards is received r1 = -3, r2 = 2,...

Suppose gamma =0.5 and the following sequence of rewards is received r1 = -3, r2 = 2, r3 = 6, r4 = 3, r5 = 2, with T = 5. What is the return at time-step 0 (G0)?

0 0
Add a comment Improve this question Transcribed image text
Answer #1

`Hey,

Note: Brother in case of any queries, just comment in box I would be very happy to assist all your queries

The pre-requisite we need to know to solve this question is that with a discount rate of \gamma , the agent will try to maximize the discounted returns it'll receive by choosing the actions supporting its goal.

We have the following formula for the total return we get using discounted rewards:

Gt Rt41+ Rt+2 Rt+3..

G= R+k+1 k-0

\RightarrowGtR+1 Gt+1

where, VI VI is called the discount rate.

With this formula in hand and utilizing the hint given to us, let us move forward to calculating the values of Go, G1,.G .

Since we are give T= 5 , thus there the rewards and returns at T = 6 will be 0 , i.e, 16 and G=0 6 .

Applying the formula we had obtained earlier for G5 ,

  G5= R6+Ge

G5=0 +0.5 0

BG=0 T5

For GA ,

  GA= R5+ yG,

GA=2+0.5* 0

GA2

For G ,

G3 R4YG4 G3 30.5 2 G3 31 G3 = 4

For Gz ,

G2 R3+yG3 G2 60.5 4 G2 62 G2 8

For G1 ,

Gi R2+YG2 G1 20.5 8 G1 24 Gi = 6

And for Go ,

Go R1 G Go 10.5 6 Go -13 Go 2

G0=-3+0.5*6=0

Therefore G0=0

Kindly revert for any queries

Thanks.

Add a comment
Know the answer?
Add Answer to:
Suppose =0.5 and the following sequence of rewards is received r1 = -3, r2 = 2,...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT