AI Research and Study Note: Bellman Equation

Bellman equation, also known as a dynamic programming equation, is an equation to find optimal solution and discrete-time optimization problems. (In continuous-time optimization problems, the analogous equation is a partial differential equation which is usually called the Hamilton–Jacobi–Bellman equation.)

First, any optimization problem has some objective. The mathematical function that describes this objective is called the objective function.

In MDP, a Bellman equation refers to a recursion for expected rewards. For example, the expected reward for being in a particular state s and following some fixed policy π has the Bellman equation:

V^{\pi }(s)=R(s,\pi (s))+\gamma \sum _{s'}P(s'|s,\pi (s))V^{\pi }(s').\

In practical solution, Bellman equation is used to show chain relations for past events.

AI Research and Study Note

2016년 11월 21일 월요일

Bellman Equation

댓글 없음:

댓글 쓰기