There are two optimal policies for Dynamic Programming, one is (), and the other is policy iteration.动态规划有两种优化策略,一个是(),而另一种是策略迭代。
正确答案:value iteration
There are two optimal policies for Dynamic Programming, one is (), and the other is policy iteration.动态规划有两种优化策略,一个是(),而另一种是策略迭代。
正确答案:value iteration