您的位置:首页 > 健康 > 养生 > OpenAI gym: when is reset required?

OpenAI gym: when is reset required?

2024/12/23 16:15:28 来源:https://blog.csdn.net/suiusoar/article/details/141463942  浏览:    关键词:OpenAI gym: when is reset required?

题意:“OpenAI Gym: 什么时候需要重置?”

问题背景:

Although I can manage to get the examples and my own code to run, I am more curious about the real semantics / expectations behind OpenAI gym API, in particular Env.reset()

“虽然我能够让示例代码和我自己的代码运行起来,但我更好奇 OpenAI Gym API 背后的真实语义和预期,特别是对 `Env.reset()` 方法。”

When is reset expected/required? At the end of each episode? Or only after creating an environment?

“什么时候应该/需要调用重置?是在每个回合结束时,还是只在创建环境后调用?”

I rather think it makes sense before each episode but I have not been able to read that explicitly!

“我认为在每个回合开始前调用重置是有道理的,但我没有明确读到这一点!”

问题解决:

You typically use reset after an entire episode. So that could be after you reached a terminal state in the mdp, or after you reached you maximum amount of time steps (set by you). I also typically reset it at the very start of training as well.

“通常,你会在整个回合结束后使用 `reset`。这可能是在你达到马尔可夫决策过程(MDP)中的终止状态之后,或者在你达到设定的最大时间步数之后。我通常也会在训练刚开始时调用 `reset`。”

So if you are at your starting state 'A' and you want to reach state 'Z', you would run your time steps going from 'A' -> 'B' -> 'C' ..., then when you reach the terminal state 'Z', you start a new episode using reset, which would take you back to 'A'.

“所以,如果你处于起始状态 ‘A’ 并且想要到达状态 ‘Z’,你会执行时间步,从 ‘A’ -> ‘B’ -> ‘C’ ……,然后当你到达终止状态 ‘Z’ 时,使用 `reset` 开始新的一回合,这会让你回到 ‘A’。”

for episode in range(iterations):state = env.reset() // first statefor time_step in range(1000):  //max amount of iterationsaction = take_action(state)state, reward, done, _ = env.step(action)if done:break // takes you to the next episode where the environment is reset

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com