ding.world_model.base_world_model¶
ding.world_model.base_world_model
¶
WorldModel
¶
Bases: ABC
Overview
Abstract baseclass for world model.
Interfaces
should_train, should_eval, train, eval, step
should_train(envstep)
¶
Overview
Check whether need to train world model.
should_eval(envstep)
¶
Overview
Check whether need to evaluate world model.
train(env_buffer, envstep, train_iter)
abstractmethod
¶
Overview
Train world model using data from env_buffer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- env_buffer (
|
obj: |
required | |
- envstep (
|
obj: |
required | |
- train_iter (
|
obj: |
required |
eval(env_buffer, envstep, train_iter)
abstractmethod
¶
Overview
Evaluate world model using data from env_buffer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- env_buffer (
|
obj: |
required | |
- envstep (
|
obj: |
required | |
- train_iter (
|
obj: |
required |
step(obs, action)
abstractmethod
¶
Overview
Take one step in world model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- obs (
|
obj: |
required | |
- action (
|
obj: |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
|
Tensor
|
|
Tensor
|
|
Shapes
:math:B: batch size
:math:O: observation dimension
:math:A: action dimension
- obs: [B, O]
- action: [B, A]
- reward: [B, ]
- next_obs: [B, O]
- done: [B, ]
DynaWorldModel
¶
Bases: WorldModel, ABC
Overview
Dyna-style world model (summarized in arXiv: 1907.02057) which stores and\ reuses imagination rollout in the imagination buffer.
Interfaces
sample, fill_img_buffer, should_train, should_eval, train, eval, step
sample(env_buffer, img_buffer, batch_size, train_iter)
¶
Overview
Sample from the combination of environment buffer and imagination buffer with\ certain ratio to generate batched data for policy training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- policy (
|
obj: |
required | |
- env_buffer (
|
obj: |
required | |
- img_buffer (
|
obj: |
required | |
- batch_size (
|
obj: |
required | |
- train_iter (
|
obj: |
required |
Returns:
| Type | Description |
|---|---|
dict
|
|
fill_img_buffer(policy, env_buffer, img_buffer, envstep, train_iter)
¶
Overview
Sample from the env_buffer, rollouts to generate new data, and push them into the img_buffer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- policy (
|
obj: |
required | |
- env_buffer (
|
obj: |
required | |
- img_buffer (
|
obj: |
required | |
- envstep (
|
obj: |
required | |
- train_iter (
|
obj: |
required |
DreamWorldModel
¶
Bases: WorldModel, ABC
Overview
Dreamer-style world model which uses each imagination rollout only once\ and backpropagate through time(rollout) to optimize policy.
Interfaces
rollout, should_train, should_eval, train, eval, step
rollout(obs, actor_fn, envstep, **kwargs)
¶
Overview
Generate batched imagination rollouts starting from the current observations.\ This function is useful for value gradients where the policy is optimized by BPTT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- obs (
|
obj: |
required | |
- actor_fn (
|
obj: |
required | |
- envstep (
|
obj: |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
|
Tensor
|
|
Tensor
|
|
Tensor
|
|
Tensor
|
|
Shapes
:math:N: time step
:math:B: batch size
:math:O: observation dimension
:math:A: action dimension
- obss: :math:
[N+1, B, O], where obss[0] are the real observations - actions: :math:
[N+1, B, A] - rewards: :math:
[N, B] - aug_rewards: :math:
[N+1, B] - dones: :math:
[N, B]
.. note:: - The rollout length is determined by rollout length scheduler.
- actor_fn's inputs and outputs shape are similar to WorldModel.step()
HybridWorldModel
¶
Bases: DynaWorldModel, DreamWorldModel, ABC
Overview
The hybrid model that combines reused and on-the-fly rollouts.
Interfaces
rollout, sample, fill_img_buffer, should_train, should_eval, train, eval, step
Full Source Code
../ding/world_model/base_world_model.py