Skip to content

tactics2d.envs

ParkingEnv

Bases: Env

This class provides an environment to train a vehicle to park in a parking lot without dynamic traffic participants, such as pedestrians and vehicles.

Observation

ParkingEnv provides two types of observations:

  • Camera: A top-down semantic segmentation image of the agent vehicle and its surrounding. The perception range is 20 meters. The image is returned as a 3D numpy array with the shape of (200, 200, 3).
  • LiDAR: A single line lidar sensor that scans a full 360-degree view with a range of 20 meters. The lidar data is returned as a 1D numpy array with the shape of (120,).

Action

ParkingEnv accepts either a continuous or a discrete action command for the agent vehicle.

  • The continuous action is a 2D numpy array with the shape of (2,) representing the steering angle and the acceleration.
  • The discrete action is an integer from 1 to 5, representing the following actions:
    1. Do nothing: (0, 0)
    2. Turn left: (-0.5, 0)
    3. Turn right: (0.5, 0)
    4. Move forward: (0, 1)
    5. Move backward: (0, -1)

The first element of the action tuple is the steering angle, which should be in the range of [-0.75, 0.75]. Its unit is radian. The second element of the action tuple is the acceleration, which should be in the range of [-2.0, 2.0]. Its unit is m/s$^2$.

Status and Reward

The environment provides a reward for each step. The status check and reward is calculated based on the following rules:

  1. Check time exceed: If the time step exceeds the maximum time step (20000 steps), the scenario status will be set to TIME_EXCEEDED and a negative reward -1 will be given.
  2. Check no action: If the agent vehicle does not move for over 100 steps, the scenario status will be set to NO_ACTION and a negative reward -1 will be given.
  3. Check out bound: If the agent vehicle goes out of the boundary of the map, the scenario status will be set to OUT_BOUND and a negative reward -5 will be given.
  4. Check collision: If the agent vehicle collides with the static obstacles, the traffic status will be set to COLLISION_STATIC and a negative reward -5 will be given.
  5. Check completed: If the agent vehicle successfully parks in the target area, the scenario status will be set to COMPLETED and a positive reward 5 will be given.
  6. Otherwise, the reward is calculated as the sum of the time penalty and the IoU reward. The time penalty is calculated as -tanh(t / T) * 0.1, where t is the current time step and T is the maximum time step. The IoU reward is calculated as the difference between the current IoU and the maximum IoU.

If the agent has successfully completed the scenario, the environment will set the terminated flag to True. If the scenario status goes abnormal or the traffic status goes abnormal, the environment will set the truncated flag to True.

The status information is returned as a dictionary with the following keys:

  • lidar: A 1D numpy array with the shape of (120,) representing the lidar data.
  • state: The current state of the agent vehicle.
  • target_area: The coordinates of the target area.
  • target_heading: The heading of the target area.
  • traffic_status: The status of the traffic scenario.
  • scenario_status: The status of the scenario.

__init__(type_proportion=0.5, render_mode='human', render_fps=60, max_step=int(20000.0), continuous=True)

Initialize the parking environment.

Parameters:

Name Type Description Default
type_proportion float

The proportion of "bay" parking scenario in all generated scenarios. It should be in the range of [0, 1]. If the input is out of the range, it will be clipped to the range. When it is 0, the generator only generates "parallel" parking scenarios. When it is 1, the generator only generates "bay" parking scenarios.

0.5
render_mode str

The mode of the rendering. It can be "human" or "rgb_array".

'human'
render_fps int

The frame rate of the rendering.

60
max_step int

The maximum time step of the scenario.

int(20000.0)
continuous bool

Whether to use continuous action space.

True

Raises:

Type Description
NotImplementedError

If the render mode is not supported.

reset(seed=None, options=None)

This function resets the environment.

Parameters:

Name Type Description Default
seed int

The random seed.

None
options dict

The options for the environment.

None

Returns: observation (np.array): The BEV image observation of the environment. infos (dict): The information of the environment.

step(action)

This function takes a step in the environment.

Parameters:

Name Type Description Default
action Union[Tuple[float], int]

The action command for the agent vehicle.

required

Raises:

Type Description
InvalidAction

If the action is not in the action space.

Returns:

Name Type Description
observation array

The BEV image observation of the environment.

reward float

The reward of the environment.

terminated bool

Whether the scenario is terminated. If the agent has completed the scenario, the scenario is terminated.

truncated bool

Whether the scenario is truncated. If the scenario status goes abnormal or the traffic status goes abnormal, the scenario is truncated.

infos dict

The information of the environment.

RacingEnv

Bases: Env

This class provides an environment to train a racing car to drive on a racing track.

Observation

RacingEnv provides a top-down semantic segmentation image of agent and its surrounding. The observation is represented as a

Action

RacingEnv accepts either a continuous or a discrete action command for the agent vehicle.

  • The continuous action is a 2D numpy array with the shape of (2,) representing the steering angle and the acceleration.
  • The discrete action is an integer from 1 to 5, representing the following actions:
    1. Do nothing: (0, 0)
    2. Turn left: (-0.5, 0)
    3. Turn right: (0.5, 0)
    4. Move forward: (0, 1)
    5. Move backward: (0, -1)

The first element of the action tuple is the steering angle, which should be in the range of [-0.75, 0.75]. Its unit is radian. The second element of the action tuple is the acceleration, which should be in the range of [-2.0, 2.0]. Its unit is m/s$^2$.

Status and Reward

The environment provides a reward for each step. The status check and reward is calculated based on the following rules:

  1. Check time exceed: If the time step exceeds the maximum time step (100000 steps), the scenario status will be set to TIME_EXCEEDED and a negative reward -1 will be given.
  2. Check no action: If the agent vehicle does not move for over 100 steps, the scenario status will be set to NO_ACTION and a negative reward -1 will be given.
  3. Check out bound: If the agent vehicle goes out of the boundary of the map, the scenario status will be set to OUT_BOUND and a negative reward -5 will be given.
  4. Check off road: If the agent vehicle drives off the road, the scenario status will be set to OFF_ROAD and a negative reward -5 will be given.
  5. Check arrived: If the agent vehicle arrives at the destination, the scenario status will be set to ARRIVED and a positive reward 10 will be given.
  6. Otherwise,

If the agent has successfully completed the scenario, the environment will set the terminated flag to True. If the scenario status goes abnormal or the traffic status goes abnormal, the environment will set the truncated flag to True.

The status information is returned as a dictionary with the following keys:

  • state: The current state of the agent vehicle.
  • traffic_status: The status of the traffic scenario.
  • scenario_status: The status of the scenario.

__init__(render_mode='human', render_fps=60, max_step=int(100000.0), continuous=True)

Initialize the racing environment.

Parameters:

Name Type Description Default
render_mode str

The mode of the rendering. It can be "human" or "rgb_array".

'human'
render_fps int

The frame rate of the rendering.

60
max_step int

The maximum time step of the scenario.

int(100000.0)
continuous bool

Whether to use continuous action space.

True

Raises:

Type Description
NotImplementedError

If the render mode is not supported.

step(action)

This function takes a step in the environment.

Parameters:

Name Type Description Default
action Union[array, int]

The action command for the agent vehicle.

required

Raises:

Type Description
InvalidAction

If the action is not in the action space.

Returns:

Name Type Description
observation array

The BEV image observation of the environment.

reward float

The reward of the environment.

terminated bool

Whether the scenario is terminated. If the agent has completed the scenario, the scenario is terminated.

truncated bool

Whether the scenario is truncated. If the scenario status goes abnormal or the traffic status goes abnormal, the scenario is truncated.

infos dict

The information of the environment.