Derk’s Gym 1.1.1

This is the documentation for gym-derk, a python package that exposes the game “Dr. Derk’s Mutant Battlegrounds” as an OpenAI gym environment.

Main website: Please get a license on the website if you’re using this in a commercial or academic context.

Installing: pip install gym-derk (see Installation & Running, os specific instructions for details)


Basic example

In this example the Derklings just take random actions

from gym_derk.envs import DerkEnv

env = DerkEnv()

for t in range(3):
  observation_n = env.reset()
  while True:
    action_n = [env.action_space.sample() for i in range(env.n_agents)]
    observation_n, reward_n, done_n, info = env.step(action_n)
    if all(done_n):
      print("Episode finished")

Neural network example

This is an example of how to use the Genetic Algorithm to train a single layer neural network.

from gym_derk.envs import DerkEnv
from gym_derk import ObservationKeys
import numpy as np
import gym
import math
import os.path

env = DerkEnv()

class Network:
  def __init__(self, weights=None, biases=None):
    self.network_outputs = 13
    if weights is None:
      weights_shape = (self.network_outputs, len(ObservationKeys))
      self.weights = np.random.normal(size=weights_shape)
      self.weights = weights
    if biases is None:
      self.biases = np.random.normal(size=(self.network_outputs))
      self.biases = biases

  def clone(self):
    return Network(np.copy(self.weights), np.copy(self.biases))

  def forward(self, observations):
    outputs = np.add(np.matmul(self.weights, observations), self.biases)
    casts = outputs[3:6]
    cast_i = np.argmax(casts)
    focuses = outputs[6:13]
    focus_i = np.argmax(focuses)
    return (
      math.tanh(outputs[0]), # MoveX
      math.tanh(outputs[1]), # Rotate
      max(min(outputs[2], 1), 0), # ChaseFocus
      (cast_i + 1) if casts[cast_i] > 0 else 0, # CastSlot
      (focus_i + 1) if focuses[focus_i] > 0 else 0, # Focus

  def copy_and_mutate(self, network, mr=0.1):
    self.weights = np.add(network.weights, np.random.normal(size=self.weights.shape) * mr)
    self.biases = np.add(network.biases, np.random.normal(size=self.biases.shape) * mr)

weights = np.load('weights.npy') if os.path.isfile('weights.npy') else None
biases = np.load('biases.npy') if os.path.isfile('biases.npy') else None

networks = [Network(weights, biases) for i in range(env.n_agents)]

for e in range(10):
  observation_n = env.reset()
  while True:
    action_n = [networks[i].forward(observation_n[i]) for i in range(env.n_agents)]
    observation_n, reward_n, done_n, info = env.step(action_n)
    if all(done_n):
        print("Episode finished")
  if env.mode == 'train':
    reward_n = env.total_reward
    top_network_i = np.argmax(reward_n)
    top_network = networks[top_network_i].clone()
    for network in networks:
    print('top reward', reward_n[top_network_i])'weights.npy', top_network.weights)'biases.npy', top_network.biases)


Environment details

This is a MOBA inspired RL environment, where two teams battle each other, while trying to defend their own “statue”. Each team is composed of three units, and each unit gets a random loadout (see Items for available items). The goal is to try to attack the opponents statue and units, while defending your own. With the default reward, you get one point for killing an enemy creature, and four points for killing an enemy statue.

Arenas and parallelism

The environment is designed to run multiple game instances in parallel on the GPU. Each game instance is called an arena. Functions such as step and reset provide and return values from multiple arenas each. Thanks to this functionallity, it’s possible to collect a large amount of experiences very quickly.

Team and episode stats

There are a number of statistics you can access about teams. Use gym_derk.envs.DerkEnv.team_stats or gym_derk.DerkSession.team_stats to get the data. See gym_derk.TeamStatsKeys for available keys. For example, to read the Reward of the third team:

env.team_stats[2, TeamStatsKeys.Reward.value]

You can also get stats for all arenas in an episode with gym_derk.envs.DerkEnv.episode_stats and gym_derk.DerkSession.episode_stats.

Running against other agents / Benchmarking

First, we need an agent to run; you can try for instance Clone the repo, and start them with python --server. This will start a websocket server for that agent.

Next, set mode="connected" in your own agent environment. The environment will connect with a websocket to the server running locally (by default), and the away team will now be controlled by the server. You can now train against these agents, or if you wish to benchmark against them you can look at episode_stats at the end of an episode to see how your agents were performing against the opponents.

Installation & Running, os specific instructions

The Derk environment is implemented as a WebGL2 web app, and runs in a chromium instance, through pyppeteer. This means that you can get the environment working anywhere where you can get chromium with WebGL2 working. On Desktop systems (Windows, OSX, Desktop linux), using the Derk environment is fairly straightforward; just run pip install gym-derk. If you get any errors, make sure that WebGL2 works on your system; you can verify that it does by visiting WebGL2 report. Unfortunately it’s not possible to run the environment in a headless mode, since chromium doesn’t support GPU acceleration in headless mode yet (see this issue).

On a server environment, or if you’re using Docker, there are two main ways to run the environment. The easiest is to use xvfb, which usually means the environment will run on the CPU. See for a Debian based Docker image, and for an Ubuntu based image, both using xvfb. To utilize GPU acceleration on a server/in docker, you’ll need to use virtualgl. Virtualgl can be a bit tricky to set up, but there are Docker images with it that could serve as a base.

The environment can also be run on Google Colab. See the Derk Colab GPU example (virtualgl based) or Derk Colab CPU example (xvfb based).

Finally, it’s also possible to set up an agent as a server, without running the environment. This makes it possible to set up a trained agent as a service which you can connect to. See for an example of how to do this. This is for instance useful for running a competition, where participants can submit Dockerized images with their agents, but where the actual environment is run outside of their images.

Competition (AICrowd)

We’re partnering with AICrowd to run a competition for Derk, where you can submit your agents to see how well they are performing compared to other participants’ agents. The API is free to use for the competition.

Competition page (with starter kit and submission guidelines):

Configuring your Derklings

You can configure a number of attributes on your Derklings, such as their appearance and their load-out. The configuration is read modulous, so you can specify 1, 3 or n_arenas * 3 configurations (or any other number) depending on how you want it repeated. Here’s a basic example:

env = DerkEnv(
      { 'primaryColor': '#ff00ff' },
      { 'primaryColor': '#00ff00', 'slots': ['Talons', None, None] },
      { 'primaryColor': '#ff0000', 'rewardFunction': { 'healTeammate1': 1 } }

The properties you can configure for a Derkling are:

  • Cosmetics:
    • primaryColor: A hex color: e.g. #ff00ff

    • secondaryColor: Also a hex color

    • ears: Integer between 1-4

    • eyes: Integer between 1-5

    • backSpikes: Integer between 1-7

  • slots: An array with exactly 3 items. Each item is a weapon/attachment slot. The first one is the arms attachment, the second tail attachment and the third is the misc attachment. See Items for available items.

  • rewardFunction: A specific reward function for this Derkling. See Reward function


Please use this BibTeX to cite this environment in your publications:

   author = {John Fredrik Wilhelm Norén},
   title = {Derk Gym Environment},
   year = {2020},
   publisher = {Mount Rouke},
   journal = {Mount Rouke},
   howpublished = {\url{}},

API reference

High-level API

The high-level API provides a simple, OpenAI Gym compatible DerkEnv class which is suitible for a Python notebook environment.

class gym_derk.envs.DerkEnv(mode=False, n_arenas=None, reward_function=None, turbo_mode=False, home_team=None, away_team=None, session_args={}, app_args={}, agent_server_args={})

Reinforcement Learning environment for “Dr. Derk’s Mutant Battlegrounds”

There are two modes for the environment:


This is a convenience wrapper of the more low level api of gym_derk.DerkAppInstance, gym_derk.DerkAgentServer and gym_derk.DerkSession.

property action_space

Gym space for actions

async async_close()

Async version of close()

async async_reset()

Async version of reset()

async async_step(action_n=None)

Async version of step()

Return type

Tuple[ndarray, ndarray, List[bool], List[Dict]]


Shut down environment

property episode_stats

Stats for the last episode

Return type


property n_agents

Number of agents controlled by this environment

I.e. env.n_teams * env.n_agents_per_team

Return type


property n_agents_per_team

Number of agents in a team (3)

Return type


property n_teams

Number of teams controlled by this environment

Return type


property observation_space

Gym space for observations


Resets the state of the environment and returns an initial observation.

Return type



The initial observation for each agent, with shape (n_agents, len(gym_derk.ObservationKeys)).


ConnectionLostError – If there was a connection error in connected mode


Run one timestep.

Accepts a list of actions, one for each agent, and returns the current state.

Actions can have one of the following formats/shapes:

The returned observations are laid out in the same way as the actions, and can therefore be reshape like the above. For instance: observations.reshape((env.n_teams, env.n_agents_per_team, -1))


action_n (Optional[ndarray]) – Numpy array or list of actions. See gym_derk.ActionKeys

Return type

Tuple[ndarray, ndarray, List[bool], List[Dict]]


A tuple of (observation_n, reward_n, done_n, info). observation_n has shape (n_agents, len(gym_derk.ObservationKeys))


ConnectionLostError – If there was a connection error in connected mode

property team_stats

Stats for each team for the last episode

Numpy array of shape (env.n_teams, len(gym_derk.TeamStatsKeys))

See Team and episode stats

Return type


property total_reward

Accumulated rewards over an episode

Numpy array of shape (n_agents)

Return type


Low-level API

The low-level API is more versatile and makes it possible to do things like setting up an agent as a service or running many different agents together, even if they are running on completely different machines. Here’s an example of how it works:

from gym_derk import DerkAgentServer, DerkSession, DerkAppInstance
import asyncio

async def run_fixed(env: DerkSession, actions):
  await env.reset()
  while not env.done:
    await env.step([actions for i in range(env.n_agents)])

async def main():
  # Agent servers are just websocket servers which can be connected to by a DerkAppInstance
  # That means these three could be running in different processes or even on different machines
  agent_walk  = DerkAgentServer(run_fixed, args={ 'actions': [0.1,  0, 0, 0, 0] }, port=8788)
  agent_turn  = DerkAgentServer(run_fixed, args={ 'actions': [0,  0.1, 0, 0, 0] }, port=8789)
  agent_chase = DerkAgentServer(run_fixed, args={ 'actions': [0,    0, 1, 1, 5] }, port=8790)

  await agent_walk.start()
  await agent_turn.start()
  await agent_chase.start()

  # This creates an actual instance of the game to run simulations in
  app = DerkAppInstance()
  await app.start()
  # We can specify any number of agent hosts here, and which sides and arenas they control
  await app.run_session(
      { 'uri': agent_walk.uri,  'regions': [{ 'sides': 'home' }] },
      { 'uri': agent_turn.uri,  'regions': [{ 'sides': 'away', 'start_arena': 0, 'n_arenas': 1 }] },
      { 'uri': agent_chase.uri, 'regions': [{ 'sides': 'away', 'start_arena': 1, 'n_arenas': 1 }] },
  await app.print_team_stats()

class gym_derk.DerkAgentServer(handle_session, port=None, host=None, args={})

Agent server

This creates a websocket agent server, listening on host:port

  • handle_session – A coroutine accepting the session and optionally a list org argument

  • port (Optional[int]) – Port to listen to. Defaults to 8789

  • host (Optional[str]) – Host to listen to. Defaults to

  • args (Dict) – Dictonary of args passed to handle_session



async start()

Start the server

class gym_derk.DerkSession(websocket, init_msg)

A single training/evaluation session, consisting of multiple episodes


Number of teams controlled by this environment


Number of agents in a team (3)


Gym space for actions


Gym space for observations


Accumulated rewards over an episode. Numpy array of shape (n_agents)


Stats for each team for the last episode. Numpy array of shape (n_teams, len(gym_derk.TeamStatsKeys)). See Team and episode stats


Stats for the last episode. See Team and episode stats

async close()

Close session

property n_agents

Number of agents controlled by this environment

I.e. env.n_teams * env.n_agents_per_team

async reset()

See gym_derk.envs.DerkEnv.reset()

Return type


async step(action_n=None)

See gym_derk.envs.DerkEnv.step()

Return type

Tuple[ndarray, ndarray, List[bool], List[Dict]]

class gym_derk.DerkAppInstance(app_host=None, chrome_executable=None, chrome_args=[], chrome_devtools=False, window_size=[1000, 750], browser=None, browser_logs=False, internal_http_server=False)

Application instance of “Dr. Derk’s Mutant Battlegrounds”

  • app_host (Optional[str]) – Configure an alternative app bundle host. (Environment variable: DERK_APP_HOST)

  • chrome_executable (Optional[str]) – Path to chrome or chromium. (Environment variable: DERK_CHROME_EXECUTABLE)

  • chrome_args (List[str]) – List of command line switches passed to chrome

  • chrome_devtools (bool) – Launch devtools when chrome starts

  • window_size (Tuple[int, int]) – Tuple with the size of the window

  • browser (Optional[Browser]) – A pyppeteer browser instance

  • browser_logs (bool) – Show log output from browser

  • web_socket_worker – Run websockets in a web worker

async async_get_webgl_renderer()

Async version of get_webgl_renderer()

async close()

Shut down app instance

async connect_to_agent_hosts()

Connect to agent hosts specified when the session was created


True if all hosts are connected, False otherwise

This method can be called in a loop to wait for all hosts to come online.

async create_session(n_arenas=1, reward_function=None, turbo_mode=False, home_team=None, away_team=None, substeps=8, interleaved=True, agent_hosts=None, debug_no_observations=False, web_socket_worker=None, ai_crowd_logo=False, read_game_state=False)

Create a session

All arguments are optional.

  • n_arenas (int) – Number of parallel arenas to run

  • reward_function (Optional[Dict]) – Reward function. See Reward function for available options

  • turbo_mode (bool) – Skip rendering to the screen to run as fast as possible

  • home_team (Optional[List[Dict]]) – Home team creatures. See Configuring your Derklings.

  • away_team (Optional[List[Dict]]) – Away team creatures. See Configuring your Derklings.

  • substeps (int) – Number of game steps to run for each call to step

  • interleaved (bool) – Run each step in the background, returning the previous steps observations

  • agent_hosts (Union[List[Dict], str, None]) – List of DerkAgentServer’s to connect to, or "single_local", or "dual_local". See below for details.

  • read_game_state (bool) – Read the entire internal game state each step, and provide it as a JSON in the info object returned from the step function.

With the interleaved mode on, there’s a delay between observation and action of size substeps. E.g. if substeps=8 there’s an 8*16ms = 128ms “reaction time” from observation to action. This means that the game and the python code can in effect run in parallel.

The agent_hosts argument takes list of dicts with the following format: { uri: str, regions: [{ side: str, start_arena: int, n_arenas: int }] }, where uri specifies a running DerkAgentServer to connect to, and regions define which arenas and sides that agent will control. side can be 'home', 'away' or 'both'. start_arena and n_arenas can be ommitted to run the agent on all arenas. You can also pass a string value of "single_local", in which case the agent_hosts defaults to [{ 'uri': 'ws://', 'regions': [{ 'sides': 'both' }] }], or if you specify "dual_local" it defaults to

  { 'uri': 'ws://', 'regions': [{ 'sides': 'home' }] },
  { 'uri': 'ws://', 'regions': [{ 'sides': 'away' }] }
async disconnect_all_remotes()

Disconnect all remotes

async episode_reset()

Reset for an episode

async episode_step()

Step for an episode

async get_episode_stats()

Gets a summary of stats for the last episode, based on team_stats

async get_team_stats()

Read all team stats from the last episode

Return type



Team stats for all teams; a numpy array of shape (2, n_arenas, len(gym_derk.TeamStatsKeys)). The first dimension is the side (0=home, 1=away).


Return which webgl renderer is being used by the game

Return type


async print_team_stats(team_stats=None)

Reads and prints the team stats from the last episode

async reload()

Reload the game

async run_episode()

Run a single episode

Shorthand for:

  await app.episode_reset()
  while not (await app.episode_step()):
except Exception as e:
async run_episodes_loop()

Runs episodes in a loop until agents disconnect

async run_session(**kwargs)

Creates a session, connect hosts and runs episodes loop.

See create_session() for args.

This is just a shorthand for:

`python await self.create_session(args) await self.connect_to_agent_hosts() await self.run_episodes_loop() `

property running

Returns true if the app is still running

async start()

Start the application

async update_away_team_config(config)

Update the away teams configuration.

The session needs to be created first.


config – See Configuring your Derklings

async update_home_team_config(config)

Update the home teams configuration.

The session needs to be created first.


config – See Configuring your Derklings

async update_reward_function(reward_function)

Update the reward function.

The session needs to be created first.


reward_function – See Reward function

class gym_derk.ObservationKeys(value)

An enumeration.

Hitpoints = 0
Ability0Ready = 1
FriendStatueDistance = 2
FriendStatueAngle = 3
Friend1Distance = 4
Friend1Angle = 5
Friend2Distance = 6
Friend2Angle = 7
EnemyStatueDistance = 8
EnemyStatueAngle = 9
Enemy1Distance = 10
Enemy1Angle = 11
Enemy2Distance = 12
Enemy2Angle = 13
Enemy3Distance = 14
Enemy3Angle = 15
HasFocus = 16
FocusRelativeRotation = 17
FocusFacingUs = 18
FocusFocusingBack = 19
FocusHitpoints = 20
Ability1Ready = 21
Ability2Ready = 22
FocusDazed = 23
FocusCrippled = 24
HeightFront1 = 25
HeightFront5 = 26
HeightBack2 = 27
PositionLeftRight = 28
PositionUpDown = 29
Stuck = 30
UnusedSense31 = 31
HasTalons = 32
HasBloodClaws = 33
HasCleavers = 34
HasCripplers = 35
HasHealingGland = 36
HasVampireGland = 37
HasFrogLegs = 38
HasPistol = 39
HasMagnum = 40
HasBlaster = 41
HasParalyzingDart = 42
HasIronBubblegum = 43
HasHeliumBubblegum = 44
HasShell = 45
HasTrombone = 46
FocusHasTalons = 47
FocusHasBloodClaws = 48
FocusHasCleavers = 49
FocusHasCripplers = 50
FocusHasHealingGland = 51
FocusHasVampireGland = 52
FocusHasFrogLegs = 53
FocusHasPistol = 54
FocusHasMagnum = 55
FocusHasBlaster = 56
FocusHasParalyzingDart = 57
FocusHasIronBubblegum = 58
FocusHasHeliumBubblegum = 59
FocusHasShell = 60
FocusHasTrombone = 61
UnusedExtraSense30 = 62
UnusedExtraSense31 = 63
class gym_derk.ActionKeys(value)

These are the actions a Derkling can take, which you send to the step function.

MoveX = 0

A number between -1 and 1. This controlls forward/backwords movement of the Derkling.

Rotate = 1

A number between -1 and 1. This controlls the rotation of the Derklin. Rotate -1 mean turn left full speed.

ChaseFocus = 2

A number between 0 and 1. If this is 1, the MoveX and Rotate actions are ignored and instead the Derkling runs towards its current focus. Numbers between 0-1 interpolates between this behavior and the MoveX/Rotate actions, and 0 means only MoveX and Rotate are used.

CastingSlot = 3

0=don’t cast. 1-3=cast corresponding ability.

ChangeFocus = 4

0=keep current focus. 1=focus home statue. 2-3=focus teammates, 4=focus enemy statue, 5-7=focus enemy

class gym_derk.TeamStatsKeys(value)

An enumeration.

Reward = 0
OpponentReward = 1
Hitpoints = 2
AliveTime = 3
CumulativeHitpoints = 4
gym_derk.run_derk_agent_server_in_background(handle_session, **kwargs)

Launch a DerkAgentServer a background thread

Accepts the same arguments as gym_derk.DerkAgentServer

Reward function

The reward function is based on the OpenAI Five reward function ( These are the possible fields:


Default value




Per hitpoint



Per hitpoint







Per hitpoint



Per hitpoint



Per hitpoint



Every 5 seconds



Every 5 seconds



Every 5 seconds



Every 5 seconds



Per hitpoint



Per hitpoint



Per hitpoint



Per hitpoint



Per hitpoint (the teams own statue)



If this is 1, it means all rewards are averaged between teammates



This is a linear falloff with time for reward; 0 means no reward at all at the last step


By default, a Derkling gets a random loadout assigend. Each slot has a 70% chance to be filled, which means there’s a 34% chance of three items, 44% chance of two items, 19% chance of one item and 3% chance of no items.






Melee item dealing good, steady damge to a target.



Damage dealing melee item that also heals the equipper with each hit.



Heave and powerful, but slow hitting melee item.



Melee item that also cripple the opponent, making them move slower.



Ranged weapon. Pew pew!



Heavy ranged weapon that knocks the target back.



Heavy ranged weapon that deals massive damge.



Long strong legs, enabling the Derkling to quickly jump forward.



Blows an iron-enforced bubble around a target, protecting them from damage.



Blows a bubble filled with helium around a target, making them float up into the air.



Increases the armor of a Derkling. Armor is further increased when they duck.



When the horn is blown, all enemies are forced to focus on the musician.



Siphons hitpoints to the target.



Drains a target of hitpoints and restores the casters hitpoints.



Launches a projectile at a target, dazing them for a short moment.


  • 0.15.1: Make it possible to display the AICrowd logo in-game

  • 0.15.0

    • Remove “Points”; we only have Reward now

    • Update default reward

    • Update team_stats_keys; remove “Gold”, “Points” and “OpponentPoints” and add “Reward” and “OpponentReward”

    • Winner is now based on the team with the highest reward

  • 0.14.3: Prevent Derklings from moving too far off camera

  • 0.14.2: Tweak camera to make more of the map visible

  • 0.14.1: Configurable window size

  • 0.14.0: Improve Derklig configuration documentation, and change Derkling bounties field name to rewardFunction.

  • 0.13.2: Fix two memory leaks

  • 0.13.1: Fix bug that prevented DerkEnv to start since 0.13.0

  • 0.13.0

  • 0.12.4:

  • 0.12.3:

    • Fix argument bug to run_derk_agent_server_in_background

  • 0.12.2:

    • Add run_derk_agent_server_in_background convenience method

  • 0.12.1:

    • Re-added a couple of convenience arguments to DerkEnv: n_arenas, reward_function, turbo_mode, home_team and away_team. These simply get added to the sesssion_args argument.

  • 0.12.0:

    • Rename DerkAppInstance.async_init_browser to DerkAppInstance.start, which the user now needs to call to start the app

    • If the app closes for any reason, the DerkEnv is now able to restart it on reset

  • 0.11.1:

    • Add args to DerkAgentServer which are passed to the session runner

  • 0.11.0:

    • This version breaks the API into two parts; a high-level DerkEnv, suitable for working in for instance a notebook environment, and a low-level API with DerkAgentServer, DerkSession and DerkAppInstance.

    • The arguments to DerkEnv have changed and is just three dict args that gets passed down the the low-level API. To set for instance n_arenas and app_host, it would look like this now: DerkEnv(session_args={ 'n_arenas': 10 }, app_args={ 'app_host': 'http://localhost:3000' })

  • 0.10.0:

    • Added env.action_keys and env.observation_keys

    • Removed env.n_actions; use len(env.action_keys) instead.

    • Removed env.n_senses; use len(env.observation_keys) instead.

  • 0.9.0:

    • Started keeping a changelog

    • The connected_host argument was replaced with a connected_envs argument, and documentation added on how to specify it

Indices and tables