.. gym-derk documentation master file, created by
   sphinx-quickstart on Mon Aug 17 09:52:51 2020.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

.. include:: title.rst

This is the documentation for gym-derk, a python package that exposes the game "Dr. Derk's Mutant Battlegrounds"
as an OpenAI gym environment.

Main website: https://gym.derkgame.com Please get a license on the website if you're using this in a commercial or academic context.

Installing: ``pip install gym-derk`` (see :ref:`installing` for details)

.. toctree::
   :maxdepth: 2
   :caption: Contents:

Examples
========

Basic example
-------------

In this example the Derklings just take random actions

.. literalinclude:: ../examples/random_actions.py

Neural network example
----------------------

This is an example of how to use the Genetic Algorithm to train a single layer neural network.

.. literalinclude:: ../examples/neural_network.py

Environment
===========

Environment details
-------------------

This is a MOBA inspired RL environment, where two teams battle each other, while trying to defend their own "statue".
Each team is composed of three units, and each unit gets a random loadout (see :ref:`items` for available items). The
goal is to try to attack the opponents statue and units, while defending your own. With the default reward, you
get one point for killing an enemy creature, and four points for killing an enemy statue.

Arenas and parallelism
----------------------

The environment is designed to run multiple game instances in parallel on the GPU. Each game instance is called an `arena`.
Functions such as `step` and `reset` provide and return values from multiple arenas each. Thanks to this functionallity,
it's possible to collect a large amount of experiences very quickly.

.. _reading-stats:

Team and episode stats
----------------------

There are a number of statistics you can access about teams. Use :attr:`gym_derk.envs.DerkEnv.team_stats` or
:attr:`gym_derk.DerkSession.team_stats` to get the data. See :class:`gym_derk.TeamStatsKeys` for available keys.
For example, to read the Reward of the third team:

.. code-block:: python

   env.team_stats[2, TeamStatsKeys.Reward.value]

You can also get stats for all arenas in an episode with :py:attr:`gym_derk.envs.DerkEnv.episode_stats` and
:attr:`gym_derk.DerkSession.episode_stats`.

.. _connected_mode:

Running against other agents / Benchmarking
-------------------------------------------

First, we need an agent to run; you can try for instance https://github.com/MountRouke/Randos. Clone the repo,
and start them with ``python main.py --server``. This will start a websocket server for that agent.

Next, set ``mode="connected"`` in your own agent environment. The environment will connect with a
websocket to the server running locally (by default), and the away team will now
be controlled by the server. You can now train against these agents, or if you wish to benchmark against them you can
look at ``episode_stats`` at the end of an episode to see how your agents were performing against the opponents.

.. _installing:

Installation & Running, os specific instructions
---------------------------------------------------------

The Derk environment is implemented as a WebGL2 web app, and runs in a chromium instance,
through `pyppeteer <https://github.com/pyppeteer/pyppeteer>`_.
This means that you can get the environment working anywhere where you can get chromium with WebGL2 working.
On Desktop systems (Windows, OSX, Desktop linux), using the Derk environment is fairly straightforward; just
run ``pip install gym-derk``. If you get any errors, make sure that WebGL2 works on your system; you can verify that
it does by visiting `WebGL2 report <https://webglreport.com/?v=2>`_. Unfortunately it's not possible to run
the environment in a headless mode, since chromium doesn't support GPU acceleration in headless mode yet (`see
this issue <https://bugs.chromium.org/p/chromium/issues/detail?id=765284>`_).

On a server environment, or if you're using Docker, there are two main ways to run the environment. The easiest is
to use xvfb, which usually means the environment will run on the CPU.
See https://github.com/MountRouke/DerkAppInstanceDockerized for a Debian based Docker image, and
https://github.com/MountRouke/DerkAppInstanceDockerized/tree/ubuntu for an Ubuntu based image, both using xvfb.
To utilize GPU acceleration on a server/in docker, you'll need to use `virtualgl <https://virtualgl.org/>`_. Virtualgl
can be a bit tricky to set up, but there are Docker images with it that could serve as a base.

The environment can also be run on Google Colab. See
the `Derk Colab GPU example <https://colab.research.google.com/drive/1uIovS0NOSJc0nU_Yz7E1EMeXRu8cmxLl>`_ (virtualgl based)
or `Derk Colab CPU example <https://colab.research.google.com/drive/19lo2AxUCpoic4nRWIcBhbKRI33FfghLA>`_ (xvfb based).

Finally, it's also possible to set up an agent as a server, *without running the environment*. This makes it possible to
set up a trained agent as a service which you can connect to. See https://github.com/MountRouke/Randos for an example of
how to do this. This is for instance useful for running a competition, where participants can submit Dockerized images
with their agents, but where the actual environment is run outside of their images.

Competition (AICrowd)
---------------------

We're partnering with AICrowd to run a competition for Derk, where you can submit your agents to see how well they are
performing compared to other participants' agents. The API is free to use for the competition.

Competition page (with starter kit and submission guidelines): https://aicrowd.com/derk

.. _creature-config:

Configuring your Derklings
--------------------------

You can configure a number of attributes on your Derklings, such as their appearance and their load-out.
The configuration is read modulous, so you can specify 1, 3 or `n_arenas * 3` configurations (or any
other number) depending on how you want it repeated. Here's a basic example:

.. code-block:: python

   env = DerkEnv(
      home_team=[
         { 'primaryColor': '#ff00ff' },
         { 'primaryColor': '#00ff00', 'slots': ['Talons', None, None] },
         { 'primaryColor': '#ff0000', 'rewardFunction': { 'healTeammate1': 1 } }
      ]
   )

The properties you can configure for a Derkling are:

- Cosmetics:
   - ``primaryColor``: A hex color: e.g. #ff00ff
   - ``secondaryColor``: Also a hex color
   - ``ears``: Integer between 1-4
   - ``eyes``: Integer between 1-5
   - ``backSpikes``: Integer between 1-7
- ``slots``: An array with exactly 3 items. Each item is a weapon/attachment slot. The first one is the arms attachment, the second tail attachment and the third is the misc attachment. See :ref:`items` for available items.
- ``rewardFunction``: A specific reward function for this Derkling. See :ref:`reward-function`

Citation
--------

Please use this BibTeX to cite this environment in your publications:

.. code-block:: bibtex

   @misc{gym_derk,
      author = {John Fredrik Wilhelm Norén},
      title = {Derk Gym Environment},
      year = {2020},
      publisher = {Mount Rouke},
      journal = {Mount Rouke},
      howpublished = {\url{https://gym.derkgame.com}},
   }

API reference
=============

High-level API
--------------

The high-level API provides a simple, OpenAI Gym compatible DerkEnv class
which is suitible for a Python notebook environment.

.. autoclass:: gym_derk.envs.DerkEnv
   :members:

Low-level API
-------------

The low-level API is more versatile and makes it possible to do things like setting up an agent as
a service or running many different agents together, even if they are running on completely different
machines. Here's an example of how it works:

.. literalinclude:: ../examples/multi_agent.py

.. autoclass:: gym_derk.DerkAgentServer
   :members:

.. autoclass:: gym_derk.DerkSession
   :members:

.. autoclass:: gym_derk.DerkAppInstance
   :members:

.. autoclass:: gym_derk.ObservationKeys
   :members:
   :undoc-members:
   :member-order: bysource

.. autoclass:: gym_derk.ActionKeys
   :members:
   :member-order: bysource

.. autoclass:: gym_derk.TeamStatsKeys
   :members:
   :undoc-members:
   :member-order: bysource

.. autofunction:: gym_derk.run_derk_agent_server_in_background

.. _reward-function:

Reward function
---------------

The reward function is based on the OpenAI Five reward function (https://gist.github.com/dfarhi/66ec9d760ae0c49a5c492c9fae93984a).
These are the possible fields:

====================== ============= =====
Field                  Default value Notes
====================== ============= =====
damageEnemyStatue      0             Per hitpoint
damageEnemyUnit        0             Per hitpoint
killEnemyStatue        4
killEnemyUnit          1
healFriendlyStatue     0             Per hitpoint
healTeammate1          0             Per hitpoint
healTeammate2          0             Per hitpoint
timeSpentHomeBase      0             Every 5 seconds
timeSpentHomeTerritory 0             Every 5 seconds
timeSpentAwayTerritory 0             Every 5 seconds
timeSpentAwayBase      0             Every 5 seconds
damageTaken            0             Per hitpoint
friendlyFire           0             Per hitpoint
healEnemy              0             Per hitpoint
fallDamageTaken        0             Per hitpoint
statueDamageTaken      0             Per hitpoint (the teams own statue)
teamSpirit             0             If this is 1, it means all rewards are averaged between teammates
timeScaling            1             This is a linear falloff with time for reward; 0 means no reward at all at the last step
====================== ============= =====

.. _items:

Items
-----

By default, a Derkling gets a random loadout assigend. Each slot has a 70% chance to be filled, which means there's a
34% chance of three items, 44% chance of two items, 19% chance of one item and 3% chance of no items.

.. include:: items.rst

.. include:: changelog.rst

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`