Skip to content

Commit ac81d00

Browse files
authored
Merge pull request #324 from cpnota/develop
v0.9.1
2 parents 9ce894f + f8073e5 commit ac81d00

File tree

190 files changed

+5014
-3118
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

190 files changed

+5014
-3118
lines changed

.github/workflows/python-package.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
runs-on: ubuntu-latest
1616
strategy:
1717
matrix:
18-
python-version: [3.8, 3.9]
18+
python-version: [3.8, 3.11]
1919

2020
steps:
2121
- uses: actions/checkout@v2
@@ -25,9 +25,8 @@ jobs:
2525
python-version: ${{ matrix.python-version }}
2626
- name: Install dependencies
2727
run: |
28-
sudo apt-get install swig
29-
sudo apt-get install unrar
30-
pip install torch~=1.11 --extra-index-url https://download.pytorch.org/whl/cpu
28+
python -m pip install --upgrade pip
29+
pip install torch~=2.0 --extra-index-url https://download.pytorch.org/whl/cpu
3130
make install
3231
- name: Lint code
3332
run: |

.github/workflows/python-publish.yml

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,34 @@
11
# This workflow will upload a Python Package using Twine when a release is created
2-
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
33

44
name: Upload Python Package
55

66
on:
77
release:
8-
types: [created]
8+
types: [published]
9+
10+
permissions:
11+
contents: read
912

1013
jobs:
1114
deploy:
12-
1315
runs-on: ubuntu-latest
14-
15-
environment: deployment
16-
16+
environment: publish
17+
permissions:
18+
id-token: write
1719
steps:
18-
- uses: actions/checkout@v2
20+
- uses: actions/checkout@v3
1921
- name: Set up Python
20-
uses: actions/setup-python@v2
22+
uses: actions/setup-python@v3
2123
with:
22-
python-version: '3.x'
24+
python-version: 3.11
2325
- name: Install dependencies
2426
run: |
2527
python -m pip install --upgrade pip
26-
pip install setuptools wheel twine
27-
- name: Build and publish
28-
env:
29-
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
30-
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
31-
run: |
32-
python setup.py sdist bdist_wheel
33-
twine upload dist/*
28+
pip install torch~=2.0 --extra-index-url https://download.pytorch.org/whl/cpu
29+
pip install setuptools wheel
30+
make install
31+
- name: Build package
32+
run: make build
33+
- name: Publish package
34+
uses: pypa/gh-action-pypi-publish@release/v1

.readthedocs.yml

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,16 @@
1-
# .readthedocs.yml
2-
# Read the Docs configuration file
3-
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4-
5-
# Required
61
version: 2
72

8-
# Build documentation in the docs/ directory with Sphinx
9-
sphinx:
10-
configuration: docs/source/conf.py
11-
12-
# Build documentation with MkDocs
13-
#mkdocs:
14-
# configuration: mkdocs.yml
3+
build:
4+
os: "ubuntu-22.04"
5+
tools:
6+
python: "3.11"
157

16-
# Optionally build your docs in additional formats such as PDF and ePub
17-
formats: all
18-
19-
# Optionally set the version of Python and requirements required to build your docs
208
python:
21-
version: 3.7
229
install:
2310
- method: pip
2411
path: .
2512
extra_requirements:
2613
- docs
14+
15+
sphinx:
16+
configuration: docs/source/conf.py

Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,13 @@ integration-test:
1111
python -m unittest discover -s integration -p "*test.py"
1212

1313
lint:
14-
flake8 --ignore "E501,E731,E74,E402,F401,W503,E128" all
14+
black --check all benchmarks examples integration setup.py
15+
isort --profile black --check all benchmarks examples integration setup.py
16+
flake8 --select "F401" all benchmarks examples integration setup.py
1517

1618
format:
17-
autopep8 --in-place --aggressive --aggressive --ignore "E501,E731,E74,E402,F401,W503,E128" -r all
19+
black all benchmarks examples integration setup.py
20+
isort --profile black all benchmarks examples integration setup.py
1821

1922
tensorboard:
2023
tensorboard --logdir runs

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,11 @@ Additionally, we provide an [example project](https://github.com/cpnota/all-exam
2121

2222
## High-Quality Reference Implementations
2323

24-
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and PyBullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
24+
The `autonomous-learning-library` separates reinforcement learning agents into two modules: `all.agents`, which provides flexible, high-level implementations of many common algorithms which can be adapted to new problems and environments, and `all.presets` which provides specific instansiations of these agents tuned for particular sets of environments, including Atari games, classic control tasks, and MuJoCo/Pybullet robotics simulations. Some benchmark results showing results on-par with published results can be found below:
2525

26-
![atari40](benchmarks/atari40.png)
27-
![pybullet](benchmarks/pybullet.png)
26+
![atari40](benchmarks/atari_40m.png)
27+
![atari40](benchmarks/mujoco_v4.png)
28+
![pybullet](benchmarks/pybullet_v0.png)
2829

2930
As of today, `all` contains implementations of the following deep RL algorithms:
3031

all/__init__.py

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,16 @@
1-
import all.agents
2-
import all.approximation
3-
import all.core
4-
import all.environments
5-
import all.logging
6-
import all.memory
7-
import all.nn
8-
import all.optim
9-
import all.policies
10-
import all.presets
111
from all.core import State, StateArray
122

133
__all__ = [
14-
'agents',
15-
'approximation',
16-
'core',
17-
'environments',
18-
'logging',
19-
'memory',
20-
'nn',
21-
'optim',
22-
'policies',
23-
'presets',
24-
'State',
25-
'StateArray'
4+
"agents",
5+
"approximation",
6+
"core",
7+
"environments",
8+
"logging",
9+
"memory",
10+
"nn",
11+
"optim",
12+
"policies",
13+
"presets",
14+
"State",
15+
"StateArray",
2616
]

all/agents/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515
from .vqn import VQN, VQNTestAgent
1616
from .vsarsa import VSarsa, VSarsaTestAgent
1717

18-
1918
__all__ = [
2019
# Agent interfaces
2120
"Agent",

all/agents/_agent.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from abc import ABC, abstractmethod
2+
23
from all.optim import Schedulable
34

45

all/agents/_multiagent.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from abc import ABC, abstractmethod
2+
23
from all.optim import Schedulable
34

45

all/agents/_parallel_agent.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from abc import ABC, abstractmethod
2+
23
from all.optim import Schedulable
34

45

all/agents/a2c.py

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
import torch
21
from torch.nn.functional import mse_loss
2+
33
from all.logging import DummyLogger
44
from all.memory import NStepAdvantageBuffer
5+
56
from ._agent import Agent
67
from ._parallel_agent import ParallelAgent
78

@@ -28,15 +29,15 @@ class A2C(ParallelAgent):
2829
"""
2930

3031
def __init__(
31-
self,
32-
features,
33-
v,
34-
policy,
35-
discount_factor=0.99,
36-
entropy_loss_scaling=0.01,
37-
n_envs=None,
38-
n_steps=4,
39-
logger=DummyLogger()
32+
self,
33+
features,
34+
v,
35+
policy,
36+
discount_factor=0.99,
37+
entropy_loss_scaling=0.01,
38+
n_envs=None,
39+
n_steps=4,
40+
logger=DummyLogger(),
4041
):
4142
if n_envs is None:
4243
raise RuntimeError("Must specify n_envs.")
@@ -80,7 +81,9 @@ def _train(self, next_states):
8081
value_loss = mse_loss(values, targets)
8182
policy_gradient_loss = -(distribution.log_prob(actions) * advantages).mean()
8283
entropy_loss = -distribution.entropy().mean()
83-
policy_loss = policy_gradient_loss + self.entropy_loss_scaling * entropy_loss
84+
policy_loss = (
85+
policy_gradient_loss + self.entropy_loss_scaling * entropy_loss
86+
)
8487
loss = value_loss + policy_loss
8588

8689
# backward pass
@@ -90,16 +93,16 @@ def _train(self, next_states):
9093
self.features.step()
9194

9295
# record metrics
93-
self.logger.add_info('entropy', -entropy_loss)
94-
self.logger.add_info('normalized_value_error', value_loss / targets.var())
96+
self.logger.add_info("entropy", -entropy_loss)
97+
self.logger.add_info("normalized_value_error", value_loss / targets.var())
9598

9699
def _make_buffer(self):
97100
return NStepAdvantageBuffer(
98101
self.v,
99102
self.features,
100103
self.n_steps,
101104
self.n_envs,
102-
discount_factor=self.discount_factor
105+
discount_factor=self.discount_factor,
103106
)
104107

105108

all/agents/c51.py

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
import torch
21
import numpy as np
2+
import torch
3+
34
from all.logging import DummyLogger
5+
46
from ._agent import Agent
57

68

@@ -26,16 +28,16 @@ class C51(Agent):
2628
"""
2729

2830
def __init__(
29-
self,
30-
q_dist,
31-
replay_buffer,
32-
discount_factor=0.99,
33-
eps=1e-5,
34-
exploration=0.02,
35-
minibatch_size=32,
36-
replay_start_size=5000,
37-
update_frequency=1,
38-
logger=DummyLogger(),
31+
self,
32+
q_dist,
33+
replay_buffer,
34+
discount_factor=0.99,
35+
eps=1e-5,
36+
exploration=0.02,
37+
minibatch_size=32,
38+
replay_start_size=5000,
39+
update_frequency=1,
40+
logger=DummyLogger(),
3941
):
4042
# objects
4143
self.q_dist = q_dist
@@ -81,7 +83,9 @@ def _best_actions(self, probs):
8183
def _train(self):
8284
if self._should_train():
8385
# sample transitions from buffer
84-
states, actions, rewards, next_states, weights = self.replay_buffer.sample(self.minibatch_size)
86+
states, actions, rewards, next_states, weights = self.replay_buffer.sample(
87+
self.minibatch_size
88+
)
8589
# forward pass
8690
dist = self.q_dist(states, actions)
8791
# compute target distribution
@@ -100,14 +104,15 @@ def _train(self):
100104

101105
def _should_train(self):
102106
self._frames_seen += 1
103-
return self._frames_seen > self.replay_start_size and self._frames_seen % self.update_frequency == 0
107+
return (
108+
self._frames_seen > self.replay_start_size
109+
and self._frames_seen % self.update_frequency == 0
110+
)
104111

105112
def _compute_target_dist(self, states, rewards):
106113
actions = self._best_actions(self.q_dist.no_grad(states))
107114
dist = self.q_dist.target(states, actions)
108-
shifted_atoms = (
109-
rewards.view((-1, 1)) + self.discount_factor * self.q_dist.atoms
110-
)
115+
shifted_atoms = rewards.view((-1, 1)) + self.discount_factor * self.q_dist.atoms
111116
return self.q_dist.project(dist, shifted_atoms)
112117

113118
def _kl(self, dist, target_dist):
@@ -117,7 +122,7 @@ def _kl(self, dist, target_dist):
117122

118123

119124
class C51TestAgent(Agent):
120-
def __init__(self, q_dist, n_actions, exploration=0.):
125+
def __init__(self, q_dist, n_actions, exploration=0.0):
121126
self.q_dist = q_dist
122127
self.n_actions = n_actions
123128
self.exploration = exploration

0 commit comments

Comments
 (0)