用百度飞桨、PARL一步步实现MADDPG算法
jsdbzcm 发布于2020-07 浏览:4472 回复:2
1
收藏

aistudio实现:https://aistudio.baidu.com/aistudio/projectdetail/634944

本地实现方法:

1、首先clone环境
git clone git@github.com:openai/multiagent-particle-envs.git
Cloning into 'multiagent-particle-envs'...
remote: Enumerating objects: 234, done.
remote: Total 234 (delta 0), reused 0 (delta 0), pack-reused 234
Receiving objects: 100% (234/234), 100.83 KiB | 6.00 KiB/s, done.
Resolving deltas: 100% (127/127), done.

2、然后安装环境
pip install -e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///F:/magic/git/multiagent-particle-envs
Requirement already satisfied: gym in d:\programdata\anaconda3\lib\site-packages (from multiagent==0.0.1) (0.17.2)
Collecting numpy-stl
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ef/08/2d8533798a08e1878120a1bf4970eb8ee50f6860cd50db917c9defe5dda2/numpy-stl-2.11.2.tar.gz (484 kB)
Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.5.0)
Requirement already satisfied: numpy>=1.10.4 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.16.4)
Requirement already satisfied: scipy in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.2.1)
Requirement already satisfied: cloudpickle<1.4.0,>=1.2.0 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.2.1)
Collecting python-utils>=1.6.2
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d9/ff/623dfa533f3277199957229f053fdb2c73a9c18048680e1899c9a5c95e6b/python_utils-2.4.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: future in d:\programdata\anaconda3\lib\site-packages (from pyglet<=1.5.0,>=1.4.0->gym->multiagent==0.0.1) (0.18.0)
Requirement already satisfied: six in d:\programdata\anaconda3\lib\site-packages (from python-utils>=1.6.2->numpy-stl->multiagent==0.0.1) (1.12.0)
Building wheels for collected packages: numpy-stl
Building wheel for numpy-stl (setup.py): started
Building wheel for numpy-stl (setup.py): finished with status 'done'
Created wheel for numpy-stl: filename=numpy_stl-2.11.2-py3-none-any.whl size=17634 sha256=17c74ea7b966600fea0159b3af85a143b89c687e4e93d52f9a855f2c30abb045
Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\30\9f\04\49b6630b2c10a5fff136a9de1c77935d370377e6b63e671ae6
Successfully built numpy-stl
Installing collected packages: python-utils, numpy-stl, multiagent
Running setup.py develop for multiagent
Successfully installed multiagent numpy-stl-2.11.2 python-utils-2.4.0

3、然后运行train.py时出现错误
ImportError: cannot import name 'prng'
发现gym版本是0.17.2
4、安装gym版本0.10.5
 pip install gym==0.10.5
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting gym==0.10.5
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9b/50/ed4a03d2be47ffd043be2ee514f329ce45d98a30fe2d1b9c61dea5a9d861/gym-0.10.5.tar.gz (1.5 MB)
Requirement already satisfied: numpy>=1.10.4 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.16.4)
Requirement already satisfied: requests>=2.0 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (2.22.0)
Requirement already satisfied: six in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.12.0)
Requirement already satisfied: pyglet>=1.2.0 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.5.0)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (1.25.6)
Requirement already satisfied: idna<2.9,>=2.5 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (3.0.4)
Requirement already satisfied: future in d:\programdata\anaconda3\lib\site-packages (from pyglet>=1.2.0->gym==0.10.5) (0.18.0)
Building wheels for collected packages: gym
Building wheel for gym (setup.py): started
Building wheel for gym (setup.py): finished with status 'done'
Created wheel for gym: filename=gym-0.10.5-py3-none-any.whl size=1581312 sha256=fb811ccccc4594d0f8dd39d3a33c4fc8dfb8dacf52db3671c9ec8b25207dc375
Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\5c\ef\aa\e0b69113808c1103383f11762afbe30fbf8094661d2eea0997
Successfully built gym
Installing collected packages: gym
Attempting uninstall: gym
Found existing installation: gym 0.17.2
Uninstalling gym-0.17.2:
Successfully uninstalled gym-0.17.2
Successfully installed gym-0.10.5

5、运行tran.py
python train.py
[07-15 14:57:40 MainThread @logger.py:224] Argv: train.py
[07-15 14:57:41 MainThread @train.py:73] agent num: 2
[07-15 14:57:41 MainThread @train.py:74] observation_space: [Box(3,), Box(11,)]
[07-15 14:57:41 MainThread @train.py:75] action_space: [Discrete(3), Discrete(5)]
[07-15 14:57:41 MainThread @train.py:76] obs_shape_n: [3, 11]
[07-15 14:57:41 MainThread @train.py:77] act_shape_n: [3, 5]
[07-15 14:57:41 MainThread @train.py:80] agent 0 obs_low:[-inf -inf -inf] obs_high:[inf inf inf]
[07-15 14:57:41 MainThread @train.py:81] agent 0 act_n:3
[07-15 14:57:41 MainThread @train.py:80] agent 1 obs_low:[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf] obs_high:[inf inf inf inf inf inf inf inf inf inf inf]
[07-15 14:57:41 MainThread @train.py:81] agent 1 act_n:5
[07-15 14:57:41 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @train.py:131] Starting...
[07-15 14:57:43 MainThread @tensorboard.py:34] WRN [tensorboard] logdir is None, will save tensorboard files to train_log\train
View the data using: tensorboard --logdir=./train_log\train --host=192.168.1.18
[07-15 14:58:31 MainThread @train.py:156] Steps: 25000, Episodes: 1000, Mean episode reward: -145.28204924995373, Time: 48.015
6、出错
Error: Cannot open .\./model/agent_0.ckpt to write at (D:\1.6.3\paddle\paddle/fluid/operators/save_combine_op.h:51)
新建model目录后可以运行了。

python train.py
[07-15 15:11:32 MainThread @logger.py:224] Argv: train.py
[07-15 15:11:33 MainThread @train.py:73] agent num: 2
[07-15 15:11:33 MainThread @train.py:74] observation_space: [Box(3,), Box(11,)]
[07-15 15:11:33 MainThread @train.py:75] action_space: [Discrete(3), Discrete(5)]
[07-15 15:11:33 MainThread @train.py:76] obs_shape_n: [3, 11]
[07-15 15:11:33 MainThread @train.py:77] act_shape_n: [3, 5]
[07-15 15:11:33 MainThread @train.py:80] agent 0 obs_low:[-inf -inf -inf] obs_high:[inf inf inf]
[07-15 15:11:33 MainThread @train.py:81] agent 0 act_n:3
[07-15 15:11:33 MainThread @train.py:80] agent 1 obs_low:[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf] obs_high:[inf inf inf inf inf inf inf inf inf inf inf]
[07-15 15:11:33 MainThread @train.py:81] agent 1 act_n:5
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:34 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:34 MainThread @train.py:131] Starting...
[07-15 15:11:34 MainThread @tensorboard.py:34] WRN [tensorboard] logdir is None, will save tensorboard files to train_log\train
View the data using: tensorboard --logdir=./train_log\train --host=192.168.1.18
[07-15 15:12:20 MainThread @train.py:156] Steps: 25000, Episodes: 1000, Mean episode reward: -139.45457618596407, Time: 46.62
[07-15 15:12:22 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
I0715 15:11:34.038677 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:11:34.038677 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:11:34.039677 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:11:34.039677 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:11:34.040678 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:11:34.045678 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:11:34.045678 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:11:34.046679 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:11:34.047678 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:11:34.047678 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.928417 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.928417 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.929417 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.930418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.930418 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.941417 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.941417 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.942417 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.943418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.944417 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.953418 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.953418 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.954418 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.954418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.955418 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.992420 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.992420 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:22.002421 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:22.008421 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:22.012421 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:22.232434 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:22.232434 7304 pa[07-15 15:12:22 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:13:25 MainThread @train.py:156] Steps: 50000, Episodes: 2000, Mean episode reward: -202.16885853410054, Time: 64.749
[07-15 15:14:29 MainThread @train.py:156] Steps: 75000, Episodes: 3000, Mean episode reward: -62.100308289830025, Time: 63.836
[07-15 15:15:32 MainThread @train.py:156] Steps: 100000, Episodes: 4000, Mean episode reward: -60.38213499056411, Time: 63.734
[07-15 15:16:36 MainThread @train.py:156] Steps: 125000, Episodes: 5000, Mean episode reward: -57.73754570420472, Time: 63.111
[07-15 15:17:39 MainThread @train.py:156] Steps: 150000, Episodes: 6000, Mean episode reward: -60.41700897470501, Time: 62.931

7、运行环境

有好几个环境:

simple、simple_adversary、simple_crypto、simple_push、simple_reference、simple_speaker_listener、simple_spread、simple_tag、simple_world_comm

python train.py #默认运行的是simple_speaker_listener环境

python train.py --env [ENV_NAME] 可以运行其它环境,如:

python train.py --env simple_world_comm  #运行simple_world_comm环境

python train.py --env [ENV_NAME] --show --restore #可以查看效果,如:

python train.py --env simple_world_comm --show --restore #查看simple_world_comm运行效果

8、simple_world_comm环境介绍

共有4红2绿6个智能体,1个黑色圆是不可通过的障碍物,2个蓝色圆为food,2个大绿圆为森林,绿色智能体进入后,红色智能体无法获取其位置。
绿色的智能体通过靠近食物来获取奖励,速度较快,数量较少;红色的智能体有一定协同能力,通过阻碍good_agent获取奖励,速度较慢,但数量较多。

9、调整参数

    parser.add_argument(
        '--max_episodes',
        type=int,
        default=600000,  #修改 default值可修改训练次数(Episodes次数)
        help='stop condition:number of episodes')

    parser.add_argument(
        '--lr',
        type=float,
        default=1e-3, #修改 default值可修改学习率。
        help='learning rate for Adam optimizer')

    parser.add_argument(
        '--stat_rate',
        type=int,
        default=1000,第1000episodes保存一下,并显示reward值。
        help='statistical interval of save model or count reward')

10、运行simple_world_comm环境一段时间后,已经学会4个红色智能体围追1个绿色智能体(1个绿色智能体引开4个红色智能体),最后reward一直在45-50之间波动。(发不了gif动图,发个图片)

 

收藏
点赞
1
个赞
共2条回复 最后由用户已被禁言回复于2022-04
#3jsdbzcm回复于2020-08

https://aistudio.baidu.com/aistudio/projectdetail/634944

0
#2wangwei8638回复于2020-07

建议分享一下工程

0
TOP
切换版块