用百度飞桨、PARL一步步实现MADDPG算法

首页版块访问AI主站注册发帖

jsdbzcm 发布于2020-07 浏览:5008 回复:2

用百度飞桨、PARL一步步实现MADDPG算法

快速回复

aistudio实现：https://aistudio.baidu.com/aistudio/projectdetail/634944

本地实现方法：

1、首先clone环境
git clone git@github.com:openai/multiagent-particle-envs.git
Cloning into 'multiagent-particle-envs'...
remote: Enumerating objects: 234, done.
remote: Total 234 (delta 0), reused 0 (delta 0), pack-reused 234
Receiving objects: 100% (234/234), 100.83 KiB | 6.00 KiB/s, done.
Resolving deltas: 100% (127/127), done.

2、然后安装环境
pip install -e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///F:/magic/git/multiagent-particle-envs
Requirement already satisfied: gym in d:\programdata\anaconda3\lib\site-packages (from multiagent==0.0.1) (0.17.2)
Collecting numpy-stl
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ef/08/2d8533798a08e1878120a1bf4970eb8ee50f6860cd50db917c9defe5dda2/numpy-stl-2.11.2.tar.gz (484 kB)
Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.5.0)
Requirement already satisfied: numpy>=1.10.4 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.16.4)
Requirement already satisfied: scipy in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.2.1)
Requirement already satisfied: cloudpickle<1.4.0,>=1.2.0 in d:\programdata\anaconda3\lib\site-packages (from gym->multiagent==0.0.1) (1.2.1)
Collecting python-utils>=1.6.2
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d9/ff/623dfa533f3277199957229f053fdb2c73a9c18048680e1899c9a5c95e6b/python_utils-2.4.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: future in d:\programdata\anaconda3\lib\site-packages (from pyglet<=1.5.0,>=1.4.0->gym->multiagent==0.0.1) (0.18.0)
Requirement already satisfied: six in d:\programdata\anaconda3\lib\site-packages (from python-utils>=1.6.2->numpy-stl->multiagent==0.0.1) (1.12.0)
Building wheels for collected packages: numpy-stl
Building wheel for numpy-stl (setup.py): started
Building wheel for numpy-stl (setup.py): finished with status 'done'
Created wheel for numpy-stl: filename=numpy_stl-2.11.2-py3-none-any.whl size=17634 sha256=17c74ea7b966600fea0159b3af85a143b89c687e4e93d52f9a855f2c30abb045
Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\30\9f\04\49b6630b2c10a5fff136a9de1c77935d370377e6b63e671ae6
Successfully built numpy-stl
Installing collected packages: python-utils, numpy-stl, multiagent
Running setup.py develop for multiagent
Successfully installed multiagent numpy-stl-2.11.2 python-utils-2.4.0

3、然后运行train.py时出现错误
ImportError: cannot import name 'prng'
发现gym版本是0.17.2
4、安装gym版本0.10.5
pip install gym==0.10.5
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting gym==0.10.5
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9b/50/ed4a03d2be47ffd043be2ee514f329ce45d98a30fe2d1b9c61dea5a9d861/gym-0.10.5.tar.gz (1.5 MB)
Requirement already satisfied: numpy>=1.10.4 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.16.4)
Requirement already satisfied: requests>=2.0 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (2.22.0)
Requirement already satisfied: six in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.12.0)
Requirement already satisfied: pyglet>=1.2.0 in d:\programdata\anaconda3\lib\site-packages (from gym==0.10.5) (1.5.0)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (1.25.6)
Requirement already satisfied: idna<2.9,>=2.5 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in d:\programdata\anaconda3\lib\site-packages (from requests>=2.0->gym==0.10.5) (3.0.4)
Requirement already satisfied: future in d:\programdata\anaconda3\lib\site-packages (from pyglet>=1.2.0->gym==0.10.5) (0.18.0)
Building wheels for collected packages: gym
Building wheel for gym (setup.py): started
Building wheel for gym (setup.py): finished with status 'done'
Created wheel for gym: filename=gym-0.10.5-py3-none-any.whl size=1581312 sha256=fb811ccccc4594d0f8dd39d3a33c4fc8dfb8dacf52db3671c9ec8b25207dc375
Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\5c\ef\aa\e0b69113808c1103383f11762afbe30fbf8094661d2eea0997
Successfully built gym
Installing collected packages: gym
Attempting uninstall: gym
Found existing installation: gym 0.17.2
Uninstalling gym-0.17.2:
Successfully uninstalled gym-0.17.2
Successfully installed gym-0.10.5

5、运行tran.py
python train.py
[07-15 14:57:40 MainThread @logger.py:224] Argv: train.py
[07-15 14:57:41 MainThread @train.py:73] agent num: 2
[07-15 14:57:41 MainThread @train.py:74] observation_space: [Box(3,), Box(11,)]
[07-15 14:57:41 MainThread @train.py:75] action_space: [Discrete(3), Discrete(5)]
[07-15 14:57:41 MainThread @train.py:76] obs_shape_n: [3, 11]
[07-15 14:57:41 MainThread @train.py:77] act_shape_n: [3, 5]
[07-15 14:57:41 MainThread @train.py:80] agent 0 obs_low:[-inf -inf -inf] obs_high:[inf inf inf]
[07-15 14:57:41 MainThread @train.py:81] agent 0 act_n:3
[07-15 14:57:41 MainThread @train.py:80] agent 1 obs_low:[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf] obs_high:[inf inf inf inf inf inf inf inf inf inf inf]
[07-15 14:57:41 MainThread @train.py:81] agent 1 act_n:5
[07-15 14:57:41 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:42 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 14:57:43 MainThread @train.py:131] Starting...
[07-15 14:57:43 MainThread @tensorboard.py:34] WRN [tensorboard] logdir is None, will save tensorboard files to train_log\train
View the data using: tensorboard --logdir=./train_log\train --host=192.168.1.18
[07-15 14:58:31 MainThread @train.py:156] Steps: 25000, Episodes: 1000, Mean episode reward: -145.28204924995373, Time: 48.015
6、出错
Error: Cannot open .\./model/agent_0.ckpt to write at (D:\1.6.3\paddle\paddle/fluid/operators/save_combine_op.h:51)
新建model目录后可以运行了。

python train.py
[07-15 15:11:32 MainThread @logger.py:224] Argv: train.py
[07-15 15:11:33 MainThread @train.py:73] agent num: 2
[07-15 15:11:33 MainThread @train.py:74] observation_space: [Box(3,), Box(11,)]
[07-15 15:11:33 MainThread @train.py:75] action_space: [Discrete(3), Discrete(5)]
[07-15 15:11:33 MainThread @train.py:76] obs_shape_n: [3, 11]
[07-15 15:11:33 MainThread @train.py:77] act_shape_n: [3, 5]
[07-15 15:11:33 MainThread @train.py:80] agent 0 obs_low:[-inf -inf -inf] obs_high:[inf inf inf]
[07-15 15:11:33 MainThread @train.py:81] agent 0 act_n:3
[07-15 15:11:33 MainThread @train.py:80] agent 1 obs_low:[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf] obs_high:[inf inf inf inf inf inf inf inf inf inf inf]
[07-15 15:11:33 MainThread @train.py:81] agent 1 act_n:5
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:33 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:34 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:11:34 MainThread @train.py:131] Starting...
[07-15 15:11:34 MainThread @tensorboard.py:34] WRN [tensorboard] logdir is None, will save tensorboard files to train_log\train
View the data using: tensorboard --logdir=./train_log\train --host=192.168.1.18
[07-15 15:12:20 MainThread @train.py:156] Steps: 25000, Episodes: 1000, Mean episode reward: -139.45457618596407, Time: 46.62
[07-15 15:12:22 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
I0715 15:11:34.038677 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:11:34.038677 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:11:34.039677 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:11:34.039677 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:11:34.040678 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:11:34.045678 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:11:34.045678 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:11:34.046679 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:11:34.047678 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:11:34.047678 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.928417 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.928417 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.929417 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.930418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.930418 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.941417 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.941417 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.942417 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.943418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.944417 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.953418 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.953418 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:21.954418 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:21.954418 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:21.955418 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:21.992420 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:21.992420 7304 parallel_executor.cc:421] The number of CPUPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0715 15:12:22.002421 7304 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0715 15:12:22.008421 7304 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0715 15:12:22.012421 7304 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
I0715 15:12:22.232434 7304 parallel_executor.cc:409] If you set build_strategy.reduce with 'Reduce',the number of places should be greater than 1.
I0715 15:12:22.232434 7304 pa[07-15 15:12:22 MainThread @machine_info.py:88] Cannot find available GPU devices, using CPU now.
[07-15 15:13:25 MainThread @train.py:156] Steps: 50000, Episodes: 2000, Mean episode reward: -202.16885853410054, Time: 64.749
[07-15 15:14:29 MainThread @train.py:156] Steps: 75000, Episodes: 3000, Mean episode reward: -62.100308289830025, Time: 63.836
[07-15 15:15:32 MainThread @train.py:156] Steps: 100000, Episodes: 4000, Mean episode reward: -60.38213499056411, Time: 63.734
[07-15 15:16:36 MainThread @train.py:156] Steps: 125000, Episodes: 5000, Mean episode reward: -57.73754570420472, Time: 63.111
[07-15 15:17:39 MainThread @train.py:156] Steps: 150000, Episodes: 6000, Mean episode reward: -60.41700897470501, Time: 62.931

7、运行环境

有好几个环境：

simple、simple_adversary、simple_crypto、simple_push、simple_reference、simple_speaker_listener、simple_spread、simple_tag、simple_world_comm

python train.py #默认运行的是simple_speaker_listener环境

python train.py --env [ENV_NAME] 可以运行其它环境，如：

python train.py --env simple_world_comm #运行simple_world_comm环境

python train.py --env [ENV_NAME] --show --restore #可以查看效果，如：

python train.py --env simple_world_comm --show --restore #查看simple_world_comm运行效果

8、simple_world_comm环境介绍

共有4红2绿6个智能体，1个黑色圆是不可通过的障碍物，2个蓝色圆为food，2个大绿圆为森林，绿色智能体进入后，红色智能体无法获取其位置。
绿色的智能体通过靠近食物来获取奖励，速度较快，数量较少；红色的智能体有一定协同能力，通过阻碍good_agent获取奖励，速度较慢，但数量较多。

9、调整参数

    parser.add_argument(
        '--max_episodes',
        type=int,
        default=600000, #修改 default值可修改训练次数（Episodes次数）
        help='stop condition:number of episodes')

    parser.add_argument(
        '--lr',
        type=float,
        default=1e-3, #修改 default值可修改学习率。
        help='learning rate for Adam optimizer')

    parser.add_argument(
        '--stat_rate',
        type=int,
        default=1000,第1000episodes保存一下，并显示reward值。
        help='statistical interval of save model or count reward')

10、运行simple_world_comm环境一段时间后，已经学会4个红色智能体围追1个绿色智能体（1个绿色智能体引开4个红色智能体），最后reward一直在45－50之间波动。（发不了gif动图，发个图片）

热门活动

课程资源

飞桨深度学习500问

个赞

共2条回复最后由用户已被禁言回复于2022-04

#3jsdbzcm回复于2020-08

https://aistudio.baidu.com/aistudio/projectdetail/634944

#2wangwei8638回复于2020-07

建议分享一下工程

快速回复

小编推荐

【征稿计划第二期】评测报告、使用攻略、行业案例

用户已被禁言 14回复

Baidu Create 2019 | 百度大脑

用户已被禁言 6回复

【颁奖】第三期百度大脑新品体验师

用户已被禁言 19回复

【四月评测】远场语音识别套件

goJhou 17回复

TOP

操作指南

常见问答

平台公告

经验交流

技术专区

文字识别

人脸识别

语音技术

PaddlePaddle

EasyDL

BML

EasyData

AI Studio

UNIT

人体分析

图像搜索

图像识别

内容审核

自然语言处理

机器人视觉

视频技术

增强现实

知识图谱

智能创作

智能呼叫中心

文心

EdgeBoard

DuerOS

EasyEdge

度目硬件

百度AI市场

Doris

AI赛事

百度之星大赛

AI Studio人工智能竞赛

语言与智能技术竞赛

千言数据集

集思广益

共享工具

头脑风暴

成果展示

智能客服