救助详细中出现的paddle训练问题要如何解决
星榕基 发布于2020-03 浏览:3303 回复:3
0
收藏

[2020-03-25 21:08:52,561] [ INFO] - Installing mobilenet_v2_imagenet module
[2020-03-25 21:08:53,152] [ INFO] - Module mobilenet_v2_imagenet already installed in /Users/handy/.paddlehub/modules/mobilenet_v2_imagenet
[2020-03-25 21:08:53,660] [ INFO] - 267 pretrained paramaters loaded by PaddleHub
[2020-03-25 21:08:53,661] [ INFO] - Dataset label map = {'roses': 0, 'daisy': 1}
[2020-03-25 21:08:53,661] [ INFO] - Checkpoint dir: ckpt_model
[2020-03-25 21:09:00,721] [ INFO] - Strategy with slanted triangle learning rate, L2 regularization,
/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddle/fluid/executor.py:804: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used.
warnings.warn(error_info)
[2020-03-25 21:09:00,763] [ INFO] - Try loading checkpoint from ckpt_model/ckpt.meta
[2020-03-25 21:09:00,763] [ INFO] - PaddleHub model checkpoint not found, start from scratch...
[2020-03-25 21:09:00,810] [ INFO] - PaddleHub finetune start
I0325 21:09:01.170374 272072128 parallel_executor.cc:440] The Program will be executed on CPU using ParallelExecutor, 2 cards are used, so 2 programs are executed in parallel.
W0325 21:09:01.280500 272072128 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 161. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 161.
I0325 21:09:01.301883 272072128 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0325 21:09:01.572834 272072128 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0325 21:09:01.647723 272072128 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
[2020-03-25 21:09:05,624] [ INFO] - Evaluation on dev dataset start
Traceback (most recent call last):
File "classify_task.py", line 27, in
run_states = paddle_wrap.task.finetune_and_eval()
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 864, in finetune_and_eval
return self.finetune(do_eval=True)
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 893, in finetune
self.eval(phase="dev")
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 923, in eval
self._eval_end_event(run_states)
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 631, in hook_function
func(*args)
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py", line 719, in _default_eval_end_event
eval_scores, eval_loss, run_speed = self._calculate_metrics(run_states)
File "/Users/handy/.virtualenvs/PaddleApp/lib/python3.7/site-packages/paddlehub/finetune/task/classifier_task.py", line 115, in _calculate_metrics
run_time_used = time.time() - run_states[0].run_time_begin
IndexError: list index out of range
libc++abi.dylib: terminating with uncaught exception of type paddle::platform::EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::__1::basic_string, std::__1::allocator > paddle::platform::GetTraceBackString, std::__1::allocator > const&>(std::__1::basic_string, std::__1::allocator > const&, char const*, int)
1 paddle::framework::RWLock::RDLock()
2 paddle::framework::Scope::HasKid(paddle::framework::Scope const*) const
3 paddle::framework::ParallelExecutorPrivate::~ParallelExecutorPrivate()
4 paddle::framework::ParallelExecutor::~ParallelExecutor()
5 pybind11::class_::dealloc(pybind11::detail::value_and_holder&)
6 pybind11::detail::clear_instance(_object*)
7 pybind11_object_dealloc

----------------------
Error Message Summary:
----------------------
Error: acquire read lock failed
[Hint: Expected pthread_rwlock_rdlock(&lock_) == 0, but received pthread_rwlock_rdlock(&lock_):22 != 0:0.] at (/home/teamcity/work/ef54dc8a5b211854/paddle/fluid/framework/rw_lock.h:36)

W0325 21:09:06.206892 272072128 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0325 21:09:06.206903 272072128 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0325 21:09:06.206907 272072128 init.cc:214] The detail failure signal is:

W0325 21:09:06.206910 272072128 init.cc:217] *** Aborted at 1585141746 (unix time) try "date -d @1585141746" if you are using GNU date ***
W0325 21:09:06.207233 272072128 init.cc:217] PC: @ 0x0 (unknown)
W0325 21:09:06.209056 272072128 init.cc:217] *** SIGABRT (@0x7fff682af7fa) received by PID 9529 (TID 0x110377dc0) stack trace: ***
W0325 21:09:06.209656 272072128 init.cc:217] @ 0x7fff6836142d _sigtramp

收藏
点赞
0
个赞
共3条回复 最后由Randcase回复于2020-08
#4Randcase回复于2020-08

干啥的时候碰到的?

0
#3550474936回复于2020-07

遇到了类似的问题,请问问题解决了吗?

0
#2小马奔特回复于2020-05

加paddle技术支持群了解下,或提工单吧,

0
TOP
切换版块