失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > C++回调函数中调用Python函数出现的死锁问题调试及解决

C++回调函数中调用Python函数出现的死锁问题调试及解决

时间:2022-11-15 15:07:21

相关推荐

C++回调函数中调用Python函数出现的死锁问题调试及解决

一、查找死锁原因:

1、使用gdb exe指令进入gdb命令行,再输入r运行可执行文件

gdb /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_serverGNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1Copyright (C) Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later </licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:</software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:</software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_server...done.(gdb) rStarting program: /home/sdhm/catkin_ws/devel/lib/gpd_ros/gpd_server [Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7fffd5c41700 (LWP 1837)][New Thread 0x7fffd5440700 (LWP 1838)][New Thread 0x7fffd4c3f700 (LWP 1839)][New Thread 0x7fffcffff700 (LWP 1844)][New Thread 0x7fffcc85d700 (LWP 1847)][New Thread 0x7fffbdfff700 (LWP 1848)][New Thread 0x7fffbd7fe700 (LWP 1849)][New Thread 0x7fffb8ffd700 (LWP 1850)][New Thread 0x7fffb67fc700 (LWP 1851)][New Thread 0x7fffb3ffb700 (LWP 1852)][New Thread 0x7fffb17fa700 (LWP 1853)][Thread 0x7fffb17fa700 (LWP 1853) exited][Thread 0x7fffb3ffb700 (LWP 1852) exited][Thread 0x7fffb67fc700 (LWP 1851) exited][Thread 0x7fffb8ffd700 (LWP 1850) exited][Thread 0x7fffbd7fe700 (LWP 1849) exited][Thread 0x7fffbdfff700 (LWP 1848) exited][Thread 0x7fffcc85d700 (LWP 1847) exited][New Thread 0x7fffb17fa700 (LWP 1874)][New Thread 0x7fffb3ffb700 (LWP 1875)][New Thread 0x7fffb67fc700 (LWP 1876)][New Thread 0x7fffb8ffd700 (LWP 1925)][New Thread 0x7fff006eb700 (LWP 1926)][New Thread 0x7ffeffeea700 (LWP 1927)][New Thread 0x7ffeff6e9700 (LWP 1928)][New Thread 0x7ffefeee8700 (LWP 1929)][New Thread 0x7ffefe6e7700 (LWP 1930)][New Thread 0x7ffefdee6700 (LWP 1931)][New Thread 0x7ffefd6e5700 (LWP 1933)][New Thread 0x7ffefcee4700 (LWP 1935)][New Thread 0x7ffed7fff700 (LWP 1936)][New Thread 0x7ffed77fe700 (LWP 1937)][New Thread 0x7ffed6ffd700 (LWP 1938)][New Thread 0x7ffed67fc700 (LWP 1939)][New Thread 0x7ffed5ffb700 (LWP 1940)][New Thread 0x7ffed57fa700 (LWP 1941)][New Thread 0x7ffed4ff9700 (LWP 1942)][New Thread 0x7ffeb767e700 (LWP 1943)][New Thread 0x7ffeb6e7d700 (LWP 1944)][New Thread 0x7ffeb667c700 (LWP 1945)][New Thread 0x7ffeb5e7b700 (LWP 1946)][New Thread 0x7ffeb567a700 (LWP 1948)][New Thread 0x7ffeb4e79700 (LWP 1949)][New Thread 0x7ffeaffff700 (LWP 1950)][New Thread 0x7ffeaf7fe700 (LWP 1951)][New Thread 0x7ffeaeffd700 (LWP 1952)][New Thread 0x7ffeae7fc700 (LWP 1953)][New Thread 0x7ffeadffb700 (LWP 2166)][New Thread 0x7ffead7fa700 (LWP 2167)][New Thread 0x7ffeacff9700 (LWP 2168)][New Thread 0x7ffea7fff700 (LWP 2169)][New Thread 0x7ffea77fe700 (LWP 2170)][New Thread 0x7ffea6ffd700 (LWP 2171)][New Thread 0x7ffea67fc700 (LWP 2172)]

2、此时程序死锁,持续运行,但不往下走,按下ctrl + c

^CThread 1 "gpd_server" received signal SIGINT, Interrupt.0x00007ffff72fcc1d in nanosleep () at ../sysdeps/unix/syscall-template.S:8484../sysdeps/unix/syscall-template.S: No such file or directory.

3、查看线程栈信息,info stack,这个命令只能查看当前正在运行的某个线程的栈信息

(gdb) info stack#0 0x00007ffff72fcc1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84#1 0x00007ffff647d96b in ros::ros_wallsleep(unsigned int, unsigned int) ()from /opt/ros/kinetic/lib/librostime.so#2 0x00007ffff6bf6f77 in ros::waitForShutdown() () from /opt/ros/kinetic/lib/libroscpp.so#3 0x00007ffff6c15671 in ros::MultiThreadedSpinner::spin(ros::CallbackQueue*) ()from /opt/ros/kinetic/lib/libroscpp.so#4 0x0000000000430f30 in main (argc=1, argv=<optimized out>)at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:132

4、info threads查看所有线程id,前面有*的,代表正在运行的线程,其他没有*的极有可能是在阻塞或者死锁的。

(gdb) info threadsId Target Id Frame * 1 Thread 0x7ffff7f33c40 (LWP 1767) "gpd_server" 0x00007ffff72fcc1d in nanosleep ()at ../sysdeps/unix/syscall-template.S:842 Thread 0x7fffd5c41700 (LWP 1837) "gpd_server" 0x00007ffff10a3a13 in epoll_wait ()at ../sysdeps/unix/syscall-template.S:843 Thread 0x7fffd5440700 (LWP 1838) "gpd_server" 0x00007ffff109774d in poll ()at ../sysdeps/unix/syscall-template.S:844 Thread 0x7fffd4c3f700 (LWP 1839) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:1855 Thread 0x7fffcffff700 (LWP 1844) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22513 Thread 0x7fffb17fa700 (LWP 1874) "gpd_server" 0x00007ffff10a48c8 in accept4 (fd=17, addr=..., addr_len=0x7fffb17f9918, flags=524288)at ../sysdeps/unix/sysv/linux/accept4.c:4014 Thread 0x7fffb3ffb700 (LWP 1875) "gpd_server" 0x00007ffff109774d in poll ()at ../sysdeps/unix/syscall-template.S:8415 Thread 0x7fffb67fc700 (LWP 1876) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18516 Thread 0x7fffb8ffd700 (LWP 1925) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22517 Thread 0x7fff006eb700 (LWP 1926) "gpd_server" 0x00007ffff72fb827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x8b4030)at ../sysdeps/unix/sysv/linux/futex-internal.h:20518 Thread 0x7ffeffeea700 (LWP 1927) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22519 Thread 0x7ffeff6e9700 (LWP 1928) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225---Type <return> to continue, or q <return> to quit---return20 Thread 0x7ffefeee8700 (LWP 1929) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22521 Thread 0x7ffefe6e7700 (LWP 1930) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22522 Thread 0x7ffefdee6700 (LWP 1931) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22523 Thread 0x7ffefd6e5700 (LWP 1933) "gpd_server" pthread_cond_timedwait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:22524 Thread 0x7ffefcee4700 (LWP 1935) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18525 Thread 0x7ffed7fff700 (LWP 1936) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18526 Thread 0x7ffed77fe700 (LWP 1937) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18527 Thread 0x7ffed6ffd700 (LWP 1938) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18528 Thread 0x7ffed67fc700 (LWP 1939) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18529 Thread 0x7ffed5ffb700 (LWP 1940) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18530 Thread 0x7ffed57fa700 (LWP 1941) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18531 Thread 0x7ffed4ff9700 (LWP 1942) "gpd_server" pthread_cond_wait@@GLIBC_2.3.2 ()at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:18532 Thread 0x7ffeb767e700 (LWP 1943) "gpd_server" 0x00007ffff72fcc1d in nanosleep ()at ../sysdeps/unix/syscall-template.S:8433 Thread 0x7ffeb6e7d700 (LWP 1944) "gpd_server" 0x00007ffff158db4f in ?? ()---Type <return> to continue, or q <return> to quit---quitfrom /usr/libQuit

5、thread apply all bt (thread apply all 命令,gdb会让所有线程都执行这个命令,比如命令为bt,查看所有线程的具体的栈信息)

(gdb) thread apply all btThread 48 (Thread 0x7ffea67fc700 (LWP 2172)):#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea67fc700) at pthread_create.c:333#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109Thread 47 (Thread 0x7ffea6ffd700 (LWP 2171)):#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea6ffd700) at pthread_create.c:333#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109Thread 46 (Thread 0x7ffea77fe700 (LWP 2170)):#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea77fe700) at pthread_create.c:333#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109Thread 45 (Thread 0x7ffea7fff700 (LWP 2169)):#0 0x00007ffff158db4f in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#1 0x00007ffff158b418 in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1#2 0x00007ffff72f36ba in start_thread (arg=0x7ffea7fff700) at pthread_create.c:333#3 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109..........

6、将输出复制到文件中,查找出现lock的线程

Thread 17 (Thread 0x7fff006eb700 (LWP 1926)):#0 0x00007ffff72fb827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x8b4030) at ../sysdeps/unix/sysv/linux/futex-internal.h:205#1 do_futex_wait (sem=sem@entry=0x8b4030, abstime=0x0) at sem_waitcommon.c:111---Type <return> to continue, or q <return> to quit---#2 0x00007ffff72fb8d4 in __new_sem_wait_slow (sem=0x8b4030, abstime=0x0)at sem_waitcommon.c:181#3 0x00007ffff72fb97a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29#4 0x00007ffff05d5028 in PyThread_acquire_lock ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#5 0x00007ffff05a9966 in PyEval_RestoreThread ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#6 0x00007ffff0624cf6 in PyGILState_Ensure ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#7 0x00007fffab08b47e in std::_Function_handler<void (void*), torch::utils::tensor_from_numpy(_object*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*) ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#8 0x00007fff70bba37c in c10::deleteInefficientStdFunctionContext(void*) ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libc10.so#9 0x00007fff71471610 in at::TensorImpl::release_resources() ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libcaffe2.so#10 0x00007fff6f55d04b in c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reset_() () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch.so.1#11 0x00007fff6f7d1e67 in torch::autograd::Variable::Impl::release_resources() ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch.so.1#12 0x00007fffaad02b6b in c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reset_() () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#13 0x00007fffab0854d0 in torch::utils::(anonymous namespace)::internal_new_from_data(at::Type const&, c10::optional<c10::Device>, _object*, bool, bool, bool) ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#14 0x00007fffab0872fd in torch::utils::legacy_new_from_data(at::Type const&, c10::optional<c10::Device>, _object*) ()---Type <return> to continue, or q <return> to quit---from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#15 0x00007fffab087363 in torch::utils::(anonymous namespace)::legacy_new_from_sequence(at::Type const&, c10::optional<c10::Device>, _object*) ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#16 0x00007fffab0896a8 in torch::utils::legacy_tensor_ctor(at::Type const&, _object*, _object*) () from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#17 0x00007fffab06285a in torch::tensors::Tensor_new(_typeobject*, _object*, _object*) ()from /usr/local/lib/python2.7/dist-packages/torch/lib/libtorch_python.so#18 0x00007ffff05c81b3 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#19 0x00007ffff06112b3 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#20 0x00007ffff05af39c in PyEval_EvalFrameEx ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#21 0x00007ffff06e811c in PyEval_EvalCodeEx ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#22 0x00007ffff063e3b0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#23 0x00007ffff06112b3 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#24 0x00007ffff06e7547 in PyEval_CallObjectWithKeywords ()from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0#25 0x00007ffff775cc1b in gpd::net::PythonClassifier::classifyPointsBatch(std::vector<std::unique_ptr<Eigen::Matrix<double, 3, -1, 0, 3, -1>, std::default_delete<Eigen::Matrix<double, 3, -1, 0, 3, -1> > >, std::allocator<std::unique_ptr<Eigen::Matrix<double, 3, -1, 0, 3, -1>, std::default_delete<Eigen::Matrix<double, 3, -1, 0, 3, -1> > > > > const&) ()from /usr/local/lib/libgpd_pointnet.so#26 0x00007ffff7747c4f in gpd::GraspDetectorPointNet::detectGrasps(gpd::util::Cloud&) ()from /usr/local/lib/libgpd_pointnet.so#27 0x0000000000432663 in GraspDetectionServer::run (this=this@entry=0x7fffffffcd90, loop_rate=loop_rate@entry=15)---Type <return> to continue, or q <return> to quit---at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:57#28 0x00000000004328ef in GraspDetectionServer::detectGrasps (this=0x7fffffffcd90, req=..., res=...) at /home/sdhm/catkin_ws/src/gpd_ros/src/gpd_ros/gpd_server.cpp:72#29 0x0000000000469b12 in boost::function2<bool, gpd_ros::detect_graspsRequest_<std::allocator<void> >&, gpd_ros::detect_graspsResponse_<std::allocator<void> >&>::operator() (a1=..., a0=..., this=0x205efe8) at /usr/include/boost/function/function_template.hpp:773#30 ros::ServiceSpec<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > >::call(boost::function<bool (gpd_ros::detect_graspsRequest_<std::allocator<void> >&, gpd_ros::detect_graspsResponse_<std::allocator<void> >&)> const&, ros::ServiceSpecCallParams<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > >&) (params=<synthetic pointer>, cb=...)at /opt/ros/kinetic/include/ros/service_callback_helper.h:125#31 ros::ServiceCallbackHelperT<ros::ServiceSpec<gpd_ros::detect_graspsRequest_<std::allocator<void> >, gpd_ros::detect_graspsResponse_<std::allocator<void> > > >::call (this=0x205efe0, params=...) at /opt/ros/kinetic/include/ros/service_callback_helper.h:182#32 0x00007ffff6b5f501 in ros::ServiceCallback::call() ()from /opt/ros/kinetic/lib/libroscpp.so#33 0x00007ffff6bb3838 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) ()from /opt/ros/kinetic/lib/libroscpp.so#34 0x00007ffff6bb4074 in ros::CallbackQueue::callOne(ros::WallDuration) ()from /opt/ros/kinetic/lib/libroscpp.so#35 0x00007ffff6c11265 in ros::AsyncSpinnerImpl::threadFunc() ()from /opt/ros/kinetic/lib/libroscpp.so#36 0x00007fffeef395d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0#37 0x00007ffff72f36ba in start_thread (arg=0x7fff006eb700) at pthread_create.c:333#38 0x00007ffff10a341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

看到Thread 17中出现了PyThread_acquire_lock,说明Python代码死锁了。

二、解决死锁问题

以上问题的出现是由于在C++回调函数中调用了Python函数,但是确没有获取Python的GIL。当调用C/C++回调时,线程正在运行。如果从另一个非Python创建线程调用,那么在调用任何Python API函数之前,必须获取Python的全局解释器锁(GIL)。否则,程序的行为是未定义的。

解决方法:在调用Python函数前获得GIL锁,在调用后释放。

// C++中的回调函数,或回调函数中的函数void callback() {static gil_init = false;if(!gil_init) { // 确保GIL锁已被创建, 并仅创建一次PyEval_InitThreads();PyEval_SaveThread();gil_init = true;}// 获取GILPyGILState_STATE gstate;gstate = PyGILState_Ensure();// 获取参数等// 调用Python函数PyObject * pInstance = PyObject_CallObject(pFunc, args);// 其他Python操作// 释放锁,后面不可有Python相关API调用PyGILState_Release(gstate);}

参考:

Linux C/C++ 多线程死锁的gdb调试方法

Calling python method from C++ (or C) callback

如何在多线程C应用程序中嵌入python?

如果觉得《C++回调函数中调用Python函数出现的死锁问题调试及解决》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。