julia
40e05c5d - fix cmdlineargs test hang (#59942)

Commit
59 days ago
fix cmdlineargs test hang (#59942) Fix #59768 Running this a lot in a loop locally, it fails for me on the following test at cmdlineargs.jl:77:79:95 (aka `wait(p)`) ``` /home/vtjnash/julia1/usr/bin/julia --check-bounds=yes --startup-file=no --depwarn=error ./runtests.jl --buildroot=/home/vtjnash/julia1 cmdlineargs └─ /home/vtjnash/julia1/usr/bin/julia -C native -J/home/vtjnash/julia1/usr/lib/julia/sys.so --depwarn=error --check-bounds=yes -g1 --startup-file=no --startup-file=no --color=no -e mutable struct RLimit cur::Int ``` It seems that the child very clearly must have seen the signal, and just as clearly chosen to ignore it. ``` (gdb) p jl_gc_disable_counter $21 = 0 (gdb) p thread0_exit_count $22 = 1 ``` ``` pid tcomm ppid pending blocked sigign sigcatch 1519170 (julia) 1517484 0 1000000010 1000000010000 1000000000000000000111011101010 1519172 (iou-sqp-1519170) 1517484 0 1111111111110111111111011111111 1000000010000 1000000000000000000111011101010 1519173 (julia) 1517484 0 100001000000010 1000000010000 1000000000000000000111011101010 1519190 (julia) 1517484 0 100001000000110 1000000010000 1000000000000000000111011101010 1517484 (julia) 1517483 0 100001000000110 1000000000000 1000000000000010000111011101010 1517488 (iou-sqp-1517484) 1517483 0 1111111111110111111111011111111 1000000000000 1000000000000010000111011101010 1517489 (julia) 1517483 0 100001000000110 1000000000000 1000000000000010000111011101010 1517500 (julia) 1517483 0 100001000000110 1000000000000 1000000000000010000111011101010 ``` ``` (gdb) thr apply all bt Thread 4 (Thread 0x7ff6b007d640 (LWP 1519190) "julia"): Thread 3 (Thread 0x7ff6b67cf640 (LWP 1519173) "julia"): Thread 2 (Thread 0x7ff6c981e740 (LWP 1519172) "iou-sqp-1519170"): Backtrace stopped: Cannot access memory at address 0x0 Thread 1 (Thread 0x7ff6c981e740 (LWP 1519170) "julia"): ``` And why did we ignore SIGQUIT? Probably this: ``` ==== Thread 1 created 2 live tasks ---- Root task (0x7f80a3fcc010) (sticky: 1, started: 1, state: 0, tid: 1) jl_start_fiber_swap at /home/vtjnash/julia1/src/task.c:1467 ctx_switch at /home/vtjnash/julia1/src/task.c:652 ijl_switch at /home/vtjnash/julia1/src/task.c:692 try_yieldto at ./task.jl:1101 yieldto at ./task.jl:1096 yieldto at ./task.jl:1082 [inlined] wait at ./task.jl:1215 wait at ./condition.jl:136 [inlined] _wait at ./task.jl:312 jfptr_YY.start_profile_listenerYY.YY.2_14042 at /home/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) _atexit at ./initdefs.jl:464 jfptr__atexit_27874 at /home/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) jl_apply at /home/vtjnash/julia1/src/julia.h:2275 [inlined] ijl_atexit_hook at /home/vtjnash/julia1/src/init.c:263 jl_exit_thread0_cb at /home/vtjnash/julia1/src/signals-unix.c:557 ---- End root task ---- Task 1 (0x7f80a3fcf1c0) (sticky: 1, started: 1, state: 0, tid: 1) jl_record_backtrace at /home/vtjnash/julia1/src/stackwalk.c:1290 jl_fprint_backtracet at /home/vtjnash/julia1/src/stackwalk.c:1397 jl_fprint_task_backtraces at /home/vtjnash/julia1/src/stackwalk.c:1457 unknown function (ip: 0x7f809993683e) at (unknown file) ---- End task 1 ==== End thread 1 ==== Thread 2 created 2 live tasks ---- Task 1 (0x7f80a7fc8100) (sticky: 1, started: 1, state: 0, tid: 2) no backtrace recorded ---- End task 1 ---- Task 2 (0x7f80a3fcc1f0) (sticky: 0, started: 1, state: 0, tid: 2) unknown function (ip: 0x7f80bbd702be) at /lib/x86_64-linux-gnu/libc.so.6 pthread_mutex_lock at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) dlsym at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) ijl_dlsym at /home/vtjnash/julia1/src/dlload.c:445 ijl_load_and_lookup at /home/vtjnash/julia1/src/runtime_ccall.cpp:62 jlplt_ijl_eqtable_pop_6260 at /home/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) pop! at ./iddict.jl:104 [inlined] pop! at ./iddict.jl:115 [inlined] unpreserve_handle at ./libuv.jl:76 _trywait at ./asyncevent.jl:193 profile_printing_listener at ./Base.jl:335 jfptr_YY.start_profile_listenerYY.YY.0_12244 at /home/vtjnash/julia1/usr/lib/julia/sys.so (unknown line) jl_apply at /home/vtjnash/julia1/src/julia.h:2275 [inlined] start_task at /home/vtjnash/julia1/src/task.c:1281 ---- End task 2 ==== End thread 2 ==== Done ``` We probably interrupted while the child main thread while it was holding some lock (looks like probably for dlsym), causing lock ordering issues when we try to wait for `profile_printing_listener` while still holding the pthread_mutex on this thread that `profile_printing_listener` would need to continue and return ``` 128 ./nptl/pthread_mutex_lock.c: No such file or directory. (gdb) p mutex $7 = (pthread_mutex_t *) 0x7f80bbf58a48 <_rtld_global+2568> (gdb) p *mutex $8 = {__data = {__lock = 2, __count = 1, __owner = 1509489, __nusers = 1, __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\001\000\000\000q\b\027\000\001\000\000\000\001", '\000' <repeats 22 times>, __align = 4294967298} (gdb) info thr Id Target Id Frame 1 Thread 0x7f80bbcb9740 (LWP 1509489) "julia" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55a138d5ae28) at ./nptl/futex-internal.c:57 2 Thread 0x7f80bbcb9740 (LWP 1509491) "iou-sqp-1509489" 0x0000000000000000 in ?? () 3 Thread 0x7f80a8bcf640 (LWP 1509492) "julia" 0x00007f80bbd2221a in __GI___sigtimedwait (set=set@entry=0x7f80a8bce950, info=info@entry=0x7f80a8bce9d0, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:61 * 4 Thread 0x7f80a24fd640 (LWP 1509503) "julia" futex_wait (private=0, expected=2, futex_word=0x7f80bbf58a48 <_rtld_global+2568>) at ../sysdeps/nptl/futex-internal.h:146 ``` n.b. there might be a small chance this test will fail in the future with other errors (e.g. signal 11), but we haven't see that yet
Author
Parents
Loading