<W>2021-11-17 20:14:43.118 QObject::connect: Cannot queue arguments of type 'QTextCursor'
(Make sure 'QTextCursor' is registered using qRegisterMetaType().)
[Thread 0x7fffc37fe640 (LWP 12229) exited]
[Thread 0x7fffc2ffd640 (LWP 12230) exited]
[Thread 0x7fffc3fff640 (LWP 12228) exited]
[Thread 0x7fffe889b640 (LWP 12227) exited]
[Thread 0x7fff7578d640 (LWP 12250) exited]
[Thread 0x7fff74f8c640 (LWP 12251) exited]
[Thread 0x7fff767fc640 (LWP 12249) exited]
--Type <RET> for more, q to quit, c to continue without paging--
Thread 1 "mumble" received signal SIGSEGV, Segmentation fault.
0x00007ffff65ccbf3 in QTextEngine::lineNumberForTextPosition(int) () from /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5
(gdb) bt
#0 0x00007ffff65ccbf3 in QTextEngine::lineNumberForTextPosition(int) () at /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5
#1 0x00007ffff65cefac in QTextLayout::lineForTextPosition(int) const () at /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5
#2 0x00007ffff662f099 in QTextCursorPrivate::setX() () at /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5
#3 0x00007ffff7869fcd in () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#4 0x00007ffff5fdad6e in QObject::event(QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5 0x00007ffff767a6af in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#6 0x00007ffff5fae75a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7 0x00007ffff5fb17a7 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#8 0x00007ffff6006733 in () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9 0x00007ffff5489c7b in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ffff5489f28 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ffff5489fdf in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007ffff6005db4 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#13 0x00007ffff5fad16b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007ffff5fb5440 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x000055555560b1e5 in main(int, char**) (argc=1, argv=0x7fffffffdc98) at /path/to/mumble/src/mumble/main.cpp:814
(gdb)
The original crashes were produced using a modified version of Minecraft with JNI being used to call the code that would interact with mumbles Link plugin.
I was able to reproduce the crashes using the built-in Manual placement plugin and using the Pulse backend by toggling the "Link" checkbox in its configuration dialog. Only I was using the plugin, my test partner had no active positional audio plugin.
The resulting backtrace was very different from the one I posted above, a second try with the Manual placement plugin resulted in yet another backtrace.
The second attempt at reproducing the crash took more than 20 link-unlink cycles, during which I observed that the chat log window would have the second to last line disappear after some unlink operations, they would reappear when additional lines would get added by the next link operation.
Attached file "manual 1" took roughly 10 link-unlink cycles to crash, "manual 2" was the attempt with disappearing lines in chatlog.
mumble gdb crash manual 1 full.txt
mumble gdb crash manual 2 full.txt
yeah the different backtraces are kinda expected - that can happen with memory corruptions (segfaults).
But every time you had your conversation partner talk to you while the link/unlink cycles happened?
Yes, as mentioned at the top, any audio activity (from them or me) will cause these issues.
I should be able to provide gdb runs with other backends (where the crashes happen on the first unlink if any audio is active) in the next few days if needed.
Okay so I just tried to replicate this on my system and in my case Mumble doesn't crash but will hang up at some point when performing multiple link/unlink cycles. I haven't checked yet whether this only happens while someone is talking to me (in my tests so far that was the case) but maybe this is the same issue or at least has the same root cause.
From by gdb backtrace it seems that there might be a deadlock for fetching positional data... I'll have to have a deeper look at this at some point 👀
Ah I think I have a good idea what might have caused the crash: When reporting that a plugin has lost link, the UI code is accessed directly. However, when audio playback is involved, it can happen that the corresponding function is called from a different thread (the audio thread). Accessing GUI stuff from a different thread can cause a segfault (see also https://medium.com/@armin.samii/avoiding-random-crashes-when-multithreading-qt-f740dc16059)
When a positional data plugin lost link while Mumble was playing audio
(or recording & sending audio from the mic), it could end up crashing.
The reason for this was that whenever a plugin lost link,
PluginManager::reportLostLink was called. In this function a
corresponding message was logged to Mumble's console. This involves
accessing a Qt GUI object. However, if the lost link was detected from
within the audio thread (either AudioInput or AudioOutput) this function
would be called from a different thread than the GUI (main) thread. This
results in undefined behavior and in our case causes a segmentation
fault, which crashes Mumble.
Mitigating this issue is a simple matter of converting this function
into a slot and then only emitting the corresponding signal instead of
calling the function directly. This way, the lostLink function is only
ever called from within the plugin manager's thread (which lives in the
GUI thread).
Fixes mumble-voip#5319
@Missingmew since you seem to be able to reproduce the crash rather reliably, could you check whether it is fixed by #5338? I was unable to reproduce the crash (and wasn't even able to reproduce the freeze that I have been observing earlier) and can therefore not verify that my changes actually fix the issue.
When a positional data plugin lost link while Mumble was playing audio
(or recording & sending audio from the mic), it could end up crashing.
The reason for this was that whenever a plugin lost link,
PluginManager::reportLostLink was called. In this function a
corresponding message was logged to Mumble's console. This involves
accessing a Qt GUI object. However, if the lost link was detected from
within the audio thread (either AudioInput or AudioOutput) this function
would be called from a different thread than the GUI (main) thread. This
results in undefined behavior and in our case causes a segmentation
fault, which crashes Mumble.
Mitigating this issue is a simple matter of converting this function
into a slot and then only emitting the corresponding signal instead of
calling the function directly. This way, the lostLink function is only
ever called from within the plugin manager's thread (which lives in the
GUI thread).
Fixes mumble-voip#5319
Using your fork, I have managed to reproduce the crash when not running under gdb.
With gdb there is no hard crashes but full freezes (deadlocks I assume) instead, which had consistent backtraces over three attempts, see attached file for one such run.
mumble gdb freeze.txt
I updated #5338 (there was another place at which the log function was called from the wrong thread).
I also looked into why a deadlock may occur at the place the backtrace seems to indicate but I did not figure it out yet. However, the line numbers in your backtrace don't seem to match the ones in my fork 🤔
In any case, could you please completely wipe your build directory, pull the latest changes of my fork, rebuild with that and then check if the crash and/or the deadlock still occur?
As it turns out, either I or git screwed up, that last log was using your fork but on master
.
I cleaned up, switched to the correct branch and rebuilt.
Using PA as backend, I was able to go through roughly 5 minutes of Link-Unlink and it ran just fine.
Using ALSA as backend, I was able to get another deadlock but it took a few Link-Unlink cycles (so definitely an improvement over the inital report where it crashed instantly), see attachment for the gdb run and backtrace (this time the line numbers should match up).
mumble gdb freeze 2.txt
Okay great - no more crashes. It seems we're getting somewhere ^^
And yes now the line numbers appear to be matching up 👍
I'll look into this some more in order to dig to the root cause of the deadlock.
Okay I am still a bit lost as to how and why there would be a deadlock involving this particular area of the code.
@Missingmew could you attempt to reproduce this freeze again while gdb is attached and then provide me with a backtrace of all threads (instead of only the main one)? See https://stackoverflow.com/questions/18391808/how-do-i-get-the-backtrace-for-all-the-threads-in-gdb
See attached file for full thread dump. This was done using the PA backend and happened after 5 link-unlink cycles.
mumble gdb freeze all threads.txt
Just had a go at it with my partner, about 5 minutes of continuous link-unlink using ALSA backend and no lockups.
About an hour of chatting with unlink-relink every so often and everything is fine, I think this might be it :D
When a positional data plugin lost link while Mumble was playing audio
(or recording & sending audio from the mic), it could end up crashing.
The reason for this was that whenever a plugin lost link,
PluginManager::reportLostLink was called. In this function a
corresponding message was logged to Mumble's console. This involves
accessing a Qt GUI object. However, if the lost link was detected from
within the audio thread (either AudioInput or AudioOutput) this function
would be called from a different thread than the GUI (main) thread. This
results in undefined behavior and in our case causes a segmentation
fault, which crashes Mumble.
Mitigating this issue is a simple matter of converting this function
into a slot and then only emitting the corresponding signal instead of
calling the function directly. This way, the lostLink function is only
ever called from within the plugin manager's thread (which lives in the
GUI thread).
The same argument applies to the messages logged upon linking a plugin
or upon reporting that a plugin has encountered a permanent error.
Fixes mumble-voip#5319
When a positional data plugin was unlinked while Mumble was playing
audio, a deadlock could be created. This deadlock involved the
m_activePosDataPluginLock and m_positionalData.m_lock. Both locks are
taken in PluginManager::fetchPositionalData. This function calls
PluginManager::selectActivePositionalDataPlugin, but before doing so,
releases the m_activePosDataPluginLock which is re-taken in the
mentioned function.
At the same time the audio thread would call
PluginManager::fetchPositionalData which attempts to take the
aforementioned locks, but this can now lead to a situation in which the
m_activePosDataPluginLock was free and is then taken by the audio thread
but m_positionalData.m_lock is still held by the main thread which is in
the process of calling PluginManager::selectActivePositionalDataPlugin
which then blocks to acquire m_activePosDataPluginLock. This lock is
held by the audio thread but this thread is blocked by waiting until
m_positionalData.m_lock is available again. Thus, a deadlock is created.
This commit fixes this situation by making sure m_positionalData.m_lock
is released before calling selectActivePositionalDataPlugin.
Fixes mumble-voip#5319
This PR fixes a crash that could occur on logging plugin-related info such as
linking/unlinking a positional data plugin and also fixes a deadlock that could
occur in the linking process.
Fixes #5319
When a positional data plugin lost link while Mumble was playing audio
(or recording & sending audio from the mic), it could end up crashing.
The reason for this was that whenever a plugin lost link,
PluginManager::reportLostLink was called. In this function a
corresponding message was logged to Mumble's console. This involves
accessing a Qt GUI object. However, if the lost link was detected from
within the audio thread (either AudioInput or AudioOutput) this function
would be called from a different thread than the GUI (main) thread. This
results in undefined behavior and in our case causes a segmentation
fault, which crashes Mumble.
Mitigating this issue is a simple matter of converting this function
into a slot and then only emitting the corresponding signal instead of
calling the function directly. This way, the lostLink function is only
ever called from within the plugin manager's thread (which lives in the
GUI thread).
The same argument applies to the messages logged upon linking a plugin
or upon reporting that a plugin has encountered a permanent error.
Fixes mumble-voip#5319
When a positional data plugin was unlinked while Mumble was playing
audio, a deadlock could be created. This deadlock involved the
m_activePosDataPluginLock and m_positionalData.m_lock. Both locks are
taken in PluginManager::fetchPositionalData. This function calls
PluginManager::selectActivePositionalDataPlugin, but before doing so,
releases the m_activePosDataPluginLock which is re-taken in the
mentioned function.
At the same time the audio thread would call
PluginManager::fetchPositionalData which attempts to take the
aforementioned locks, but this can now lead to a situation in which the
m_activePosDataPluginLock was free and is then taken by the audio thread
but m_positionalData.m_lock is still held by the main thread which is in
the process of calling PluginManager::selectActivePositionalDataPlugin
which then blocks to acquire m_activePosDataPluginLock. This lock is
held by the audio thread but this thread is blocked by waiting until
m_positionalData.m_lock is available again. Thus, a deadlock is created.
This commit fixes this situation by making sure m_positionalData.m_lock
is released before calling selectActivePositionalDataPlugin.
Fixes mumble-voip#5319