| Home >> Varia >> Software Engineering

5-2-MUTE: Zoom Interpretation Turn-Taking

When communications don't follow a protocol, bad things can happen.
When communications don't follow a protocol, bad things can happen.
[Tenerife airport disaster, 1977-March-27]

1) Introduction

How should conference interpreters coordinate their "turns" (or "shifts" or "handoffs") when using the currently buggy Zoom videoconference software? This article is my current best guess; feel free to send me improvements.

Two warnings: This article will become totally obsolete as soon as Zoom fixes its bug (i.e. when the interpretation feature actually allows interpreters to coordinate their turns using Zoom, not despite Zoom; as I'm writing, Zoom is at Version 5.6.5). Also, none of this is of any interest if you're not a conference interpreter.

2) Overview of the communication protocol

Interpreters on the verge of changing turns.
Interpreters on the verge of changing turns.

None of this is complicated, since all we're doing is standardizing the communications between the "Interpreter-Working" and the "Interpreter-Watching", so they can switch roles without crashing.

In the screen shot of an actual Zoom meeting here above, you can see the "Interpreter-Watching" as (1) and the "Interpreter-Working" as (2). (You are seeing things from the point of view of the "Interpreter-Watching".) If we had the full transcript of the messages they exchanged in the Zoom chat (4), we would see something like:

[Interpreter-Watching]	5 MIN
[Interpreter-Working]	OK
[Interpreter-Watching]	2 MIN
[Interpreter-Watching]	WATCHING YOUR MUTE

And that's it! (Hence the title "5-2-MUTE" of this article.) When the icon of the working interpreter flips to the "muted" icon, that's the signal for the watching interpreter to unmute and start working.

3) Now the hard part: Why this way?

The hardest part about all this is "herding the cats", i.e. convincing all interpreters to change their old habits and to adopt a behavior they probably don't want to adopt, because nobody likes to change.

This is even more difficult because we're talking about a communication protocol, which by definition is a convention, i.e. something that can easily be different. Think about traffic lights: why green for Go and red for Stop? There is no reason, and we could all argue for any other colors. The benefit is all agreeing on the same colors, not that red or green are particuliarly sexy or soothing.

Allow me to propose some principles:

3.1) Just about any standardization is better than nothing. If we all agree, we don't have to reinvent the wheel and have long discussions before each contract, because each interpreter wants to coordinate shift changes differently.

3.2) The Zoom microphone icon is the ultimate authority. Many interpreters advocate for an external coordination mechanism, like "WhatsApp" or "FaceTime" or cellular telephone text messages, etc. One of the many problems with such external signals is that when they fail (which often happens with technology), the conference continues, but now the turn coordination has broken down. Zoom does fail, and not rarely, but if the Zoom microphone icon fails, all of Zoom has failed and you cannot interpret anyway.

3.3) The intellectual burden should be on the Interpreter-Watching. Splitting your attention while you're translating is very difficult, and the few brain cells you manage to take away from your main task usually amount to the I.Q. of a teabag. In other words, no matter how smart the interpreter, his or her intelligence will be a lot lower when attempting to do something on top of interpreting. So any protocol should spare the Interpreter-Working from as much thought process as possible.

3.4) Prepare the turn switch, but not too much. Every time the Interpreter-Working has to look at the Zoom chat and read a message, there is a chance of "crashing" (loosing one or more words, or even a whole sentence, etc.), so the number of messages sent by the Interpreter-Watching should be minimized. On the other hand, if you send only one message, the Interpreter-Working might take a while to see it, or even miss it (one of the Zoom bugs is that interpreters don't have their separate chat, and sometimes participants gab a lot, which can obscure communications). After trial-and-error, three messages seems a reasonable balance.

3.5) The three actual messages. The first "5 MIN" is far out enough for the Interpreter-Working to see it, even if he or she is really concentrated on a tough speaker, while not being too far out to be meaningless (like "15 MIN" which doesn't help). Also, the fact it's fairly far out means if the Interpreter-Working is mixed up and says "OK" to the wrong person (frequent in teams of three interpreters), the Interpreter-Watching can notice he didn't get an "OK". He can then chat to the Interpreter-Working something like "Are you chatting back to somebody else?", which gives the Interpreter-Working plenty of time to fix his mistake.

The second "2 MIN" is not too close to the actual time for switching ("1 MIN" is more of a distraction than a help), while still meaning both interpreters are now getting ready to switch.

The last message, "WATCHING YOUR MUTE", means: "Do it, do it now!" The Interpreter-Working has finished his turn (usually 30 minutes), and should now mute himself as soon as possible. The Interpreter-Watching, since he has just sent a Zoom chat, has activated his Zoom window (i.e. the active program on his computer is not Termium or Linguee or something else, which would prevent him from turning on his Zoom microphone), he is keeping his fingers on the keyboard shortcut to activate his Zoom microphone (ALT-A), and stares at the microphone icon of the Interpreter-Working, while mentally keeping track of what the speaker is saying, so he can patch his translation seamlessly into the flow. As soon as the microphone icon of the Interpreter-Working changes to the "muted" status, the Interpreter-Watching turns on his microphone and starts to work.

3.6) "GO" is very different. Some interpreters think "GO" is the same as "WATCHING YOUR MUTE", but "GO" puts the burden of turn-switching on the Interpreter-Working: he has to keep track of how long he has been working, he has to start typing to tell the Interpreter-Watching to start working, and he has to make sure he is sending the message correctly. How many times have I received chats like "GO!", "GO DARNIT!", "GO GO GO"! when it wasn't even my turn to start working! But as I often repeat, the Interpreter-Working has the I.Q. of a teabag, so such mistakes occur regularly when the Interpreter-Watching waits helplessly instead of calmly organizing the handoff.

4) A few details

Here are some additional details about this "5-2-MUTE" method:

4.1) Filter the interpreters. Most meetings have many participants. If all interpreters put the word "Interpreter" somewhere in their Zoom name, it's very easy to type "interp" or such in the Search box for the Participants sub-window. This way, only the actual interpreters appear in the list of Participants, so you can see their microphone status, as well as the language channel they are on (handy when an interpreter chooses the wrong channel, so the others can chat to him to fix his mistake). You can see this at (3) in the screen shot above.

4.2) ALL CAPS. Because of the Zoom bug, communications between interpreters are not on a separate channel, so by putting them in ALL CAPS, they are easier to pick out from all the other messages in the chat.

4.3) "MUTE" and not "MIKE" or something else. Yes, you could write: "Considering the aspect of your microphone's icon", but by using the word "mute", you help the Interpreter-Working (who, as you remember, currently has the I.Q. of a teabag). All his brain cells need to do is read "mute", and that tells him what to do! Just mute yourself, and the Interpreter-Waiting will save your bacon!

4.4) This whole "5-2-MUTE" is for worst case scenarios. Strict communication protocols are necessary to avoid crashes when things are difficult (like an unusually busy airport that is blanketed by fog). But if you are all alone at the airport and the visibility is perfect, any communication will do the job. It's the same for conferences. For example, if the meeting has allocated each speaker 2 minutes at the microphone, or if the speaker is using slides and changing them regularly, then you just type: "after this lady", or "whenever they change the PowerPoint slide", etc.

In nearly ideal conditions, I've even seen turn coordination by "dead reckoning" (i.e. purely by looking at the clock). I say ideal conditions because we had all the documentation in English and in French, as well as the script for the Master of Ceremony, and they were respecting their schedule to the minute, and all the interpreters were experienced and had been working together for many years. We synchronized our watches before the conference started, then, after exactly 30 minutes, we would start watching the Zoom microphone icon of the other interpreter, and start talking when it muted. I don't recommend turn-switching by "dead reckoning"; it's like doing trapeze without a safety net. Also, the only reason we did this was that the "Zoom Host" had stupidly disactivated the chat feature.

4.5) Yes, some conference organizers disable the chat. There is no reason to prevent the interpreters from using the chat. If the "Zoom Host" somehow has a good reason to disactivate the chat for normal participants, he can make an exception for the interpreters by just assigning them as "Zoom Co-Hosts", which lets the interpreters continue to use the chat.

I have actually seen a conference where the interpreters wrote messages in large letters on a physical sheet of paper to say: "RE-ENABLE THE CHAT!", and holding it in front of their camera to try to attract the attention of the moron responsible for that Zoom conference! And even that didn't work! That leads me to yet another reason to never prevent the interpreters from having access to the chat: they must be able to contact the Zoom Host if there is a technical problem.

5) What would a Zoom bug fix look like?

73% good reasons to worry about Zoom bugs.
73% good reasons to worry about Zoom bugs.
[Source]

If the Zoom interpretation feature was fixed, what would it look like? Probably a lot like the kludge or workaround many interpreters currently use: some dedicated third-party communication channel that somehow reproduces the good old translation booth. When physically sitting together, the Interpreter-Working and the Interpreter-Watching can hear and see each other, so turn-switching is very natural. The Interpreter-Watching can check the elapsed time, and when his turn comes up, he just puts his finger above the button on the translation console that decides which microphone is active (i.e. it turns off the microphone of the Interpreter-Working, and turns on the microphone of the Interpreter-Watching). With peripheral vision, the Interpreter-Working sees this out of the corner of his eye, so he finishes his sentence and just waves his hand or something, giving the signal to the Interpreter-Watching to push the button and start working.

There are many ways to do this, for example (blurbs from Wikipedia):

5.1) Google Chat is a communication software developed by Google built for teams that provides direct messages and team chat rooms, similar to competitors Slack and Microsoft Teams, along with a group messaging function that allows Google Drive content sharing.

5.2) WhatsApp Messenger, is a freeware, cross-platform centralized messaging and voice-over-IP (VoIP) service owned by Facebook, Inc. It allows users to send text messages and voice messages, make voice and video calls, and share images, documents, user locations, and other content. WhatsApp's client application runs on mobile devices but is also accessible from desktop computers, as long as the user's mobile device remains connected to the Internet while they use the desktop app.

5.3) Apple FaceTime is a proprietary videotelephony product developed by Apple Inc. FaceTime is available on supported iOS mobile devices running iOS 4 and later and Mac computers that run Mac OS X 10.6.6 and later.

5.4) Facebook Messenger is a messaging app and platform developed by Facebook, Inc. Facebook has launched a dedicated website interface (Messenger.com), and separated the messaging functionality from the main Facebook app, allowing users to use the web interface or download one of the standalone apps. In April 2020, Facebook officially released Messenger for Desktop, which is supported on Windows 10.

What do I think about this option? It's certainly better than crashing because of bad turn coordination. It's also certainly worse than if Zoom just fixed their bug. Zoom already has access to the voice and the video of all the interpreters... they just need to make the connections so the interpreters can see and hear each other. (I'm told other multilingual web conferencing programs already do this, like Kudo, but this is practically meaningless without sufficient market share.)

There is also the aforementioned extra complexity on top of the already complex Zoom software.

Personally, the biggest disadvantage is just spending more money for stuff I don't need or want. I'd rather not give more of my money to Mark Zuckerberg and Bill Gates and whoever controls Apple and Bell Canada, etc. "Big Tech" is already rich enough, and controls enough of our lives already.

6) Conclusion

On March 27, 1977, two Boeing 747 passenger jets collided on the runway on the Spanish island of Tenerife, resulting in 583 fatalities, the deadliest crash in aviation history.

I was told that ever since then, the communications between the pilot and the control tower have been standardized, greatly reducing the frequency of such accidents.

| Home >> Varia >> Software Engineering