Remote Collaboration Infrastructure

by Telcosystems

Remote Collaboration Infrastructure studio setups.

With the Remote Collaboration Infrastructure project, we have investigated the possibilities of remote collaboration via the Internet in real time on audiovisual projects. Existing commercial technical solutions for remote collaboration fall short on a number of important points for us to be able to collaborate in real time on our audiovisual projects.

Because physical collaboration was hardly possible during the COVID-19 pandemic, we started looking for ways to work together from different locations, via the internet. Because in recent years we had already developed a number of our own protocols in which communication between machines ran locally via digital audio connections, it seemed obvious to us to investigate with the Remote Collaboration Infrastructure project how we can also make such communication protocols suitable for a audio connection over the internet.

Research and experiment

In the first phase of the project, we researched a number of technical problems and issues that needed solutions before we could develop a prototype.

Providers

We have examined a large number of providers of uncompressed audio over the internet for functionality (different sample rates and bit rates, is it only stereo sound or also multichannel) and user-friendly (how good and easy is it to integrate into our workflow). The providers we researched included: Source Elements (https://www.source-elements.com), Earshot (https://github.com/EnvelopSound/Earshot), Jamtaba (https://github.com /elieserdejesus/JamTaba), Sessionwire (https://www.sessionwire.com), Soundwhale https://soundwhale.com), NINJAM (https://www.cockos.com/ninjam), TPF (https:// gitlab.zhdk.ch/TPF/jacktrip), Quaxtrip (https://llllllll.co/t/quaxtrip-low-latency-audio-over-the-internet-in-max-built-on-quacktrip-jacktrip/38423 , https://cycling74.com/articles/quaxtrip-and-internet-performance-an-interview-with-damon-holzborn) and Audiomovers (https://audiomovers.com).
In addition, during our research we also came across several projects that developed a hardware solution for remote collaboration, such as the Touchlab project by the Dutch Ensemble Insomnio (https://www.nrc.nl/nieuws/2021/05/04/gezelschap-building-superstream-for-live-music-a4042335, https://www.ensembleinsomnio.nl/touchlab) and the Swedish ELK (https://www.elk.live).

Ultimately, Audiomovers turned out to be the most promising candidate for our project, because it could be integrated well and easily into our workflow, because sample rate and bit rate can be set and because it enables multichannel streams. In addition we also did a series of sessions with the hardware from ELK. This is a very user-friendly solution, but still has many limitations, such as only stereo sound, and little control over the audio settings.

Network

Throughout the project we experimented with optimizing and stabilizing local network connections. The main findings are that: Wi-Fi is generally not stable enough; the use of a vpn causes instability in the audio signal; and that it helps to pause or turn off network-intensive background processes (such as email, dropbox, updates, etc.) as much as possible. Ideally, you use a dedicated wired internet connection via broadband cable or even better fiber.

Synchronization and feedback

To be able to work in sync with each other, and to avoid data feedback, we experimented with two different network approaches: A non-hierarchical peer-to-peer setup and a hierarchical system with a server and multiple clients. The first approach seemed the most efficient in advance, but in practice it caused many problems because there was no central sync. Moreover, the number of required audio channels turned out to be much greater than with a hierarchical system with a server and several clients. So in the end we opted for a system with 1 server and 3 clients. The server sends a central sync signal to the clients, the clients send their audio and data to the server, and the server sends a mix of all audio signals and a data track to the clients. This way the clients can handle a 4 channel in-out audio stream, and the server has a 4 in, 12 out audio stream. By running the server from its own independent internet connection, this system works very stable. This solution also makes it very easy to prevent data feedback.

Protocols

For the parameter data protocol, we initially worked with a self-developed protocol consisting of two synced audio channels, in which the data is sent real time as audio on one channel, and on the other channel a sync signal with which the data is sent to the receiving side can be read properly again. However, this solution turned out to be too error-prone. We then developed a solution in which the data is stored in an audio buffer on a server and this buffer is then shared with the clients via the internet. This turned out to be a very stable solution.

Network drift

Network drift is inevitable when working over the internet, so rather than trying to rule it out we investigated how best to deal with it in the system. We experienced in all our sessions that the delay between the various clients and the server is always slightly different, but that the network drift is not too bad. We therefore saw it as a feature during this project and not so much as a problem that needed to be solved.

Development prototypes

Based on the research and the experiments conducted, we have developed several prototypes. The first was based on the real-time peer-to-peer transfer of sync and data as a continuous signal. This turned out to be too prone to failure, synchronization proved difficult, and data feedback occurred. The second prototype was based on a server and three clients. However, this solution still requires a relatively large amount of bandwidth (4 channels in and 4 channels out for the clients, 12 channels in and 4 channels out for the server). And importantly, the bandwidth must be constant and stable, because we send and receive signals in real-time (almost real-time, with a latency of 0.3 seconds), buffering is not possible (something that is possible when streaming of pre-recorded audio or video, such as Netflix etc). Hiccups in the signal or accelerations and delays (which often happens with services such as Skype, Zoom, Teams) cannot occur, because that disrupts the sync with the local AV sources.
In addition, it appears that Audiomovers also makes changes to their protocols when updating their software and service. We therefore had to adjust our own data about audio protocol several times during the project because small changes appeared to have been made in the translation/conversion of audio signals that Audiomovers applies between the sender and the receiver. The initialization of the prototype required quite a bit of setup work in the beginning, which has been greatly simplified after a number of updates. What is essential here is the use of a dedicated communication channel (Skype/Whattsapp/Slack, etc.) to share the Audiomovers links.

We used our second prototype in test sessions over a period of one year. In these sessions we focused as much as possible on the joint creative process. We noted technical problems and shortcomings and resolved them after the session. Before each session we created a new routing plan for a different set of audiovisual instruments, testing a wide variety of possibilities and protocols. The further development and testing turned out to be much more work and took much more time than expected, but ultimately resulted in a very usable solution to collaboratively work together from different locations.