Scalability matters but let’s start with talking about video-conferencing in general.
What is video-conferencing?
The goal of video-conferencing is that two or more people can have a real-time communication. This communication may include video, audio, chat, file sharing and screen sharing. Normally, IP networks handles it. Basically, it is rather simple to generate a basic video conferencing software. The implementation based on audio/video device interfaces connects two clients of a network.
Despite the assumption that all video-conferencing providers use the same technology, it has to be highlighted that this is certainly not true.
eyeson is based on principles of WebRTC (Web Real-Time Communication) standards. WebRTC is a free, open source project which provides real-time communications in web browsers and mobile applications via simple APIs (application programming interfaces). Most commonly used browsers like Google Chrome, Mozilla Firefox or even Apples Safari support WebRTC. In comparison, Skype or Zoom are NOT capable of providing WebRTC. This means before a video calls starts, native programs have to be installed beforehand.
When it comes to video-conferencing, scalability is a decisive factor since it is responsible for keeping an excellent video and audio quality for the connected clients while the number of video call participants rises. The only technology which guarantees this effect is the MCU (multipoint control unit) topology. However, it is the least used technology due to its complexity and needed expertise in this certain field of development. Further, its operational costs are quite high as well. Although, scalability is one of the most important parts for video conferencing.
Basically, three technologies are dominating the video-conferencing market right now which are the following:
- Peer-To-Peer (Mesh)
- Selective Forwarding Unit (SFU)
- Multipoint Control Unit (MCU)
The lowest cost solution for video-conferencing is mesh. For this technology, you do not need any intermediate infrastructure (except for call-signaling) like a media-server, cloud, etc. as each client sends and receives all streams to and from all the other video call participants. It is easy to implement, though, it reveals major issues in terms of scalability when more than 2 people enter the video call. Skype and Facetime are using this technology at any time. Whereas Google Hangouts is using it when solely 2 people are in the call.
However, what happens when the number of video call participants increases in a mesh-based technology?
Well, now the problem starts. The bandwidth is constantly rising since the number of video and audio streams is increasing as well. This overwhelms the processing capacity like the CPU or the limited network. As a result, the video-conference quality is suffering tremendously.
Choppy sounds, freezing pictures and dropped video calls are the results and they get even worse with every additional participant. These issues can be solved using either the SFU or the MCU technology.
Selective Forwarding Unit (SFU)
This technology is based on a central unit, which receives all streams from the clients. These clients need then to decide which streams they want to receive from the other video call participants. This means that the higher the number of participants the more streams have to be downloaded which increases the bandwidth. So the SFU does a selective forwarding but no audio/video processing. Note: the prerequisite is that the client has the full correspondence of the SFU media server.
So what are the advantages of the SFU compared to the P2P?
The streams of all participants are received by one central unit which then selectively forwards them to all participants. The latency is minimal, though, the additional steps of encoding and decoding are slowing down the process in case a high number of participants take part in the group video call.
However, it is the most popular technology of the WebRTC communities. Google Hangouts is using this technology if more than 2 people enter the call. In comparison, eyeson uses the SFU technology if only 2 people are in the call. In case that a third is entering the call, it switches automatically to the MCU.
Is there a solution for group video calls with a high number of participants? Indeed, there is!
Multipoint Control Unit (MCU)
The MCU is responsible that the streams from all participants get mixed. So the MCU encodes, decodes, mixes up all the streams into one at a single time and sends it out to all participants. Therefore each client has only one common stream which reduces significantly the bandwidth usage.
An unlimited number of people can join the group video call without reducing the decisive audio and video quality. Moreover, the whole video call quality improves in general. Since additional central processing like noise filtering, echo-reduces, image processing etc. can be done very easily as well.
Further, it reduces the bandwidth consumption and keeps it stable and low. eyeson is using this technology if more than 2 people are in the video call. So eyeson provides its customers with the SFU topology for max 2 people. Then it switches automatically to the MCU topology in case additional participants join the call.
But why are Skype, Facetime, Hangouts or Zoom not using the MCU technology as well?
Although scalability is the most important part when it comes to video conferencing, the MCU topology needs a lot of expertise and computing resources on the server. Furthermore, it is considerably complex. However, it clearly beats up the mesh and SFU technology in case that more than two people have a video call.