Video conferencing is a great way to communicate with your colleagues, partners, and clients. Online meetings are more engaging than traditional phone calls, email conversations or instant messages and can really boost your team productivity. However, video meetings are also more demanding: they impose increased requirements for both video conferencing endpoints and your communication channels.
Bandwidth is possibly the most important asset to a successful video conference, and we are used to evaluating video conferencing quality with the bandwidth. However, this might not be totally true. The connection speed can rapidly change during the meeting, drop or shift depending on the transmission mode, while it is critically important for the data streams to be stable, smooth and predictable during video conferences.
Video conferencing system can easily adjust the bandwidth from 64kb/s to 4Mb/s depending on the conferencing mode and signal quality of the participants. It is much more difficult to adapt the stream to constantly changing network conditions of each conference participant.
Video conferencing architecture and its ability to operate under constantly changing conditions plays the key role in ensuring video conferencing quality. Here are some common video conferencing challenges that may negatively affect your meetings:
How to prevent the most common video conferencing challenges? The simplest and the most expensive solution is to put fixed restrictions on both hardware and network resources of your video conferencing system.
Fortunately, science and technologies are evolving fast and modern video conferencing systems provide excellent connection quality under any conditions due to advanced software architecture.
In any group video conference, there is a certain way to transmit data between its participants. Given the fact that direct connection between conference participants is hardly applicable due to the most common video conferencing challenges, we need to consider a system that supports star typology and can be used as a medium, i.e. video conferencing server.
All solutions were previously divided into two categories: software and hardware solutions. but this approach has been considered outdated since 2014, because clear separation between hardware and software solutions simply disappeared: there are hardware systems that combine typical software architecture (switching and SVC) and software systems with built-in MCU. Second, leading video conferencing vendors tend to deliver their video infrastructure as a software in a virtualized environment.
During a video conference, the server receives streams from each participant, decodes and decreases their resolution, creates a new image of the required quality and resolution for each participant (as adjusted for common video conferencing challenges), encodes the stream and sends it. All these stages require massive computational power, delays server processing and might impair video quality as a result of recompression. The scalability of such architecture is extremely low even considering its virtualization capabilities, so the price of such an infrastructure is extremely high and unjustified.
A classic example of this architecture is a software video conferencing system, such as Skype. Unlike MCU, video conferencing server does not recompress the video; instead it creates copies of the incoming streams and sends them to other participants "as is". Thus, each endpoint receives several streams in full quality and is incapable of displaying them simultaneously in original resolution. The endpoint has to reduce the resolution of each incoming video stream from participants on its side or request the the other side to reduce it before sending, which impairs both video quality and bandwidth requirements for all other participants.
This approach has one particular advantage: the infrastructure is not resource-demanding and even an ordinary PC can run hundreds of such conferences simultaneously. However, the disadvantages outnumber: an endpoint (usually an ordinary PC) has to decode several streams simultaneously and the video server requires several times more outgoing channel bandwidth to send all created copies of the streams.
Consider the real conditions, and we get a system that can hardly hold a video conference with more than 3 participants and impairs video quality for all participants when a mobile device is unable to process the original video quality it gets from other participants.
This type of architecture includes all advantages of mixing approach and escapes all drawbacks of multiplex-based systems. It is affordable and easily scalable, and it runs on any platform thanks to advanced signal processing and data compression technologies.
Here’s what SVC-based architecture does: an endpoint compresses its video stream in layers - each additional layer comes with an increased video resolution, quality and FPS. If the channel between an endpoint and a video conferencing server provides high bandwidth, the endpoint sends the maximum number of layers. SVC stream varies by only 15-20% bandwidth as compared to non-SVC stream, and requires much less bandwidth than the switching approach.
After receiving an SVC stream with layers, the video conferencing server cuts off excessive layers without transcoding by getting rid of data packages. In this way it creates individual sets of streams for each participant of a group video conference on the fly, in accordance with its actual connection conditions, available resources, layout requested, screen resolution etc. This in turn brings great resiliency.
This type of architecture includes all advantages of mixing approach and escapes all drawbacks of multiplex-based systems. It is affordable and easily scalable, and it runs on any platform thanks to advanced signal processing and data compression technologies.
Standard data transfer protocols are used to hold video conferences between software systems and hardware endpoints from third-party manufacturers.
Compression and playback of video and audio during a video session is carried out through the use of audio and video codecs.