What is Video Conferencing

What is Video Conferencing?

Video conferencing is a communication session between two users or a group of users, regardless of their location, while enabling the participants to see and hear each other in a way determined by the type of a video conference.

Video conferencing requires special tools that may be hardware-defined or software-defined solutions for rooms, PCs, mobile devices, and browsers.

Various peripherals are used to provide participants with audio and video: cameras, screens, microphones, speakerphones, headsets, conferencing systems and projectors. An enterprise network based on various principles as well as the global Internet network can be used as a communication environment.

Existing video and audio codecs, specialized network protocols, various signal processing algorithms care for high-quality communication on virtually any communication channel.

Oftentimes during a video conferencing sessions, participants want to demonstrate various media data, for this purpose video conferencing systems offer tools for screen or separate windows sharing, capturing and transferring presentations and documents of various formats to remote participants. This is achieved by using special software, additional cameras (e.g, documentary cameras), capturing video output signal on laptops, PCs and other systems, including medical complexes.

To sum up: Video conferencing is a modern high-tech communication tool for increasing business efficiency, optimizing business processes, accelerating the decision-making process and saving money on travel.

Types of Video Conferencing

There are two main types of video conferencing: personal video conferencing and group conferences. Personal video conferencing involves a video session between two users. Group conferencing involves all other types of conferences. Various well-established rules for displaying video participants of each party are called conferencing modes. Let’s gain insight into the matter.

Video Call 1-on-1

Video call 1-on-1 involves two participants, who can see and hear each other simultaneously. Let’s note that users can use various collaboration tools while video conferencing session of any type, e.g. text messaging, file transfer, presentations and other media data sharing.

Video Call

Symmetric Video Conference

It is also called a continuous presence video conference. Symmetric video conference involves more than 2 participants who can see and hear each other simultaneously. This type of video conferencing provides for full-duplex communication. In other words, it as a reproduction of a roundtable conference, where all have equal rights. Group video conferencing is perfect for those meetings that require the maximum involvement of all participants.

Symmetric Video Conference

Voice-activated Switching

With Voice-Activated Switching, all participants see and hear only the active speaker while he can see himself or the previous speaker. The technology can be executed in various versions, but the idea behind the VAS remains the same: video conferencing server monitors the voice activity of the participants and switches the focus to the speaker. This mode has considerable disadvantages, e.g. false activation to the noise, coughing or mobile calls.

Voice-Activated Switching

Role-based Meeting

There are two types of participants: speakers and listeners, while every listener can become a speaker (with the permission of the conference’s host). The host appoints a speaker and can remove him from the podium at any time.

Role-based Meeting

Video Conferencing for Distance Learning

A special mode that enables the speaker (the instructor) to see and hear all students, while all students see and hear only the instructor. This way, the students can focus on the learning process, they do not get distracted by other students and the instructor is able to monitor them.

Video Lecture Operation Scheme

Streaming

Streaming is a conference mode that enables the speaker to broadcast to a wide audience while he cannot see or hear the listeners. Other participants can see and hear only the speaker. Feedback is available only through the text chat. A notable latency of up to several seconds is often applied during the broadcast between the speaker and other participants to the conference in order to smooth-out the changes in the network conditions.

Streaming

Equipment for Video Conferencing

Depending on the place and way of joining a conference, you will need different peripherals.

Video Conferencing in a Meeting Room or a Conference Hall

You should consider lots of details to equip a meeting room with a high-quality video conferencing. Those impact the expenses on video communication. The first thing you should care about is choosing the right sound reinforcement system. If the room is small, one or a few speakerphones will be enough (speakerphone is a special device with one or several embedded microphones and speakers, designed to suppress noise and echo).

You will need a PTZ camera that can pan, tilt and zoom. This camera can switch between speakers and audience in both manual and automatic modes. It is recommended to use two large LCD screens to display the picture from PTZ camera: one for conference participants, the other for presentations and other content.

There also other aspects of great significance about the room, like lighting, walls painted in contrasting but not bright colors, sound absorbing panels etc. As can be seen, the cost of the equipment for a meeting room may differ considerably depending on the video solution, peripherals and furnishing.

Video Conferencing on the Work Place

There are lots of all-purpose video conferencing offerings that include everything you might need to hold a conference but they take up too much space. That's why for different reasons including financial considerations, an ordinary PC is used as a video conferencing endpoint, since it is as good as a specialized hardware, when the right peripherals are chosen.

Your PC will need a high-quality web camera You will need to purchase a good-quality webcam to conduct video conferences on a PC (see the list of recommended equipment - unfortunately, most built-in cameras in all-in-one PCs and laptops are of insufficient quality for video conferencing), and a headset (preferably USB headset) or a portable speakerphone that connects to the PC via USB interface.

Mobile Video Conferencing

One of the advantages of video conferencing is its mobility. You can conduct video conferences even while travelling or on the go. A smartphone, tablet, PC or even the smartwatch can serve as an endpoint for video conferencing. All you need to do is to install a special application on them.

The manufacturers of these devices have taken care of everything else: front camera, powerful CPU, hardware support for video codecs (that is also required for watching movies or YouTube videos), high-quality speaker and microphone. Mobile video conferencing will enable you to always be in touch with your colleagues, business partners and relatives, regardless of location.

On the other hand, there are some difficulties associated with mobile conferencing, those issues must be resolved to make it as convenient and popular as ordinary PC conferences.

What Affects the Quality of Video Conferences?

Unlike ordinary electronic communications, e.g. email and messaging, video conferencing relates to the Real Time Communications which impose increased requirements to both video conferencing endpoints and communication channels that connect endpoints.

All of us normally evaluate the connection quality by the bandwidth, which is not quite appropriate in terms of video conferencing. The declared speed can rapidly change in the course of time, can reduce under high load and differ radically depending on the transmit or receive direction. It is critically important for video conferencing, where smoothness and predictability of data streams are the most important things.

Video conferencing system can easily adjust the bandwidth to the wide range of values beginning with 64kb/s to, say, 4Mb/s to adapt it to the conferencing mode and signal quality of the participants, but it's difficult to adapt the stream in real time to changing conditions of each participant to the session.

The type of architecture used for video conferencing and the ability of the architecture to operate under constantly changing conditions play the key role while evaluating the quality of video conferencing:

  • Power of the CPU of endpoints. User might start to perform resource-intensive tasks while video conferencing.
  • Ability to capture video from the endpoint's camera. Camera might be of excellent resolution but produce grainy picture because of low quality in low light conditions.
  • Ability to display a video conference on the endpoint's screen. E.g. if user exits the full-screen mode, he doesn't stream the video in high resolution.
  • Channel bandwidth between the video conferencing server and the participants. It's the most common issue. E.g., someone in the office started downloading large amounts of data and reduced network resources notably for the video conference. Or you got in a large crowd when video conferencing on your smartphone with the nearest base station of your provider unable to maintain the same speed and connection quality.

The simplest solution is the fixed reservation of both hardware and network resources of video conferencing system. However, it's the costliest solution. Fortunately, science and technologies are evolving fast and present-day video conferencing systems provide excellent connection quality under any conditions due to advanced software architecture. Let's take a closer look at this issue.

Architecture Types of Video Conferencing Systems

Conducting any group conference requires a certain method and way to transmit data between its participants. Given the fact that direct connection between participants is hardly applicable (see real conditions in the previous section), we'll consider an option with some medium, let's call it "a video conferencing server", i.e. a system that supports star topology (from the centerpoint to each participant).

Such servers are called MCU (Multipoint Control Unit) in traditional hardware video conferencing systems, with no established name if software systems. The MCU's task is switching, transcoding and processing streams during a group video conference. A video conferencing server is the core of the video conferencing infrastructure that provides resources for video conferencing endpoints.

All solutions were previously divided into two categories: software and hardware solutions, but in 2015 such division is considered outdated. First of all, for the reason that there exist hardware solutions that combine typical software architecture (switching and SVC) and hardware attributes (MCU-like capabilities). And, secondly, all major vendors tend to deliver their video infrastructure as a software in virtualized environment. It should be mentioned that is it relevant today to compare not software with hardware solutions but mixing with SVC solutions.

Mixing-based Video Conferencing Architecture (MCU)

During a video conference, video conferencing server receives streams from each participant, decodes and decreases them in resolution, creates a new picture of the required quality and resolution for each participant (do not forget about the real conditions described above), encodes the stream and sends it. All these stages require massive computational power, delays server processing and might impair video quality as a result of recompression. The scalability of such architecture is extremely low even considering its virtualization capabilities, so the price of such infrastructure is extreme high and unjustified.

Multiplexing-based Video Conferencing Architecture (Switching)

This is a classical procedure for designing software video conferenсing systems implemented in Skype, for instance. Unlike MCU, video conferencing server does no recompression but creates copies of the incoming streams and sends them to other participants "as is". Thus, each endpoint receives several streams in full quality, and is incapable of displaying them simultaneously in original resolution. The endpoint has to reduce the resolution of each incoming video stream from participants on its side or request the the other side to reduce it before sending, which impairs both video quality and bandwidth requirements for all other participants.

This approach has one advantage: the infrastructure is not resource-demanding, and even an ordinary PC can hold hundreds of such conferences simultaneously. But it has a lot more disadvantages: an endpoint (traditionally an ordinary PC) has to decode several streams simultaneously and the video server requires several times more outgoing channel bandwidth to send all created copies of streams.

Add the real conditions, and we get a system that can hardly hold more than 3 participants and impairs the video quality for all participants when a mobile participant’s endpoint is unable to process the original video quality it gets from other participants.

Scalable Video Coding (SVC) - based Video Conferencing Architecture

This type of architecture includes all advantages of mixing approach and has no weaknesses of multiplex-based systems. It is inexpensive, easily scalable and runs on any platform. This has been possible through the development of signal processing and data compression technologies.

The idea is that an endpoint compresses its video stream in layers: each additional layer comes with an increased video resolution, quality and FPS. If the channel between an endpoint and a video server provides high bandwidth, the endpoint sends the maximum number of layers. It should be noted that a layer is not a separate video stream of lower quality but a full-value difference between it and the previous layer. Thus, SVC stream varies by only 15-20% in the bandwidth from non-SVC stream, and requires much less bandwidth on the server compared to the switching approach.

After receiving an SVC stream with layers, a video server just cuts off excessive layers without transcoding and throws away data packages according to certain rules. Thus creating individual sets of streams on-the-fly for each participant of a group video conference in accordance with his actual connection conditions, available resources, layout requested, screen resolution etc. All of this brings great resiliency.

The Use of Advanced Protocols and Codecs

Standard data transfer protocols are used to hold video conferencing sessions between participants with different software and hardware from third-party manufacturers.

  • H.239 is a communication protocol supporting two media streams from different sources. It is suitable for conducting video conferences where the picture is displayed on two different screens (e.g., two screens in a meeting room, one displaying the presenter and the other displaying the presentation).
  • H.323 is a data transfer protocol with non-guaranteed bandwidth applied in both personal and group video conferencing.
  • SIP is a network protocol for connecting client applications from different vendors. SIP has replaced H.323 and is used in video conferencing and IP telephony.

Compression and playback of video and audio during a video session is carried out through the use of audio and video codecs.

  • H.264 is a video compression standard providing a high compression level while preserving the original quality.
  • H.264 Scalable Video Coding (SVC) is a video codec that compensates missing data and transfers video through several video streams. Resistant to network errors, e.g. packet loss.
  • H.265 is a video compression standard that features more efficient encoding algorythms than in H.264. The key elements of this video codec include an increased resistance to data packet loss during media data transfer, and minimal signal delay during video conferences. This standard supports UltraHD formats: 4K and 8K.
  • Opus is a codec for audio compression with exceptional performance, not impacted by changes in internet connection during sessions.
  • G.722.1 Annex C is a broadband audio signal compression standard.
  • VP8 is a video codec with a high-speed decoding and an increased resistance to frame loss.
  • VP9 is an opensource video compression standard. Its initial purpose was to improve the characteristics of VP8 and H.265 codecs. In case with VP8, the main objective for the developers was to achieve bitrate decrease by 50% with no initial video quality loss. For H.265, the main goal was to improve video stream comppression efficiency.

Bottom Line

We recommend to carefully review the characteristics when choosing a video conferencing system and select the one that requires minimum expenses for implementation (TCO), scaling and maintenance. As of today, all software SVC-based video conferencing servers meet those requirements.