Understanding Audio Networks

The world of audio data networking is often clouded by conflicting information and opinions by proponents of proprietary systems, who often feel their approach is the best for all users. And from that viewpoint, finding clear unbiased information can be difficult, or simply so filled with jargon and deep technical details that readers may be left somewhat bewildered.

Yet in choosing a networking system, there are no easy answers. The needs of live audio users can range from a simple need to share mic/line feeds between a central stagebox and the FOH/monitor consoles, to systems that add system parameter control that allow a system tech to remotely monitor and tweak amplifiers/crossovers/routing/levels/zones/room combining and more. Depending on the systems, some and/or all of this may be accomplished by little more than patching into a hub via a Cat-5 feed, or using a WiFi-enabled laptop or tablet.

With the goal of presenting what are sometimes difficult concepts in straightforward, approachable terms, we present this article, which we adapted for sound reinforcement users from The Calrec Audio Network Primer, an excellent document created by broadcast console manufacturer Calrec. For those interested, the complete 40-page version is available as a PDF from the company’s website at www.calrec.com.

Networking Evolution

Forty years ago, audio systems consisted of components and microphones hard-wired into discrete hardware, patchbays and mixing consoles routed to specific sources (P.A. or recorders) in a single location. In such systems, a separate physical connection is required for each audio channel. Modern audio networking offers more flexibility; all of the hardware connects to a data network, and the precise nature of the interconnections between the equipment can be redefined and/or reassigned at any time under software control, remotely if required.

Over the past decade, the declining complexity and improving cost-to-benefit ratio of implementing large-scale, networked audio systems and the ever-widening capabilities of the existing IT network technology, control protocols and equipment, has tempted more and more audio users to move to networked systems.

Today’s data networks have sufficient bandwidth to do much more than route audio around. It’s now theoretically possible to send hundreds of channels of audio, video and hardware control signals over a modern network. The technology is not yet quite at the stage where all of these signals interface seamlessly with one another, but progress is moving swiftly in that direction.

Interoperability — The Holy Grail

Throughout the world, we take access to electricity for granted, yet the irritating adaptors and transformers we carry from continent to continent are a small reminder that even today, we’re still far from having a universal standard.

International data networks have been with us for far less time, so it should be no surprise to learn that IT infrastructures which were originally designed to handle office-based data transport are far from optimal for routing, mixing, processing and controlling real-time multichannel audio and video.

That’s not to say widespread Ethernet- based networking technology can’t be developed to form the basis of audio networks — this is already a reality. The problem is, there are already many proprietary ways to do this, and there is no single standard for handling audio across a network.

The Holy Grail is a networking standard that would allow the use of a single high-capacity network, with audio data and equipment monitoring/control protocols, together with some kind of system management, with all the equipment being able to talk to all of the control protocols on the network irrespective of its manufacturer.

This shining goal is known as “interoperability.” The good news is that cross-manufacturer standards to facilitate it are in development, and later, we will look at two of the most promising, AVB and Ravenna.

Flexibility Galore

Given the bandwidth of today’s networks, which typically allow hundreds of audio channels to be passed down a single connection, there’s no need to stop at interconnecting a pair of consoles and their associated I/O. This scenario is fairly common, especially in large events where audio feeds are shared between the FOH and monitor consoles, as well as a recording system and/or one or more broadcast trucks. Why not link many consoles together with a standalone audio network router, and thus allow several mixers to freely swap audio channels? This is the basis for a star network, with several consoles interconnected to a stand-alone router.

Such things can be done with analog or digital tie lines, but a vast amount of expensive wiring is required. This is not a concern with networked audio, given you can route several hundred channels of high-resolution audio down a single inexpensive Ethernet-style network cable.

But a star network is only the beginning, once multiple consoles are networked, far more complex routings/possibilities can be achieved. Creation of a network allows something more like the modified star structure shown below in Fig. 1. Here an assortment of I/O interfacing boxes in a central location, such as a stage is shared commonly by all of the different consoles (FOH/monitor/recording/broadcast), all connected via the central stand-alone router.

Likewise, it’s considerably less costly to design fail-safe systems if your audio is part of a network. Doing this with traditional analog or digital connections requires twice the amount of expensive cabling and a lot of complicated cable splits. With networked audio, the entire output of a production can be duplicated on a few Ethernet-style IT
cables.

Network Protocol Layers

A number of proprietary standards have been developed for transmitting audio over IT networks. Some are better than others, but all have been designed to deal with the requirements for delivering hi-res audio over a network, which are considerably more stringent than those for IT-related data.

IT networking protocols for data transfer (such as the ubiquitous Ethernet) are asynchronous, meaning that the order in which data arrives is not so important as long it arrives eventually. However, in a live audio context, the audio feed usually consists of many channels of high-res audio, all of which must be kept in sync with respect to each other and which have to be delivered in real-time.

The protocols for transporting audio data transport over a network vary in terms of how closely (or not) they resemble IT networking data standards. Modern electronic networks are often described in terms of a notional model of up to seven layers of increasing complexity. For audio networking, the most important of these are the first four layers (which are also the most fundamental).

Layer 1 describes the basic electrical standards and voltages used to transmit data over a wired or wireless network, such as an Ethernet network. These use Ethernet wiring and signaling components, but do not use the Ethernet frame structure. Layer 1 protocols — which include Riedel’s RockNet, Aviom’s A-Net, Gibson’s MaGIC, and Calrec’s Hydra2 — have the least in common with IT-style network data. They are geographically limited and require proprietary routing hardware. Although compatibility with off-the-shelf networking hardware is lost in a Layer 1 protocol, dispensing with the “higher-layer” data structures allows the development of very efficient, robust, high-performance, low-latency protocols. When coupled with the hardware required to use them, they are well suited to pro audio applications (albeit usually more expensive to implement).

Layer 2 describes the most basic unit of data used on the network. In an Ethernet network, this is the “frame” containing the electronic data. Layer 2 protocols encapsulate audio data into standard Ethernet frames. Many use standard Ethernet hubs and switches. Ethernet frames include source and destination MAC addresses to identify the source and destination device for data being transmitted. The data structure of Layer 2 protocols, which include the IEEE’s Audio Video Bridging standard (AVB), AES51, Peak Audio’s CobraNet and Digigram’s EtherSound, less closely resemble standard Ethernet data. These protocols dispense with the IP packet structure and thereby lose the ability to be routed to other standard LANs. However, they still use the Ethernet Frame structure and can therefore still be routed within their network via off-the-shelf Ethernet hubs and switches.

Layer 3 adds the IP subnet structure used by all Ethernet networks (and Internet servers) to uniquely identify network devices, and packages the data being transferred in standard IP packets. These are all numbered to ensure that all of the data arrives in the right order and can be accounted for. Some networking products using Layer 3 include Audinate Dante, QSC Q-LAN, Ravenna from ALC Networx and Axia Livewire. Layer 3 protocols conform more closely to the defined standards of Gigabit Ethernet (the most common network standard), including Layer 4-style packet checking spliced into Layer 3-style IP packets. These packets sit in turn within an overall Layer 2 Ethernet Frame structure and adhere to the basic Layer 1 electrical definitions of Gigabit Ethernet. The structure of the data in Layer 3 protocols closely resembles that passing over a standard Gigabit Ethernet network. As a result, they can multicast data to multiple IP addresses simultaneously, as on an office network, and they can pass data via connected Ethernet bridges and routers. This potentially allows the data to be passed over a wide geographical area and not to remain locked within one Local Area Network.

Layer 4 adds the ability to check that the arrival of these packets has occurred in the correct order, without losses or duplication.

In practical terms, all of the audio networking technologies currently on the market are either Layer 1, 2 or 3 protocols (and all of the Layer 3 protocols contain Layer 4-style data verification capabilities). However, it would be misleading to suggest that Layer 1 protocols are the most basic and Layer 3 the most feature-rich.

“Higher level” protocols offer far greater compatibility with standard networking formats and allow the use of standard, affordable networking hardware. This can make installation more cost-effective and usable over a wider area, but it can also mean these protocols are less efficient and higher in latency. Moreover, because the data in “higher layer” networks is usually passed via non-proprietary hardware that is not specifically designed to carry audio data, the reliability of these networks can be lower. However, compromises that marry the wider compatibility and greater interoperability of Layer 2 and 3 protocols with further standards designed to improve reliability are under development.

Routes To Interoperability

Earlier, we touched on the idea of interoperability — the concept of data being shared freely between video and audio equipment. Over the next few years, more pro audio manufacturers will produce equipment that will interface with common transports, although there is still much work to do. A number of manufacturers are working together to encourage the development of true interoperability between different systems, but as is often the case, when an industry tries to establish standards, there are already several approaches.

AES-X192

An Audio Engineering Society standards task group called SC-02-12-H has been formed to develop an interoperability standard for high-performance professional digital audio IP networking. This project has been designated AES-X192 and is partially inspired by an EBU initiative called N/ACIP, which published interoperability recommendations for audio over wide-area IP networks.

The scope of this AES initiative is on higher performing networks that allow high-quality, high-capacity and low- latency digital audio transport. There are a number of systems shipping and under development that offer the targeted capabilities — AVB, Dante, LiveWire, Q-LAN and Ravenna — and the aim of this initiative is to identify common approaches and protocols and to suggest and standardize a means for interoperability between the systems.

The AVnu Alliance and AVB

The AVnu Alliance promotes Audio Video Bridging (AVB) as a brand with a view to establishing complete interoperability between manufacturers, and is developing an ecosystem with a clear accreditation scheme (i.e., an “AVB Approved” label). Founded by a handful of companies including Harman, Cisco and Intel, the Alliance has members including Avid, Beyerdynamic, Bosch, Bose, Calrec, Dolby, Focusrite, LabX, Meyer Sound, Powersoft, Riedel, Sennheiser, Shure, TC Group, Waves and Yamaha.

AVB is a Layer 2 protocol that supports various channel and latency options on 100MBit or Gigabit Ethernet. The unique innovation of AVB is that it is designed to make use of a number of extensions to the Ethernet standard (referred to collectively as IEEE 802.1), that are designed to support real-time streaming services.

It’s a common misconception that AVB is an “audio over IP” protocol. In fact, as a Layer 2 protocol, it uses Etherframes rather than IP packets to transport data. AVB networks are geographically limited to their
local network, and cannot extend across routers or bridges. Furthermore, the benefits of the IEEE 802.1 extensions are only felt if the infrastructure explicitly supports them, which requires specially manufactured AVB switches and hubs.

The trade-off is the guarantee that what you put in is what you get out, and this reliability and predictability is attractive to pro users. Other useful functions include AVB’s DECC (Discovery/Enumeration/Connection/Control) protocol, which presents a view of all AVB devices on the network as soon as a connection is established. If AVB achieves a commercial critical mass, it promises a convenient technology for connecting various devices from different manufacturers across a common network infrastructure.

Ravenna

Proposed by ALC Networx at the 2010 IBC show as “a technology for real-time transport of audio and other media data in IP-based network environments,” Ravenna is an open technology standard without a proprietary licensing policy. As such, it encourages partners to participate in ongoing development.

Ravenna is also a Layer 3 protocol. Although intended for an Ethernet infrastructure, its use of IP packets abstracts it from the underlying network fabric, extending its reach beyond LANs to public networks, and even the Internet.

ALC Networx has attempted to address the issue of interoperability by forming a Ravenna ecosystem, and has already signed up a healthy collection of well-respected manufacturers as developers, including AEQ, Digigram, Genelec, Lawo, LSB, Merging Technologies, Neumann, Schoeps, Sonifex, Telos and Linear Acoustics.

The Future of Networks

Clearly, networking technology is on the brink of delivering unparalleled audio, control and media integration, even if some aspects still remain just out of reach. Even half a decade ago, such claims would have been dismissed as misty-eyed exaggeration — but today, the scale of developments leading to an interoperable future are closer to reality.

Recent Issues

June 2025

May 2025

April 2025

March 2025

February 2025