MCU stands for Multipoint Conferencing Unit.
An MCU is a media server that implements the mixing architecture for multiparty communication. It receives media streams from all participants, decodes them, composes them into a single mixed output, re-encodes it, and sends a unique mixed stream back to each participant.
How MCU works
In an MCU-based conference:
- Each participant sends a single audio and video stream to the MCU
- The MCU decodes all incoming streams
- For video, the MCU composites multiple video feeds into a single layout (e.g., grid, speaker-focused, …)
- For audio, the MCU mixes all audio tracks into a single output (minus the participant’s own audio)
- Each participant receives a single mixed stream, making the client experience simple
MCU vs SFU
| Aspect | MCU | SFU |
|---|---|---|
| Server CPU | Very high (decode + encode) | Low (forward only) |
| Client CPU | Low (single stream) | Higher (multiple streams) |
| Client bandwidth | Low (single stream) | Higher (multiple streams) |
| Scalability | Limited (expensive) | Better (cheaper per user) |
| Layout flexibility | Server-controlled only | Client-controlled |
| E2EE | Not possible (server decodes) | Possible via Insertable Streams |
| Latency | Higher (processing) | Lower (forwarding) |
When to use an MCU
MCUs are still relevant in scenarios where:
- Legacy interop: Traditional SIP/PSTN endpoints that can only handle a single stream
- Audio-only conferences: Audio mixing is computationally cheaper than video mixing
- Live broadcast: When broadcasting via a social network (Facebook Live, YouTube Live, etc), there is a need to create a single media stream and send over, usually via an RTMP interface
- Recording: Generating a mixed recording is simpler with an MCU
However, SFUs with simulcast have largely replaced MCUs for most modern video conferencing use cases due to better scalability and lower server costs.


