Video streaming protocol

Console Port

Pad Port

Direction

50020

50120

Console ↔ Pad

The video streaming protocol, also known as vstrm, is used to stream compressed game video data from the Wii U to a GamePad, or to stream camera data from a GamePad to the Wii U. It is using H.264 with a custom wrapper around the VCL instead of NAL.

Each packet has an 8 byte header (4 is also possible but never seen), followed by an 8 byte extended header, followed by the compressed video data.

Protocol header

struct VstrmHeader {
    u32 magic : 4;
    u32 packet_type : 2;
    u32 seq_id : 10;
    u32 init : 1;
    u32 frame_begin : 1;
    u32 chunk_end : 1;
    u32 frame_end : 1;
    u32 has_timestamp : 1;
    u32 payload_size : 11;

    // If has_timestamp is set (almost always)
    u32 timestamp : 32;
};
../../_images/vstrm-header.png

This header is then followed by an 8 byte extended header.

Extended header

The extended header is an 8 byte area after the vstrm header where options can be set. Each option is represented by a byte, and options are read linearly from the extended header until either 8 bytes were read or a 0x00 byte was read (signaling the end of options). Here are the known options:

Option Byte

Option Name

Arg

Description

0x00

End

N

End of options.

0x80

IDR

N

Indicated that the frame is an IDR.

0x81

Unimplemented

Y

Takes one byte of argument but does nothing.

0x82

Frame Rate

Y

0 = 59.94Hz, 1 = 50Hz, 2 = 29.97Hz, 3 = 25Hz.

0x83

Force Decoding

N

Forces decoding even if buffer is too small.

0x84

Unset Force Flag

N

Unsets the force decoding flag.

0x85

NumMbRowsInChunk

Y

Usually 6 (6 * 16px * 5 chunks = 480px).

../../_images/vstrm-ext-header.png

Chunking and stream reconstruction

A single H.264 frame is obviously not sent in one vstrm packet: the network MTU is around 1800 bytes, making it impossible to transfer most frames. To separate a frame into several packets, the vstrm protocol does the following:

  • Cut each frame into 5 equal chunks. Each chunk represents a 854x96 region of the decoded image (6 macroblock rows).

  • Send as many packets as needed for each chunk, but the last packet of a chunk needs to have the chunk_end flag set.

On top of that, the first packet of the first chunk of a frame has the frame_begin flag set, and the last packet of the last chunk of a frame has the frame_end flag set.

H.264 configuration

TODO: see src/video-streamer.cpp.