Apple-Compliant MP4 Encapsulation with Chapter Markers and Subtitles via FFmpeg
Objective
Encode and mux assets for video files ensuring compatibility with iOS 26+ and macOS 26+ (Tahoe).
This includes:
- Embedding subtitles in Apple’s
tx3gformat. - Incorporating chapter markers with both text labels and thumbnail images.
1. Chapter Metadata Specification
To ensure chapters are recognized, the metadata file must follow the FFMETADATA header format with timestamps in nanoseconds (or based on timebase).
;FFMETADATA1
major_brand=isom
minor_version=512
compatible_brands=isomdby1iso2mp41
encoder=Lavf62.3.100
[CHAPTER]
TIMEBASE=1/1000
START=0
END=150483
title=01. ハジメテノオト (Hajimete no Oto)
[CHAPTER]
TIMEBASE=1/1000
START=150483
END=325658
title=02. Project DIVA desu.
[CHAPTER]
TIMEBASE=1/1000
START=325658
END=503002
title=03. ワールドイズマイン (World is Mine)
... ...
2. The Unified Encapsulation
The following command maps all streams and enforces the creation of an independent chapter track required by the iOS/macOS TV App.
ffmpeg -fflags +genpts -i "video_input.mp4" -i "subs_input.srt" -i "chapters_metadata_input.txt" \
-map 0:v -map 0:a -map 1 \
-map_metadata 2 -map_chapters 2 \
-c:v copy -c:a copy \
-c:s mov_text -metadata:s:s:0 title="JP-CN Dual" -metadata:s:s:0 language=mul \
-movflags +faststart \
"output.mp4"
Key Parameter Breakdown:
-
-c:s mov_text: Converts SRT to Apple’stx3gTimed Text format, the only subtitle format natively supported by the iOS MP4 container. -
-map_chapters 2: Explicitly maps the third input (index 2, the metadata file containing chapter info) as the source for the global chapter table. QuickTime won’t show chapters unless this is specified. -
-movflags +faststart: Relocates the moov atom to the beginning of the file, allowing “instant-play” via progressive downloading. Otherwise, the video may not start until fully loaded.
3. Verification Workflow
To confirm a successful mux, verify the internal atom structure:
-
mediainfoCheck: EnsureMenu #1andMenu #2appear in the summary. Text track must be linked viatref(track reference) atom. For instance, as shown below, Track ID 3’s Menu#1 is for Track ID 1,2.-
Command Reference:
mediainfo OUTPUT.mp4 -
Expected Output Sample:
Video #1 ID : 1 Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main@L4@Main Codec ID : hvc1 Codec ID/Info : High Efficiency Video Coding Duration : 5 min 0 s Source duration : 5 min 0 s Maximum bit rate : 2 646 kb/s Width : 1 920 pixels Height : 1 080 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 29.970 (30000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 (Type 0) Bit depth : 8 bits Scan type : Progressive Source stream size : 88.4 MiB (86%) Writing library : x265 4.1+1-1d117be:[Mac OS X][clang 17.0.0][64 bit] 8bit+10bit+12bit Encoding settings : cpuid=98 / ... /no-frame-rc Language : English Tagged date : 2025-12-22 10:17:14 UTC Color range : Limited Color primaries : BT.709 Transfer characteristics : BT.709 Matrix coefficients : BT.709 Menus : 3,4 mdhd_Duration : 300133 Codec configuration box : hvcC Video #2 ID : 4 Format : JPEG Codec ID : jpeg Duration : 5 min 0 s Duration_FirstFrame : 966 ms Bit rate mode : Variable Bit rate : 2 040 b/s Width : 640 pixels Height : 360 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 0.007 FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Compression mode : Lossy Bits/(Pixel*Frame) : 1.265 Stream size : 71.2 KiB (0%) Language : English Default : No Encoded date : 2025-12-22 09:54:15 UTC Tagged date : 2025-12-22 10:17:14 UTC For : 1 ColorSpace_ICC : RGB colour_primaries_ICC_Description : Generic RGB Profile Audio ID : 2 Format : AC-3 Format/Info : Audio Coding 3 Commercial name : Dolby Digital Codec ID : ac-3 Duration : 5 min 0 s Bit rate mode : Constant Bit rate : 384 kb/s Channel(s) : 6 channels Channel layout : L R C LFE Ls Rs Sampling rate : 48.0 kHz Frame rate : 31.250 FPS (1536 SPF) Compression mode : Lossy Stream size : 13.7 MiB (13%) Language : English Service kind : Complete Main / Complete Main Default : Yes Alternate group : 1 Tagged date : 2025-12-22 10:17:14 UTC Menus : 3 Dialog Normalization : -31 dB compr : -0.28 dB cmixlev : -3.0 dB surmixlev : -3 dB dialnorm_Average : -31 dB dialnorm_Minimum : -31 dB dialnorm_Maximum : -31 dB Menu #1 ID : 3 Format : Timed Text Codec ID : text Duration : 5 min 0 s Title : Chapters Language : English Tagged date : 2025-12-22 10:17:14 UTC Menu For : 1,2 Duration_FirstFrame : 966 00:00:00.000 : 01. ハジメテノオト (Hajimete no Oto) 00:02:30.483 : 02. Project DIVA desu. Menu #2 00:00:00.000 : 01. ハジメテノオト (Hajimete no Oto) 00:02:30.483 : 02. Project DIVA desu.Note: The chapter text track should be ID 3 and the chapter thumbnail track should be ID 4.
-
-
mp4boxCheck: Ensure the chapter track is listed.-
Command Reference:
mp4box -info OUTPUT.mp4 -
Expected Output Sample:
# Track 1 Info - ID 1 - TimeScale 30000 Media Duration 01:58:23.396 (recomputed 01:58:23.429) Track has 1 edits: track duration is 01:58:23.397 Track flags: Enabled In Movie Media Language: English (en) Media Samples: 212889 - CFR 29.970030/sec Visual Track layout: x=0 y=0 width=1920 height=1080 Media Type: vide:hvc1 Visual Sample Entry Info: width=1920 height=1080 (depth=24 bits) HEVC Video - Visual Size 1920 x 1080 HEVC Info: Profile Main @ Level 4 - Chroma Format YUV 4:2:0 NAL Unit length bits: 32 - general profile compatibility 0x60000000 Parameter Sets: 1 VPS 1 SPS 1 PPS SPS resolution 1920x1080 - Pixel Aspect Ratio 1:1 - Indicated track size 1920 x 1080 Bit Depth luma 8 - Chroma 8 - 1 temporal layers VPS#1 hash: 404DE42FD661F021A6D65558D6262919A28BB2DC SPS#1 hash: C84BDEDD5842CA8F651116584B60EDD80CF32A8A PPS#1 hash: 476CC254A14C7C38F1EC443B8C1E3E48B9736FDF RFC6381 Codec Parameters: hvc1.1.6.L120.90 Average GOP length: 90 samples Max sample duration: 1001 / 30000 # Track 2 Info - ID 2 - TimeScale 48000 Media Duration 01:58:19.488 Track has 1 edits: track duration is 01:58:19.488 Track flags: Enabled In Movie Media Language: Japanese (jpn) Media Samples: 221859 1 UDTA types: tagc: public.main-program-content Alternate Group ID 1 Media Type: soun:ac-3 AC-3 stream - Sample Rate 48000 - 5.1 channel(s) - bitrate 384000 RFC6381 Codec Parameters: ac-3 All samples are sync Max sample duration: 1537 / 48000 # Track 3 Info - ID 3 - TimeScale 1000000 Media Duration 01:55:33.430 Track has 1 edits: track duration is 01:55:33.430 Track flags: Enabled In Movie Media Language: Multiple languages (mul) Media Samples: 1613 2 UDTA types: name: 日中双语 (JP/CN) tagc: public.main-program-content Alternate Group ID 2 Media Type: sbtl:tx3g QT/3GPP subtitle Size 0 x 0 - Translation X=0 Y=0 - Layer 0 RFC6381 Codec Parameters: tx3g All samples are sync Max sample duration: 56850000 / 1000000 # Track 4 Info - ID 4 - TimeScale 1000 Media Duration 01:58:23.396 Track has 1 edits: track duration is 01:58:23.396 Track flags: Disabled In Movie Media Language: English (en) Chapter Labels Media Samples: 45 Media Type: text:text QT/3GPP text Size 0 x 0 - Translation X=0 Y=0 - Layer 0 RFC6381 Codec Parameters: text All samples are sync Max sample duration: 288621 / 1000 # Track 5 Info - ID 5 - TimeScale 1000 Media Duration 01:58:23.396 Track has 1 edits: track duration is 01:58:23.396 Track flags: Disabled In Movie Media Language: Japanese (ja) Chapter Thumbnails Media Samples: 45 Visual Track layout: x=0 y=0 width=640 height=360 Media Type: vide:jpeg Visual Sample Entry Info: width=640 height=360 (depth=24 bits) JPEG Image - Resolution 640 x 360 RFC6381 Codec Parameters: mp4v.6C All samples are sync Max sample duration: 288621 / 1000Note: The chapter text track should be Track 4 and the chapter thumbnail track should be Track 5.
-
-
ffprobeCheck: Ensure chapter metadata is present as one text track and another video track (thumbnails).-
Command Reference:
ffprobe -v error -show_entries stream=index,codec_type:stream_tags=handler_name -of default=noprint_wrappers=1 'OUTPUT.mp4' -
Expected Output Sample:
index=0 codec_type=video TAG:handler_name=VideoHandler index=1 codec_type=audio TAG:handler_name=SoundHandler index=2 codec_type=subtitle TAG:handler_name=JP/CN Dual Subtitles index=3 codec_type=data TAG:handler_name=SubtitleHandler index=4 codec_type=video index=5 codec_type=videoNote: The subtitle track should be index 3 (text) and index 4 (thumbnail). And index 5 should be the video cover track.
-
4. Conclusion
By following the outlined encapsulation procedures, video files can be effectively prepared for optimal playback on Apple devices, ensuring that subtitles can be seamlessly integrated and functional within the iOS and macOS environments.
While the QuickTime-compatible chapter track is properly recognized on macOS, it is not displayed in the native iOS player. However, it may be visible when using third-party players like Infuse. This is likely due to Apple’s restrictions on third-party chapter markers since iOS 12.3, as noted by the developer of mChapters.

More Backgrounds About tref and MP4 Atoms
An atom (also known as a box) is the fundamental building block of the MP4 (MPEG-4 Part 14) and QuickTime (MOV) container formats.
Every atom consists of two main parts:
- Size: A 4-byte integer defining the total length of the atom.
- Type (FourCC): A 4-character code (like
moov,trak, ormdat) that identifies what the atom does.
Primary Atoms in MP4:
-
ftyp(File Type): The very first atom. It identifies the “Brand” (like isom or mp42) so the iPad knows which standards the file follows. -
mdat(Media Data): The largest atom. This contains the actual encoded “meat” of the file—your H.265 video frames and AC-3 audio samples. -
moov(Movie Resource): The “Table of Contents.” It contains all the metadata, timing information, and pointers to the data inside the mdat. Note: Using -movflags +faststart moves this atom to the beginning of the file so your iPad can start playing before the full 17GB is downloaded.
Atoms Relevant to the Chapter Problem:
-
trak(Track): Each video, audio, and subtitle stream has its own trak atom. -
udta(User Data): This is where FFmpeg writes your metadata.txt chapter info by default. iOS 19+ often ignores this for navigation. -
tref(Track Reference): It is a sub-atom inside the Video Track (trak) that must explicitly “point” to the Chapter Track.
References
-
Demystifying the mp4 container format: https://agama.tv/demystifying-the-mp4-container-format/#:~:text=a%20moov%20(movie)%20box%2C,containing%20the%20audio%2Fvideo%20payload
-
A Quick Dive into MP4: https://dev.to/alfg/a-quick-dive-into-mp4-57fo#:~:text=file%2Dmp4_example_2%2Dgo-,%24%20go%20run%20mp4.go%20tears%2Dof%2Dsteel.,mp41%5D%20moov.name%3A%20moov