Hi there! Apologies if this question has already been addressed in a previous post or directly in the manual. My team is using ELAN 6.2 to annotate GoPro videos (.mp4, 59.94fps, 60000 frames/1001 msec) as a part of a larger project which requires synchronization of multiple data streams, where these GoPro videos (and any annotations of them) would be aligned in reference to another data stream (not brought into ELAN) altogether. We would greatly appreciate if anyone could answer the following:
To what degree can we trust the values given in the exported variable ‘XXXX Time - hh:mm:ss.ms’ to refer to that moment in time on the original video? Is there any processing done to the video within ELAN? If so, where would I find that information?
This is an important issue which is touched upon in bits and pieces in other posts and manuals, but is, I believe, not discussed extensively in a single article or post.
Concerning your question there are two things to take into account:
ELAN doesn’t do any video processing itself but instead includes or incorporates several media players/frameworks to which media playback is delegated. ELAN interacts with the players by sending start, pause, get/set media time etc. messages to the players. It depends on the platform (operating system) which media frameworks are available. Some frameworks (JavaFX and VLC) are available on all supported platforms and even then issues and accuracy can differ per platform and per media format (so far the ELAN manual).
The best performance and accuracy is achieved with the default media frameworks on Windows (based on Microsoft Media Foundation) and macOS (based on Apple’s AVFoundation). In all cases ELAN depends on the accuracy of the player, e.g. when the player is paused, ELAN cannot really verify that the image that is shown and the current media time that is reported by the player are in sync.
To illustrate this, it has been observed that with (some) mp4 files, jumping to a certain point in time (e.g. the start time of an annotation) can show a different video frame depending on (e.g.) whether the jump is forward or backward in time. The reported media time will correspond with the start time of the annotation, but the shown image might be 1 or 2 frames off. The risk of running into this issue depends on the encoding of the video, especially the GOP (group of pictures) and/or the B-frames settings. The least problematic files are I-frame or keyframe only encoded files, but these are also the largest in size and usually require re-encoding of the video (which is not always feasible). Our media encoding guide makes a mention of this.
Media times in ELAN and in its EAF files are expressed in milliseconds (for historic reasons). So there will be rounding effects in case of video frame rates where the frame duration in ms is not an integer value. E.g. if the frame duration is 16.683… ms, the start time of an annotation aligned with the second frame will be stored as 16 or 17 ms.