Naturalistic driving studies often make use of cameras to monitor driver behavior. To analyze the resulting video images, human annotation is often adopted. These annotations then serve as the 'gold standard' to train and evaluate automated computer vision algorithms, even though it is uncertain how accurate human annotation is. In this study, we provide a first evaluation of glance direction annotation by comparing instructed, actual glance direction of truck drivers with annotated direction. Findings indicate that while for some locations high annotation accuracy is achieved, for most locations accuracy is well below 50%. Higher accuracy can be obtained by clustering these locations, but this also leads to reduced detail of the annotation, suggesting that decisions to use clustering should take into account the purpose of the annotation. The data also show that high agreement between annotators does not guarantee high accuracy. We argue that the accuracy of annotation needs to be verified experimentally more often.