The SEWA database contains a total of 199 experiment sessions, involving 398 subjects from 6 different cultures (British, German, Hungarian, Greek, Serbian, and Chinese). Based on the subject’s activity in the experiment, each subject’s recording is further divided into 5 parts: recordings of the subject watching advert #1, #2, #3 and #4, and video-chat recording. In addition to audio and video data, we also include low-level audio descriptor (LLD) features and per-frame facial landmark locations for all SEWA recordings. For video-chat recordings, audio transcript and hand-gesture annotations are also provided. Last but not least, we provide 197 episodes of mimicry and a total of 248 agreement and disagreement episodes with three levels (low, medium and high) of intensities, all manually selected from the full video-chat recordings.

In addition to the full recordings, a subset of SEWA data has been selected to be annotated in more dimensions. This dataset, which is called the basic SEWA dataset, consist of 538 short (10~30 second long) segments cut from the full video-chat recordings. The selection criterion for these segments is the subject was in the emotional state of low / high arousal, low / high valence, or showed liking or disliking toward the advert / product. All segments were selected by annotators from the same culture of the recorded subjects. In addition to the aforementioned LLD features, facial landmarks, transcript, and hand-gesture annotation, the basic SEWA dataset also include annotations in continuously value arousal, valence and liking / disliking, head gesture (nod / shake), facial action unit (FAU) 1, 2, 4, 12 and 17, and template behaviours.