The SEWA Database

 

A database of annotated audio and 2D visual dynamic behaviour (recorded by standard webcams used by the volunteers) has been collected within the SEWA project.

In the data-collection experiments, volunteers have been divided into pairs based on their cultural background, age and gender. During initial sign-up, the volunteers have completed demographic measures (in particular gender, age, country of origin, education, personality, and familiarity with the person with whom they chatted in a computer-mediated face-to-face interaction session). Then, each pair of the volunteers participated in two parts of the experiment, resulting in two sets of recordings.

  • Experimental Setup Part 1: Each volunteer is asked to watch adverts (each person watches 4 adverts, each being about 60 seconds long). These adverts have been chosen to elicit mental states including amusement, empathy, liking and boredom.  After watching the advert, the volunteer is also asked to fill-in a questionnaire to self-report his/her emotional state and sentiment toward the advert.
  • Experimental Setup Part 2: After watching the 4th advert, he/she discusses about the last watched advert and the content he/she has just seen with another volunteer usually known to the first volunteer by means of a video-chat software (on average, 3 minutes long conversations). The discussion is intended to elicit further reactions and opinions about the advert and the advertised product such as whether the product is to be purchased, whether it is to be recommended to others, what are the best parts of the advert, whether the advert is appropriate, how it can be enhanced, etc.. After the discussion, each participant is asked to fill-in a questionnaire to self-report his/her emotional state and sentiment toward the discussion.

The entire watching of adverts and the subsequent conversation between the volunteers is recorded using web-cameras and microphones integrated into the laptops/PCs of the volunteers.

In the SEWA project, we aimed to record 6 groups of volunteers (30 persons per group) from six different cultural backgrounds: British, German, Hungarian, Greek, Serbian, and Chinese. The volunteers in each group will have a broad distribution in gender and age. Specifically, there are at least three pairs of native speakers in each age group -- 20+, 30+, 40+, 50+, 60+ -- for each culture. The resulting database contains a total of 199 sessions of experiment recordings, with 1525 minutes of audio-visual data of people's reaction to adverts from 398 individuals and more than 550 minutes of recorded computer-mediated face-to-face interactions between pairs of subjects.

The SEWA database includes annotations of the recordings in terms of facial landmarks, facial action unit (FAU) intensities, various vocalisations, verbal cues, mirroring, and rapport, continuously valued valence, arousal, liking, and prototypic examples (templates) of (dis)liking and sentiment. The data has been annotated in an iterative fashion, starting with a sufficient amount of examples to be annotated in a semi-automated manner and used to train various feature extraction algorithms developed in SEWA, and ending with a large DB of annotated facial behaviour recorded in the wild.

Accurately labelled/annotated real-world data are the crux in designing audio-visual human behaviour sensing, tracking and interpretation algorithms that will achieve robust performance in-the-wild. The SEWA DB is the very first of that kind to be released for research purposes. This database is not only an extremely valuable resource for researchers both in Europe and internationally but it will also push forward the research in automatic human behavioural analysis and user-centric HCI and FF-HCI.