A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries

Data collection setup

We chose the Samsung Galaxy Active 2 for dataset collection since, at the time of the study (2021), it was one of the few devices that (1) allowed third parties to access raw PPG data through the development and embedding of a wearable application and (2) enabled convenient adjustment of the sampling rate of features via the app. In addition, prior studies had validated the measurement accuracy of this device¹⁸. Data collection began with an in-person orientation session where we explained experimental details, including the purpose and duration of the study, data collection methods, participation rewards, and instructions for using the wearable device. Participants were given a smartwatch and instructed to wear it on the wrist of their non-dominant hand at all times during the experimental period, except during sleep, when they were asked to charge the device. Although the wearable device is water-resistant, participants were advised to remove it during long baths or swimming.

Data collection

Procedure

Figure 1 illustrates the overall data collection workflow. The initial orientation was followed by four weeks of data gathering, concluding with device reclamation. A pre-survey was conducted at the outset to gather general demographics and lifestyle information from the participants. During the data-gathering period, participants completed a short online survey assessing their mental health three times (i.e., two-week intervals). We evaluated insomnia using the Insomnia Severity Index (ISI)¹⁹ questionnaire. We also surveyed mental health using PHQ-9 (Patient Health Questionnaire-9)²⁰ and GAD-7 (General Anxiety Disorder-7)²¹ to assess depression and anxiety, respectively.

Application

We developed a wearable device application named “Heart+” to facilitate data collection. This application gathered three types of data: (1) activity, based on an accelerometer, gyroscope, gyroscope rotational vector, and pedometer; (2) physiological, including heart rate and PPG; and (3) environmental sensing, specifically recordings of ambient light. Built on the Tizen platform (https://www.tizen.org/)—Samsung’s operating system designed specifically for embedded programming—the application was configured to sample data at 100 ms intervals (i.e., 10 Hz).

PPG sampling frequencies vary widely in the literature, with studies using rates ranging from 5 Hz to 100 Hz and recent experiments sampling at 20–25 Hz^22,23. We selected 10 Hz to balance temporal resolution against signal processing principles and practical constraints. More specifically, according to the Nyquist theorem, when sampling a continuous signal, the sampling rate should be at least twice the highest frequency to be captured. Since human heart rates range from 40–220 bpm (i.e., 0.67–3.67 Hz), a minimum sampling rate of 7.34 Hz (2 × 3.67 Hz) is theoretically sufficient to capture the maximum heart frequency²⁴. Additionally, studies demonstrate that interpolation methods can enhance HRV accuracy from lower temporal resolution PPG signals, supporting our sampling frequency decision²².

In terms of practical considerations, adjusting sampling frequency is a key strategy for reducing computational load and power consumption due to limited system resources in wearable devices²⁵. The selected sampling frequency enables continuous recording throughout the day while maintaining a battery life of up to 14 hours. Thus, 10 Hz provides an adequate margin above the theoretical minimum while minimizing battery consumption—a key consideration for uninterrupted 24/7 monitoring over four weeks. Lower sampling rates also reduce power consumption, memory, and data transmission burdens²². Finally, the heart rate monitor on this device, provided by the Samsung Health application developed by Tizen, uses an internal proprietary algorithm to automatically record real-time heart rate based on PPG signals. All of these signals are temporarily stored on the watch and transmitted to the server at regular intervals (every 30 minutes) when connected by Wi-Fi, thus minimizing battery use²⁴.

We used the Tornado framework (https://www.tornadoweb.org/) to create a RESTful API that facilitates communication between the application and the server. This API has two endpoints: the first stores data from the registration portal and generates unique user identifiers, and the second handles the storage of all sensor data retrieved from the watch.

The data, including user ID and sign-up time, were stored in a MongoDB Instance within the Users collection. Device identifiers, such as device ID and remaining battery ratio, were stored in the Devices collection. Meanwhile, continuous raw sensor data was packaged into CSV files and locally stored by the watch until transmission to the server along with the device ID. Before storing, the server referenced the MongoDB collections, matched the user ID with the transmitted device ID data, and sorted the CSV files into directories categorized by user ID. Once the watch connected to Wi-Fi, the packaged CSV files were transferred to the server according to user IDs, and the transferred files were then deleted from the watch locally.

Monitoring

We monitored the data collection process to ensure it was conducted properly. First, we reviewed the recordings from wearables and sleep diaries daily. Participants were informed that the research team would monitor daily recordings solely to verify that the smartwatch was worn correctly and that the data were stored accurately. If a wearable device was not worn (1) more than three times or (2) for longer than two hours per day (excluding sleep time), an individual warning was sent. Additionally, we checked daily to ensure that all participants logged an entry in their sleep diary.

Second, we set up an online chat room through a social network messenger. Reminders to wear the watch and complete daily tasks, such as connecting to the Heart + app for data storage and filling out daily diaries, along with important announcements (e.g., the three clinical surveys), were sent via this group chat. Participants were also instructed to report any issues in the chat and were provided with an emergency contact number for urgent questions and updates.

Ethics statement

The dataset workflow was developed with and approved by the Institutional Review Board of KAIST (KH2020-027). We obtained written informed consent from participants, using a form that outlined the purpose, duration, and procedure of data collection. All participants agreed to the use of anonymized personal information for research purposes and were compensated with USD 100 if they faithfully participated in the study to its completion. The most diligent 10 participants were rewarded with an extra coffee coupon. All data was anonymized before release, ensuring compliance with the privacy rights of the participants.

Participant recruitment and preparation

Our objective was to collect real-world daily sensor data along with information on sleep and mental health in healthy individuals. We recruited participants from a university and a research institution in South Korea through online postings and flyers. The recruitment announcement specified the following eligibility criteria: (1) aged 20 to 50, and (2) not undergoing any hospital treatment for acute medical, surgical, or psychiatric illness. Ultimately, 49 participants were recruited for a four-week experiment.

Table 1 summarizes participant demographic information both in aggregate and stratified by gender. The participants were balanced across three categories, which included office workers (35%), undergraduate students (30%), and graduate students (35%). Figure 2 presents the distribution of participants by lifestyle factors such as smoking, exercise frequency, consumption of alcohol and coffee, and overall lifestyle regularity, all of which were self-reported. The vast majority of participants reported being non-smokers, consuming alcohol at most once a week, and maintaining a regular lifestyle. Participants were relatively balanced in their exercise frequency and daily coffee consumption levels. The distribution of participants based on scores from clinical questionnaires assessing insomnia (ISI), depression (PHQ9), and anxiety (GAD7) is presented in Fig. 3. Although these assessments were conducted at three time points—before, midway, and after the experiment—we report the distributions from the pre-experiment and post-experiment assessments (i.e., four weeks apart), for visual conciseness. All the scores are available in the provided data.

Table 1 Demographic characteristics of participants.

Fig. 2

Distribution of participant lifestyle survey responses. The figure illustrates the frequency of participant responses across five categories: weekly exercise frequency (A), daily coffee consumption (B), overall lifestyle regularity (consistency in daily activities) (C), alcohol consumption frequency (D), and smoking frequency (E). Data are presented for all participants, with bar colors indicating female (red), male (purple), and all (grey) participants.

Data processing

The final dataset collected over the four weeks includes five types of data: (1) participant demographics, (2) smartwatch sensor data, including PPG signals, (3) computed HRV features, (4) sleep diaries, and (5) biweekly clinical survey results. The dataset also contains demographic and lifestyle information of participants matched to their device IDs. To process raw signals from wearable devices, we referred to the official Tizen documentation for sensor descriptions, filtering out values outside the acceptable range for each sensor before aggregating them into 5-minute chunks (https://docs.tizen.org/application/native/guides/location-sensors/device-sensors/). Additionally, we used sensor recordings to detect periods when smartwatches were off-wrist (e.g., not being worn) by excluding any data lacking a heartbeat frequency or heartbeat values outside the acceptable range. The research team manually inspected the sleep diaries to ensure their accuracy, correcting any AM/PM confusion by participants. Through daily monitoring, we detected issues with two participants (i.e., device IDs ab50 and kb24) during the first three days of the experiment. These participants were instructed to wear the device for an additional three days beyond the end of the study.

A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries

Data collection setup

Data collection

Procedure

Application

Monitoring

Ethics statement

Participant recruitment and preparation

Data processing

Continue Reading

More posts

Nike x Tomo Koizumi Debut a Bold, Vibrant Expression of the Air Superfly — NIKE, Inc.

Asian Futures Show Cautious Optimism Ahead of Fed: Markets Wrap

‘We’re here to win,’ declares Patrick Rafter at his first look inside Chase Center | News

Tronox Announces Pricing of $400 Million Aggregate Principal Amount of 9.125% Senior Secured Notes