Validation of the art education in medical education questionnaire (AEMEQ): integrating medical sciences with the humanities | BMC Medical Education

A literature review was performed to clarify art education as a tool of medical humanities integration into medical education. a literature search was conducted in databases such as Google Scholar, PubMed, Iran Medex, Elsevier, Springer, Embase, Scopus, PsycINFO, and ERIC using keywords medical humanities/humanities, medical education, teaching, integration, art, learning, medicine, and clinical skills. After determining the dimensions and items of the instrument, the interview and expert focus group were conducted to match the participants’/respondents’ conceptualization of the construct. Finally, the initial instrument of AEME was developed with 53 items and scored on a five-point Likert response. The corresponding author completed the item generation as part of the NASRME project No. 972,475. Certain steps should be taken to evaluate the questionnaire’s validity. This study used the AMEE Guide for designing high-quality questionnaires. Addressing each step systematically will improve the probability that the survey designed will accurately measure what it intends to measure [16]. According to the AMEE guide 87, the survey scale design process has 7 steps. After item development, which includes the first four steps, the next three steps are expert validation, cognitive interviews, and pilot testing, as expressed below.

Participants

This study was conducted on medical students at Birjand University of Medical Sciences. Based on Krejcie and Morgan’s table from 923 studying medical students in different stages, the sample size was determined to be 555. Then, three incomplete questionnaires were excluded. The sample size determination based on Krejcie and Morgan’s Table [17] was deemed sufficient for the psychometry of the tool according to previous studies [18]. Samples were allocated based on the number of students in basic medical sciences (n = 350), clinical preparations (n = 146), and internships (n = 427). The questionnaire was prepared in electronic format. At the beginning of the questionnaire, students were asked to provide their consent to participate, and upon informed acceptance, they were presented with the electronic form. Participants were informally invited to take part in the study through convenience sampling, and the questionnaire link was shared with university students (https://survey.porsline.ir/s/MIG2vGxJ).

Expert validation-face and content validity

Face validity is the lowest level of content validity, which is used after the tool preparation. Without considering specific statistical analysis, it quickly determines whether the tool is visually suitable for the desired variable [19].

Content validity is related to an aspect of the questionnaire that indicates whether the stated questions cover all important and necessary domains of the variable of interest. Usually, the more items in the questionnaire, the better. Content validity better reflects the concept domain of the variable [20]. The criteria for an expert are often based on experience or knowledge of the survey field and, practically, dependent on the individual’s willingness and availability [21]. In this way, the 20 faculty members informed of artistic education with various specialists (medical education, biostatistics, and professors of basic and clinical sciences) were included. The criterion for the item acceptability should be determined in advance. The content validity ratio (CVR) and the content validity index (CVI) are common metrics used in decision-making for individual items. In addition to collecting quantitative data, the expert’s free-text comments were considered to elucidate existing items in which its construct is not well-represented. The Content Validity Ratio (CVR) was employed to quantify the perceived essentiality of individual items, as originally proposed by Lawshe [22]. Experts independently rated each item as “essential,” “useful but not essential,” or “not necessary.” CVR values were then calculated to reflect the proportion of experts identifying an item as essential. Items exceeding the critical threshold determined by the number of raters were considered to demonstrate acceptable content validity. To complement this assessment, the Content Validity Index (CVI) was calculated to evaluate the overall clarity and relevance of the items.

Cognitive interview

The interview (7 medical students) was performed with immediate retrospective verbal probe questions designed to elicit specific information [23]. The common verbal probes were (1) Restate the item in your language, and (2) What do you think this item means? (3) Could you provide an example?

Information gathered from the cognitive interview was incorporated into the overall questionnaire, and each item was refined. The student’s responses were analyzed thematically and continued until data saturation was reached (no new codes emerged).

Pilot testing

The questionnaire pilot test collected valid evidence (members of the target population, namely medical students, n = 552) based on two main approaches: internal structure (factor analysis) and reliability analysis (internal consistency by Cronbach’s alpha coefficient). The details are described in Sect. 2.5-7.

Exploratory factor analysis

To validate the factor structure identified in the exploratory factor analysis (EFA), the remaining half of the dataset (n = 276) was used for confirmatory factor analysis (CFA). This method, using principal component analysis with Varimax rotation, considering eigenvalues above 1, and the scree plot, was employed. The quality of the dataset was evaluated using the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity. The KMO value exceeded 0.67, suggesting that the sample size was sufficient and that multicollinearity among items was within an acceptable range (values above 0.50–0.60 are generally considered adequate). Bartlett’s test produced a statistically significant result (p < 0.05), indicating that the correlation matrix was not an identity matrix and that meaningful shared variance existed among the items. Together, these results supported the suitability of the dataset for factor analysis and affirmed the robustness of the underlying factor structure [24, 25].

Confirmatory factor analysis (CFA)

CFA was performed using AMOS software, employing the maximum likelihood (ML) estimation method. This technique is widely adopted in cases where the data approximately satisfy the assumption of multivariate normality. The assumption of multivariate normality was examined using skewness and kurtosis statistics. Various statistics are available for assessing goodness-of-fit. Hu & Bentler [26] recommended multiple fit indices and their corresponding cut-off values for evaluating data: Tucker–Lewis Index (TLI) with a cut-off of TLI ≥ 0.90, Comparative Fit Index (CFI) with a cut-off of CFI ≥ 0.90, Root Mean Squared Error of Approximation (RMSEA) with a cut-off of less than 0.08 [27]. The parsimonious Normed Fit Index (PNFI) with a cut-off value of PNFI ≥ 0.5. Furthermore, some other indices, such as Goodness-of-Fit-Index (GFI) ≥ 0.9 and Parsimonious Comparative Fit Index (PCFI) ≥ 0.6, were suggested [28].

Discriminant analysis validity was assessed using the Average Variance Extracted (AVE) index. The average variance extracted indicates the proportion of variance in the indicators explained by the latent constructs, providing a more cautious assessment of the measurement model’s validity. Typically, values above 0.5 for Average Variance Extracted (AVE) are accepted [29]. Also, acceptable CR values indicate that the measurement items exhibit high levels of internal reliability. CR values greater than 0.7 are required for confirmation [29]. Another approach to evaluating discriminant validity is by using the Heterotrait-Monotrait (HTMT) correlation ratio [30]. A value approaching 1 suggests insufficient discriminant validity. To assess this using the HTMT method, its value is compared against a set cut point. If the HTMT exceeds that cut point, it implies that discriminant validity may not be present. Gold et al. challenged this approach and recommended setting the threshold at 0.90 [31].

Reliability assessment

The internal consistency method using Cronbach’s alpha was utilized to assess reliability. A value greater than 0.70 indicates good reliability [32, 33]. Additionally, the test-retest method is employed to evaluate reliability over time, with a threshold of 0.75 considered acceptable [32].

Statistical analysis

Quantitative data were described using central tendency and dispersion criteria, while qualitative variables were described using counts and percentages. Given the normality of the quantitative variables, Pearson’s correlation coefficient was applied to assess the relationships among them.

Exploratory factor analysis was conducted using SPSS 26 software, while confirmatory factor analysis was performed using AMOS software 4.5.0, with the semTools package for McDonald’s Omega was used.

Continue Reading