Part V: creating the EQ-i 2.0 and EQ 360 2.0

Standardization, Reliability, and Validity

EQ-i 2.0 Pilot Study and Standardization

Standardization is an important part of test development, involving the collection of pilot and normative data. Pilot data is used to test the basic functions of an assessment, such as its reading level, response instructions, and completion time. Issues that may arise in these areas can then be addressed before normative data collection begins. Normative data establish a baseline against which all subsequent results are compared, and enable the test developer to capture the characteristics of an “average” respondent. Norms indicate the average performance on a test and the distribution of scores above and below the average (Anastasi, 1988). A large, representative normative database ensures that the reference group is inclusive with respect to demographic variables such as age, gender, education level, and race/ethnicity, increasing the audience to which the assessment is relevant. This section describes the method of data collection and the breakdown of the pilot and normative samples, including the effects of age and gender on the EQ-i 2.0 results. Data collection for the EQ-i 2.0 followed multiple stages between June 2009 and December 2010. More than 10,000 participants completed the EQ-i 2.0 over this time period. These data were collected for pilot testing, the creation of norms, and validation analyses.

Data Collection

PILOT PHASE

This first stage of data collection took place between June 2009 and November 2009. Participants in the EQ-i 2.0 pilot dataset (N = 1,346; Table A.1) were 58.8% female with a mean age of 35.5 years (SD = 14.6 years). The majority of the sample was White (74.9%) and the largest education level group was college/university degree or higher (45.7%). These data were collected to ensure that the basic functionality of the EQ-i 2.0 (e.g., instructions, response options, administration time) was adequate. The pilot study confirmed most aspects of the test protocol and where needed, some adjustments were made to fine tune the assessment.

NORMATIVE PHASE

This second phase that involved the collection of data that were included in the normative sample, as well as reliability and validity data, took place between March 2010 and December 2010.

Data were gathered from all 50 U.S. states and the District of Columbia, as well as from all 10 Canadian provinces. All raters were sent an email invitation to participate in the EQ-i 2.0 data collection process. Those who agreed to participate completed the assessment online, and were compensated for their time. Various measures were undertaken to ensure all data met the highest levels of data authenticity. For example, data were screened so that any potentially illegitimate assessments (e.g., participant responded with only a single response option for a significant number of items in a row, left too many items missing, took less than 10 minutes or more than 90 minutes to complete the assessment, etc.) were excluded from the dataset. The following section focuses on a description of the normative sample; see the EQ-i 2.0 Reliability and EQ-i 2.0 Validity sections further down this page for more information on the reliability and validity samples.

In order to create representative normative samples, specific demographic (i.e., age, gender, race/ethnicity, education level and geographic targets), guided by recent Canadian and U.S. Census information (i.e., Statistics Canada, 2006; U.S. Bureau of the Census, 2008), were utilized during the data collection procedure. In order to create the representative samples for the EQ-i 2.0, information was collected on each participant’s gender, age, race/ethnicity (Asian/Pacific Islander, Black/African-American/African-Canadian, Hispanic/Latino, White, Multiracial, or Other), highest level of education attainment (high school or less, some college/university, college/university degree or higher), employment status (employed/self-employed, unemployed, retired, or other), and geographic location (state/province and country). For ease of presentation, race/ethnicity groups are referred to in this manual as follows: Black, Hispanic/Latino, White, and Other.

Standardization

This section describes the process of standardization for the EQ-i 2.0, including a description of the normative sample, and the statistical analyses that were conducted in order to create normative groups and standardized scores.

NORMATIVE SAMPLE

Normative data were collected between March 2010 and April 2010. During this time period, 4,996 participants provided EQ-i 2.0 data for standardization purposes. A final sample of 4,000 participants was selected as the normative dataset. Statistical analyses showed no meaningful differences between U.S. and Canadian participants in EQ-i 2.0 scores (i.e., none of the Cohen’s d values even reached a small effect size; Table A.2), so data from both countries were included together in a single normative sample. The EQ-i 2.0 normative sample was collected within ten age ranges (400 cases in each age range), equally proportioned by gender (Table A.3). The data provided in Tables A.4 through A.6 indicate that the normative sample is very similar to the Censuses (within 3%) in terms of race/ethnicity, geographic region, and education level. Therefore, the reference group against which individual EQ-i 2.0 scores are compared is representative of the North American general population.

NORMING PROCEDURES

The first step in preparation of the norms was to determine if any trends existed in the data. For instance, large differences in scores between men and women, or across various age groups, could provide an argument for creating separate gender- or age-based norm groups. Conversely, a lack of such differences may dictate the use of a single norm group with genders and age groups combined. A series of analyses of covariance (ANCOVA; for Total EI) and multivariate analyses of covariance (MANCOVA; for the Composites and Subscales) were used to examine the relationships between gender and age with EQ-i 2.0 scores. For ease of interpretability, the ten age groups were condensed into five (18–29 years, 30–39 years, 40–49 years, 50–59 years, and 60+ years) for these analyses, with education level and race/ethnicity as covariates (in order to control for the effects of these demographic variables). In an attempt to control for Type I errors that might occur with multiple analyses, a more conservative criterion of p < .01 was used for all F-tests.

The Wilk’s lambda statistic generated from these analyses ranges from 0.00 to 1.00 and conveys the proportion of variance that is not explained by the effect (in this case, the interaction between gender and age) in the multivariate analyses. These values were all close to 1.00, suggesting that only a small amount of variance could be explained by the interaction. However, F-tests revealed significant effects of gender, age, and the interaction of gender and age (see Table A.7). Given these results, the univariate effects are described in detail below.

Focus on Effect Size. The large sample size dictates that effect sizes should be considered more strongly than significance tests (see the previous section on Effect Size). The effect sizes are provided in Table A.8. While Cohen’s d values are reported to describe the size of the gender effects, Cohen’s d values are not appropriate for describing age effects (where there are more than two groups). Furthermore, previous research has determined that associations between age and EI are generally non-linear, with scores increasing up to a certain age (around age 40–50) then either decreasing slightly or stabilizing (BarOn, 1997). Therefore, it is inappropriate to examine correlations between age and EI, because Pearson’s correlations are used to estimate linear trends and can therefore underestimate or completely overlook non-linear relationships. Instead, partial eta-squared (partial η2) values are reported and are used to summarize the overall effect of age on EI (technically speaking, it quantifies the proportion of variance in EI scores accounted for by the age groups).

Gender Effects. Results of the gender analyses showed that males and females did not differ significantly on the EQ-i 2.0 Total EI score, indicating that overall emotional intelligence as measured by the EQ-i 2.0 is the same for males and females; however, small to medium gender effects were found for some subscales (see Table A.8 for effects sizes and Table A.9 for descriptive statistics and significance test results). The largest difference was on Empathy, with women scoring higher than men with a moderate effect size (d = -0.49). Smaller differences were found with women scoring higher than men on the Interpersonal Composite (d = -0.33), Emotional Expression (d =-0.31), and Emotional Self-Awareness (d = -0.22). Men scored higher than women with small effect sizes on Stress Tolerance (d = 0.30), Problem Solving (d = 0.26), and Independence (d = 0.21). These differences are compatible with the logic of the EQ-i 2.0 conceptual framework and show empirical precedence, such as in the original EQ-i (see Bar-On, 2004). However, it is important to note that these effects were small and represent only a few absolute standard score points.

Age Effects. Significant but small age effects were found for the EQ-i 2.0 (see Table A.8 for effect sizes and Table A.10 for descriptive statistics and significance test results). The age differences varied from scale to scale. In some instances, scale scores increased with age (i.e., Total EI, Self-Regard, Interpersonal Composite, Interpersonal Relationships, Empathy, Stress Management Composite). In other cases, scores increased until about age 40–49 years, then the scores stabilized or decreased slightly (i.e., Self-Expression Composite, Independence, Problem Solving, Flexibility, Stress Tolerance). Differences between age groups were generally only a few standard score points in magnitude. Previous research has demonstrated similar age trends (see Bar-On, 2004). Emotional Self-Awareness and Assertiveness were the only subscales that failed to show at least a small effect size.

Gender × Age Interaction. There were no interactions between age and gender; partial η2 values did not reach the minimum criterion for a small effect size (Table A.8). In fact, partial η2 values were .00 for all scales. In other words, any age effects were consistent within males and females, and any gender effects were consistent within age groups.

Overall, the age and gender analyses revealed significant, but small effects. Therefore, both specific “Age and Gender norms” (i.e., age and gender specific) as well as “General population norms” (i.e., neither age nor gender specific) were developed. Actual construction of the norms was conducted by a multi-step statistical process. Results revealed that skewness and kurtosis values were close to 0 (skewness values ranged from -0.93 to -0.15; kurtosis values ranged from -0.17 to 0.77), and an examination of the scale histograms did not reveal any significant departures from normality (an example histogram for the EQ-i 2.0 Total EI score is provided in Figure A.1). Therefore, artificial transformation of scores to fit normal distributions was deemed unnecessary.

In the next step, means were statistically smoothed for the Age and Gender norms. Data points that diverged significantly from a smooth curve partly reflect true differences and partly reflect sampling variability (Zachary & Gorsuch, 1985). To mitigate the effect of sampling variability, the data were smoothed using the following technique. Means and standard deviations were computed at each age group, separately for males and females, for every score. For each scale, regression analysis was used to find the best fitting curve (linear or curvilinear) across age. Linear and quadratic effects of age were the independent variables, and the mean scores at each age were the dependent variables. At each age, the predicted score mean from the regression was used in conjunction with the original (unsmoothed) mean to produce the final norms. Specifically, the final “smoothed” mean was a weighted mean of the regression generated value, and the original, unsmoothed mean (each a 50% weighting). Use of this smoothed normative value allows for irregular but real differences between age groups to have an effect, while reducing the impact of random fluctuation. The smoothed values were averaged within each of the five age groups for the computation of the standard scores. For example, the mean of the means and standard deviations for 18-year-olds, 19-year-olds, 20-year-olds, and so on up to 29-year-olds were computed for the 18-29 years group.

Standardization Summary

Over 10,000 EQ-i 2.0 assessments were collected between 2009 and 2010 for the standardization of the tool. A sample of 4,000 participants was chosen as the EQ-i 2.0 normative sample. The sample was evenly distributed by gender and age, and matched to the Census based on race/ethnicity, geographic region, and highest level of educational attainment. Statistical analyses revealed small differences across gender and age; therefore, general norms as well as separate age and gender norms are available as options in the use of the EQ-i 2.0. The norming process resulted in standard scores with means of 100 and standard deviations of 15 for the Total EI score, Composite Scales, and Subscales.