Part V: creating the EQ-i 2.0 and EQ 360 2.0

Standardization, Reliability, and Validity

EQ-i 2.0 Reliability

Reliability is defined as “the consistency of scores obtained by the same person when re-examined with the same test on different occasions, or with difference sets of equivalent items, or under other variable examining conditions” (Anastasi, 1988, p. 102). Two basic statistical methods for evaluating a test’s reliability are internal consistency and test-retest reliability analyses. Internal consistency refers to the general cohesiveness of its items, or the degree to which a particular set of items assess a single construct. Test-retest reliability refers to the stability of scores over time. Each of these two analyses was conducted for the EQ-i 2.0. From a practical perspective, internal consistency may be used to calculate the precision or “margin of error” associated with an individual’s EQ-i 2.0 score. These values are also referred to as confidence intervals (CI). Reliability analyses are also used to determine which subscales are in balance or out of balance with one another within a client’s EQ-i 2.0  profile (i.e., what is a meaningful difference between an individual’s subscale scores?).

Internal Consistency

Internal consistency conveys the degree to which a set of items are associated with one another. High levels of internal consistency suggest that the set of items are measuring a single, cohesive construct. Internal consistency is typically measured using Cronbach’s alpha (Cronbach, 1951). Cronbach’s alpha ranges from 0.0 to 1.0 and is a function of (a) the interrelatedness of the items in a test or scale and (b) the length of the test (John & Benet-Martinez, 2000). Higher values reflect higher internal consistency.

Cronbach’s alpha values for the EQ-i 2.0 scales in the normative sample are presented in Table A.11 (see the the previous Standardization section for a description of the normative sample). Given that Cronbach’s alpha is influenced by the number of items in a set (with more items generally leading to higher alphas), the number of items per scale is also displayed in this table. Though there is no universal criterion for a “good” alpha level, informal cutoffs for evaluating alpha are typically .90 is “excellent,” .80 is “good,” .70 is “acceptable,” and less than .70 is “unacceptable.” Most of the values found in Table A.11 demonstrate excellent reliability for the EQ-i 2.0, particularly notable given the small number of items included in most subscales. Looking at the General (Total Sample) column, the alpha value of the Total EI scale was .97, values for the composite scales ranged from .88 to .93, and values were .77 or higher for all subscales. These values were similar within the various age and gender normative groups, including a Total EI alpha of at least .97 in each norm group. Furthermore, these values are generally higher than those found in the original EQ-i normative samples. For instance, the average alpha reliability value for the original EQ-i Total EI score across nine normative samples was .79 (Bar-On, 2004). The high level of internal consistency found in the EQ-i 2.0 Total EI score supports the idea that, taken together, the EQ-i 2.0 items are measuring a single cohesive construct—namely, emotional intelligence.

CONFIDENCE INTERVALS

A practical application of alpha values is that they may be used to calculate the precision or margin of error associated with individual scores. Specifically, alpha values may be used to calculate confidence intervalsfor each individual score. Unlike physical attributes, such as height and blood pressure, psychological characteristics (such as EI) cannot be measured directly. Psychological assessments serve as estimates of an individual’s true score on these dimensions, and therefore some degree of uncertainty is associated with the obtained scores. Confidence intervals are a method of measuring the degree of this uncertainty. The relationship between alpha values and confidence intervals is inverse; as alpha values increase, confidence intervals decrease. In other words, as the internal consistency of an assessment increases, the degree of uncertainty decreases.
Confidence intervals at the 90% confidence level for all EQ-i 2.0 scores are integrated into the computerized reports as an option the user may select. For example, if a client obtains a score of 105 on the EQ-i 2.0 Total EI scale, 90% confidence intervals suggest that the margin of error is ± 4 points, with the true score ranging from a low of 101 to a high of 109. In other words, this individual’s actual level of EI will fall within this interval 90% of the time. Note that the score of 105 still remains the best single point estimate of the client’s Total EI.

BALANCING EI: Comparing Differences in Subscale Scores

The EQ-i 2.0 report includes an optional Balancing Your EI section. This section compares scores from every subscale to three related subscales. For example, Self-Regard is compared to Self-Actualization, Problem Solving, and Reality Testing (see Understanding the Results for details on interpreting the Balancing Your EI section). Analyses similar to those used to generate confidence intervals were used to calculate the size of “gaps” between EQ-i 2.0 subscales. Results from these analyses were used to guide the critical value at which point scales were determined to be “in balance” or “out of balance” with each other. Specifically, considering the results of these analyses as well as practical functionality, a critical value of 10 points was selected for the Balancing Your EI section. This value is actually slightly smaller than those suggested by the statistical analyses, but was selected so the user can be confident that they are identifying any potentially important imbalances in EI abilities. For example, if two subscales in the Balancing Your EI section are less than 10 points apart, they will be reported as being “in balance,” whereas subscale scores that are 10 or more points apart will be described as being “out of balance.”

Test-Retest Reliability and Stability

The test-retest reliability of an assessment refers to the consistency of scores over time. This type of reliability is typically calculated by examining the correlation between an individual’s scores on the same assessment at two different times. This time interval must not be too long (Anastasi, 1982) to ensure that factors such as developmental changes do not overly obscure the assessment of the instrument’s reliability, and must not be too short as to be contaminated by memory effects (Downie & Heath, 1970). A two- to eight-week interval between administrations is usually recommended.

When test-retest reliability is assessed at the group level, high correlations indicate that the rank-order of individuals’ assessment scores have remained consistent over time. However, differences in mean scores may confound these results. For example, if each individual’s score increases or decreases in a dramatic but uniform manner over time, the test-retest correlation for the overall sample will remain high. Test-retest stability analyses can be used to determine not only if the rank-order of scores remains consistent, but if the actual scores themselves remain stable over time. Test-retest stability was examined by calculating the difference between Time 1 and Time 2 standard scores for each individual in the test-retest samples.

For the EQ-i 2.0, test-retest data was available for 204 individuals who were assessed two to four weeks apart (mean interval = 18.41 days, SD = 3.22 days), and for 104 individuals who were assessed approximately eight weeks apart (mean interval = 56.80 days, SD = 1.25 days). Demographic characteristics of the two retest samples are displayed in Table A.12. EQ-i 2.0 test-retest correlations are expected to be high for the two- to four-week interval, supporting the reliability of the EQ-i 2.0 as a tool, because a person’s EI should not change much over two to four weeks, especially in the absence of any EI-targeted intervention, as was the case in our data (see Stein & Book, 2000). However, in general, test-retest correlations also tend to decrease as the time interval between assessments increases because there is more opportunity for developmental changes or other events to occur. Therefore, the 8-week test-retest values are expected to be slightly lower than the 2- to 4-week values. Nonetheless, test-retest correlations (see Table A.13) were high for the EQ-i 2.0 Total EI score in both the 2- to 4-week (r = .92) and 8-week samples (r = .81). Test-retest correlations for the various Composite scales were very high, ranging from r = .86 (Self-Expression Composite) to r = .91 (Interpersonal Composite) in the 2- to 4-week sample, and from r = .76 (Interpersonal Composite) to r = .83 (Decision Making Composite) in the 8-week sample. Finally, results for the subscales were also high, ranging from r = .78 (Impulse Control) to r = .89 (Empathy) in the 2-4-week sample and from r = .70 (Flexibility) to  r = .84 (Self-Regard, Happiness) in the 8-week sample. These values were generally similar to those found in the original EQ-i (Bar-On, 2004).

The stability of the EQ-i 2.0 scores was examined by calculating the difference between Time 1 and Time 2 standard scores for each individual in the test-retest samples. Tables A.14 (2- to 4-weeks) and A.15 (8 weeks) display the frequencies of these differences, as well as the mean differences (i.e., the difference between Time 1 and Time 2 ratings for each individual averaged across the samples) and the 95% confidence interval surrounding the mean difference. Positive mean differences indicate that scores increased over time, whereas negative mean differences indicate that scores decreased over time. The results suggest scores remained highly stable over time: for almost all scales, roughly 90% or more of the individuals’ scores did not change by more than one normative standard deviation (i.e., 15 standard score points) over time in both the 2- to-4-week and 8-week samples. Confidence intervals around the mean differences were also consistently small, and instances where this interval encapsulates zero suggest that the difference is not statistically significant (p < .05). These results provide support that the EQ-i 2.0 captures the temporal stability of emotional intelligence.

Reliability Summary

Overall, the EQ-i 2.0 demonstrates sound reliability. Internal consistency (alpha) values were generally high for the overall normative groups and within specific age and gender subgroups, suggesting that the items cohesively measure Total EI, as well as the constructs represented by the composite scales and subscales. Test-retest reliability and stability values were also high at both 2- to 4-week and 8-week intervals, reflecting a level of temporal stability that would be expected for emotional intelligence. Users of the EQ-i 2.0 can be confident that the scores generated by this assessment will be consistent and reliable.