Total, interval, exact and proportional indices were calculated for all sessions, as described in Study 1. The reliability of the calculations was evaluated by a PhD student independently calculating the reliability scores for 33% of the sessions. The reliability of the calculated values was 100% for these sessions. Variations in response rates had different effects on four common methods of calculating reliability (Study 1). Unlike overall and interval reliability, proportional reliability showed sensitivity to the response rate, but was not affected as negatively by the response rate as the exact reliability. When the characteristics of the high-rate response were studied in more detail (Study 2), it was found that the response at the end of the interval, rather than the response rate itself, or the bursting of the response had a greater impact on reliability scores. As in Study 1, the exact chord method was most affected by responses at the end of the interval. Exact match IOA. It is obvious that the approach of partial agreement at regular intervals is stricter than total counting as a measure of agreement between two observers. Nevertheless, the most conservative approach for the IOA would be to ignore all deviations as a total lack of agreement during such intervals and count any deviation as zero.
The exact agreement is one such approach. With this measurement, only exact matches during an interval can be estimated at 100% (or 1.0). Using our current example, exact matches would be obtained for intervals 5 to 14 or 10 of the 15 intervals. Dividing 10 by the total number of intervals (15) results in an AIO of 66.7% – a slightly lower chord value than the partial chord approach in the intervals. Given that much of the paper published in the field`s leading journal, the Journal of Applied Behavior Analysis, is based on simple measures of total total count reliability or interval-by-interval reliability (see Mudford et al., 2009), it may surprise some readers that there are many formulas from which the behavioral analyst can perform an IOA analysis. Due to the exceptional importance given by the field to the Applied Behavior Analysis 2007 textbook by Cooper, Heron and Heward, as well as our own professional experience that this book contains the most comprehensive discussion of the various IOA algorithms used in our field, our discussion of IOA procedures in the following is mainly based on Chapter 5 of this text. Therefore, readers interested in this topic are strongly advised to consult cooper et al.`s manual. However, we will refer to other documents if justified. For more information on the practical aspects of collecting IOA data (e.B how often this data is collected, how to interpret and use this data, acceptable IOA levels), the reader is invited to consult Vollmer et al. (2008), also published in Behavior Analysis in Practice. Response measurement in applied research usually involves data collection by human observers, which is likely to be more prone to errors than automatic transduction. Therefore, the assessment of the consistency or reliability of observers has become a standard feature of applied research and is achieved by determining the extent of the scoring agreement between the files of independent observers.
A number of factors can affect reliability (see Reviews of Kazdin, 1977; Page and Iwata, 1986); This study focuses on the methods used to calculate reliability statistics and how they are affected by response rate and distribution. Interval-based IOA algorithms evaluate the correspondence between data based on the intervals of two observers (including time samples). These measures include (a) interval-by-interval open access algorithms, (b) marked interval open access algorithms, and (c) unscored interval open access algorithms. After a brief overview of the individual interval-based algorithms, Table 2 summarizes the strengths of the three interval-based algorithms. As a common example of interval-based open access, consider the hypothetical data stream shown in Figure 2, in which two independent observers record the occurrence and non-occurrence of a target response over seven consecutive intervals. In the first and seventh intervals, observers disagree on the event. However, both observers agree that no reaction occurred at the second, third and fourth intervals. Finally, both observers also agree that at least one reaction occurred during the fifth and sixth intervals.
Interval marked IOA. One approach to improving the accuracy of the agreement of two observers in recording intervals is simply to limit the analysis of correspondences to cases where at least one of the observers recorded a target response at an interval. Intervals in which no observer reported a target response are excluded from the calculation in order to provide stricter match statistics. Cooper et al. (2007) suggest that the IOA marked interval (also known as the „occurrence“ chord in the research literature) is more beneficial when target responses occur at low rates. In the sample data in Figure 2, the second, third and fourth intervals are ignored for computational purposes because none of the observed individuals received a response at these intervals. Thus, the statistics of the IOA are calculated only from the first, fifth, sixth and seventh intervals. Since there was agreement only in half of the intervals (fifth and sixth intervals), the match value is 50% (2/4). We examined the effects of multiple changes in the response rate on the calculation of indices of total reliability, interval, exact agreement, and proportional reliability. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates in separate sessions, allowing reliability results to be compared over a range of values based on all four calculations. Overall reliability was uniformly high, interval reliability was opaquely high for high-throughput response, proportional reliability was slightly lower for high-throughput response, and chord accuracy was the lowest of measurements, especially for high-rate response.
In Study 2, we looked at the distinct effects of the response rate itself, bursting, and response at the end of the interval. Response rate and burst had little impact on reliability scores; However, the distribution of some responses at the end of the intervals somewhat reduced interval reliability, proportional reliability significantly, and correspondence accuracy significantly. Repp, Dietz et al. (1976) compared three methods of calculating interobserver reliability for five responses recorded by two observers: total (described as an „entire session“), interval (described as a „category“), and exact correspondence. .