### COMMENSURABILITY

Assuming that we have obtained valid A-ROI estimates for multiple investments, there remain issues to be considered and addressed when using those results for investment comparisons and decisions. The issues center on the extent to which those different A-ROI estimates can be compared to gauge relative cost-effectiveness and, if the answer is yes, how to compare them properly.

First, there is the issue of different outcome measures. For example, A-ROI is calculated for three investments that are intended to increase reading achievement, reduce suspension, and improve sense of belonging, respectively. It is apparently inappropriate to compare the results directly because success is measured by different metrics.

One way to deal with this problem is to convert program
impact into effect size to standardize the results by standard deviation (*SD*).
With this approach, the appropriateness of comparing the three A-ROI values is
based on the premise of one *SD* change being equivalent
for all of the three different outcome measures. In other words, one *SD*
increase in reading achievement is equivalent to one *SD*
improvement in suspension reduction and sense of belonging, or 1 *SD
*increase in reading achievement is better than 0.9 *SD*
improvement in suspension reduction and sense of belonging.

The second way to address this is to convert program impact into growth in percentage so that A-ROI results are standardized by change in reference to the baseline data. In this way, A-ROI results are represented by percentage of growth at a certain cost. With the second approach, cost-effectiveness can be compared between the three investments because 1% change is considered to be equivalent across all three outcome measures. In other words, 1% growth in reading achievement is equivalent to 1% improvement in suspension reduction and sense of belonging, or 2% increase in reading achievement is better than 1% improvement in suspension reduction and sense of belonging.

It should be noted that both methods presented here ignore baseline differences between programs, which seems more or less likely to impact program effect. That is, for two programs that serve students of different reading levels (e.g., one serves students in the lowest 20^{th} percentile and the other serves students in the bottom 5^{th} percentile), one *SD *or 1% improvement in reading achievement would be considered equivalent using either approach just discussed. In our experiences, obtaining such an improvement for students in the 5^{th} percentile is probably qualitatively different from achieving the same for students in the 20^{th} percentile. However, little research exists in this area to guide appropriate adjustments for taking baseline differences into consideration.

Building on the aforementioned two methods, the third way is
to take the cost-utility approach by assigning different weights to the standardized
A-ROI results (either in *SD *or percentage of
growth). In this way, 1 *SD* or 1% change is not
considered equivalent across different outcome measures. Rather, 1 *SD*
or 1% increase in reading achievement is equivalent to *x SD* or *x*%
change in suspension reduction and *y SD *or *y*%
improvement in sense of belonging, with *x* and *y*
being the weights for the two latter outcome measures, respectively.

In essence, all of these three methods involve constructing a new scale and projecting the raw A-ROI values onto the new scale through some sort of linear transformation, with each approach making a different assumption about the appropriateness of the transformation that would render commensurate results. These assumptions are necessary and largely value-driven, since there lacks an empirical basis for addressing commensurability when different outcome measures are involved. Because of this, in certain fields such as public health, it has been suggested that “ROI should only be used for equivalent alternatives and not to compare interventions that are different in their objectives” (Brousselle, Benmarhnia, & Benhadj, 2016) .

To further complicate the problem, often times, investment items target more than one outcome for improvement. There are statistical methods that reduce dimensionality or allow comparisons with multiple outcome variables, which basically involve more sophisticated transformations. However, they are too complex to conduct and often produce results that are difficult to interpret. A more practical approach is to employ the cost-utility method to combine multiple measures of effectiveness into a single estimate of utility (Levin & McEwan, 2000). Based on those weights, a composite A-ROI can then be derived for each investment. With that said, if possible, it would be desirable to compare A-ROI results for investments that target the same rather than different outcomes.

The second issue around commensurability of A-ROI results concerns linear extrapolation. When comparing two investments with different costs and returns, we are often implicitly engaged in a linear extrapolation, either upward or downward, depending on where the reference point is set. Figure 1 shows one example of such linear extrapolations involving comparing A-ROI results for two separate investments. In the chart, investment A, represented in a diamond shape, produces 0.3 *SD* growth in reading achievement for 1,000 students at a cost of $500,000 ($500 per pupil), and investment B, represented in a square shape, produces 0.5 *SD* growth in reading achievement for 250 students at a cost of $250,000 ($1,000 per pupil).

When comparing the two A-ROI results and concluding that, when cost is the same, investment A has a higher return and is thus more cost-effective, we are either extrapolating the A-ROI result of investment A upward along the solid blue line or extrapolating the A-ROI result of investment B downward along the solid green line, assuming that the relationship between cost and return remains unchanged in each case and return reduces to zero when there is no investment.

However, the relationship between cost and return is most likely not linear for most investments. With economies of scale, it is possible for an investment to cover more students without incurring cost proportionally, which, as a result, reduces its cost per pupil level. For example, investment B might be able to cover the same number of students as investment A does by only doubling the total cost, as opposed quadrupling the total cost. At the same time, the return probably will not be reduced by half as the result of the scaling up. Consequently, the actual A-ROI for investment B might follow the dotted green line as more students are covered, which leads to investment B being more cost-effective when it is at the same cost per pupil level of investment A.

On the flip side, we could increase the cost per pupil level to boost return. For example, as an incentive program, investment A offers a $5,000 annual bonus for high-quality teachers to teach special education classes with an average of 10 high-need students at low-performing schools. We could double the incentive to $10,000 to attract more high-quality teachers to boost student achievement in those schools. However, we probably would not expect the return to be doubled as the result of the increased cost per pupil level. Consequently, the actual A-ROI for investment A might follow the dotted blue line as the bonus amount increases, which, again, leads to investment A being less cost-effective when it is at the same cost per pupil level of investment B.

The above discussion is based on two investments only. Things could quickly become complex when more investments are involved. The challenge here is to decide where the reference point is and figure out how cost per pupil changes when an investment is scaled up or down toward the reference point as well as the subsequent movement of return. Unfortunately, research in this area is rather thin and does not provide much guidance on how adjustments should be made to cost per pupil and return, respectively, when comparing A-ROI results calculated from investments of different scales.

Third, as shown in Figure 1 in this post discussing validity of A-ROI, program effect could vary in the first a few years and comparing A-ROI for programs at different implementation phases could lead to misleading or even wrong conclusions. For example, assuming investments B and C in Figure 1 started at the same cost per pupil level that remained unchanged in subsequent years, comparing investment B’s year 1 result (full effect yet developed) and investment C’s year 2 result (full effect realized) would lead to the conclusion of investment C being more cost-effective when the opposite is true after both investments become established with stabilized program effects.

Ideally, cost-effectiveness should be compared between investments with stabilized returns and costs. In reality, however, it is difficult to know for sure when an investment’s A-ROI becomes stable with the variation being random fluctuation. Even when stabilized A-ROI results are attainable, there could be a variety of reasons (e.g., political pressure) for cost-effectiveness comparison between programs at different phases of implementation. It is important to help decision makers be aware that some of the A-ROI results might still be in flux.

Fourth, investment decisions many times involve comparing A-ROI results for investments from the same context or similar contexts. For example, a school district might need to decide, between two programs, which one to retain and which one to cut due to a budget shortfall, or the district might be weighing whether to replace its own elementary reading intervention program with the one implemented in a neighboring school district. At other times, decision making requires comparing A-ROI results between investments from rather different contexts, such as one investment implemented in small a rural school district and the other implemented in a large school system that serves students from rural, suburban, and urban areas.

It is important to point out that, when comparing A-ROI
between investments from different contexts, considerations need to be given to
not just adjustments relating to cost of living and labor market, but also how change
in context could affect implementation and result in a different return. In
other words, a 0.5 *SD* growth in
reading among low achieving students in a rural school district is probably not
directly comparable to the same growth among low achieving students in a
suburban school district.

The last issue around commensurability of A-ROI results deals with how to interpret investments with similar or even identical A-ROI values and their implications for decision making. One major motivation for A-ROI is to reduce the complexity of decision making by encapsulating multiple pieces of information including program effect and cost into a single data point. While achieving this goal, inevitably, some nuanced but important information, which could be critical sometimes, is masked in this simplified representation.

One such example involves comparison between two investments, with one producing 0.1 *SD* growth in reading at a cost of $100 per student and the other producing 0.8 *SD* growth in reading at a cost of $800 per student. Computationally, these two investments have an identical A-ROI. Assuming both investments serve 500 at-risk students who share similar demographics and academic performance[1], practically, the $350,000 difference ($50,000 vs. $400,000) in cost could have a quite different implication than the 0.7 *SD* difference in return, especially for financially strapped school districts. In this case, A-ROI basically loses its unidimensional discriminating power and it is helpful to reverse the encapsulation by projecting A-ROI results back onto some of the original dimensions. One strategy is to present the A-ROI results on the cost and return dimensions at the same time. Based on certain criteria, a district might group a portfolio of investments into nine categories shown in Table 1. While cells of the same background color have similar and potentially identical A-ROI results, this presentation makes the difference in cost and return explicit respectively for those investments.

Sometimes, we might need to go back further to re-introduce even more dimensions back. For example, investments A and B are both high return and high cost with similar A-ROI values for improving reading achievement. However, they differ in the students they intend to serve, with investment A targeting Tier 3 students and investment B targeting Tier 2 students. In this case, it might help to further breakdown the information from which the A-ROI results are derived.

Table 2[2] provides an example of unpacking A-ROI along five dimensions for two investments that both produce 0.5 *SD* increase in reading at a cost of $1,250 per pupil. Despite the identical A-ROI value, it shows the two investments differ in total cost, number of students served, and the particular group of students they served. At this point, A-ROI provides little value for comparing cost-effectiveness of the two investments. Leaders will have to base their decision on something else such as whether Tier 2 or 3 students should be the focus, if a choice has to be made between the two.

It can be argued that the information shown in Table 2 should be presented and used in all investment decisions, even for those involving investments with rather different A-ROI values. This is because a case can be made in some situations that it is more important to focus on a particular group of students even when the investment does not have the highest A-ROI, and albeit important, A-ROI is one piece of information leaders consider when making budget decisions, which are always multifaceted.

In this post, we focus on issues around comparing A-ROI results for cost-effectiveness after they become available to decision makers. The intention is not to discourage people from using A-ROI for that purpose because of these both complex and complicated issues. Rather, the goal is to: 1) propose solutions when A-ROI results cannot be easily compared (e.g., different outcome measures, multiple measures, identical A-ROI values) and 2) make explicit the assumptions embedded in comparisons when A-ROI results can be compared so that proper adjustments can be made (e.g., two investments are on very different implementation scales, but we have some understanding of how program effect and cost per pupil level for one investment might move when it is scaled up or down to the reference point) and inappropriate comparisons can be avoided (e.g., two programs are on very different implementation scales, but we know little about how program effect and cost per pupil level might change for either investment when it is scaled up or down to the reference point).

### NOTE

[1] In this example, investment B serves as the reference and it is assumed that investment A can reduce its cost per pupil level to $1,250 when it is scaled up to serve 400 or so students, at a price of program effect slipping from 0.6 *SD* to 0.5 *SD*.

[2] In this hypothetical scenario, we avoid the problem of making the linear extrapolation assumption discussed earlier for comparison.