Metrics Selection Framework

Zeeshan Amjad
6 min readJan 5, 2024

Tell me how you measure me, and I will tell you how I will behave. If you measure me in an illogical way… do not complain about illogical behavior” (Goldratt, 1990).

Effective evaluation and quantification are paramount in comprehending the present state of affairs. Nonetheless, carefully selecting pertinent metrics and measurements is crucial to avert unintended repercussions. When devising metrics and measurements, it is imperative to adhere to certain principles before reaching a final decision. Confusion often arises concerning some interrelated terms. Let us commence our discussion by elucidating the distinction between ‘metric’ and ‘measurement.’ A measurement represents the actual value of a variable, such as current room temperature, home sales price, test scores, or the number of visits. Conversely, a metric serves as a means to interpret or apply these measurements, such as return on investment (ROI), user engagement, sales revenue, or process efficiency. There are a few principles to consider when trying to create any metrics or do any measurements.

Selective measurement is paramount and hinges on three pivotal considerations; let’s call it PDC criteria.

Purpose-Precondition: Before selecting any metric or measurement, the foundational inquiry revolves around understanding the core problem we seek to address. Without a clear navigational objective, our measurement efforts risk being rendered ineffectual, akin to quantifying mere noise, and possibly becoming a vanity metric. The passenger on the plane is more interested in knowing how long it will take to reach the destination, fuel quantity, course deviation indicator, and rate of descent; although very useful for the pilot, it makes noise for the passengers.

Decision-Relevance: The need for measurement should be grounded in its capacity to inform decisions. It may be superfluous if the measurement does not play a substantive role in shaping decisions. For instance, metrics like lines of code, the ratio of bugs to lines of code, and the number of classes, files, or commits may fall into this category.

Cost-Benefit Evaluation: Measuring comes at a cost, both in terms of resources and effort. Therefore, one must weigh the expenditure associated with measurement against the economic benefits it yields. As a rule of thumb, if the cost of measurement surpasses 20% of the anticipated economic gains, it warrants a reconsideration of the measurement’s utility. (Hubbard, 2014)

While the PDC criteria serve as a valuable initial filter for metric selection, they alone do not suffice for determining the suitability of a given metric. In addition to these criteria, five fundamental concepts are derived from the realms of management, social science, and economics, each of which provides a unique perspective on evaluating metrics. These essential perspectives collectively form what we may aptly term the ‘five measurement lenses.’

Let’s explore all of these one by one.

McNamara fallacy: When assembling a basketball team, what factors warrant consideration? Beyond the quantitative metrics, it is imperative to account for intangible qualities that may be challenging to assess conventionally, such as teamwork, attitude, and defensive capabilities. This perspective aligns with the assertion of the McNamara Fallacy, underscoring the limitations of relying exclusively on quantifiable data and the potential oversight of crucial, albeit less measurable, elements. Thus, it is prudent not to rely solely on quantifiable factors when striving to attain our objectives.

Hawthorne effect: At the Hawthorne Works factory in Chicago, a phenomenon was observed wherein individuals altered their behavior in response to being under observation. (Introduction — the Human Relations Movement — Baker Library | Bloomberg Center, Historical Collections) This phenomenon underscores a broader principle that applies to metrics in various contexts: measuring itself can influence the behavior of those being measured. For instance, when organizations track and reward individuals based on the sheer quantity of their commits, the individuals may respond by increasing their commit counts. However, this heightened quantity does not necessarily translate to a commensurate increase in the value they create. In such scenarios, the emphasis often shifts toward quantity at the expense of quality.

Goodhart’s law: Goodhart’s law was proposed by Charles Goodhart. He proposed, “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes” (Goodhart, 1984). It was more popularized by Marilyn Strathern in simple words “When a measure becomes a target, it ceases to be a good measure.” (Strathern, 1997). It is related to the Hawthorne effect when people change their behavior. It is also explained that “If you punish bad news, you will only get good news — or, more accurately, camouflaged bad news made to look good.” (McGreal & Jocham, 2018)

Cobra effect: The Cobra Effect represents an unintended adverse consequence arising from incentives designed to enhance societal welfare or individual well-being. Coined by Horst Siebert, this term finds its origins in the historical account of cobra population control measures in Delhi. In an effort to reduce the number of cobras, the British government offered a financial reward for every dead cobra, initially yielding a decrease in their population. However, an unanticipated outcome emerged as individuals began to breed cobras to profit from this incentive. Consequently, the government ceased the incentive program. Paradoxically, cobra breeders subsequently released their surplus cobras, deeming them of no value, leading to a surge in the cobra population. Despite having begun with well-intentioned objectives, The result proved more detrimental than the initial problem.

Campbell’s law: Campbell’s law, which is closely related to the Cobra effect, states, “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (Campbell, 1979). Simply put, if a single measure is used to make a major decision, individuals focus too much on it and change their behavior accordingly to create unintended consequences. Such as if we rely so much on the student’s GPA, the Agile team’s velocity, or the business’s sales target.

Consider a scenario where you are operating a vehicle and encounter a speed limit sign. In one case, you fail to heed the sign and become aware of a police car trailing behind you. This contrast highlights the distinction between leading and lagging indicators in measurement and analysis. The speed limit sign serves as an example of a leading indicator, signaling a potential event in the future and allowing you to take preemptive action. On the other hand, the presence of the police car represents a lagging indicator, indicating that an event has already occurred, leaving limited room for proactive measures. Once we have established a metric’s validity using the PDC criteria and examined it through the ‘five measurement lenses,’ the next step is to discern whether it operates as a leading or lagging indicator within the context of our analysis. Here is a visual representation of the Metrics and Measure Selection Framework.

References

Campbell, D. T. (1979). Assessing the impact of planned social change. Evaluation and Program Planning, 2(1), 67–90. https://doi.org/10.1016/0149-7189(79)90048-x

Goldratt, E. M. (1990). The Haystack syndrome: sifting information out of the data ocean. https://ci.nii.ac.jp/ncid/BA23507388

Goodhart, C.A. (1984). Problems of Monetary Management: The UK Experience.

Hubbard, D. W. (2014). How to measure anything: Finding the Value of Intangibles in Business. John Wiley & Sons.

Introduction — The Human Relations Movement — Baker Library | Bloomberg Center, Historical Collections. https://www.library.hbs.edu/hc/hawthorne/intro.html

McGreal, D., & Jocham, R. M. (2018). The professional product owner: Leveraging Scrum As a Competitive Advantage. Addison-Wesley Professional.

Strathern, M. (1997). ‘Improving ratings’: audit in the British University system. European Review, 5(3), 305–321. https://doi.org/10.1002/(sici)1234-981x(199707)5:3

--

--

Zeeshan Amjad

Zeeshan Amjad is a life long learner. He love reading, writing, traveling, photography and healthy discussion.