Someplace, proper now, a know-how government tells their administrators: “we
want a option to measure the productiveness of our engineering groups.” A working
group assembles to discover potential options, and weeks later, proposes
implementing the metrics: lead time, deployment frequency, and variety of
pull requests created per engineer.
Quickly after, senior engineering leaders meet to assessment their newly created
dashboards. Instantly, questions and doubts are raised. One chief says:
“Our lead time is 2 days which is ‘low performing’ in response to these
benchmarks – however is there really an issue?”. One other chief says: “it’s
unsurprising to see that a few of our groups are deploying much less usually than
others. However I’m undecided if this spells a chance for enchancment.”
If this story arc is acquainted to you, don’t fear – it is acquainted to
most, together with among the largest tech corporations on the planet. It’s not unusual
for measurement applications to fall brief when metrics like DORA fail to offer
the insights leaders had hoped for.
There may be, nonetheless, a greater method. An method that focuses on
capturing insights from builders themselves, moderately than solely counting on
fundamental measures of velocity and output. We’ve helped many organizations make the
leap to this human-centered method. And we’ve seen firsthand the
dramatically improved understanding of developer productiveness that it
gives.
What we’re referring to right here is qualitative measurement. On this
article, we offer a primer on this method derived from our expertise
serving to many organizations on this journey. We start with a definition of
qualitative metrics and advocate for them. We observe with sensible
steering on seize, observe, and make the most of this knowledge.
At the moment, developer productiveness is a crucial concern for companies amid
the backdrop of fiscal tightening and transformational applied sciences corresponding to
AI. As well as, developer expertise and platform engineering are garnering
elevated consideration as enterprises look past Agile and DevOps
transformation. What all these considerations share is a reliance on measurement
to assist information choices and observe progress. And for this, qualitative
measurement is essential.
Notice: after we say “developer productiveness”, we imply the diploma to which
builders’ can do their work in a frictionless method – not the person
efficiency of builders. Some organizations discover “developer productiveness”
to be a problematic time period due to the best way it may be misinterpreted by
builders. We advocate that organizations use the time period “developer
expertise,” which has extra constructive connotations for builders.
What’s a qualitative metric?
We outline a qualitative metric as a measurement comprised of information
offered by people. This can be a sensible definition – we haven’t discovered a
singular definition throughout the social sciences, and the choice
definitions we’ve seen have flaws that we focus on later on this
part.
The definition of the phrase “metric” is unambiguous. The time period
“qualitative,” nonetheless, has no authoritative definition as famous within the
2019 journal paper What’s Qualitative in
Qualitative Analysis:
There are various definitions of qualitative analysis, but when we search for
a definition that addresses its distinctive function of being
“qualitative,” the literature throughout the broad area of social science is
meager. The principle cause behind this text lies within the paradox, which, to
put it bluntly, is that researchers act as in the event that they know what it’s, however
they can not formulate a coherent definition.
An alternate definition we’ve heard is that qualitative metrics measure
high quality, whereas quantitative metrics measure amount. We’ve discovered this
definition problematic for 2 causes: first, the time period “qualitative
metric” consists of the time period metric, which suggests that the output is a
amount (i.e., a measurement). Second, high quality is usually measured
by way of ordinal scales which might be translated into numerical values and
scores – which once more, contradicts the definition.
One other argument we’ve heard is that the output of sentiment evaluation
is quantitative as a result of the evaluation leads to numbers. Whereas we agree
that the information ensuing from sentiment evaluation is quantitative, primarily based on
our unique definition that is nonetheless a qualitative metric (i.e., a amount
produced qualitatively) except one have been to take the place that
“qualitative metric” is altogether an oxymoron.
Other than the issue of defining what a qualitative metric is, we’ve
additionally encountered problematic colloquialisms. One instance is the time period “delicate
metric”. We warning towards this phrase as a result of it harmfully and
incorrectly implies that knowledge collected from people is weaker than “arduous
metrics” collected from techniques. We additionally discourage the time period “subjective
metrics” as a result of it misconstrues the truth that knowledge collected from people
will be both goal or subjective – as we focus on within the subsequent
part.
Qualitative metrics: Measurements derived from people
Kind |
Definition |
Instance |
Attitudinal metrics |
Subjective emotions, opinions, or attitudes towards a selected topic. |
How happy are you along with your IDE, on a scale of 1–10? |
Behavioral metrics |
Goal information or occasions pertaining to a person’s work expertise. |
How lengthy does it take so that you can deploy a change to manufacturing? |
Later on this article we offer steering on acquire and use
these measurements, however first we’ll present a real-world instance of this
method put to apply
Peloton is an American know-how firm
whose developer productiveness measurement technique facilities round
qualitative metrics. To gather qualitative metrics, their group
runs a semi-annual developer expertise survey led by their Tech
Enablement & Developer Expertise group, which is a part of their Product
Operations group.
Thansha Sadacharam, head of tech studying and insights, explains: “I
very strongly imagine, and I feel quite a lot of our engineers additionally actually
respect this, that engineers aren’t robots, they’re people. And simply
taking a look at fundamental numbers would not drive the entire story. So for us, having
a very complete survey that helped us perceive that total
developer expertise was actually vital.”
Every survey is shipped to
a random pattern of roughly half of their builders. With this method,
particular person builders solely must take part in a single survey per 12 months,
minimizing the general time spent on filling out surveys whereas nonetheless
offering a statistically important consultant set of information outcomes.
The Tech Enablement & Developer Expertise group can also be liable for
analyzing and sharing the findings from their surveys with leaders throughout
the group.
For extra on Peloton’s developer expertise survey, take heed to this
interview
with Thansha Sadacharam.
Advocating for qualitative metrics
Executives are sometimes skeptical concerning the reliability or usefulness of
qualitative metrics. Even extremely scientific organizations like Google have
needed to overcome these biases. Engineering leaders are inclined towards
system metrics since they’re accustomed to working with telemetry knowledge
for inspecting techniques. Nevertheless, we can’t depend on this identical method for
measuring folks.
Keep away from pitting qualitative and quantitative metrics towards one another.
We’ve seen some organizations get into an inner “battle of the
metrics” which isn’t a very good use of time or power. Our recommendation for
champions is to keep away from pitting qualitative and quantitative metrics towards
one another as an both/or. It’s higher to make the argument that they’re
complementary instruments – as we cowl on the finish of this text.
We’ve discovered that the underlying explanation for opposition to qualitative knowledge
are misconceptions which we deal with beneath. Later on this article, we
define the distinct advantages of self-reported knowledge corresponding to its potential to
measure intangibles and floor crucial context.
False impression: Qualitative knowledge is simply subjective
Conventional office surveys usually deal with the subjective
opinions and emotions of their workers. Thus many engineering leaders
intuitively imagine that surveys can solely acquire subjective knowledge from
builders.
As we describe within the following part, surveys may also seize
goal details about information or occasions. Google’s DevOps Analysis and
Evaluation (DORA) program is a wonderful concrete
instance.
Some examples of goal survey questions:
- How lengthy does it take to go from code dedicated to code efficiently
working in manufacturing?
- How usually does your group deploy code to manufacturing or
launch it to finish customers?
False impression: Qualitative knowledge is unreliable
One problem of surveys is that folks with all method of backgrounds
write survey questions with no particular coaching. In consequence, many
office surveys don’t meet the minimal requirements wanted to provide
dependable or legitimate measures. Properly designed surveys, nonetheless, produce
correct and dependable knowledge (we offer steering on how to do that later in
the article).
Some organizations have considerations that folks could lie in surveys. Which
can occur in conditions the place there’s worry round how the information shall be
used. In our expertise, when surveys are deployed as a device to assist
perceive and enhance bottlenecks affecting builders, there isn’t a
incentive for respondents to lie or sport the system.
Whereas it’s true that survey knowledge isn’t at all times 100% correct, we frequently
remind leaders that system metrics are sometimes imperfect too. For instance,
many organizations try to measure CI construct occasions utilizing knowledge aggregated
from their pipelines, solely to seek out that it requires important effort to
clear the information (e.g. excluding background jobs, accounting for parallel
jobs) to provide an correct end result
The 2 forms of qualitative metrics
There are two key forms of qualitative metrics:
- Attitudinal metrics seize subjective emotions, opinions, or
attitudes towards a selected topic. An instance of an attitudinal measure would
be the numeric worth captured in response to the query: “How happy are
you along with your IDE, on a scale of 1-10?”.
- Behavioral metrics seize goal information or occasions pertaining to an
people’ work experiences. An instance of a behavioral measure could be the
amount captured in response to the query: “How lengthy does it take so that you can
deploy a change to manufacturing?”
We’ve discovered that almost all tech practitioners overlook behavioral measures
when serious about qualitative metrics. This happens regardless of the
prevalence of qualitative behavioral measures in software program analysis, such
because the Google’s DORA program talked about earlier.
DORA publishes annual benchmarks for metrics corresponding to lead time for
modifications, deployment frequency, and alter fail fee. Unbeknownst to many,
DORA’s benchmarks are captured utilizing qualitative strategies with the survey
gadgets proven beneath:
Lead time
For the first utility or service you’re employed on,
what’s your lead time for modifications (that’s, how lengthy does it take to go
from code dedicated to code efficiently working in manufacturing)?
Greater than six months
One to 6 months
One week to 1 month
In the future to 1 week
Lower than in the future
Lower than one hour
Deploy frequency
For the first utility or service you
work on, how usually does your group deploy code to manufacturing or
launch it to finish customers?
Fewer than as soon as per six months
Between as soon as per 30 days and as soon as each six months
Between as soon as per week and as soon as per 30 days
Between as soon as per day and as soon as per week
Between as soon as per hour and as soon as per day
On demand (a number of deploys per day)
Change fail share
For the first utility or service you’re employed on, what
share of modifications to manufacturing or releases to customers lead to
degraded service (for instance, result in service impairment or service
outage) and subsequently require remediation (for instance, require a
hotfix, rollback, repair ahead, patch)?
0–15%
16–30%
31–45%
46–60%
61–75%
76–100%
Time to revive
For the first utility or service you’re employed on, how lengthy
does it usually take to revive service when a service incident or a
defect that impacts customers happens (for instance, unplanned outage, service
impairment)?
Greater than six months
One to 6 months
One week to 1 month
In the future to 1 week
Lower than in the future
Lower than one hour
We’ve discovered that the flexibility to gather attitudinal and behavioral knowledge
on the identical time is a robust advantage of qualitative measurement.
For instance, behavioral knowledge may present you that your launch course of
is quick and environment friendly. However solely attitudinal knowledge may let you know whether or not it
is clean and painless, which has vital implications for developer
burnout and retention.
To make use of a non-tech analogy: think about you feel sick and go to a
physician. The physician takes your blood strain, your temperature, your coronary heart
fee, and so they say “Properly, it seems such as you’re all good. There’s nothing
improper with you.” You’ll be greatly surprised! You’d say, “Wait, I’m telling
you that one thing feels improper.”
The advantages of qualitative metrics
One argument for qualitative metrics is that they keep away from subjecting
builders to the sensation of “being measured” by administration. Whereas we’ve
discovered this to be true – particularly when in comparison with metrics derived from
builders’ Git or Jira knowledge – it doesn’t deal with the primary goal
advantages that qualitative approaches can present.
There are three major advantages of qualitative metrics with regards to
measuring developer productiveness:
Qualitative metrics will let you measure issues which might be in any other case
unmeasurable
System metrics like lead time and deployment quantity seize what’s
taking place in our pipelines or ticketing techniques. However there are numerous extra
elements of builders’ work that must be understood to be able to enhance
productiveness: for instance, whether or not builders are capable of keep within the stream
or work or simply navigate their codebases. Qualitative metrics allow you to
measure these intangibles which might be in any other case troublesome or unimaginable to
measure.
An attention-grabbing instance of that is technical debt. At Google, a examine to
establish metrics for technical debt included an evaluation of 117 metrics
that have been proposed as potential indicators. To the frustration of
Google researchers, no single metric or mixture of metrics have been discovered
to be legitimate indicators (for extra on how Google measures technical debt,
take heed to this interview).
Whereas there could exist an undiscovered goal metric for technical
debt, one can suppose that this can be unimaginable as a consequence of the truth that
evaluation of technical debt depends on the comparability between the present
state of a system or codebase versus its imagined best state. In different
phrases, human judgment is important.
Qualitative metrics present lacking visibility throughout groups and
techniques
Metrics from ticketing techniques and pipelines give us visibility into
among the work that builders do. However this knowledge alone can’t give us
the complete story. Builders do quite a lot of work that’s not captured in tickets
or builds: for instance, designing key options, shaping the path of a
venture, or serving to a teammate get onboarded.
It’s unimaginable to achieve visibility into all these actions by way of
knowledge from our techniques alone. And even when we may theoretically acquire
all the information by way of techniques, there are extra challenges to capturing
metrics by way of instrumentation.
One instance is the problem of normalizing metrics throughout completely different
group workflows. For instance, if you happen to’re attempting to measure how lengthy it takes
for duties to go from begin to completion, you may attempt to get this knowledge
out of your ticketing device. However particular person groups usually have completely different
workflows that make it troublesome to provide an correct metric. In
distinction, merely asking builders how lengthy duties usually take will be
a lot less complicated.
One other widespread problem is cross-system visibility. For instance, a
small startup can measure TTR (time to revive) utilizing simply a difficulty
tracker corresponding to Jira. A big group, nonetheless, will probably must
consolidate and cross-attribute knowledge throughout planning techniques and deployment
pipelines to be able to achieve end-to-end system visibility. This generally is a
yearlong effort, whereas capturing this knowledge from builders can present a
baseline rapidly.
Qualitative metrics present context for quantitative knowledge
As technologists, it’s simple to focus closely on quantitative measures.
They appear clear and clear, afterall. There’s a danger, nonetheless, that the
full story isn’t being informed with out richer knowledge and that this will likely lead us
into specializing in the improper factor.
One instance of that is code assessment: a typical optimization is to attempt to
velocity up the code assessment. This appears logical as ready for a code assessment
could cause wasted time or undesirable context switching. We may measure the
time it takes for evaluations to be accomplished and incentivize groups to enhance
it. However this method could encourage detrimental habits: reviewers dashing
by way of evaluations or builders not discovering the fitting specialists to carry out
evaluations.
Code evaluations exist for an vital goal: to make sure prime quality
software program is delivered. If we do a extra holistic evaluation – specializing in the
outcomes of the method moderately than simply velocity – we discover that optimization
of code assessment should guarantee good code high quality, mitigation of safety
dangers, constructing shared information throughout group members, in addition to guaranteeing
that our coworkers aren’t caught ready. Qualitative measures may also help us
assess whether or not these outcomes are being met.
One other instance is developer onboarding processes. Software program improvement
is a group exercise. Thus if we solely measure particular person output metrics such
as the speed new builders are committing or time to first commit, we miss
vital outcomes e.g. whether or not we’re absolutely using the concepts the
builders are bringing, whether or not they really feel secure to ask questions and if
they’re collaborating with cross-functional friends.
seize qualitative metrics
Many tech practitioners don’t notice how troublesome it’s to jot down good
survey questions and design good survey devices. In reality, there are
complete fields of examine associated to this, corresponding to psychometrics and
industrial psychology. It is very important deliver or construct experience right here
when potential.
Beneath are few good guidelines for writing surveys to keep away from the most typical
errors we see organizations make:
- Survey gadgets must be fastidiously worded and each query ought to solely ask
one factor.
- If you wish to evaluate outcomes between surveys, watch out about altering
the wording of questions such that you simply’re measuring one thing completely different.
- If you happen to change any wording, you need to do rigorous statistical checks.
In survey parlance, ”good surveys” means “legitimate and dependable” or
“demonstrating good psychometric properties.” Validity is the diploma to
which a survey merchandise really measures the assemble you want to measure.
Reliability is the diploma to which a survey merchandise produces constant
outcomes out of your inhabitants and over time.
One mind-set about survey design that we’ve discovered useful to
tech practitioners: consider the survey response course of as an algorithm
that takes place within the human thoughts.
When a person is offered a survey query, a collection of psychological
steps happen to be able to arrive at a response. The mannequin beneath is from
the seminal 2012 e book, The Psychology of Survey
Response:
Parts of the Response Course of
Element |
Particular Processes |
Comprehension |
Attend to questions and directions
Signify logical type of query
Establish query focus (info sought)
Hyperlink key phrases to related ideas
|
Retrieval |
Generate retrieval technique and cues
Retrieve particular, generic recollections
Fill in lacking particulars
|
Judgment |
Assess completeness and relevance of recollections
Draw inferences primarily based on accessibility
Combine materials retrieved
Make estimate primarily based on partial retrieval
|
Response |
Map Judgement onto response class
Edit response
|
Decomposing the survey response course of and inspecting every step
may also help us refine our inputs to provide extra correct survey outcomes.
Creating good survey gadgets requires rigorous design, testing, and
evaluation – identical to the method of designing software program!
However good survey design is only one side of working profitable surveys.
Extra challenges embrace participation charges, knowledge evaluation, and figuring out
act on knowledge. Beneath are among the finest practices we’ve
realized.
Section outcomes by group and persona
A typical mistake made by organizational leaders is to deal with companywide
outcomes as an alternative of information damaged down by group and persona (e.g., position, tenure,
seniority). As beforehand described, developer expertise is very contextual
and might differ radically throughout groups or roles. Focusing solely on combination
outcomes can result in overlooking issues that have an effect on small however vital
populations throughout the firm, corresponding to cellular builders.
Evaluate outcomes towards benchmarks
Comparative evaluation may also help contextualize knowledge and assist drive motion. For
instance, developer sentiment towards code high quality generally skews detrimental, making
it troublesome to establish true issues or gauge their magnitude. The extra
actionable knowledge level is: “are our builders extra pissed off about code
high quality than different groups or organizations?” Groups with decrease sentiment scores
than their friends and organizations with decrease scores than their business friends
can floor notable alternatives for enchancment.
Use transactional surveys the place applicable
Transactional surveys seize suggestions throughout particular touchpoints or
interactions within the developer workflow. For instance, platform groups can use
transactional surveys to immediate builders for suggestions whereas they’re within the midst of
creating a brand new service in an inner developer portal. Transactional surveys can
additionally increase knowledge from periodic surveys by producing higher-frequency suggestions and
extra granular insights.
Keep away from survey fatigue
Many organizations battle to maintain excessive participation charges in surveys
over time. Lack of follow-up could cause builders to really feel that
repeatedly responding to surveys shouldn’t be worthwhile. It’s subsequently
crucial that leaders and groups observe up and take significant motion after surveys.
Whereas a quarterly or
semi-annual survey cadence is perfect for many organizations, we’ve seen some
organizations achieve success with extra frequent surveys which might be built-in into
common group rituals corresponding to retrospectives.
Survey Template
Beneath are a easy set of survey questions for getting began. Load the questions
beneath into your most well-liked survey device, or get began rapidly by making a replica of our ready-to-go
Google Types template.
The template is deliberately easy, however surveys usually turn out to be fairly sizable as your measurement
technique matures. For instance, Shopify’s developer survey is 20-minutes
lengthy and Google’s is over 30-minutes lengthy.
After you’ve got collected responses, rating the a number of alternative questions
utilizing both imply or high field scoring. Imply scores are calculated by
assigning every choice a worth between 1 and 5 and taking the common.
High field scores are calculated by the odds of responses that
select one of many high two most favorable choices.
You should definitely assessment open textual content responses which might include nice
info. If you happen to’ve collected a lot of feedback, LLM instruments
corresponding to ChatGPT will be helpful for extracting core themes and
solutions. While you’ve completed analyzing outcomes, make sure you share
your findings with respondents so their time filling out the survey
feels worthwhile.
How simple or troublesome is it so that you can do work as a
developer or technical contributor at [INSERT ORGANIATION NAME]?
Very troublesome
Considerably troublesome
Neither simple nor troublesome
Considerably simple
Very simple
For the first utility or service you’re employed on, what
is your lead time for modifications (that’s, how lengthy does it take to go
from code dedicated to code efficiently working in
manufacturing)?
A couple of month
One week to 1 month
In the future to 1 week
Lower than in the future
Lower than one hour
How usually do you’re feeling extremely productive in your
work?
By no means
Somewhat of the time
A few of the time
More often than not
All the time
Please fee your settlement or disagreement with the next
statements:
|
My group follows improvement finest practices |
□ |
□ |
□ |
□ |
□ |
I’ve sufficient time for deep work. |
□ |
□ |
□ |
□ |
□ |
I’m happy with the quantity of automated take a look at protection in
my venture. |
□ |
□ |
□ |
□ |
□ |
It is easy for me to deploy to manufacturing. |
□ |
□ |
□ |
□ |
□ |
I am happy with the standard of our CI/CD tooling. |
□ |
□ |
□ |
□ |
□ |
My group’s codebase is simple for me to contribute to. |
□ |
□ |
□ |
□ |
□ |
The quantity of technical debt on my group is suitable primarily based on our objectives. |
□ |
□ |
□ |
□ |
□ |
Specs are constantly revisited and reprioritized in response to consumer indicators. |
□ |
□ |
□ |
□ |
□ |
Please share any extra suggestions on how your developer expertise could possibly be improved
[open textarea]
Utilizing qualitative and quantitative metrics collectively
Qualitative metrics and quantitative metrics are complementary approaches
to measuring developer productiveness. Qualitative metrics, derived from
surveys, present a holistic view of productiveness that features each subjective
and goal measurements. Quantitative metrics, then again, present
distinct benefits as nicely:
- Precision. People can let you know whether or not their CI/CD builds are usually
quick or gradual (i.e., whether or not durations are nearer to a minute or an hour), however
they can not report on construct occasions right down to millisecond precision. Quantitative
metrics are wanted when a excessive diploma of precision is required in our
measurements.
- Continuity. Usually, the frequency at which a company can survey
their builders is at most a few times per quarter. To be able to acquire extra
frequent or steady metrics, organizations should collect knowledge
systematically.
In the end, it’s by way of the mix of qualitative and quantitative metrics – a mixed-methods method –
that organizations can achieve most visibility into the productiveness and
expertise of builders. So how do you employ qualitative and quantitative
metrics collectively?
We’ve seen organizations discover success after they begin with qualitative
metrics to ascertain baselines and decide the place to focus. Then, observe with
quantitative metrics to assist drill in deeper into particular areas.
Engineering leaders discover this method to be efficient as a result of qualitative
metrics present a holistic view and context, offering extensive understanding of
potential alternatives. Quantitative metrics, then again, are
usually solely out there for a narrower set of the software program supply
course of.
Google equally advises its engineering leaders to go to survey knowledge first
earlier than taking a look at logs knowledge because of this. Google engineering researcher
Ciera Jaspan explains: “We encourage leaders to go to the survey knowledge first,
as a result of if you happen to solely have a look at logs knowledge it would not actually let you know whether or not
one thing is sweet or dangerous. For instance, we’ve a metric that tracks the time
to make a change, however that quantity is ineffective by itself. You do not know, is
this a very good factor? Is it a foul factor? Do we’ve an issue?”.
A blended strategies method permits us to make the most of the advantages of
each qualitative and quantitative metrics whereas getting a full perceive of
developer productiveness:
- Begin with qualitative knowledge to establish your high alternatives
- As soon as you already know what you wish to enhance, use quantitative metrics to
drill-in additional
- Observe your progress utilizing each qualitative and quantitative metrics
It is just by combining as a lot knowledge as potential – each qualitative and
quantitative – that organizations can start to construct a full understanding of
developer productiveness.
In the long run, nonetheless, it’s vital to recollect: organizations spend so much
on extremely certified people that may observe and detect issues that log-based
metrics can’t. By tapping into the minds and voices of builders,
organizations can unlock insights beforehand seen as unimaginable.