Cultivating an Experimental Mindset in Your Organization

AMANDA KERSEY: Welcome to HBR “On Leadership” case studies and conversations with the world’s top business and management experts, hand-selected to help you unlock the best in those around you. I’m HBR senior editor and producer Amanda Kersey. As a leader, you face uncertainty all the time. Experiments offer a way to test assumptions, but it’s not enough to simply run them. Their value comes from designing them carefully and being willing to act on what they reveal, even when the findings upend your expectations. Here’s HBR Idea Cast host Curt Nickisch with a conversation from 2020 about what leaders need to know to design rigorous experiments and then put the evidence to work.

CURT NICKISCH: In science, the need for experimentation is cut and dry. You come up with a hypothesis, whether it’s about how storm clouds move or how cells in the body die, and you set up an experiment to test it. There’s a method, it’s called the scientific method, and you test it over and over again until you’re sure that it’s replicable and your answers are right, or at least as right as they can be until new variables come to light or the landscape changes. In business, there isn’t currently as much experimentation. Value has been placed on experience on the intuition of managers and leaders, and that’s a bad thing, says today’s guest, even in the most innovative industries we can think of more, can be done to set up experiments, test the results, and deliver better products and services to customers. And this goes far beyond AB testing at tech giants. Our guest today is Stefan Thomke. He’s a professor at Harvard Business School. He’s the author of the book “Experimentation Works, the Surprising Power of Business Experiments”, and he also wrote the HBR article “Building a Culture of Experimentation. Stefan, thanks for coming in.

STEFAN THOMKE: Thanks for having me.

CURT NICKISCH: Just to start, pretend I’m a business leader, make the case for me. Why do we need to experiment more in business?

STEFAN THOMKE: Well, first of all, it can generate a tremendous amount of value. Lemme give you an example, Microsoft Bing, which is its search engine. An employee working at Bing came up with an idea on how to display ads. The manager didn’t think much of it and they kind of shelved it

But the employee insisted at some point, the employee decided just to launch an experiment to run a test, a control test. And when he ran the test, that little change, a few days of work, generated more than a hundred million dollars of additional revenue in that year alone. And of course more revenue going forward, it was in fact, it was the most successful experiment that was run at Bing. So what made the difference? Well, the difference was that the employee had the power essentially, or the authority to run the experiment, to launch it and to test it. It’s the test that actually told you what works and doesn’t work

CURT NICKISCH: And not the manager

STEFAN THOMKE: And not the manager. The problem is in a lot of innovation, especially when you’re trying to predict customer behavior, we get it wrong most of the time. And so rather than trying to follow our intuition or our opinions, why not just run the test and let the test tell us what works and doesn’t work?

CURT NICKISCH: And what’s the answer to that? Why aren’t people doing it?

STEFAN THOMKE: Well, there are lots of reasons why not people are doing it, at scale especially. So some people are sort of running simple experiments because they refer to an experiment as something like a trial. We’re trying something. That’s not really an experiment sort of in the scientific sense. And they don’t do many of those because they either don’t have the infrastructure to run many tests. They may not have the tools to do so it may be too expensive to run it, and then they may decide that, listen, we run a test and we get some results, and then nobody listens to us anyway.

CURT NICKISCH: Right. Do managers overestimate the downside to experiments and underestimate the upside?

STEFAN THOMKE: I think sometimes they’re too overly concerned about the risk of running the experiment for good reasons. You have a lot of traffic. You may not want to launch something that results in a loss of customers visiting your websites, for example

CURT NICKISCH: Right? If it goes down

STEFAN THOMKE: If it goes down. And so if you don’t have good stoppage, rule, kill switches and things like that sort of in place, and there may be a risk aversion, it’s also stepping into the unknown. And quite honestly, it takes humility to admit that I just don’t know, and I’m walking into a meeting and we’re launching this thing and everybody has some hypothesis about what the outcome is going to look like and just go into the meeting and tell everybody, listen, honestly, I don’t know what’s going to happen, so let’s just find out

CURT NICKISCH: Even though I get paid more

STEFAN THOMKE: We get paid.

CURT NICKISCH: I’m in charge. I don’t know either.

STEFAN THOMKE: Exactly. And the higher up we go, the more you get paid

The more senior you get paid to make tough decisions and you want to be decision maker and you create an organization that ticks a little differently to do this sort of thing. By the way, I mean it’s not just the online world, it’s also the physical world where companies are running experiments and even there we have to make big decisions, sometimes very expensive decisions, and it’s the experiments that can in fact adjudicate whether we want to do something or not. Kohl’s big retailer and so forth. So Kohl’s hires a consulting company, and the consulting company basically does a cost analysis and they go to senior management and tell them, listen, we figured out that you can save a lot of money if you open your stores an hour later. Now here you are running this company and you have to make a decision. Should we do that? Calculating the cost savings is easy, but the big question is, what’s actually going to happen to our revenue? Are customers going to buy less if we open an hour later? So how do you make these kinds of decisions? We can analyze and analyze, but we won’t know until we actually do it, until we run the test. And in this case, they did. And so they ran controlled experiments in which they set up these tests opening an hour later, and lo and behold, at the end, the result was that didn’t make much difference. So

CURT NICKISCH: Just so we’re on the same page, how do you go about setting up an experiment? Are there playbooks for this?

STEFAN THOMKE: Well, first of all, there are tools. A lot of companies that described in the book built their own infrastructure, built their own tools because when they got started many years ago, the tools weren’t around. So you look at an Amazon, a Microsoft a Netflix, a Booking.com, I mean you go through them and it was about a dozen or so they decided to do it themselves.

CURT NICKISCH: So they knew that they had questions they wanted to answer, and they just figured out a way to do it.

STEFAN THOMKE: They figured this is going to give them a competitive advantage if they can kind of go out and just test a lot. And they knew that they often get it wrong. And so they started to invest in an infrastructure. And so at a place like Microsoft for example, you have a very, very large group that basically runs the infrastructure. Something like the last time I checked it was something like 85, 90 people or so that are just sort of doing infrastructure. But the good thing that happened a few years ago is there are now third party tools as well that can do this both in the online spaces and in the brick and mortar spaces, which do sort of a lot of the heavy lifting for you, a lot of the statistical stuff and so forth. And so it’s gotten a lot easier than say, if you wanted to start say five or 10 years ago.

CURT NICKISCH: Developing a culture for this is probably a little bit different.

STEFAN THOMKE: I think it may be potentially harder than getting the tools and building the tools because now we’re dealing with behaviors, with beliefs, with norms and all sorts of things.

CURT NICKISCH: How does this show up in companies? If the culture for experimentation is not working, what do you actually see and observe?

STEFAN THOMKE: Well, the classical example is they start running experiments. We have an experiment, we hand over the result to the group that ask us to run the experiment, and then nothing happens or they will start to challenge the experiment. Something must have gone wrong. I remember a story where an angry person actually called one of the tool vendors who sort of in this space and complained about the tool being wrong. The person ran an experiment that actually showed if you give customers less choice in his setting, you get better performance. And that was kind of just counterintuitive because everything that he believed in up to this point is that you should give people more choices. And so he was really disturbed by the finding, and so he called them and complained that there’s a flaw in the tool, something that the tool must be wrong because the result doesn ‘t match the experience that he’s had. And he’s been doing this for a long time. And so you run into that sort of thing

CURT NICKISCH: Which kind of underlines your point that experiments bring new insights that you just can’t develop on your own?

STEFAN THOMKE: Correct. There’s a company called Booking.com, which most of us use. In fact, it’s the biggest accommodations platform in the world. More than 1.5 million room nights are booked on the platform each day. It’s a two-sided platform. This is what we call, it’s got suppliers on one side, which are hotel operators for example. And of course it’s got customers like us on the other side, and Booking.com runs a massive number of experiments. My estimates are, and I’m probably on the low side, they told me, it’s my estimates, it’s over 30,000 a year of experiments, and it’s a really, really fascinating company. It’s also a highly successful company. Their gross profits are in the high nineties percent and they don’t really have any assets. They don’t really own any accommodation, so it’s a super competitive industry too. And so how do they get away with this? And the answer to this is they run a lot of experiments and they created an experimentation culture where almost running experiments is breathing. You kind of do it every single day. Curt, you have to think about the numbers here. Even if I’m running a lower number of experiments, I mean they’re running more than a hundred new experiments a day. You have to have an organization that can even come up with so many hypotheses.

CURT NICKISCH: I mean, you mentioned the number of transactions that Booking.com does in a day. How key is that to being able to run experiments? Does that also work for places that just don’t have data like that?

STEFAN THOMKE: Yes, it works for places that also have a lot less traffic. The underlying math changes, sort of what you have to do algorithmically is very different. In fact, if you have very large sample sizes, a lot of traffic for example, you can really fine tune. You can sort of do very, very small changes and you can pick up whether that change actually causes something to happen. As your sample size shrinks, you’re going to have to go for bigger changes. We call it the power of an experiment. You have to power an experiment, statistical power. And so I recommend for companies that are sort of smaller, that maybe they kind of run experiments that are a little bigger. Now what happens also, and this is something that actually happened at IBM when they started to do this, they realized that they have way too many websites. So yes, they had very little traffic on some of these websites, but they didn’t need all the websites. So it actually led to a process of consolidation. I said, listen, we don’t really need all these things, so what we’ll do is we’ll consolidate and we get more traffic on fewer websites, which then allows us to run more experiments.

CURT NICKISCH: I wondered if there are companies or industries outside of consumer facing tech or outside of scientific or pharmaceutical companies where experimentation really feels foreign?

STEFAN THOMKE: Well, I mean, the classical companies I think are sort of in the creative industries where the assumption is that everything is driven by creatives. I mean, look at entertainment for example, and look at what Netflix has done. So Netflix kind of flipped it around and they operate in the creative industry, but they’re completely experimentation driven. And I think it was a big wake up call for the entertainment industry because when you go in and you run Netflix, you are part of their ecosystem, their experimentation ecosystem. They run a massive number of tests. They want to find out what works and doesn’t work. By the way, running the test and getting a result doesn’t mean that you have to blindly follow what the result is because sometimes they’re good strategic reasons why you may not want to implement what the test tells you

CURT NICKISCH: Or there are trade-offs to whatever benefits

STEFAN THOMKE: Or trade-offs, for example. Or maybe there may be a contractual violation or something like that. But what that test does is it actually adds transparency to the decision. So you cannot pretend that we’re doing this because it’s good for the customer or something like that, or good for the viewer. It adds clarity that we understand from the test what’s good for the viewer, but there may be other reasons why we may not want to do it. And adding that transparency to what you’re doing I think is sort of a big value, and it allows a company like Netflix to operate really in the creative industry with a testing approach

CURT NICKISCH: Yeah.

STEFAN THOMKE: I don’t want to diminish the value of creative talent because creative talent is really important, but that doesn’t create certainty in terms of decision-making. To me, the creative talent and the intuition is an important part of experimentation because it allows us to create hypothesis. You have to ask yourself, Kurt, where do these hypotheses come from?

CURT NICKISCH: Still from people. It’s still people asking questions or having ideas.

STEFAN THOMKE: Absolutely. So what I’m saying is they’re running all these experiments, they’re all hypotheses that came out of product groups, and it’s the people who come up with these hypotheses. And so where do they get the ideas? Well, it’s intuition. Sometimes it’s insight, surprising customer surprises, things that thought that were true, and then they observe something that doesn’t quite fit sort of what they know. It’s usability labs. So still these companies all run qualitative research, but they do all the kinds of things that other companies do, but they do it for generating hypothesis, which are then rigorously tested versus other organizations that generate the hypothesis and go directly from hypothesis to launch

CURT NICKISCH: Right, based on whoever is the best public speaker based on or makes the best case in a meeting rather than

STEFAN THOMKE: Yeah, yeah, yeah. There’s a word for that in the community. They’re called hippos.

CURT NICKISCH: Hippos

STEFAN THOMKE: Yes. Highest paid person’s opinion hippos. And we all know that hippos are very dangerous animals.

CURT NICKISCH: I think a lot of executives are probably also not used to knowing how much experimentation to do. How do you know what to experiment on and how do you know what to let be?

STEFAN THOMKE: Yes, you have to empower people to make that decision. And the reality is right now, I think most organizations test too little. So I don’t think you should be too worried about testing too much. Yes, there’s probably a point at which you test too much because you need an organization that can absorb all that knowledge or all those findings that are generated by all these tests. That’s true. And we need to think about that, but I don’t think that’s the problem in most organizations right now. Right now, they’re not doing enough.

CURT NICKISCH: If you’re bringing this into a company, do you try to do this company-wide? Do you try to start with a team or a division and scale it up from there?

STEFAN THOMKE: So there are different ways to organize your experimentation teams. There are three models that I described in the book. One model is really more a centralized approach. I basically have a center, a group that’s responsible for experiments, and they’re like a service organization where you can come from a business unit, you can commission experiment, and they’ll run it for you and they give you the results.

CURT NICKISCH: Oh, that’s interesting.

STEFAN THOMKE: That’s one model. And a lot of companies start out that way because kind of a little uncertain how this is all going to work out. They may not believe that the company’s ready to do this at large scale.

CURT NICKISCH: It probably simplifies training and it

STEFAN THOMKE: A lot

CURT NICKISCH: lets people dip their toe in without really having to.

STEFAN THOMKE: Exactly. And you have a few experts and they kind of make sure that people don’t do foolish things.

Then another form is to have a decentralized, completely decentralized. So now we’re shifting the autonomy basically to people and allow pretty much anybody to run experiments and we don’t centralize it anymore. And of course there you have to trust people. You have to know that they’re actually capable of doing this. And it’s a way of course to rapidly scale things. But what happens there is when you start to put all these, you spread all these, your experts around, and they’re always sort of through the company, they get very busy and you kind of lose the focus on building capabilities because you need to always kind of get better and better. And so there’s no coordinated approach to this. Everybody kind of does their own thing. So what companies have found is they go from centralized to decentralized and they want to scale things, but then they realize that they need to have a more coordinated approach, and then they create something which they call a center of excellence.

And the center of excellence is kind of a hybrid model then where you have sort of a core group that actually is responsible for developing capabilities, experimentation capabilities, kind of know what tools to use and push the envelope. But at the same time, you take people out of that group and then place them sort into the different organizational units that are doing this, and they’re basically there to help as well. And companies found that that’s actually a very good compromise because on one hand you kind of empower people to do things on their own. At the same time, you actually have someone who centrally owns this capability as well.

CURT NICKISCH:

How do you know when it’s really working?

STEFAN THOMKE:

The way you’re really working, I think it’s a cultural test. And I tell you, here’s the test. You sit in a meeting and you’re discussing a decision. And when it’s working, either when someone asks, where’s the experiment or when someone actually walks into the meeting and says, here is the experiment. When these kinds of discussions are happening every single day without you having to ask for these things, then things are kind of working. I call it, it’s like running the numbers. When you go into a meeting, you always expect people to do some financial analysis. It’s almost a given. So it has to be like that. It has to be running a financial analysis. It has to be a given that you kind of do a test. You run an experiment that unless you’ve done it, we are not going to make a decision.

CURT NICKISCH: Say you’re an individual contributor, you may be a manager, you may be a frontline worker, but you buy into this, you see the value of experiments. You want your organization to do more. What do you do to try to bring a culture of experimentation to a place that is still relatively new to it?

STEFAN THOMKE: What you can do as an employee is first of all, raise the awareness around you.

CURT NICKISCH: What does that mean?

STEFAN THOMKE: That means basically explaining to people what the value of the experiment, what experiments are, but then also I think at the same time is maybe try to do some of these things in the areas that you control. Yes, I see the difficulty sometimes, and I hear this from people saying, okay, I get you, but there are two levels up. I’m not sure that they do, so what can I do? So I always tell them, start small, get going. And then this is often what happens. And I’ve talked to organizations that actually started this way and then got bigger and bigger. They said, we started out and we ran an experiment and we went to the meeting and we told people what the experiments showed us and so forth, and they listened to it and they gradually start to understand the value of it. But you got to get started. Don’t wait.

CURT NICKISCH: What kind of manager is then the successful manager in a company that has a culture of experimentation? Because in the past, maybe, it used to be people who had experience, people who had intuition. Now when you run experiments, what is the type of manager who excels and advances in an organization that has a culture of experimentation?

STEFAN THOMKE: So you can ask the question, if everything is adjudicated by experiments or by tests, what’s the role of the manager? Anyway

CURT NICKISCH: Right.

I kind of break it down into three different things that they should do. First role I think of a manager is to set a grand challenge. What we don’t want to do is we don’t have an organization that just does experiments willy-nilly with no direction. So there needs to be a grand challenge. A grand challenge, for example, could be we want to have the best user experience in the industry. And that grand challenge then can be broken down into different pieces, which then can be addressed with hypotheses, which are then tested. So you give them a directionality that needs to be a program, a systematic program that sort of aims for some bigger goal. So that’s the grand challenge. The second thing I think that managers need to do, especially in this kind of environment, they need to place the systems, resources and organizational designs that allow for that large scale experimentation to happen.

Things like that don’t happen by themselves. You need to invest in tools. You need to make sure that you’ve got the right organizational design to start out with and maybe then change it when things don’t work. So you have to think about that as well, and you need to make sure that all the systems are in place. So someone like that employee at Microsoft can just kind of push a button essentially and just launch and run this thing. If it takes employees weeks and weeks to set up an experiment, what are the odds of them doing it at large scale? It’s not going to happen. So you’ve got to make it easy as well, and you need to empower people to do it. You need to democratize experiments. And the third one is they need to be a role model. They need to live by the same rules. So when we go into a meeting and we propose a course of action and someone says, that’s really nice, we’ll run a test and let you know what happens. We need to then have the humility to say, let’s do it, and let’s do it quickly. So we need to live the same way. We need to do the same thing that we ask our employees to do. So that’s a different style of leading.

Stefan, thank you so much. Maybe we’ll try some experimentation on this show as well.

STEFAN THOMKE: Thank you. Great to be here.

CURT NICKISCH: Stefan Thomke is a professor at Harvard Business School. He’s the author of the book “Experimentation Works, the Surprising Power of Business Experiments”, as well as the HBR article, “Building a Culture of Experimentation.”

AMANDA KERSEY: HBR “On Leadership” will be back next Wednesday with another handpicked conversation from Harvard Business Review. If this episode helped you, share it with your friends and colleagues and follow the show on Apple Podcasts, Spotify, or wherever you listen to podcasts. And while you’re there, consider leaving us a review. And when you’re ready for more podcasts, articles, case studies, books and videos with the world’s top business and management experts, find it all at hbr.org. This episode was produced by Mary Dooe and me, Amanda Kersey. On Leadership’s team includes Maureen Hoch, Rob Eckhardt, Tina Tobey Mack, Erica Truxler, Ramsey Khabbaz, Nicole Smith, and Anne Bartholomew. Music is by Coma Media. Thanks for listening.

Continue Reading