Blog

Hypoxia weakens the body’s infection defense by changing genetic material of neutrophils

Low oxygen levels in the blood can alter the genetic makeup of key immune cells, weakening the body’s ability to fight infection, new research shows.

Scientists found that oxygen deprivation – known as hypoxia – changes the…

Continue Reading

October 28, 2025
Charles Howell III Among 29 LIV Players With Masters Stake in Hong Kong Open

This wasn’t quite a tradition unlike any other, but Charles Howell III rarely got through the early part of the year on the PGA Tour without being asked about the Masters. He grew up in Augusta, Georgia, and no other tournament meant more to him.

Continue Reading

October 28, 2025
Samsung Turns Up the Heat This Summer with Black Friday Sale Savings of Up to 45% – Samsung Newsroom South Africa

Smarter Living. Hotter Deals.

The season of sunshine and savings has officially arrived. Samsung’s Black Friday Summer Sale is here and it’s bringing South Africans up to 45% off^[1] the…

Continue Reading

October 28, 2025
Lawson retired with a ‘destroyed’ car as Hadjar did not have the ‘pace’ for points in Mexico City Grand Prix

Liam Lawson had a very short Grand Prix in Mexico City, the Kiwi racer the first to retire after a handful of laps thanks to car damage picked up at the start in a collision with Carlos Sainz into the first corner.

With Isack Hadjar also failing…

Continue Reading

October 28, 2025
AI, Citizen Science Detect Invasive Malaria Mosquito

By John Dudley, University Communications and Marketing

Researchers from the University of South Florida have used artificial intelligence and citizen science to identify what may be the first specimen of Anopheles…

Continue Reading

October 28, 2025
Danny Boyle’s Pic On Rupert Murdoch Adds Claire Foy To Cast

EXCLUSIVE: Emmy-winner Claire Foy is set to join Jack O’Connell and Guy Pearce in Ink, a Studiocanal, Media Res and House Productions pic that revolves around the rise of the Rupert Murdoch empire. Danny Boyle is directing….

Continue Reading

October 28, 2025
Nvidia CTO Michael Kagan: Scaling Beyond Moore’s Law to Million-GPU Clusters

Introduction

Michael Kagan: One of the interesting things about Nvidia is the culture of win-win, okay? We are not after taking a bigger piece of the existing pie. We are after baking a bigger pie for everybody, and the success—our success is our customer’s success. Our success is not the failure of our competition. Our success is success of our customers and success of the ecosystem.

And I think fusing together conventional computing, von Neumann machines and accelerated computing that are provided with Nvidia, it’s probably opened yet another dimension that I’m not sure what it is, but it basically gives—you know, on the practical short-term view is it gives Nvidia and Intel channels to the market, or expanding the market and serving the markets that otherwise was more challenging.

Sonya Huang: We are delighted to hear today from one of the legends of the semiconductor industry, Michael Kagan, the CTO of Nvidia. Michael was formerly Chief Architect at Intel and later co-founder and CTO of Mellanox, which Nvidia acquired for $7 billion in March 2019. Since then, Michael has been a major driver of Nvidia’s dominance as the AI compute platform—thanks in large part to Mellanox’s interconnect technology, which has been key to pushing chips beyond Moore’s Law.

The AI race is ultimately a silicon race—to squeeze the most intelligence possible out of each unit of silicon. Michael takes us on a journey through how the compute frontier has evolved: from packing more transistors onto a single chip to connecting thousands—or even hundreds of thousands—of chips into a unified fabric across an AI data center.

Michael has been advancing the compute frontier for more than four decades, and we’re honored to have him on today’s show.

Full Conversation

Pat Grady: Okay. We’re here with Michael Kagan, the CTO of Nvidia, currently the world’s most valuable company. Michael, thank you for joining us.

Michael Kagan: Thank you. My pleasure.

Pat Grady: So I thought where we could start—our partner Shaun likes to make the case about every six months that Nvidia would not be Nvidia without Mellanox. Mellanox was a company that you co-founded some 25 years ago, and have been a part of through this day. So can you kind of paint that picture for us? Why is it that the Mellanox acquisition was so critical to Nvidia?

Michael Kagan: You know, there was a huge transition in the world in terms of computing and the need for computing. And it grows, grows exponentially. And one of the things that we usually estimate linearly, but the world was exponential. And exponential growth now is actually accelerated. It used to be like Moore’s Law, which is basic silicon, and it was twice every other year. And regardless, the discussion that Moore’s Law in terms of physics is not quite running anymore.

Once the AI kicked in, which was in 2010, 2011—kicked in when GPU from graphic processing unit became general processing unit, actually, that’s running workloads where for the first time the AI workload was run on the GPU, taking advantage of the programmability and parallel nature of this machine. The requirements for performance started to grow at a much higher coefficient, so the models started to go up in terms of size and capacity 2X every three months, which requires now 10X or 16X a year performance growth, versus old school of twice every other year.

And in order to grow this scale, you need to innovate, and you need to develop solutions at a much higher scale than just basic components. And that’s where the network kicks in, that’s where the network is. And there’s multiple layers of scaling performance that requires high speed networks and high performance networks.

The one is what we call scale up. Basically if you’re going back to the CPU days, scaling up was more [inaudible], more transistors, and also some advances in the microarchitecture like out-of-order execution and at some point multicore and so on and so forth. But so this is the basic building block of computing. In the GPU world, the basic building block is the GPU. And in order to scale it up more than you can do on a single piece of silicon with a lot of advances that we are doing with the microarchitecture and advanced technologies, we actually need to do something on the scale of—on the sort of multicore CPU, but a much larger scale. And that’s what we are doing with NVLink. This is a scale-up solution. So our GPU, what we call GPU today, is a rock-sized machine. You need a forklift to lift it. So, you know, if you order just GPU on Amazon, just don’t wonder that you’ll show up this huge rack of …

Pat Grady: Yeah, people think chip, but it’s really a system.

Michael Kagan: Right. Right. And that’s just one GPU, okay? So basic building block, a very basic computer that application software is running on is this GPU. And it is not just silicon, it’s not just hardware, it’s not just wires, but there is also software layer that exposes CUDA as the API. And that’s what actually enables it to pretty much seamlessly scale. I’m simplifying a little bit the story, but seamlessly scale from one component that used to be a single GPU all the way up to 72, maintaining the same software interface.

And once you get this building block as big as it conceivably can be built in terms of power, cost, efficiency, then you start scaling out. And to scale out, it means you take many of these building blocks, connect them together, and now on the algorithm level, on the application level, you actually split your application to multiple pieces running in parallel on these big machines.

Pat Grady: Mm-hmm.

Michael Kagan: And that’s again where the network comes in. And so if you talk about scale up, we basically made memory like domain to go beyond single compute node and a single GPU. And that’s actually the first thing where Mellanox technology comes in, because before the Mellanox acquisition, the scaling up of Nvidia with the NVLink was limited to a single node machine. Going outside of single compute node, if the 72 GPUs are—it’s actually 36 computers, each one has two GPUs and they’re wired together, so present all this as a single GPU—getting connectivity outside of the signal node, it’s not just plugging in the wire to the connector. It’s a lot of software, it’s a lot of technology within the network, how to make multiple nodes to work as the single machine. And that’s where Mellanox first just immediate in terms of the way we go upstream. That’s the first one.

The second one is how do you split the operation across multiple machines? And the way to do it, if I have a task that it takes one GPU to do one second, if I want to accelerate it, I split it to 10, to, you know, a thousand pieces, and send each piece to a different GPU. And now in one millisecond, I get done whatever I was doing in the second. But, you know, you need to communicate this partial job split, split the task, then you need to consolidate the results. And every time you are—and you run this, you know, multiple times, you have multiple iterations or multiple applications running, so that there is a part of it doing communication and part of it doing computation.

Now the thing is that you want to split it into as many pieces as you possibly can, because that’s your speed-up factor. But then if your communication is actually blocking you, you waste time, you waste energy, you waste everything. So what you need to do is you need to has a very fast communication. So you split it into many, many pieces, and so each piece takes very little time. But then there is another piece that is communicated, and you need to fit it at this time. So that’s just pure bandwidth.

And another thing is that when you tune your application, you tune your application so that communication can be hidden behind computation. And it means that if communication for some reason gets longer, then everybody waits. So it means that what you need to do in the network, you need to have not only just raw performance, like what’s called hero numbers, you know, I can get to that many gigabits per second. I also need to make sure that no matter who communicates to whom, the latency, the time it takes, the distribution is very narrow.

So if you look at other network technologies or other network products, you know, you go to the HERO numbers, you know, sending beat from one place to another, it’s basically physics.

Pat Grady: Yeah.

Michael Kagan: It’s pretty much close to everyone. You know, we are a little bit better, but that’s not the big advantage. But when you do it thousands of times and it takes the same time to do it, versus a very wide distribution of other technologies, then the machine becomes less efficient. So instead of being able to split your job to a thousand GPUs, you can split it only to ten GPUs, because you need to accommodate for the jitter on the network within the computation phase.

So inherently, network determines the performance of this cluster. And we look at this data center as basically a single unit of computing.

Pat Grady: Yeah.

Michael Kagan: Okay? Single unit computing means that you look at this, you start architecting your components, your software and your hardware at the point where this is the data center, this is 100,000 GPUs that we want to make them work together. We need to make multiple chips, compute chips, two, network chips, five. Okay, so this is the scale just in terms of what’s the impact and what’s the investment you need to make to create this single unit of computing. So that’s where Mellanox technology came in.

And another aspect of this is we talked about a network that connects the GPUs to run the task. But there is another side of this machine which is customer facing. So this machine needs to serve multiple tenants, and this machine needs to run an operating system. Every computer runs an operating system. Another part of the Mellanox technologies is what we call BlueField DPU, a data processing unit which is actually the computing platform to run the operating system of the data center.

In a conventional computer you have a CPU that runs an operating system and runs application software. And there are many things we can talk about, you know, the advantage versus disadvantage. But there are two key things: one is how much time do you spend on your general purpose computing to run the application? You want to maximize it. And another thing is how do you isolate your infrastructure computing from the application computing? Because, you know, viruses and the cyber attacks and so on and so forth. And being able to run infrastructure computing on a different computing platform actually reduces significantly the attack front especially in the side channel attacks, versus what happens if you run it on the same computer.

If you remember there was a five or six—well actually, more than almost 10 years ago there was this meltdown and all these cyber attacks of the side channel on CPUs, and this cannot happen, or the attack surface is reduced significantly when you run the different—so on the other side of the network we have also technology. So that’s what makes the data center to be more efficient. And I—well, I may be not objective, but I do agree that this merger of Mellanox and Nvidia, it actually goes both ways. I don’t think that the networking business of, now it’s Nvidia, previously Mellanox, could have been growing that significantly as it grew. Now I think we are the fastest growing ethernet business, you know, let alone NVLink and InfiniBand, but just ethernet business is the fastest-growing business ever.

Sonya Huang: What are the things that break as you get to 100,000, maybe eventually a million GPU clusters? And how do you use software to help design around that?

Michael Kagan: It’s a multi-stage challenge, okay? One of the things that you need to keep in mind, and it’s not very obvious for all the engineers, that when you design the machine or think how to operate it, well, you know, you have these components and they are working and now just let’s figure out. Okay, so the thing is that the hardware component works at 99.999-whatever percent of the time, and it’s usually okay if you are dealing with a single box with a couple of them.

But if you are building a 100,000 component machine or a 100 GPUs machine, which means in terms of components, there is millions of them, the chance that everything works is zero. So something is definitely broken, and you need to design it both from hardware and from software perspective to keep going, to keep going as efficiently as you can, to keep, you know, your performance, you keep your power efficiency and of course keep the service running. So this is challenge number one even before you go to millions. This challenge actually starts at, you know, a few tens of thousands. And that’s number one.

Number two is when you are running these workloads, it is really important to—sometimes you run a single job on the entire data center, and then you need to write the software and you need to provide all the interfaces to the software to place the different parts of the jobs more efficiently.

Building networks at this scale is a very different story than building—compute network on this scale is a very different story than building just a general purpose data center network. A general purpose data center network is ethernet. It’s not a big deal—well, it is a big deal, but it is a different deal. You are serving, you know, loosely coupled collaborative microservices that create the service that you see as a customer from outside. Here you are running one single application on 100,000 machines, and they need to …

Pat Grady: Is that specific to training workloads, or is that also true with inference workloads?

Michael Kagan: It’s true for everything. It depends to what scale. And the inference is yet another topic that we [inaudible]. Until recently, training was the key thing, you know, a lot of GPUs. And there was a very specific way of training that was being done. You basically copy this with another model on multiple machines or multiple sets of machines and run them, then consolidate the results and so on and so forth.

On the inference, the story is a little bit different, but the thing is that you need to provide the hooks on the hardware and on your low level system software for application and for scheduler to place the job and place the different parts of the job in the most efficient way. And as long as your machine fits in a building, which is about 100,000 GPUs, now we’re talking about a gigawatt, it’s all power driven, then you are here.

But the problem is—the challenge is that for many reasons you want to split your workloads across multiple data centers. And sometimes data centers are at a distance of many kilometers, many miles. It may be across the continent. And this comes with yet another challenge, which is the speed of light.

Pat Grady: Yeah.

Michael Kagan: Okay, now the latency variance between different parts of your machine is dramatically different. And what is even more challenging is that when you talk about networks, the congestion on the network is one of the key problems that deteriorates network performance. And managing congestion across such a latency difference, it’s not like, you know, in the old telco days you put some box at the edge of your data center with a huge buffer and it’s a shock absorber for congestion. A huge buffer is not good. You know, bigger is not better. There is a famous statement from a very famous woman.

And these buffers are basically, or these devices are basically to isolate the external world from the internals. But when you want to run a single workload across data centers that are distant by kilometers, you need to have every machine on one side to be aware. But whom does it communicate to, whether it’s short communication, long communication, and adjust all the communication patterns accordingly, so you don’t need these big buffers. Because big buffers is a jitter.

Pat Grady: Yeah.

Michael Kagan: And so we have a technology, we actually developed it recently, a technology, all of our ethernet network is Spectrum X. This is the device that we designed and developed based on the Spectrum switch that we put on the edge of the data center. And it provides all the information and telemetry needed for the endpoints to adjust for the congestion.

Sonya Huang: Can we talk a little bit more about training versus inference? Like, how does the shape of workload differ when you’re doing, I guess, back prop is a lot more computationally intensive, forward pass less so. But how does the workload differ? And then are you seeing customer demand start to shift from pre-training towards inference, or do you think it’s still very training heavy right now?

Pat Grady: And if I could just ask a quick follow-up question with that. Will people be running inference workloads on the same data centers that they use for training, or will these end up being two separate—because they’re different optimizations, people end up using two different sets of data centers?

Michael Kagan: Okay. Yeah, that’s a great question, and let me start with the first one. So training has two phases. One is inference, which is just for the propagation and then back propagation to adjust the weights. And for data parallel training, it’s yet another phase to consolidate the results of the weights update across multiple model copies.

So until recently it was the main driver of the compute, because until not very long ago—it’s maybe two years, which is ages in the AI era—the inference or AI was mainly perceptional. So you show the picture, that’s a dog. You show the photo of the person, and here that’s Michael and that’s Sonya. And so that’s a single path and that’s it.

Then came generative AI, where actually you get the recursive generation. So when you pause the prompt, then it’s not just one inference, it’s many inferences. Because for every token, when you generate text or generate picture, for every new token, you need to go through the entire machine all over again. So instead of one-shot [inaudible] and then there is more. And then now there is reasoning, which means machine starts, you know, sort of thinking. Okay? If you ask me what time is it now, I can tell you. It’s easy, right? What time is it now? But if you ask me a more complicated question, then I need to think, I probably need to wait or compare multiple solutions or multiple paths, and every such a thing is inference.

Pat Grady: Yeah.

Michael Kagan: Every such thing is inference. And inference itself has actually two phases. One is much more compute intensive, and the other one is memory intensive. It’s what we call prefill. Because when you do the inference, you have some sort of background, which is prompt, which is some relevant data that you need to process and create the context to generate the answer. And this is very compute intensive, it’s not much memory intensive.

And the other part is actually generating the answer, which is the decode part of the inference where you generate token by token, okay? Well, there are some technologies that you can generate more than one token, but it’s still a single path is much less than the final answer.

So if you combine all these things together, inference demand for computing is actually not less than training, it’s actually even more. And there are two reasons for this. One is that what I explained, that there’s much more computing than it used to be for the inference. The other thing is you train a model once, but you infer many times. You know, ChatGPT, billions of people—or it’s almost billions of people. Customers, they are pounding them all the time in the same model. They trained it once.

Sonya Huang: And now making videos. Now they’re making videos.

Michael Kagan: Right. Right. Now they’re making videos and you can generate, and then, you know, everybody is doing the inference. My wife, I think she talks to ChatGPT more than to me these days. Once she discovered this, that’s her best friend. Now to your question about machines. You can infer on the phone, okay? So there is definitely going to be much smaller scale installations for inference. It’s like mobile devices. If you look at the data center scale, at the data center scale and its efficiency of the programming, the programmability is much more viable than optimizations for hardware. And, you know, every hardware instance, it has its own cost and its own drawback. So as long as you don’t identify—and I don’t think besides, you know, this—we actually did—it’s a very similar GPU. It’s the same programming model as a GPU for prefill versus decode. I think—I don’t remember when it happened, but actually we announced that we are building the GPU SKU that is optimized for prefill. So you will have—it can do decode, and decode GPU can do prefill, but you can equip your data center with the SKUs or the prefill with the SKUs that are for decode to optimize for typical use. But if your workload shifts for more decode or for more prefill, you can use either one of them to compensate. And this is the importance of programmability. The same interfaces for GPUs, it’s based on CUDA and up, which is—that’s what made Nvidia. Nvidia before Mellanox.

Pat Grady: Yeah. Yeah. Can I ask you a question about data center scaling? So for many decades we had Moore’s Law, and chips got more and more dense and produced better and better performance. And then we ran into the laws of physics, and chips just couldn’t get more dense because their quantum mechanical properties caused them to break down. And so then we had to scale up to the rack level and now we got to scale out to the data center level. Is there some analogous law of data center scaling that says when data centers get too big the communication overhead causes the performance to break down? Or just said differently or maybe said more simply: Is there a natural limit to how big data centers can get?

Michael Kagan: I think there is a practical limit of how much energy you can consume within the given size of the data center.

Pat Grady: If you were surrounded by nuclear power plants and the energy was available, would the data center itself perform?

Michael Kagan: I don’t know. I’m not an expert in the construction. But if you surround, there’s energy coming in, now the heat is going out. So there is a whole—we are now basically moved pretty much entirely to the liquid cooling. And one of the reasons we did it is to enable much denser compute power. We couldn’t build as dense computing as we’re building now with air cooling.

Pat Grady: Yeah.

Michael Kagan: So there’s a whole bunch of technologies coming to help this more and more denser. Now the last big data center which is like XAI scale is 100 or 150 megawatt. Now we’re talking about gigawatt data centers, people are talking about 10 gigawatt data centers. So, you know, looking forward to building much bigger data centers.

Sonya Huang: Are you sending the data centers to outer space?

Pat Grady: Pretty cool.

Michael Kagan: I think—well, one of the things that determines the speed of data center deployment is you know, how fast concrete gets stable.

Pat Grady: [laughs]

Sonya Huang: So before starting Mellanox, you were at Intel.

Michael Kagan: That’s right.

Sonya Huang: Sixteen years?

Michael Kagan: Sixteen years.

Sonya Huang: Became chief architect. Nvidia and Intel recently announced a partnership. Can you share a little bit about what the vision for that might be?

Michael Kagan: You know, the starting point is that computing changed in the last decade, or a little bit more than a decade. Nvidia started as the accelerated computing company. Video games was the first. And then it evolved to AI, which is the new way of data processing. So you cannot just—General von Neumann machine just is not capable of being used as a platform to solve the problem like, you know, programming when a machine is just explaining to somebody what to do. Okay, I can explain many things and I can explain to many people what to do, but I can’t explain how to distinguish between cat and dog, right?

So there are new challenges that AI solves, and you need acceleration there. And our partnership with Intel is actually fusing accelerated computing with the general purpose computing. Because general purpose computing is not going away. Everything will be accelerated, but we accelerate the general purpose computing, we accelerate the applications. And x86 is the architecture that is dominant there, and it would serve greatly both companies.

That’s actually one of the interesting things about Nvidia is the culture of win-win, okay? We are not after taking a bigger piece of the existing pie. We are after baking a bigger pie for everybody, and the success—our success is our customer’s success. Our success is not the failure of our competition. Our success is success of our customers and success of the ecosystem.

And I think fusing together conventional computing, von Neumann machines and accelerated computing that are provided with Nvidia, it’s probably opened yet another dimension that I’m not sure what it is, but it basically gives—you know, on the practical short-term view is it gives Nvidia and Intel channels to the market, or expanding the market and serving the markets that otherwise was more challenging.

Pat Grady: You mentioned the culture of Nvidia. So when Mellanox became part of Nvidia in 2019, the market cap of the combined company was about $100 billion—which is no joke. But the market cap today is about $4.5 trillion. And so 45x growth in value in six years is pretty phenomenal. How has that changed the culture of Nvidia? How is Nvidia different today now that it’s one of the most admired companies in the world, if not the most admired, versus six years ago?

Michael Kagan: Yeah. About this, you know, when we just joined, Jensen was in Israel, and I presented him, you know, that I believe that one plus one will be ten. And I actually was off by a factor of four.

Pat Grady: [laughs]

Michael Kagan: But, you know, Mellanox and Nvidia in a sense it’s sort of similar, the culture is very similar to begin with, but there are some—there’s nothing absolutely similar. And I was the only founder that left Mellanox after Eyal resigned a few months after the acquisition. And my main focus in the beginning, you know, were things that you think about in the shower, was how to make sure that this acquisition will succeed.

Pat Grady: Yeah.

Michael Kagan: You know, Nvidia paid $7 billion for a company that I founded and, you know, with all the mixed feelings that were there. But once it’s done, it’s done. Now I have to make it successful. So eventually it worked.

Sonya Huang: [laughs]

Michael Kagan: Most of the Israeli employees stayed. I think it’s 85 or 90 percent of the original employees stayed. Actually, Nvidia grew more than 2x in Israel in terms of manpower.

Pat Grady: Yeah.

Michael Kagan: So we’re growing, and we are announcing that we are actually going to build a campus in Israel, a new campus for Nvidia. And so that’s where I think overall the merger was very successful. I did my best to make sure it succeeds. And besides the technology that I was looking at, this part of which is sort of technology, but it’s technology and theology, and there is many other things to make sure that people are comfortable from being in the center of Mellanox, which is the headquarters on Israel, don’t feel left somewhere in the far away with the—and Jensen basically emphasizes the networking is the critical part of Nvidia’s success.

Pat Grady: Yeah.

Michael Kagan: And he’s right. So I think it was the—it’s considered to be the most successful measure in the history of the technology. You guys probably track the things better than I am, but overall I think it was a great move.

Pat Grady: Yeah.

Michael Kagan: Looking backwards.

Sonya Huang: What are the science fiction things that you spend your time thinking about? I was just even wondering, like, for example, optical interconnects. Do you think that will exist? Do you think AI will ever be better at physics than us and better at data center design than us?

Michael Kagan: Well, what I’m thinking, you know, if you look at science fiction, is how to make history to be experimental science. You can, in physics, try something and then see if it works and then try something else. In history, time goes one direction, but you have a good simulation of the world. [inaudible] And we have an Earth 2 climate simulator. And with this type of technology, we can actually simulate how what we do today will impact the global warming 50 years from now, okay? So experimental science, you know, you try something, you see what happens 50 years later.

So that’s the science fiction part. And, you know, the physics? Now we are moving from reasoning and so on and so forth. Now once we get AI models to understand physics, we actually can learn physics. AI can teach us physics because the way we get to the laws of physics that we observe—theoretical physics—you observe some phenomena, and you generalize it and you compose the rule that’s basically the law, the physics law that stays underneath this phenomena. And AI is really great at generalizing and data processing and observing, so AI can help us to get to know some laws of physics that we don’t even imagine now.

Sonya Huang: Okay, so Huang’s law was 2x every two years. Huang plus Kagan’s law is, what is the slope and how long do you think you can sustain it?

Michael Kagan: Well, the slope is somewhere in the range of 10x or a few orders of magnitude a year.

Sonya Huang: Okay.

Michael Kagan: And that’s what we are doing by the way now, since about two or three years ago we accelerated our product introduction from every other year to every year. Now we introduce a new wave of products every year, and it’s an order of magnitude higher performance. And it’s not on the cheap level performance, it’s on the machine that you can build with this performance. That’s what we are looking at, it’s a single unit of computing.

And how long it will stay, I don’t know. I don’t know. But we’ll do our best to maintain it as long as needed and probably even accelerate. It’s all about exponent. It’s all about exponent. It’s hard to imagine. You know, if you look at this Moulay curves or any role curves, they usually plot it on the logarithmic scale so it looks like linear. But that’s the wrong thing to look at. When I’m showing this [inaudible]. So it’s just like this, you know—boom! And a year later it’s the same—boom! You know, you can’t predict what’s going to happen. Who could predict that when iPhone was first introduced or smartphone was first introduced, you know, it’s 15 years ago?

Pat Grady: 2007.

Michael Kagan: Yeah, 2007. Oh, 17 years ago. Okay, who could imagine that this smartphone, the least-used function, at least for me, is the phone.

Pat Grady: Yeah.

Michael Kagan: Unless it’s e-commerce, it’s texting, it’s news, it’s mail, it’s basically running your life from this machine. It’s your authentication, your ID is there. So, you know, now who can imagine what’s going to happen, you know, 10 years from now with all these developments that we are doing today? But we are building the platform for innovation.

Pat Grady: What is your commentary on who can imagine—notwithstanding, what is the most optimistic view of our future with AI that you like to think about? Like, what could AI do for the world five, ten, fifteen years from now?

Michael Kagan: The thing is that Steve Jobs called the computer to be the bicycle of mind.

Pat Grady: Yeah.

Michael Kagan: Okay? So AI is—it’s maybe, I don’t know if it’s—it’s probably a spaceship.

Pat Grady: [laughs]

Michael Kagan: Because there’s a lot of things that I would like to do, but I just don’t have enough time, don’t have enough resources to do it. With AI, I will have it. And it doesn’t mean that, you know, I will do twice as much. Maybe I will do 10 times as much. But the thing is that I will want to do a hundred times as much as I want to do today. And, you know, you go to any project leader and nobody says, you know, “I have enough. I have enough manpower, I have enough resources. I don’t need any more.” Okay? If you give him resources which are twice as efficient, he will do four times more.

Pat Grady: Yeah.

Michael Kagan: And he will want to do 10 times more. So it’s going to—it’s like electricity changes the world, right? Instead of using—you know, in London, you still see these gas lamps and this infrastructure to use the gas as the source of energy. Who could think that, you know, once this electricity was invented, it will change the world that, you know, we can’t live without electricity. The same with AI.

Pat Grady: Awesome.

Sonya Huang: Beautifully said.

Michael Kagan: New world.

Sonya Huang: Thank you so much for joining us today. I love this conversation.

Michael Kagan: Thank you.

Pat Grady: Thank you.

Michael Kagan: Thank you for having me.

Continue Reading

October 28, 2025
Black British book festival launches publisher | Books

The organiser of the Black British book festival, Selina Brown, announced earlier this month that the festival will launch a publishing collaboration with Pan Macmillan, focusing on “raw talent”, in particular writers who have not been…

Continue Reading

October 28, 2025
Just a moment…

Just a moment…

This request seems a bit unusual, so we need to confirm that you’re human. Please press and hold the button until it turns completely green. Thank you for your cooperation!

Continue Reading

October 28, 2025
Just a moment…

Just a moment…

This request seems a bit unusual, so we need to confirm that you’re human. Please press and hold the button until it turns completely green. Thank you for your cooperation!

Continue Reading

October 28, 2025