ISC-2025 Cluster Competition: Meet the Teams

There were 10 teams in the ISC-2025 cluster competition, which is two more than in 2024. It’s also great to see a couple of entirely new teams joining in the HPC fun. With the new 6kW power limit, the teams have twice as much juice to design their systems around but, as we noted in our first article, system power requirements just keep climbing, mainly driven by CPUs and GPUs, but memory and fans are significantly higher as well over time.

We did two interviews with each team, and they’re both in the same video. The first took place during the assembly/practice time, when everyone was confident and had rosy dreams of standing at the top of the podium, waving at the crowd and holding bouquets of roses. (That has never happened, but it’s a common dream, right?) The second interview was either late the second day (Wednesday) or on Thursday before the early afternoon finish.

By the time the second interview rolls around, the kids are tired, and have been put through the wringer by broken code, and, yes, some broken dreams. But you’ll notice that their attitudes are are still positive and that none of them have given up, they’re still in there pitching. So let’s meet them…

Centre for High Performance Computing (CHPC) is a very familiar name for all the millions (and millions) of student cluster competitions fans. In their thirteen ISC appearances, Team CHPC has harvested four gold medals, three silvers, and a bronze. This is even more impressive when you factor in that every single team member is here for the first time. CHPC has a policy that the national team members get to do it only once, this is to help spread HPC education throughout the country quicker.

The 2025 team is made up from the winning team of the CHPC intra-country competition from last December plus two standout students from other teams. The top team was from Cape Peninsula University of Technology (CPUT, not ‘Kaput’), who broke the Wits University winning streak. They did it in a brash and fun way, as you can see from this video.

In this first interview, which took place during the system testing phase, the team seems to be doing reasonably well. Hardware is working well, three HPE DL380 nodes, each with Intel 8480+ CPUs (56 cores, 2.0/3.8 GHz) and four Nvidia H100 GPUs per node (12 total.)

In addition to our regular interview chat, I also introduce the PRSM (“Prism”) cable management award. I invented this to shine a light on the vital importance of cable management (as outlined in this article LINK.) and to have a bit of fun. I also preview “The Great ISC Chocolate War” which promises to be even more fun.

Update:  We caught up with Team CHPC with less than two hours left in the competition. They’re working on the mystery application (LAAMPS) and are ready to submit OpenMX. The team managed to get to some of the sponsor parties, which gave them a chance to both relax and also soak up some of the industry ambiance. I should have broken the news to them that the typical HPC/AI day-to-day industry experience doesn’t feature as all you can eat shrimp and open bars, but why be a buzz kill, right?

ETH Zurich has seen a lot of cluster competition action with eleven appearances in ISC and SC events, taking home a gold, a couple of bronze, and two Highest LINPACK awards. They have some experienced students on this edition of Team Racklette, which will definitely help out.

I try to stir the pot by trying to pit students majoring in computer science vs. those who are majoring in computational science – didn’t work.  An interesting point comes out in the interview when we discuss the code_saturne application. They managed, with difficulty, to get it to run on GPUs with a test case but then found that it acted differently with a different test case. In short, it seems to them that whether it can be run efficiently in GPUs is dependent on the dataset rather than the application itself. Interesting.

The team is driving an eight-node cluster equipped with, wait for it, the first appearance of Nvidia’s Grace Hopper GH200 superchip. Pretty sporty when you consider the bandwidth associated with NVLink for GPU-to-GPU comms plus their integrated Infiniband node interconnect. According to the team, the hardware is running well and their main challenges seem to be optimizing the software.

We ended the interview with some controversy. I tell the students that Team Helsinki claim that Finland makes the best chocolate in the world. I continue by saying that they told me that they wouldn’t feed Swiss chocolate to farm animals. (Later on, off camera, I add that the Finns used to feed Swiss chocolate to prisoners as a punishment but were stopped by international human rights advocates.) I then set the stage for the first ever ISC Great Chocolate Battle. Stay tuned for the thrilling taste test showdown.

Update:  On the last day, Team ETH is busy, but seemingly happy, with 90 minutes until the end of the competition. They’re pushing hard on LAAMPS, trying to wring out maximum performance from the perennial cluster competition go-to app.

But I had to bring up a sensitive topic, the elephant in the room, by that I mean the shocking Swiss loss to Finland in the ISC Great Chocolate Battle. The Swiss team took it well but now agree that they made a strategic error in their choice of chocolate for evaluation. made a basic mistake in the chocolate they choose to represent their side. Great team and a lot of fun too…

 

University of Bristol: Team Bristol has participated in several virtual competitions, but ISC25 was a whole new ball game for them. Here, they had to (or got to?) deal with physical hardware and all that comes with it. In the virtual events, they didn’t need to worry about power consumption – but in a physical competition, it’s their biggest concern.

Team Bristol might be a bit of a misnomer since the team is made up of students from different UK universities – sort of an all-star team in a way.

On day one of the competition, the team seems to be in good order, the system is working and they’re making good progress on the applications. The team is riding a HPE dual-node cluster with 52-core Intel CPUs and four Nvidia H100 GPUs all interconnected with 800Gb Infiniband.

The team was concerned about going over the power cap, which is common for first time competitors. There isn’t any penalty to going over the limit, they just can’t submit that particular run for scoring. My advice is that you need to explore where that limit is and you can’t find the edge of the cliff without going over the cliff.

They had a big problem with their drives. When working on the IO500 benchmark they discovered that their drives were running in RAID 5 (via hardware.) They needed to be running in RAID 0 for max performance, as they tried to make the change, they found that the process would take TWO DAYS to complete. Oh oh, they don’t have two days. So they bit the bullet, crossed their fingers, and rebooted, hoping that nothing broke.

The result?  “Everything broke.” At least in one node. After an all nighter, which included a clean install of everything and reprovisioning, the kids managed to get that node functioning. The organizers cut them a break by allowing them extra time to complete the competition tasks, which means the team is basically doing twice the amount of work today. Yikes.

Tough break for the team, but much learning has taken place and that’s the point of this thing anyway, right?

Update:  We talked to Team Bristol again on the last day. They’re happy to see LAAMPS as the mystery app, since they have some familiarity with it. It’s been a very tough couple of days but the students have had a great time working with hardware, meeting the other teams, and getting the whole in-person experience. They’ve been a great addition to the ISC25 line up and here’s hoping they become a fixture in future competitions.

 

FAU/USI:  This is a combined team featuring students from cluster competition mainstay Friedrich Alexander Universitat and Università della Svizzera italiana (aka University of Lugano.) For want of anything more creative, I’m going with Team FAUSI as shorthand.

Our first interview with Team FAUSI is during their LINPACK/HPCG runs. They’re pretty happy with how their system is handling “The Dongarra Suite” as I refer to it. The student handling the benchmark tasks is positively basking in how well their eight Nvidia H200 GPUs are ripping through them. Gotta give their hardware sponsor Megware credit for providing them with a killer system.

The team captain drove their server to Hamburg in the back of the truck. I should have investigated this further as it’s nearly 400 miles from their home base in Nuremberg to the show floor in Hamburg. Did the other team members ride in the back of the truck or was it a solo trip? Unclear.

One consequence of the truck trip was a loose connection somewhere in a node that forced them to completely tear it down and put it back together again. Painful? Yes, but it worked.

We finished off by talking some cable management, it’s something that FAU pioneered in cluster competitions over the years, so we’ll see what they bring to the table for the PRSM award.  Plus I previewed the Great Chocolate Battle. At the very end, we discuss whether beer could be used as a liquid cooling medium….interesting topic…needs more research.

Update: We do our final interview with Team FAUSI just an hour or two before the end of the competition. The team is maybe a little partied out from their ‘industry meetings’ on Tuesday night, like several of the other teams. More teams hit the parties than I remember in past years, or maybe I just didn’t notice.

But today? Fate has reared it’s ugly head in the form of OpenMX. The team has found that the run will take two hours, which is two hours they don’t have. This is because the team figured that the competition dataset would be roughly equivalent to the testing/training datasets in terms of completion time. So the team is devoting their time to LAAMPS and putting OpenMX in the rearview mirror.

Speaking of rearview mirrors, bright and early tomorrow morning the students will be loading the server back on their truck for the long ride back home. They said it was a few hours, but if ‘home’ is Nuremberg, it’s about 400 miles or a seven-hour drive. But maybe I’m not factoring autobahn speed limits into the mix. Google maps assume a 61 MPH average, which is sadly slow. With the right truck or van capable of holding a 90 MPH average speed, they can cut that time to a more reasonable 4:45. More questions than answers..

 

University of Helsinki is a brand-new team in student clustering, at least for in-person events. Very personable and fun team, a very good interview. The team has divided the workloads for the most part and has a good approach, but I don’t think they’ve had much experience with managing power, a skill they need to pick up quickly in Hamburg.

Introductions out of the way, we take a look at their innovative rack. Who needs an expensive rack for a couple of nodes and a switch when you put together something perfectly suitable for tens of dollars? It’s based on the Ikea Lack table (a two pack is only $45.40 from Amazon) which, as Erica says “fits like a glove”, width wise. The top of the second table serves as the base of their DIY rack and the addition of casters makes it portable. The nodes are separated by scrap lumber scavenged from the show floor build up. Nicely done!

Their cluster consists of two nodes, each sporting four Nvidia H100 GPUs, driven by 32-core AMD CPUs. At this point in the warmup phase, they don’t have all their apps up yet but are on the way.

Team Helsinki is an early front runner for the PRSM award. They took my cable management nit to heart and, with an assist from show electricians, taped down every possible trip hazard cable to the table leg and floor. It’s beautiful. Sure, there’s a big bundle of cords by the wall, but that’s sort of unavoidable since they’re issued much longer cables than they need.

What looked like a F1 fire suit on the wall turns out to be a cool Finnish student tradition – the “student overall.” The color black is for the computer science major and the patches are for activities the student has participated in – and they wear them all over, they’re a source of pride and fun.

Did you know that the game Angry Birds originated in Finland? I didn’t either. The team brought two Finnish candies, a salty licorice (which I didn’t try) and milk chocolate, which I did try and found delightful. This is also the exact moment (TIME in video) when I came up with the idea of fomenting a Finland vs. Switzerland Chocolate Battle. As you’ll see, the Finns are in with both feet – they love their Karl Fazer milk chocolate!

Update: With the competition winding down, we visit Team Helsinki to get a final report. They had a good first day, turning in valid benchmark results for HPL, HPCG, and the IO500. Not sure how the afternoon went as they didn’t mention SeisSol, code_saturne, or the LLaMA tasks. They’re putting in an intensive effort to finish OpenMX and LAAMPS, so I didn’t want to take up their time going over the other stuff. In the demanding world of interviewing student cluster competition teams, you have to be able to read the room, right?

I congratulated them on their victory in the ISC25 Chocolate War, which is sure to be highlighted by the Finnish press and celebrated country wide.

They’ve been a great addition to 2025 competition and hope they keep coming back.

 

Nanyang Technological University has been, as a school, in 21 student cluster competitions and built up a nice record with two championships, four second place finishes, and four LINPACK awards.

In the first interview, the team is working on their pre-competition issues, which are both hardware and software related. We didn’t go into a lot of detail as they seemed pretty busy, but I think their hardware might still be a little shaky and there are definitely some challenges with getting  code_saturne to run on their GPUs.

They’re using three SuperMicro servers, and twelve Nvidia H100 GPUs, which pack a serious punch – about the equivalent of 8.5 H200 GPUs, depending on application and task, of course. Assuming they can work out the problems, they have enough hardware to compete and definitely have the brain power too.

Update: Team Nanyang is trying to finish strong, working on LAAMPS with half the team working on the GPU version and the other half working on the CPU version. On a five person team, that means one has to split his time evenly between the two. I don’t probe the issue as they have been heads down all morning and aren’t looking for an extended conversation. Good luck, Team Nanyang, bring it home.

 

National Tsing Hua University from Taiwan is very familiar to cluster competition fans worldwide. They were in the very first competition way back in 2007, winning LINPACK. They were the one of the first teams to use GPUs (SC11 in Seattle), where they won another championship.

Most, if not all, of these students were competing at ASC25 in Xining, China. It wasn’t their best outing, but it gave them experience in what was probably the most difficult cluster competition to date.

They certainly brought enough hardware to make a big statement at ISC25. Six nodes of AMD EPYC 96-core CPUs coupled with eight Nvidia H200 GPUs is serious business. We talk about the challenges of controlling the power when you have gear that can gulp much more than 6kW if not tightly controlled.

At interview time, the SeiSol specialist is having problems getting it to run, something I hadn’t heard from most other teams. I loved her answer when I mentioned that “I’ll work harder then…” Her partner on it said that it’s doing, or not doing, things he’s never seen before. The team member in charge of LLaMA also noted that the competition dataset is 3x larger than what they were given for training, which is to be expected, the organizers are a wily bunch.

Another interesting thing about Team NTHU this year is that their coach is only a year older than the rest of the team. He’s been in a couple of competitions, so has experience, but it’s not like he’s a Doctor of HPC (or at least not yet.)

I finished the interview with a little history lesson about the NTHU cluster competition legacy and a pep talk. Not sure if they were inspired, but I was.

Update: When we catch up to Team NTHU on the last day, they’re having problems installing LAAMPS. It’s a library thing and they need to install something else, I guess. They aren’t panicking but are concerned. Overall, the team has had a great experience at ISC25 and particularly liked the generous food and beverage availability, so good job ISC organizers!

 

The Pittsburgh Supercomputing Center team is made up of students from both the University of Pittsburgh and Carnagie Mellon University. I would have loved to stir up some intra-team rivalry, but it wasn’t in the cards, they’re getting along well.

This is their first in-person competition but they all have solid virtual competition experience behind them.

In the interview, we had some fun with the team captain, discussing his leadership role and the heavy responsibilities on his shoulders. We also define the term “rising junior”, which was a new one for me. A “rising junior” is in reality a sophomore who hasn’t start his or her junior year yet. Essentially, a sophomore who is puffing up the resume.

The team has participated in virtual competitions before, but ISC25 is the first time they’ve gotten elbow deep into real hardware, hearing the scream of the fans as they strain to push air over hot components. The team brought four compute nodes, each packing dual 48-core AMD CPUs. Three of their nodes have AMD M210 GPUs, which have solid floating point capabilities. While the M210 is a bit long in tooth, a dozen of them still will pack a punch – but the team has to optimize the hell out of the apps in order to get a place on the podium.

We close out by talking about the traditional modified Le Mans start to the competition. Rather than running to their cars, the students have to run through the show floor and collect their uniquely colored t-shirts before they can start computing. The team and I talk about a few ideas to spice that up even more.

Update: On the final day, Team Pittsburgh feels good about OpenMX and are still working LAAMPs over. It’s running, no issues, but needs more optimization. It sounds like it was a great inaugural in-person competition for the Steel City team. Great to have them and let’s hope they show up again next year.

 

Tsinghua University has participated in more student cluster competitions than any other school (34) and has finished on the podium 26 times. Oh, and they’ve also won four Highest LINPACK awards along the way too. So, yeah, they’re fairly good at this HPC clustering thing.

The team here is fresh off a heartbreaking fourth place finish at ASC25. They did extremely well, swapping the lead with archrival Peking University, but had an invalid result on one application which landed them in fifth place overall. It wasn’t all bad, though, they did take home the Highest LINPACK award.

I was watching the power tracking at the organizer booth when Tsinghua started their HPCG run. They had one tiny blip over the 6 kW power limit, stopped the job immediately, and then restarted it. Their power consumption was then a straight line at 5.803 kW. No wavering, no spikes. They had it completely dialed in. If they can keep that up on the other apps, they’re going to be hard to beat.

The team has a four-node cluster with two nodes each equipped with six Nvidia H100 GPUs. That’s a lot of firepower in the right hands. Barring any missteps, Tsinghua looks to be the favorite at ISC25. But anything can happen….

Update: We grab a quick last-minute interview with Team Diablo just before the end of the competition. They’ve nailed down their OpenMX score and are working on LAAMPS, which should be a relatively easy exercise for them, given their skills and experience. Again, anything can happen, but they seem to be on cruise control and heading for another podium finish.

 

Universitat Politecnica de Catalunya (UPC) has participated in every ISC competition since their debut in 2015. They are always, always driving Arm based systems, except for a one-time affair with RISC-V in the first competition after Covid.

The team is doing well with SeiSol, a wave propagation app. It was used in, I think, a recent ASC competition and, at that time was very hard to investigate. At the time of the interview, the team has successfully completed HPCG and, assumedly LINPACK as well.

With the benchmarks behind them, today’s menu includes SeiSol, code_saturne, and the LLM challenge. Wednesday won’t be any easier since they have to tackle OpenMX and the mystery application (LAAMPS) on a short time. As Lloyd Bridges said in the documentary Airplane! “Guess I picked the wrong week to give up amphetamines.”

The team is crunching through OpenMX and evaluating both CPU and GPU versions of LAAMPS. They don’t seem to be taking joy in their compiling but expect to love it after it’s done. Great team, finishing strong.

 

Now that we’ve met the teams, our last installment will cover who won, how they did it, and unveil the new PRSM award. Stay tuned…

Continue Reading