Microsoft Sets Inference Record With 1.1 Mlillon Tokens/Sec Azure ND GB300

Microsoft has announced that its Azure ND GB300 v6 virtual machine achieved an inference speed of 1.1 million tokens per second on Meta’s Llama2 70 B model. 

CEO Satya Nadella took to X, calling it “An industry record made possible…

Continue Reading