Developments in Japan are creating a risk that investors in the U.S. Treasury market may one day pull the rug out by keeping more of their savings at home
Why turmoil around Japan’s new government could wash up in U.S. financial markets.
Recent developments overseas have the potential to complicate the White House’s agenda to bring down borrowing costs, while heightening competition for investors in the U.S. and Japanese bond markets.
Aggressive fiscal-stimulus efforts by the cabinet of Japan’s first female prime minister, Sanae Takaichi, have created a spike in long-dated yields of Japanese government bonds and further weakness in the yen (USDJPY) in the past few weeks. It’s a situation that is being likened to the September-October 2022 crisis in the U.K., which stemmed from a crisis in confidence over a package of unfunded tax cuts proposed by then-Prime Minister Liz Truss’s government.
Read: Liz Truss redux? Simultaneous drop for Japanese currency and bonds draws eerie parallels
The U.S. needs to manage the cost of interest payments given a more than $38 trillion national debt, and this is a primary motivation for why the Trump administration wants to bring down long-term Treasury yields. Last week, Treasury Secretary Scott Bessent said in a speech in New York that the U.S. is making substantial progress in keeping most market-based rates down. He also said the 10-year “term premium,” or additional compensation demanded by investors to hold the long-dated maturity, is basically unchanged. Longer-duration yields matter because they provide a peg for borrowing rates used by U.S. households, businesses and the government.
Developments in Japan are now creating the risk that U.S. yields could rise alongside Japan’s yields. This week, Japanese government-bond yields hit their highest levels in almost two decades, with the country’s 10-year rate BX:TMBMKJP-10Y spiking above 1.78% to its highest level in more than 17 years. The 40-year yield BX:TMBMKJP-40Y climbed to an all-time high just above 3.7%.
In the U.S., 2- BX:TMUBMUSD02Y and 10-year yields BX:TMUBMUSD10Y finished Friday’s session at the lowest levels of the past three weeks, at 3.51% and almost 4.06% respectively. The 30-year U.S. yield BX:TMUBMUSD30Y fell to 4.71% or lowest level since Nov. 13.
There’s a risk now that U.S. yields may not fall as much as they otherwise might after factoring in market-implied expectations for a series of interest-rate cuts by the Federal Reserve into 2026.
Japan’s large U.S. footprint
Treasury yields are not going to necessarily follow rates on Japanese government bonds higher “on a one-for-one basis,” but there might be a limit on how low they can go, said Adam Turnquist, chief technical strategist at LPL Financial. He added that the impact of Japanese developments on the U.S. bond market could take years to play out, but “we care now because of the direction Japan’s policy is going in” and the possibility that this impact might occur even sooner.
Some of the catalysts that usually tend to push Treasury yields lower, such as any commentary from U.S. monetary policymakers that suggests the Fed might be inclined to cut rates, “might be muted because of the increased value of foreign debt,” Turnquist added.
U.S. government debt rallied for a second day on Friday, pushing yields lower, after New York Fed President John Williams said there is room to cut interest rates in the near term.
All three major U.S. stock indexes DJIA SPX COMP closed higher Friday, but notched sharp weekly losses, as investors attempted to calm doubts over the artificial-intelligence trade.
The troubling spike in yields on Japanese government bonds hasn’t fully spilled over into the U.S. bond market yet, but it remains a risk. “A repeat of the Truss episode is what people are afraid of,” said Marc Chandler, chief market strategist and managing director at Bannockburn Capital Markets.
Concerns about Japan gained added significance on Friday, when Takaichi’s cabinet approved a 21.3 trillion yen (or roughly $140 billion) economic stimulus package, which Reuters described as lavish. The amount of new spending being injected into the country’s economy from a supplementary budget, much of which is not repurposed from existing funds, is 17.7 trillion yen ($112 billion).
Anxiety over Takaichi’s stimulus efforts has resulted in a Japanese yen that has weakened against its major peers and fallen to a 10-month low ahead of Friday’s session, and in a spike in the country’s long-dated yields. Yields on 30-year BX:TMBMKJP-30Y Japanese government debt have risen this month to 3.33%.
Japan is the biggest foreign holder of Treasurys, with a roughly 13% share, according to the most recent data from the U.S. Treasury Department, and the concern is that the country’s investors might one day pull the rug by keeping more of their savings at home.
Bond-auction anxiety
Earlier in the week, a weak 20-year auction in Japan was cited as one reason why U.S. Treasury yields were a touch lower in early New York trading, which means that demand for U.S. government paper remained in place. Global investors are often incentivized to move their money based on which country offers the highest yields and best overall value.
“The conventional wisdom is that as yields rise in Japan, the Japanese are more likely to keep their savings at home rather than export it,” Chandler said. “The Japanese have been buyers of Treasurys and U.S. stocks, and if they decide to keep their money at home, those U.S. markets could lose a bid.”
For now, Japanese investors, which include insurers and pension funds, appear to be continuing to export their savings by buying more foreign government debt like Treasurys. Data from the U.S. Treasury Department shows that as of September, Japanese investors held just under $1.19 trillion in Treasurys, a number which has been climbing every month this year and is up from about $1.06 trillion last December.
One reason for this is the exchange rate. The yen has depreciated against almost every major currency this year. Japanese investors have been buying U.S. Treasurys because they can diversify against the yen, which is the weakest of the G-10 currencies on an unhedged basis, according to Chandler.
If concerns about the Takaichi government’s stimulus efforts translate into even higher yields in Japan, this could incentivize local investors to keep more of their savings at home, but might also mean rising yields for countries like the U.S.
-Vivien Lou Chen
This content was created by MarketWatch, which is operated by Dow Jones & Co. MarketWatch is published independently from Dow Jones Newswires and The Wall Street Journal.
Recent findings on pregnancy outcomes in women with IgA nephropathy (IgAN) suggest pre-conception use of non-renin-angiotensin-aldosterone system inhibitor (RASi) antihypertensive medications correlates with increased risk of severe hypertensive…
We’re introducing Zoomer, Meta’s comprehensive, automated debugging and optimization platform for AI.
Zoomer works across all of our training and inference workloads at Meta and provides deep performance insights that enable energy savings, workflow acceleration, and efficiency gains in our AI infrastructure.
Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.
At the scale that Meta’s AI infrastructure operates, poor performance debugging can lead to massive energy inefficiency, increased operational costs, and suboptimal hardware utilization across hundreds of thousands of GPUs. The fundamental challenge is achieving maximum computational efficiency while minimizing waste. Every percentage point of utilization improvement translates to significant capacity gains that can be redirected to innovation and growth.
Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads. Since its inception, Zoomer has become the de-facto tool across Meta for GPU workload optimization, generating tens of thousands of profiling reports daily for teams across all of our apps.
Why Debugging Performance Matters
Our AI infrastructure supports large-scale and advanced workloads across a global fleet of GPU clusters, continually evolving to meet the growing scale and complexity of generative AI.
At the training level it supports a diverse range of workloads, including powering models for ads ranking, content recommendations, and GenAI features.
At the inference level, we serve hundreds of trillions of AI model executions per day.
Operating at this scale means putting a high priority on eliminating GPU underutilization. Training inefficiencies delay model iterations and product launches, while inference bottlenecks limit our ability to serve user requests at scale. Removing resource waste and accelerating workflows helps us train larger models more efficiently, serve more users, and reduce our environmental footprint.
AI Performance Optimization Using Zoomer
Zoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.) and both training and inference paradigms, providing deep performance insights that enable energy savings, workflow acceleration, and efficiency gains.
Zoomer’s architecture consists of three essential layers that work together to deliver comprehensive AI performance insights:
Infrastructure and Platform Layer
The foundation provides the enterprise-grade scalability and reliability needed to profile workloads across Meta’s massive infrastructure. This includes distributed storage systems using Manifold (Meta’s blob storage platform) for trace data, fault-tolerant processing pipelines that handle huge trace files, and low-latency data collection with automatic profiling triggers across thousands of hosts simultaneously. The platform maintains high availability and scale through redundant processing workers and can handle huge numbers of profiling requests during peak usage periods.
Analytics and Insights Engine
The core intelligence layer delivers deep analytical capabilities through multiple specialized analyzers. This includes: GPU trace analysis via Kineto integration and NVIDIA DCGM, CPU profiling through StrobeLight integration, host-level metrics analysis via dyno telemetry, communication pattern analysis for distributed training, straggler detection across distributed ranks, memory allocation profiling (including GPU memory snooping), request/response profiling for inference workloads, and much more. The engine automatically detects performance anti-patterns and also provides actionable recommendations.
Visualization and User Interface Layer
The presentation layer transforms complex performance data into intuitive, actionable insights. This includes interactive timeline visualizations showing GPU activity across thousands of ranks, multi-iteration analysis for long-running training workloads, drill-down dashboards with percentile analysis across devices, trace data visualization integrated with Perfetto for kernel-level inspection, heat map visualizations for identifying outliers across GPU deployments, and automated insight summaries that highlight critical bottlenecks and optimization opportunities.
The three essential layers of Zoomer’s architecture.
How Zoomer Profiling Works: From Trigger to Insights
Understanding how Zoomer conducts a complete performance analysis provides insight into its sophisticated approach to AI workload optimization.
Profiling Trigger Mechanisms
Zoomer operates through both automatic and on-demand profiling strategies tailored to different workload types. For training workloads, which involve multiple iterations and can run for days or weeks, Zoomer automatically triggers profiling around iteration 550-555 to capture stable-state performance while avoiding startup noise. For inference workloads, profiling can be triggered on-demand for immediate debugging or through integration with automated load testing and benchmarking systems for continuous monitoring.
Comprehensive Data Capture
During each profiling session, Zoomer simultaneously collects multiple data streams to build a holistic performance picture:
GPU Performance Metrics: SM utilization, GPU memory utilization, GPU busy time, memory bandwidth, Tensor Core utilization, power consumption, clock frequencies, and power consumption data via DCGM integration.
Detailed Execution Traces: Kernel-level GPU operations, memory transfers, CUDA API calls, and communication collectives via PyTorch Profiler and Kineto.
Host-Level Performance Data: CPU utilization, memory usage, network I/O, storage access patterns, and system-level bottlenecks via dyno telemetry.
Application-Level Annotations: Training iterations, forward/backward passes, optimizer steps, data loading phases, and custom user annotations.
Inference-Specific Data: Rate of inference requests, server latency, active requests, GPU memory allocation patterns, request latency breakdowns via Strobelight’s Crochet profiler, serving parameter analysis, and thrift request-level profiling.
Communication Analysis: NCCL collective operations, inter-node communication patterns, and network utilization for distributed workloads
Distributed Analysis Pipeline
Raw profiling data flows through sophisticated processing systems that deliver multiple types of automated analysis including:
Straggler Detection: Identifies slow ranks in distributed training through comparative analysis of execution timelines and communication patterns.
Critical Path Analysis: Systematically identifies the longest execution paths to focus optimization efforts on highest-impact opportunities.
Anti-Pattern Detection: Rule-based systems that identify common efficiency issues and generate specific recommendations.
Parallelism Analysis: Deep understanding of tensor, pipeline, data, and expert parallelism interactions for large-scale distributed training.
Memory Analysis: Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.
Load Imbalance Analysis: Detects workload distribution issues across distributed ranks and recommendations for optimization.
Multi-Format Output Generation
Results are presented through multiple interfaces tailored to different user needs: interactive timeline visualizations showing activity across all ranks and hosts, comprehensive metrics dashboards with drill-down capabilities and percentile analysis, trace viewers integrated with Perfetto for detailed kernel inspection, automated insights summaries highlighting key bottlenecks and recommendations, and actionable notebooks that users can clone to rerun jobs with suggested optimizations.
Specialized Workload Support
For massive distributed training for specialized workloads, like GenAI, Zoomer contains a purpose-built platform supporting LLM workloads that offers specialized capabilities including GPU efficiency heat maps and N-dimensional parallelism visualization. For inference, specialized analysis covers everything from single GPU models, soon expanding to massive distributed inference across thousands of servers.
A Glimpse Into Advanced Zoomer Capabilities
Zoomer offers an extensive suite of advanced capabilities designed for different AI workload types and scales. While a comprehensive overview of all features would require multiple blog posts, here’s a glimpse at some of the most compelling capabilities that demonstrate Zoomer’s depth:
Training Powerhouse Features:
Straggler Analysis: Helps identify ranks in distributed training jobs that are significantly slower than others, causing overall job delays due to synchronization bottlenecks. Zoomer provides information that helps diagnose root causes like sharding imbalance or hardware issues.
Critical Path Analysis: Identification of the longest execution paths in PyTorch applications, enabling accurate performance improvement projections.
Advanced Trace Manipulation: Sophisticated tools for compression, filtering, combination, and segmentation of massive trace files (2GB+ per rank), enabling analysis of previously impossible-to-process large-scale training jobs
Inference Excellence Features:
Single-Click QPS Optimization: A workflow that identifies bottlenecks and triggers automated load tests with one click, reducing optimization time while delivering QPS improvements of +2% to +50% across different models, depending on model characteristics.
Request-Level Deep Dive: Integration with Crochet profiler provides Thrift request-level analysis, enabling identification of queue time bottlenecks and serving inefficiencies that traditional metrics miss.
Realtime Memory Profiling: GPU memory allocation tracking, providing live insights into memory leaks, allocation patterns, and optimization opportunities.
GenAI Specialized Features:
LLM Zoomer for Scale: A purpose-built platform supporting 100k+ GPU workloads with N-dimensional parallelism visualization, GPU efficiency heat maps across thousands of devices, and specialized analysis for tensor, pipeline, data, and expert parallelism interactions.
Post-Training Workflow Support: Enhanced capabilities for GenAI post-training tasks including SFT, DPO, and ARPG workflows with generator and trainer profiling separation.
Universal Intelligence Features:
Holistic Trace Analysis (HTA): Advanced framework for diagnosing distributed training bottlenecks across communication overhead, workload imbalance, and kernel inefficiencies, with automatic load balancing recommendations.
Zoomer Actionable Recommendations Engine (Zoomer AR): Automated detection of efficiency anti-patterns with machine learning-driven recommendation systems that generate auto-fix diffs, optimization notebooks, and one-click job re-launches with suggested improvements.
Multi-Hardware Profiling: Native support across NVIDIA GPUs, AMD MI300X, MTIA, and CPU-only workloads with consistent analysis and optimization recommendations regardless of hardware platform.
Zoomer’s Optimization Impact: From Debugging to Energy Efficiency
Performance debugging with Zoomer creates a cascading effect that transforms low-level optimizations into massive efficiency gains.
The optimization pathway flows from: identifying bottlenecks → improving key metrics → accelerating workflows → reducing resource consumption → saving energy and costs.
Zoomer’s Training Optimization Pipeline
Zoomer’s training analysis identifies bottlenecks in GPU utilization, memory bandwidth, and communication patterns.
Example of Training Efficiency Wins:
Algorithmic Optimizations: We delivered power savings through systematic efficiency improvements across the training fleet, by fixing reliability issues for low efficiency jobs.
Training Time Reduction Success: In 2024, we observed a 75% training time reduction for Ads relevance models, leading to 78% reduction in power consumption.
Memory Optimizations: One-line code changes for performance issues due to inefficient memory copy identified by Zoomer, delivered 20% QPS improvements with minimal engineering effort.
Inference Optimization Pipeline:
Inference debugging focuses on latency reduction, throughput optimization, and serving efficiency. Zoomer identifies opportunities in kernel execution, memory access patterns, and serving parameter tuning to maximize requests per GPU.
Inference Efficiency Wins:
GPU and CPU Serving parameters Improvements: Automated GPU and CPU bottleneck identification and parameter tuning, leading to 10% to 45% reduction in power consumption.
QPS Optimization: GPU trace analysis used to boost serving QPS and optimize serving capacity.
Zoomer’s GenAI and Large-Scale Impact
For massive distributed workloads, even small optimizations compound dramatically. 32k GPU benchmark optimizations achieved 30% speedups through broadcast issue resolution, while 64k GPU configurations delivered 25% speedups in just one day of optimization.
The Future of AI Performance Debugging
As AI workloads expand in size and complexity, Zoomer is advancing to meet new challenges focused on several innovation fronts: broadening unified performance insights across heterogeneous hardware (including MTIA and next-gen accelerators), building advanced analyzers for proactive optimization, enabling inference performance tuning through serving param optimization, and democratizing optimization with automated, intuitive tools for all engineers. As Meta’s AI infrastructure continues its rapid growth, Zoomer plays an important role in helping us innovate efficiently and sustainably.
Concerns about stock valuations in companies tied to artificial intelligence knocked the market around this week. Whether these worries will recede, as they did Friday, or flare up again will certainly be something to watch in the days and weeks ahead. We understand the concerns about valuations in the speculative aspects of the AI trade, such as nuclear stocks and neoclouds. Jim Cramer has repeatedly warned about them. But, in the past week, the broader AI cohort — including real companies that make money and are driving what many are calling the fourth industrial revolution — has been getting hit. We own many of them: Nvidia and Broadcom on the chip side, and GE Vernova and Eaton on the derivative trade of powering these energy-gobbling AI data centers. That’s not what should be happening based on their fundamentals. Outside of valuations, worries also center on capital expenditures and the depreciation that results from massive investments in AI infrastructure. On this point, investors face a choice. You can go with the bears who are glued to their spreadsheets and extrapolating the usable life of tech assets based on history, a seemingly understandable approach, and applying those depreciation rates to their financial models, arguing the chips should be near worthless after three years. Or, you can go with the commentary from management teams running the largest companies driving the AI trade, and what Jim has gleaned from talking with the smartest CEOs in the world. When it comes to the real players driving this AI investment cycle, like the ones we’re invested in, we don’t think valuations are all that high or unreasonable when you consider their growth rates and importance to the U.S., and by extension, the global economy. We’re talking about Nvidia CEO Jensen Huang, who would tell you that advancements in his company’s CUDA software have extended the life of GPU chip platforms to roughly five to six years. Don’t forget, CoreWeave recently re-contracted for H100s from Nvidia, which were released in late 2022. The bears with their spreadsheets would tell you those chips are worthless. However, we know that H100s have held most of their value. Or listen to Lisa Su, CEO of Advanced Micro Devices , who said last week that her customers are at the point now where “they can see the return on the other side” of these massive investments. For our part, we understand the spending concerns and the depreciation issues that will arise if these companies are indeed overstating the useful lives of these assets. However, those who have bet against the likes of Jensen Huang and Lisa Su, or Meta Platforms CEO Mark Zuckerberg, Microsoft CEO Satya Nadella, and others who have driven innovation in the tech world for over a decade, have been burned time and again. While the bears’ concerns aren’t invalid, long-term investors are better off taking their cues from technology experts. AI is real, and it will increasingly lead to productivity gains as adoption ramps up and the technology becomes ingrained in our everyday lives, just as the internet has. We have faith in the management teams of the AI stocks in which we are invested, and while faith is not an investment strategy, that faith is based on a historical track record of strong execution, the knowledge that offerings from these companies are best in class, and scrutiny of their underlying business fundamentals and financial profiles. Siding with these technology expert management teams, over the loud financial expert bears, has kept us on the right side of the trade for years, and we don’t see that changing in the future. (See here for a full list of the stocks in Jim Cramer’s Charitable Trust, including NVDA, AVGO, GEV, ETN, META, MSFT.) As a subscriber to the CNBC Investing Club with Jim Cramer, you will receive a trade alert before Jim makes a trade. Jim waits 45 minutes after sending a trade alert before buying or selling a stock in his charitable trust’s portfolio. If Jim has talked about a stock on CNBC TV, he waits 72 hours after issuing the trade alert before executing the trade. THE ABOVE INVESTING CLUB INFORMATION IS SUBJECT TO OUR TERMS AND CONDITIONS AND PRIVACY POLICY , TOGETHER WITH OUR DISCLAIMER . NO FIDUCIARY OBLIGATION OR DUTY EXISTS, OR IS CREATED, BY VIRTUE OF YOUR RECEIPT OF ANY INFORMATION PROVIDED IN CONNECTION WITH THE INVESTING CLUB. NO SPECIFIC OUTCOME OR PROFIT IS GUARANTEED.
Robert F Kennedy Jr, the US health secretary, said in an interview with the New York Times that he personally instructed the federal Centers for Disease Control and Prevention (CDC) to change its longstanding position that vaccines do not cause…
Spoiler alert: when Manhattan eventually became their joint address, Colbert realized what he’d been missing. “WNYC got you going in the morning: all the information, all the culture, and all the things you needed to know about New York and…
The webcast for the conference may be accessed live via the Investor Relations section of Analog Devices’ website at investor.analog.com. An archived replay will also be available following the webcast for at least 30 days.
Several iPads are slated to get updates in 2026, including the iPad Air, low-cost iPad, and iPad mini. The iPad Air could be updated early in the year, sometime around the March or April timeframe.