Category: 3. Business

  • China’s Ming Yang to invest up to £1.5bn in Scottish turbine factory

    China’s Ming Yang to invest up to £1.5bn in Scottish turbine factory

    Stay informed with free updates

    One of China’s largest turbine makers has announced plans to invest up to £1.5bn in a new factory in Scotland, setting up a test of the UK government’s appetite for investment from Chinese companies.

    The privately owned company, based in Guangdong, said it wanted to build a factory to serve offshore wind projects in the UK, Europe and some other markets, with its preferred location being Ardersier Port near Inverness.

    The announcement marks the first time Ming Yang has spoken publicly in detail about the plans, following what it described as “extensive discussions” with the UK and Scottish governments over the past two years. 

    The company on Friday confirmed the plans were “subject to final approvals from the UK government”, which has come under pressure from some MPs and US officials to reject the proposed investment due to concerns about China’s involvement in critical national infrastructure. 

    While Ming Yang is not state-owned, critics argue there is a risk of interference from Beijing in private companies’ decision-making and are concerned about over-reliance on China in supply chains more widely. 

    The move comes as the UK government’s relationship with China is under scrutiny amid questions over its role in the collapse of the prosecution of two men accused of spying on parliamentarians at Westminster on behalf of China.

    In June, the Financial Times reported that the Trump administration had raised concerns with the UK over what it argued were national security risks attached to allowing Ming Yang to build a plant in Britain.

    But the government also wants to boost supply chains to help meet its clean energy goals, including its target to decarbonise the power sector by 2030.

    Ming Yang said the first phase of its planned factory could be in production by late 2028. The company is listed in Shanghai and trades global depositary receipts in London.

    Ardersier Port near Inverness, the preferred location for the proposed Ming Yang factory

    One government official said a decision on whether to allow Ming Yang to go ahead with the factory was “imminent”.

    In September, Ming Yang announced a partnership with Octopus Energy, the UK’s largest household energy supplier, whose chief executive, Greg Jackson, is a non-executive director on the UK government’s cabinet office board.

    As well as government approval, Ming Yang is seeking some co-investment for the site. It said it had held “detailed commercial discussions” with entities including the UK’s state-owned Great British Energy, which was set up by the current government to help Britain become a “global leader in clean energy”.

    Given supply chain constraints across Europe, many regard Ming Yang’s capacity as necessary to unlock the full potential of Scotland’s offshore wind sector, especially nascent floating wind technology. Ming Yang said the project could create up to 1,500 jobs in its initial phase.

    The Scottish government’s industrial strategy has identified floating wind as a sector providing a “first-mover advantage”. But it has been waiting for UK approval for the factory, including the security services’ review into the implications of introducing advanced Chinese technology into energy infrastructure.

    One person familiar with the discussions said there had “clearly” been a delay in that process.

    “Patience is a finite resource — lots of investment and jobs await this decision,” the person said.

    Kate Forbes wearing a hard hat and high-visibility vest, gesturing while speaking in front of several wind turbines at a wind farm.
    Kate Forbes, deputy first minister of Scotland, speaks at a wind farm last year © Iain Masterton/Alamy

    Last year, Kate Forbes, deputy first minister, said there was “room” for Ming Yang to open a factory in Scotland, given ambitions for an “enormous” transition to renewables.

    The Scottish government had “no reason” to have an issue with Ming Yang but any investment would require UK approval, the person added.

    Scotland, which operates about 3 gigawatts of offshore wind, has a potential pipeline of a further 40GW, including about 25GW of floating capacity.

    A UK government spokesperson said: “This is one of a number of companies that wants to invest in the UK. Any decisions made will be consistent with our national security.”

    The Scottish government did not immediately respond to a request for comment.

    Continue Reading

  • France Risks Trigger Fresh Downgrade

    France Risks Trigger Fresh Downgrade

    This article first appeared on GuruFocus.

    Goldman Sachs (NYSE:GS) analysts have turned more cautious on European bank debt, cutting their recommendation on euro-denominated high-grade bonds by banks to underweight, only months after scrapping their earlier bullish stance. Led by Lotfi Karoui, the team noted that bank bonds are now trading at even tighter spreads than the rest of the market, leaving little room for additional value. The analysts said the spread premium that once made the sector appealing is decidedly a thing of the past, and warned that tight valuations could limit upside potential from here.

    The downgrade follows a strong rally in European bank bonds this year, as investors piled into the sector amid optimism over the industry’s recovery. But Goldman’s analysts cautioned that fiscal risksespecially in Franceare beginning to resurface, posing potential headwinds for the market. Political uncertainty in Paris has already pressured some French bank bonds, and Goldman said the fluid backdrop and challenging fiscal outlook make French issuers particularly vulnerable. The firm added that sovereign risks could be more damaging to bank bonds than to other corporate debt.

    In late June, Goldman had already shifted both its U.S. dollar and euro bank bond calls to neutral, ending an overweight that had been in place since early 2024. The move to underweight suggests a more defensive tone, as the team believes investors may find better risk-reward opportunities elsewhere in credit markets. With spreads compressed and fiscal clouds gathering over Europe’s largest economies, Goldman’s message to clients is clear: the easy gains in bank debt could be behind us.

    Continue Reading

  • Apple sued over use of copyrighted books to train Apple Intelligence

    Apple sued over use of copyrighted books to train Apple Intelligence

    Oct 10 (Reuters) – Apple (AAPL.O), opens new tab was hit with a lawsuit in California federal court by a pair of neuroscientists who say that the tech company misused thousands of copyrighted books to train its Apple Intelligence artificial intelligence model.
    Susana Martinez-Conde and Stephen Macknik, professors at SUNY Downstate Health Sciences University in Brooklyn, New York, told the court, opens new tab in a proposed class action on Thursday that Apple used illegal “shadow libraries” of pirated books to train Apple Intelligence.

    Sign up here.

    A separate group of authors sued Apple last month for allegedly misusing their work in AI training.

    TECH COMPANIES FACING LAWSUITS

    The lawsuit is one of many high-stakes cases brought by copyright owners such as authors, news outlets, and music labels against tech companies, including OpenAI, Microsoft (MSFT.O), opens new tab, and Meta Platforms (META.O), opens new tab, over the unauthorized use of their work in AI training. Anthropic agreed to pay $1.5 billion to settle a lawsuit from another group of authors over the training of its AI-powered chatbot Claude in August.

    Spokespeople for Apple and Martinez-Conde, Macknik, and their attorney did not immediately respond to requests for comment on the new complaint on Friday.

    Apple Intelligence is a suite of AI-powered features integrated into iOS devices, including the iPhone and iPad.

    “The day after Apple officially introduced Apple Intelligence, the company gained more than $200 billion in value: ‘the single most lucrative day in the history of the company,’” the lawsuit said.

    According to the complaint, Apple utilized datasets comprising thousands of pirated books as well as other copyright-infringing materials scraped from the internet to train its AI system.

    The lawsuit said that the pirated books included Martinez-Conde and Macknik’s “Champions of Illusion: The Science Behind Mind-Boggling Images and Mystifying Brain Puzzles” and “Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions.”

    The professors requested an unspecified amount of monetary damages and an order for Apple to stop misusing their copyrighted work.

    Reporting by Blake Brittain in Washington, Editing by Alexia Garamfalvi and Rod Nickel

    Our Standards: The Thomson Reuters Trust Principles., opens new tab

    Continue Reading

  • Gold pares gains after brief run above $4,000/oz on Trump's China tariff warning – Reuters

    1. Gold pares gains after brief run above $4,000/oz on Trump’s China tariff warning  Reuters
    2. Gold surges past $4,000 an ounce as uncertainty fuels rally  BBC
    3. Gold’s record run creates new rulebooks for investors  Reuters
    4. Gold is hitting new highs — here’s one way to hedge a potential price pull-back  CNBC
    5. Gold tops $4,000 for first time as traders pile into safe haven  Dawn

    Continue Reading

  • Major banks explore issuing stablecoin pegged to G7 currencies – Reuters

    1. Major banks explore issuing stablecoin pegged to G7 currencies  Reuters
    2. Goldman Sachs, Citi, Bank Of America To Walk Through The Door Opened By Trump-Backed GENIUS Act  Yahoo Finance
    3. Goldman, Santander Among Banks Exploring Blockchain-Based Money  Bloomberg.com
    4. 10 Banks Partner to Explore Issuing Digital Money  PYMNTS.com
    5. Group of leading international banks explores issuance of a 1:1 reserve-backed form of digital money  group.bnpparibas

    Continue Reading

  • European Commission says existing rules address stablecoin risks

    European Commission says existing rules address stablecoin risks

    Oct 10 (Reuters) – Europe’s crypto rules do enough to address the risks around stablecoins, the European Commission said on Friday, signalling it does not see the need for major change after the European Central Bank called for more safeguards.

    Stablecoins – cryptocurrencies pegged to real-world currencies – are among the fastest-growing parts of the digital assets industry, with the U.S. this year passing legislation to promote their usage.

    Sign up here.

    Europe has launched a landmark set of crypto-specific rules, but lawmakers in Brussels are facing pressure from the ECB to block the so-called “multi-issuance” stablecoin model.

    ECB CALLS FOR SAFEGUARDS

    At the heart of the dispute is the question of whether a multinational stablecoin company can treat the tokens it issues within the EU as interchangeable with those held outside the EU.

    In a letter sent to European Commissioner Maria Luis Albuquerque on Tuesday, six crypto industry associations, whose members include major stablecoin issuer Circle (CRCL.N), opens new tab, called on the EU to publish guidance “confirming multi-issuance in principle” and clarify how it works under the EU’s crypto rules, called MiCA.

    “We believe MiCA provides a robust and proportionate framework for addressing risks stemming from stablecoins,” a Commission spokesperson told Reuters in emailed comments, acknowledging receipt of the letter.

    “The Commission is working towards providing such clarification as soon as possible.”

    The European Systemic Risk Board, headed by ECB President Christine Lagarde, has said that the multi-issuance structure would bring built-in risks for financial stability and called for urgent safeguards.
    The ECB is concerned that people holding tokens created by a stablecoin’s non-EU entity could choose to redeem it with the EU entity, potentially creating a run on reserves held within the EU.

    But stablecoin issuers say that they can make sure they always have enough reserves to meet redemption requests, wherever they take place.

    JP Morgan analysts said this week that 99% of stablecoin supply is pegged to the dollar, and that the sector’s growth would boost demand for the greenback.

    Reporting by Elizabeth Howcroft. Editing by Tommy Reggiori Wilkes and Mark Potter

    Our Standards: The Thomson Reuters Trust Principles., opens new tab

    Continue Reading

  • Temporal recurrence as a general mechanism to explain neural responses in the auditory system

    Temporal recurrence as a general mechanism to explain neural responses in the auditory system

    We begin this section with a summary of common models of auditory neural responses and introduce a novel, Transformer-based architecture, as well as our fully recurrent model called StateNet. Then, we describe the electrophysiology datasets used in this study and set the mathematical framework of the neural response fitting task. Because conventional models and most datasets were already described in a prior study from our group67, we invite readers to consult it for a more precise description of conventional models and datasets. Lastly, we explain our process for reverse-engineering models, which generalizes STRFs for arbitrary network depth, width, and degree of nonlinearity.

    Canonical computational models of auditory neural responses

    The aim of computational models of neural responses in auditory cortex is to convert (“encode”) incoming sound stimuli into time-varying firing rates/probabilities that predict electrophysiological measurements made in auditory areas. Traditionally, these models use the cochleagram of the stimulus—a spectrogram-like representation that mimics processing in the cochlea—and are rate-based (as opposed to spiking). Many such models have been proposed in the literature, ranging from simple Linear (“STRF” or “L”) approaches17,18 to more complex methods based on multi-layer CNNs31. However, all these models use temporal convolutions with finite window lengths and, therefore, finite TRFs. In this case, the duration of the TRF is a hyperparameter that is arbitrarily defined by the modeling scientist.

    More formally, a model ({{{mathcal{M}}}}) is a causal application ({{mathbb{R}}}^{Ftimes T}mapsto {{mathbb{R}}}^{Ntimes T}) where F is the number of frequency bands of a stimulus spectrogram, T a variable number of time steps, and N a number of units/channels whose activity to predict.

    In this paper, we use five models based on this approach: the Linear (L) model17,18, the Linear-Nonlinear (LN) model45,68, the Network Receptive Field (NRF) model22, the Dynamic Network (DNet) model25, and finally a deep 2D-CNN model31. A general architecture of a convolutional model is illustrated in Fig. 7a. Because a full description of these approaches was already provided in previous benchmarks27, we only review here some of their shared general features. We also report minor modifications that we introduced in their implementations so as to make them work on our unified PyTorch pipeline. We invite the reader to consult the original studies that introduced these models for a more detailed description of their functioning.

    Auditory periphery

    The initial processing stage converts the stimulus sound waveform into a biologically plausible spectrogram-like representation (xin {{mathbb{R}}}^{Ftimes T}), thereby reflecting operations realized by the cochlea. In the literature, the waveform-to-spectrogram transformation can be performed through a simple short-term Fourier decomposition, or more often through temporal convolutions with a bank of mel or gammatone filters that are scaled logarithmically along the frequency axis. Following the latter, a compressive function such as a cubic root or logarithm is applied. Although the combination of both of these operations makes a consensus, there is a variability across studies in their implementation. However, it was shown that such variations are all more or less equivalent and still provide good cochlear sound encodings when modeling higher-order auditory neural responses. As a result, simple transformations should be preferred69. In order to facilitate present and future comparisons with previous methods, and to limit as much as possible the introduction of biases due to different data pre-processings, we directly use here the cochleagrams provided in each dataset.

    Core principle

    Classical models rely on a cascade of temporal convolutions with a stride of 1 performed on the cochleagram of the sound stimulus, interleaved with standard nonlinear activation functions (e.g., Sigmoid, LeakyReLU) and followed by a parametric output nonlinearity with learnable parameters (e.g., baseline activity, slope saturation value). In all models, the cochleagram is systematically padded to the left (i.e., in the past) with zeroes prior to the temporal convolution operations, in order to respect causality and to output a time series of neural activity with as many time bins as in the input cochleagram.

    Single unit vs. population fitting

    In datasets where all sensory neurons were probed using the same set of stimuli, it is possible for computational models to predict the (vector) activity of the whole population31. This population coding paradigm allows to train a single model with some learnable parameters shared across all neural units under consideration, and some specific to each unit. As a result, the common backbone tends to learn robust and meaningful embeddings, which further reduces overfitting. Performances are, on average, better across the population than when fitting an entire model for each unit. Furthermore, this process drastically reduces training time and brings it down to a computational complexity of O(1) instead of O(N), where N is the total number of units in each dataset. For these reasons, we adopt the population coding paradigm whenever possible, that is, when various single-unit responses were recorded for the same stimuli. This is the case for the NS1, NAT4-A1, NAT4-PEG, AA1-MLd and AA1-Field_L datasets.

    Output nonlinearity

    All but the L model were equipped with a parametric nonlinear output activation function, learned alongside all other parameters through gradient descent. We used the following 4-parameter double exponential:

    $$f(x)=aexp (-exp (kx-s))+b$$

    (1)

    where b represents the baseline spike rate, a the saturated firing rate, s the firing threshold, and k the gain21,31. Importantly, in the case of population models (i.e., when predicting the simultaneous activity of several units, see below), each output neuron learns a different set of these four parameters.

    Regularization and parameterization

    Canonical models based on convolutions are prone to overfitting, and many strategies were proposed to limit this effect, such as the parameterization of spectro-temporal convolutional kernels21,60,68,70. To stick to the most extensively reviewed version of these canonical models, as well as to highlight their limitations, our implementation did not include any such methods. Furthermore, we did not use any data augmentation techniques, weight decay, or dropout during training, as it was previously shown that such approaches complexify training and yield little to no improvements in performances27,31. Instead, we used Batch Normalization (BN), which greatly improved the robustness and performances of all models, including the linear one (L), without compromising its nature, as after training, BN’s scale and bias terms can be absorbed by the models own weights and biases.

    Transformer model

    Over the last years, attention-based Transformer architectures47 have been more and more used by the AI community as an alternative to RNN for modeling long sequences57, from text56 to images55. Contrary to stateful approaches, which rely on BPTT for training, Transformers do not suffer from vanishing or exploding gradients. However, they present some drawbacks, such as quadratic algorithmic complexity scaling with the sequence length. To investigate whether this model is well-suited for fitting dynamic neural responses in the auditory cortex, we developed a novel architecture based on the attention mechanism (see Fig. 7b). To the best of our knowledge, it is the first model of its kind that is proposed for this task.

    As for stateless models, a hyperparameter T defines the length of the temporal context window that serves to predict single-unit or population activity at the current time step. Within this window, the spectrogram of the auditory stimulus is projected into T tokens (one per time step) of embedding size E by means of a fully connected layer applied to each frequency vector. A learnable positional embedding is subsequently added to this compressed spectrogram representation before feeding it to a Transformer encoder with 1 layer, 4 heads, and a dimensionality of 4847. These hyperparameters were fixed for all datasets. As the outputs of the Transformer encoder are given as T processed tokens, we apply global average pooling over the token dimension and use the resulting tensor as the input to a final fully-connected readout layer, followed by a double-exponential activation function with per-unit learnable parameters. We observed empirically that the global average pooling operation is crucial to reach good performances while drastically reducing the size of the last fully connected layer.

    StateNet models

    A high-level schematic of the processing realized by our StateNet models is provided in Fig. 7c. Their architecture can be decomposed into three main elements: a downsampling layer, a stateful bottleneck, and a readout.

    Downsampling locally connected (LC) layer

    At each time step, we downsample the current vector of spectral information to reduce the dimensionality of input stimuli. Contrary to natural images, which are shift-invariant, spectrograms have very different statistics between low and high frequencies, thereby making weight sharing a less efficient computational strategy. Further motivated by the tonotopic organization observed along the auditory pathway71,72, we use a LC layer with restricted receptive fields (as for convolutional layers) but with independent weights across frequency bands (see Supplementary Fig. S2). In other words, LC contains a subset of the weights of a fully connected (FC) layer, defined by a convolutional (CONV) connectivity pattern over the frequency axis. In theory, the performances obtained with this LC scheme are only a lower bound of what can be reached with FC. However, we found in practice that LC yields overall only slightly lower or similar results using the same hyperparameters (see Supplementary section “Ablation study: connectivity in the first layer of StateNet”), but with a smaller number of free learnable parameters, hence reducing the risks of overfitting and permitting a better generalization. LC also outperformed the CONV approach because it relaxes the weight-sharing constraint.

    Despite its biological and computational motivations, this approach has only rarely been incorporated into models of auditory processing. For example, Chen et al.73 also used a local connectivity for speech recognition, but weight kernels were 2d (spectro-temporal) instead of 1d (only spectral and shared across the temporal dimension). In the field of computational neuroscience, Khatami and Escabí74 imposed local Gaussian kernels as fully connected weights. Our implementation differs in that it implements the trade-off between CONV and FC, with fewer parameters than FC, and possibly faster execution.

    All in all, our proposed LC downsampling scheme is more biologically plausible than the FC and CONV alternatives, while providing a better trade-off between performances at the neural response fitting task and model complexity. In addition, it executes faster than the FC approach and prior LC implementations. An optimized PyTorch module is available on our code repository.

    Stateful bottleneck(s)

    It is composed of a single layer of either type of RNN, as adding more layers did not seem beneficial to performances in our preliminary experiments. RNNs are a type of artificial neural networks specifically developed to learn and process sequential inputs and notably temporal sequences. They work iteratively and build their output at each timestep from the current inputs as well as a constantly updated internal representation called hidden state. Because the mathematical details of the modules used here are fully provided in previous studies, we only report below their main properties. We invite the readers to consult the associated papers if a more thorough understanding of their computational principles is needed.

    Vanilla RNN

    In this paper, we designate vanilla RNN as the classical Elman network48 natively implemented in PyTorch, often considered as the most naive implementation of this class of models.

    Gated RNNs: LSTM and GRU

    A notorious problem with vanilla RNNs occurs when dealing with long sequences, as gradients can explode or vanish in the unrolled-over-time network75,76, preventing them from exploiting long-range dependencies and therefore from performing well on large time scales. Gated RNNs such as LSTM49 and GRU50 successfully circumvent these difficulties, and have imposed themselves as efficient modules for learning sequences in a recurrent approach.

    State-space models (SSMs): S4 and Mamba

    State Space Models (SSM) is a new class of models that takes inspiration from other fields of applied mathematics, such as signal processing or control theory77. Specifically designed for sequence-to-sequence modeling tasks (and thus for time series prediction as in the present study), they build upon the State-Space equations below with various parameterization techniques and numerical optimization methods.

    $$left{begin{array}{l}dot{x}(t)=Ax(t)+Bu(t)quad \ y(t)=Cx(t)+Du(t)quad end{array}right.$$

    (2)

    where (uin {mathbb{R}}) is the input, (xin {{mathbb{R}}}^{N}) the hidden state vector, (yin {mathbb{R}}) the output, and A, B, C and D system matrices.

    In particular, the original Structured State-Space Sequence (S4) model appears as one of the simplest versions of this paradigm, with only a few constraints on the system51. At the opposite, the Mamba architecture is one of the more recent and sophisticated propositions in which system matrices are input-dependent, and has been shown to perform on par with Transformers on various benchmarks52. As a whole, SSMs hold the promise of data-scalable models with great performances, while benefiting from a solid theoretical foundation, which permits to connect them to convolutional models (CNNs), RNNs with discrete timesteps, but also continuous linear time-invariant systems of ordinary differential equations. This last property is particularly interesting as it can ease the reverse-engineering process of a fitted model, thereby allowing for high levels of interpretability. In addition, trained SSMs can easily be modified to work at any temporal resolution, opening interesting use cases for the field of computational neuroscience and neural engineering.

    Readout

    The readout neural activity for a given unit is computed from a linear projection of the output state vector into a single scalar, repeated at each timestep.

    Electrophysiology datasets of auditory responses

    To characterize the ability of models to capture responses in the auditory cortex, we fitted them in a supervised manner on a wide gamut of natural audio stimulus-response datasets. These datasets were collected in different species (ferret, rat, zebra finch) and brain areas (MGB, AAF, A1, PEG, MLd, Field L), under varying behavioral conditions (awake, anesthetized) and using different recording modalities (spikes, membrane potentials). All of them are freely accessible on online repositories and were used with respect to their original license. Because the “NS122, “NAT431,78, and “Wehr24,44 datasets have already been used in a previous study from our group and their pre-processing pipelines have been extensively described in the associated article27, we only describe here the new datasets added to the current study. The corresponding data pre-processing methods are representative of what was performed in the previous datasets.

    AA1 datasets: MLd, field L (zebra finch)

    These two datasets consist in single-unit responses recorded from two auditory areas (MLd and Field L) in anesthetized male zebra finches, by Frederic Theunissen’s group at UC Berkeley43,46,79. Stimuli were composed of short clips (<5 s) of conspecific songs to which animals had no prior exposure, and were modeled by log-compressed mel spectrograms with 32 frequencies ranging 0–16 kHz, at a temporal resolution of 1 ms. Extracellular recordings yielded a total of 50 single units in each area after spike sorting. Spike trains were binned in non-overlapping windows of 1 or 5 ms, matching the resolution of the stimulus spectrogram. Sounds were presented with an average of 10 trials, and the PSTHs were obtained for each neuron and stimulus after averaging spike trains across trials and smoothing with a 21 ms Hanning window26,80. For each neuron, recordings were performed in response to the same 20 audio stimuli, thereby allowing the training of a single model to predict the simultaneous activity of the whole population (i.e., a “population coding” paradigm).

    These data, also referred to as “AA1“, are freely accessible from the CRCNS website (https://crcns.org/data-sets/aa/aa-1/about) and were used with respect to their original license.

    Asari dataset: A1, MGB (rat)

    This dataset consists in single-unit responses recorded from primary auditory cortex (A1) and medial geniculate body (MGB) neurons in anesthetized rats, by Anthony Zador’s group at Cold Spring Harbor Laboratory11,44. Stimuli were natural sounds typically lasting around 2–7 s, and originally sampled at 44.1 kHz and then resampled at 97 kHz for presentation. Their associated cochleagrams were obtained using a short-term Fourier transform with 54 logarithmically distributed spectral bands from 0.1 to 45 kHz, whose outputs were subsequently passed through a logarithmic compressive activation. The temporal resolution of stimulus cochleagrams and responses was set to 5 ms. Because for each cell, recordings were performed in response to a different set of probe sounds, we could not apply the population coding paradigm, and one full model was fitted on the stimulus-response pairs for each unit. Contrary to the other datasets, recordings here are intracellular membrane potentials obtained through whole-cell patch-clamp techniques. One remarkable feature of these data is their very high trial-to-trial response reliability, making the noise-corrected normalized correlation coefficients of model predictions almost equal to the raw correlation coefficients (see “Performance metrics” subsection).

    Despite the good signal-to-noise ratio of this dataset, some trials are subject to recording artifacts and notably to drifts, which may be caused by motion of the animal and/or of the recording electrode, or electromagnetic interferences with nearby devices. Note that drifts do not contaminate supra-threshold signals resulting from spike-sorted activity, such as PSTHs, because they are strictly positive and thus have a guaranteed stationarity. In order to remove these drifts, we detrended all responses using a custom approach further explicited in Supplementary section “Detrending with MedGauss filter”.

    Similar to the previous dataset, these data can be found on CRCNS website (https://crcns.org/data-sets/ac/ac-1) as a subset of the “AC1” dataset.

    Neural response fitting task

    Because the current benchmark directly builds upon a previous study conducted by our group27, performances and model trainings were conducted according to the same methods, which we describe again below.

    Task definition

    Neural response fitting is a sequence-to-sequence, time series regression task, taking a spectrogram representation (xin {{mathbb{R}}}^{Ftimes T}) of a sound stimulus as an input, and outputting several 1d time series of neural response (one for each unit), (hat{r}in {{mathbb{R}}}^{Ntimes T}). As the latter, we use the Peri-Stimulus Time Histogram (PSTH), which is the average recorded neural response across repeats. The loss function is the mean squared error (MSE) between the predicted time series and the recorded PSTH, and was evaluated for each time bin of each sequence:

    $${{{mathcal{L}}}}=frac{1}{NT}sumlimits_{n=1}^{N}sumlimits_{t=1}^{T}({hat{r}}_{n}[t]-{r}_{n}[t])$$

    (3)

    where ({hat{r}}_{n}[t]) is the predicted neural response for neuron n at time-step t, rn[t] the corresponding PSTH, N is the total number of recorded neurons to fit, and T is the total number of time-steps in the time series (to simplify the notations, we drop the time dependencies symbols [t] thereafter).

    Performance metrics

    The neural response fitting accuracy of the different models is estimated using the raw correlation coefficient (Pearsons’ r), noted CCraw, between the model’s predicted activity (hat{r}) and the ground-truth PSTH r, which is the response averaged over all M trials r(m):

    $$r=frac{1}{M}sumlimits_{m=1}^{M}{r}^{(m)}$$

    (4)

    $$C{C}_{raw}=frac{Cov(r,hat{r})}{sqrt{Var(r)Var(hat{r})}}$$

    (5)

    where the covariance and variance are computed along the temporal dimension. Assuming neural variability is purely noise and given a limited number of stimulus presentations, perfect fits (i.e., CCraw = 1) are impossible to get in practice. In order to give an estimation of the best reachable performance given neuronal and experimental trial-to-trial variability, we use here the normalized correlation coefficient CCnorm, as defined in refs. 80[,81. For a given optimization set (e.g., train, validation or test) composed of multiple clips of stimulus-response pairs, we first create a long sequence by temporally concatenating all clips together. We then evaluate the signal power SP in the recorded responses as:

    $$SP=frac{Var({sum }_{m = 1}^{M}{r}^{(m)})-mathop{sum }_{m = 1}^{M}Var({r}^{(m)})}{M(M-1)}$$

    (6)

    which allows to compute the normalized correlation coefficient:

    $$C{C}_{norm}=frac{Cov(r,hat{r})}{sqrt{SPtimes Var(hat{r})}}$$

    (7)

    When only one trial is available, we set CCraw = CCnorm, which corresponds to a fully repeatable recording uncontaminated by noise, thereby preventing any overestimation of performances by setting a lower bound in the absence of data.

    Optimization process/model training

    All models were randomly initialized using default PyTorch methods and trained using gradient descent and backpropagation. Time recurrent models (DNet and StateNets) were trained using BPTT. Each training sample was a full stimulus-response pair whose duration varied between datasets because of the different recording protocols, but also within some datasets (AA1, Wehr, Asari) in order to maximize the amount of trial information for evaluation. We used AdamW optimizer82 and its default PyTorch hyperparameters (β1 = 0.9, β2 = 0.999). We used a batch size of 1 for all datasets except NAT4, which have a limited number of training examples, and a batch size of 16 for both NAT4 datasets, which have consequently more (see Supplementary Note 6: Dataset and model details”). The learning rate was held constant during training and set to a value of 10−3. We found empirically that these values led to better results.

    We split each dataset into a training, a validation, and a test subset, respecting a 70–10–20% ratio as much as possible, depending on the number of stimulus-response pairs available for each cell in each dataset. After each training epoch, models were evaluated on the validation set, and if the validation loss had decreased with respect to the previous best model, the new model was saved. Models were trained until there was no improvement during 50 consecutive epochs on the validation set, at which point learning was stopped, the last best-performing model was saved, and evaluated on the test set. This procedure was repeated 10 times, each corresponding to a random seed, for different train-valid-test data splits and model parameters initializations, and the test metrics were averaged across splits. All models going through the exact same training pipeline (i.e., waveform-to spectrogram transform, training hyperparameters, etc.) ensured fair comparison between them, implying that architectures with higher test accuracy are genuinely better, despite potential differences with their original studies.

    Truncated backpropagation through time

    With regular BPTT, the entire RNN model is unrolled back in time, creating a graph that grows in size linearly with the sequence length. In addition, for most tasks, old time steps are less informative of the present than the most recent ones, and gradients can either vanish or explode (i.e., “vanishing/exploding gradient problem”)75. TBPTT is a variation of this training algorithm aiming to alleviate these computational constraints83. Its principle relies on removing from the graph time steps older than a fixed temporal horizon K2. If the loss is evaluated every other K1 time steps, we note this algorithm TBPTT(K1K2). Fig. 8. illustrates the underlying computations. In this study, in order to maximize the number of samples used for training, the loss is evaluated and backpropagated every time step, and therefore K1 = 1. As a result, we simplify notations by referring to K2 as K. For the first t < K2 time steps, the graph is built from the start of the sequence. For generality, the regular BPTT algorithm used to train our models in the main experiments corresponds to TBPTT(K1 = 1, K2 = T), T being the sequence length. We also distinguish two sub-cases of TBPTT:

    • TBPTT with warmup. The model is initialized with the default (null) hidden state h0  = 0 at the very start of the sequence (see Fig. 8a). Inputs are then processed sequentially without building the computational graph, up to the K last time steps before loss evaluation. We qualify these first steps as a “warmup”. As a result, the graph starts with a model in an intermediary, non-default, and ecological hidden state resulting from this procedure.

    • TBPTT without warmup. Here, warmup steps are skipped and the model is directly initialized to the default hidden state at the Kth time step before loss evaluation (see Fig. 8b). Therefore, the model here does not have access to any prior information at all, making it a fairer comparison to the training of stateless models.

    Fig. 8: Truncated backpropagation through time (TBPTT): methods.

    a With warmup, the initial hidden state (purple) is the first of the training sequence. Subsequent time steps (in gray, delimited by “no grad”) correspond to the warmup during which the hidden state is updated. In this example, loss evaluation is shown for two time steps (4 and 5), and their respective graph are colored in red and blue. K1 can be interpreted as the temporal stride between two loss evaluations and K2 (here, 3) as the maximum graph length, in number of time steps. b Without warmup, the first time steps are skipped and do not belong to the computational graphs. Therefore, the latter starts from the default, null hidden state instead of an intermediary value resulting from the warmup procedure above.

    Model interpretability with feature visualization: gradient maps and deep dreams

    We propose here a gradient-based iterative method which, for each unit of a trained neural network ({{{mathcal{M}}}}), extracts its nonlinear receptive fields and estimates the auditory features xi that maximize its responses. This method builds on feature visualization techniques originally introduced in the AI community38,39,40 and is known as gradient ascent, leveraging the fact that all the mathematical operations in our models are differentiable. The different steps of this approach are illustrated on Fig. 3a and can be summarized as follows:

    1. 1.

      As the first input to the model, use the null stimulus ({x}_{0}in {{mathbb{R}}}^{Ftimes T}), a uniform spectrogram of constant value (0 in our case). This initial stimulus is unbiased and bears no spectro-temporal information. From an information-theoretic perspective, it has no entropy. For a parallel with electrophysiology experiments, it is worth noting that the spectrogram of white noise is theoretically uniform too. The absence of spectro-temporal correlations within probe stimuli (which we respect here) is a strong theoretical requirement that led to the use of white noise in the initial development of the linear STRF theory17, while further studies preferring natural stimuli used advanced techniques to correct their structure18.

    2. 2.

      Pass x0 through the model and compute its outputs (i.e., the predicted time-series of activation for the whole neural population): (hat{r}={{{mathcal{M}}}}({x}_{0})in {{mathbb{R}}}^{Ntimes T})

    3. 3.

      Define a loss ({{{mathcal{L}}}}:{{mathbb{R}}}^{Ntimes T}mapsto {mathbb{R}}) to minimize and compute it from the model prediction (hat{r}). In this paper, we only targeted a single unit n and used the opposite of its activation at the last (Tth) time-step: ({{{mathcal{L}}}}(hat{r})=-{hat{r}}_{n}[T]). To compute the STRF associated with any neural population ({{{mathcal{N}}}}), the following general loss can be used: ({{{mathcal{L}}}}(hat{r})=-frac{1}{{{Card}(N)}}{sum}_{nin {{{mathcal{N}}}}}{hat{r}}_{n}[T]). The choice of this loss function permits to make the connection with the Spike-Triggered Average (STA) approach used by electrophysiologists, and where the stimulus instances preceding the discharge of the target unit are averaged17,41,42. Models in our study were fitted to PSTHs or membrane potentials and thus output floating point values, which can be viewed as a spike probability. Trying to maximize this value at the present time-step (the last of the time series) by constructing previous stimulus time steps closely mimics STA. Maximizing the average firing rate across the whole stimulus presentation could be interesting to investigate in future studies.

    4. 4.

      Back-propagate through the network the gradients of this loss ({g}_{0}=frac{partial {{{mathcal{L}}}}}{partial {x}_{0}}), thereafter referred to as GradMaps. From their definition, these GradMaps can be directly related to linear STRF (see Supplementary text “Bridging the gap between STRFs, gradMaps and dreams: theoretical framework”).

    5. 5.

      Use these gradients to perform a gradient ascent step and modify the input stimulus. For simplification, we denote this operation as x1 = x0 − αg0, but some optimizers have momentum and more elaborate update rules. This is notably the case of Adam and AdamW82, which were used in our study. We did not use the SGD optimizer as it converged to much higher loss values—so less optimal—in preliminary experiments.

    6. 6.

      Repeat these steps until an early stopping criterion is satisfied, in our case after a fixed number of iterations (1500, a value which led to sufficient loss decreases, see curves in Fig. 3). The result of this process is an optimized input spectrogram d = xi that maximizes the activation of the target unit(s), thereafter referred to as Dream. If we define ({{{mathcal{F}}}}:{{mathbb{R}}}^{Ftimes T}mapsto {{mathbb{R}}}^{Ftimes T}) as one iteration of the above process, such that ({x}_{i}={{{mathcal{F}}}}({x}_{i-1}| {{{mathcal{M}}}},{{{mathcal{L}}}},n)) then recursively get ({x}_{i}={{{mathcal{F}}}}circ {{{mathcal{F}}}}circ cdots circ {{{mathcal{F}}}}({x}_{0}| {{{mathcal{M}}}},{{{mathcal{L}}}},n)={{{{mathcal{F}}}}}^{i}({x}_{0}| {{{mathcal{M}}}},{{{mathcal{L}}}},n)).

    This approach does not assume any requirement on the model to interpret. It can produce infinitely long GradMaps, which are relatable to linear STRFs and can be implemented on any architecture, including RNNs like StateNet, but also stateless and transformers.

    Dream and GradMap energy

    A trace of temporal integration for a model can be simply defined from its GradMap g as the mean over all frequency bands f of squared elements for each latency t. We designate this measure the “Energy” of the GradMap:

    $$E[t]=frac{1}{F}mathop{sum }_{f=1}^{F}g{[f,t]}^{2}$$

    (8)

    GradMap similarity matrix

    As shown in the corresponding results section, this matrix aims to identify functional clusters of models based on their GradMaps; it is built using the following methodology.

    The GradMaps of the models to compare are first computed with a number of time steps T slightly above their theoretical TRF size. In the case of the present study, stateless models had a TRF size of up to 43 time steps. Therefore, we computed GradMaps of T = 50 time steps for all models, including StateNet. The length of the GradMap should not be much greater than the theoretical TRF size of stateless models; otherwise, correlations between the GradMaps of the latter would artificially tend towards 1, as time steps beyond the receptive field do not receive any gradient and remain unaffected and at their initial value of 0. This step yields one GradMap per neuron, dataset, and model. After a flattening operation into a F × T vector, we compute Pearson’s correlation coefficient as a pixel-wise metric to compare how similar GradMaps are between models, for a given neuron and dataset. The choice of the CC here instead of other distance metrics (e.g., Euclidean) is motivated by the fact that we are comparing the overall structure of the GradMap/STRF (e.g., how inhibitory and excitatory regions are placed relative to each other) rather than its value. Furthermore, the goodness-of-fit of the Linear STRF model, to which we relate the GradMap, is insensitive to changes in scaling and shifting, precisely because the primary evaluation metric in the neural response modeling community is based on the CC too. As a result, if two models present the same GradMap up to an affine transformation, their functional similarity should be classified as perfect, which would not be the case with metrics, such as a pixel-wise MSE.

    Statistics and reproducibility

    One major contribution of this work is the sheer amount of implemented models and compiled datasets. All models were implemented and trained using PyTorch, a gold-standard library for deep learning in Python, leveraging autodiff. Datasets were pre-processed under the same format of a PyTorch Dataset class for convenience. Jobs required less than 2 GiB and were executed on Nvidia Titan V GPUs, taking tens of minutes to several hours, depending on the complexity of the model. As an example, the population training (5 seeds) of the StateNet GRU model trained in population coding on all 73 NS1 neurons in parallel typically takes less than 10 min. Conversely, the single unit training (5 seeds) of the same model on the 21 neurons of the Wehr dataset takes more than 3 h.

    Reporting summary

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

    Continue Reading

  • US drillers cut oil and gas rigs for first time in 6 weeks, Baker Hughes says – Reuters

    1. US drillers cut oil and gas rigs for first time in 6 weeks, Baker Hughes says  Reuters
    2. Basin rig count down one as prices slide  Odessa American
    3. U.S. Rig Count Remained Even @ 549; Pa. Lost 1 Marcellus Rig  Marcellus Drilling News
    4. U.S. rig count flat after four straight weekly gains in Baker Hughes survey  MSN
    5. US Oil Drillers Continue to Back Off As Prices Sink  Yahoo Finance

    Continue Reading

  • Government shutdown hasn’t left consumers glum about the economy – for now, at least

    Government shutdown hasn’t left consumers glum about the economy – for now, at least

    The ongoing federal shutdown has resulted in a pause on regular government data releases, meaning economic data has been in short supply of late. That has left market-watchers and monetary policymakers somewhat in the dark over key indicators in the U.S. economy.

    Fortunately, the University of Michigan’s Surveys of Consumers is unaffected by the impasse in Washington and released its preliminary monthly report on Oct. 10, 2025; the final read of the month will be released in two weeks.

    The Conversation U.S. spoke with Joanne Hsu, the director of the Surveys of Consumers, on what the latest data shows about consumer sentiment – and whether the shutdown has left Americans feeling blue.

    What is consumer sentiment?

    Consumer sentiment is something that we at the University of Michigan have measured since 1946. It looks at American attitudes toward the current state of the economy and the future direction of the economy through questions on personal finances, business conditions and buying conditions for big-ticket items.

    Over the decades, it has been closely followed by policymakers, business leaders, academic researchers and investors as a leading indicator of the overall state of the economy.

    When sentiment is on the decline, consumers tend to pull back on spending – and that can lead to a slowdown in the economy. The opposite is also true: High or rising sentiment tends to lead to increased spending and a growing economy.

    How is the survey compiled?

    Every month, we interview a random sample of the U.S. population across the 48 contiguous states and the District of Columbia. Around 1,000 or so people take part in it every month, and we include a representative sample across ages, income, education level, demography and geography. People from across all walks of life are asked around 50 questions pertaining to the economy, personal finances, job prospects, inflation expectations and the like.

    When you aggregate that all together, it gives a useful measure of the health of the U.S. economy.

    What does the latest survey show?

    The latest survey shows virtually no change in overall sentiment between September and October. Consumers are not feeling that optimistic at the moment, but generally no worse than they were last month.

    Pocketbook issues – high prices of goods, inflation and possible weakening in the labor market – are suppressing sentiment. Views of consumers across the country converged earlier in the year when the Trump administration’s tariffs were announced. But since then, higher-wealth and higher-income consumers have reported improved consumer sentiment. It is for lower-income Americans – those not owning stock – that sentiment hasn’t lifted since April.


    University of Michigan

    In October, we also saw a slight decline in inflation expectations, but it remains relatively high – midway between where they were around a year ago and the highs of around the time of the tariff announcements in April and May.

    Has the government shutdown affected consumer sentiment?

    The government shutdown was in place for around half the time of the latest survey period, which ran from Sept. 23-Oct. 6, 2025. And so far, we are not seeing evidence that it is impacting consumer sentiment one way or another.

    And that is not super-surprising. It is not that people don’t care about the shutdown, just that it hasn’t affected how they see the economy and their personal finances yet.

    History shows that federal shutdowns do move the needle a little. In 2019, around 10% of people spontaneously mentioned the then-shutdown in the January survey. We saw a decline in sentiment in that month, but it did improve again the following month.

    Looking back, we tend to see stronger reaction to shutdowns when there is a debt ceiling crisis attached. In 2013, for example, there was a decline in consumer sentiment coinciding with concerns over the debt ceiling being breached. But it did quickly rebound when the government opened again.

    Whether or not we see a decline in sentiment because of the current shutdown depends on how long it lasts – and how consumers believe it will impact pocketbook issues, namely prices and job prospects.

    Continue Reading

  • Ghanaian justice and security officials better equipped to protect victims of cybercrime

    Ghanaian justice and security officials better equipped to protect victims of cybercrime

    Ghanaians could soon benefit from stronger protections against cybercrime, following a Commonwealth programme that trained more than 60 judges, investigators and prosecutors in Accra this week.

    Supported by the UK’s Foreign, Commonwealth and Development Office, the programme brought together justice and security agencies for two symposiums from 7-10 October 2025, aimed at strengthening skills and teamwork for a coordinated response to cybercrime. 

    Participants, including Nigerian Federal High Court judges, worked through fictional scenarios simulating real-world cybercrime cases to test how existing laws, international agreements and mutual legal assistance apply in practice. 

    The sessions also explored common courtroom challenges, such as evaluating the merits of electronic evidence and fostering cross-border cooperation in legal proceedings.

    Ghana has one of West Africa’s most vibrant digital economies. However, like elsewhere in the world, this connectivity has also exposed people to new forms of cyber risk. 

    Policy, protection and partnership

    At one of the symposiums, Lydia Yaako Donkor, Director General of the Criminal Investigation Department at the Ghana Police Service, said the fight against cybercrime depended on “policy, protection and partnership”.

    She said: 

    “Our policy frameworks must keep pace with technology. We must strengthen our capacity to collect, preserve and present electronic evidence that is admissible in court. No single agency can combat this alone. Collaboration is essential.”

    Donkor said a proposal to create specialised cybercrime courts had been sent to the Attorney General’s office, noting that judges’ training would be important to their success.

    High Court Judge Justice Patricia Quansah described the training as critical to helping judges better understand the complexity of cybercrime.

    She said the sessions gave her practical tools to assess digital evidence in court, including how to detect tampering. 

    This knowledge, she added, will help judges respond more confidently to cybercrime cases, ensure justice for victims and hand down punishments that deter future offences.

    ‘An eye-opener’

    Chief Inspector Nancy Paintsil, a prosecutor handling cybercrime cases, called the training “an eye-opener”.

    She said:

    “The training deepened my understanding of cybercrime, which relies heavily on electronic evidence. I learned how the way we collect, store and maintain the chain of custody determines whether that evidence is admissible and whether we can convict cybercriminals.”

    In a pre-recorded message, Commonwealth Secretary-General Hon Shirley Botchwey highlighted the programme’s impact, noting that past symposiums had led to a 50 per cent improvement in Ghanaian judicial officers’ handling of electronic evidence.

    She added:

    “Now, we extend this achievement to High Court Judges, whose leadership will be vital to sustaining progress. Their work is essential to ensuring that our digital future is safe, secure and inclusive.”

    Final line of defence

    Supreme Court Justice Tanko Amadu, Director of Ghana’s Judicial Training Institute, said: 

    “The judiciary is the final line of defence in the fight against cybercrime. Cases ultimately depend on judges’ ability to fairly and efficiently adjudicate them. 

    “Continuous professional development is essential for judicial officers to keep up with technological. We will continue to learn and serve with honour to protect our citizens.”

    Hooman Nouruzi of the British High Commission in Accra said the threat of online crime was rapidly evolving, citing INTERPOL data indicating a significant year-on-year rise in cyber-attacks in Africa.

    He said:

    “It is a stark reminder that our work is far from done… By working together, we can share knowledge, strengthen legal frameworks, and build the capacity needed to investigate, prosecute and prevent cybercrime.”

    Members of the public reacted positively to the training. Raphael Boateng, a 20-year-old resident of Nungua in Accra, described it as “a step in the right direction”. 

    He said:

    “Many innocent people fall victim to online scams. It is good that our judges are being trained. It will help ensure criminals who target others face justice without delay.”

    This was the fourth programme on cybercrime and electronic evidence delivered by the Commonwealth Secretariat in Ghana since 2022. 
     


    Media contact

    • Snober Abbasi, Senior Communications Officer, Communications Division, Commonwealth Secretariat

    • E-mail

    Continue Reading