Advancing computational evaluation of adsorption via porous materials by artificial intelligence and computational fluid dynamics

Local outlier factor (LOF)

LOF is a robust and effective algorithm for identifying outliers within a dataset. It operates under the assumption that outliers are data points that significantly deviate from their local neighborhood. By using LOF, one can uncover and subsequently remove outliers from a dataset, enhancing the overall data quality. The local density of a data point (:{x}_{i}) is defined by LOF in the following manner²³:

$$:LOFleft({x}_{i}right)=:frac{{sum:}_{forall:{x}_{j}in:Nleft({x}_{i}right)}frac{text{density}left({x}_{i}right)}{text{density}left({x}_{j}right)}}{mid:Nleft({x}_{i}right)mid:}$$

Here, (:Nleft({x}_{i}right)) represents the neighborhood of data point (:{x}_{i}), and (:text{density}left({x}_{i}right)) denotes the density of (:{x}_{i}). The LOF for a data point quantifies how its density compares to the densities of its neighbors. A LOF value significantly greater than 1 suggests that the data point is an outlier²⁴.

GPR (Gaussian process regression)

GPR stands as a resilient and adaptable non-parametric Bayesian technique employed in the realm of regression analysis. Unlike traditional parametric regression methods, GPR does not make explicit assumptions about the functional form of the underlying data distribution. Instead, it models the data as a distribution over functions, allowing for uncertainty quantification and robust predictions¹⁹.

The predictive distribution of GPR is derived through Bayesian inference. Given a set of observed data points (X, y), where X shows the input data and y stands for the corresponding output data, the goal is to make predictions for new input points (:{X}^{*}), yielding predictions (:{y}^{*}). The forecasted or estimated distribution for the variable (:{y}^{text{*}}) is expressed as follows²⁵:

$$:pleft({y}^{*}|X,y,{X}^{*}right)=mathcal{N}left({{upmu:}}^{*},{{upsigma:}}^{*}right)$$

where (:{{upmu:}}^{*}) denotes the mean of the predictive distribution, and (:{{upsigma:}}^{*}) stands for its standard deviation. These quantities can be computed as follows²⁵:

$$:{{upmu:}}^{*}={upmu:}left({X}^{*}right)+Kleft({X}^{*},Xright){left[Kleft(X,Xright)+{{upsigma:}}_{n}^{2}Iright]}^{-1}left(y-{upmu:}left(Xright)right)$$

$$:{{upsigma:}}^{*}=Kleft({X}^{*},{X}^{*}right)-Kleft({X}^{*},Xright){left[Kleft(X,Xright)+{{upsigma:}}_{n}^{2}Iright]}^{-1}Kleft(X,{X}^{*}right)$$

In the equations above, K(X, X) indicates the covariance matrix associated with the training inputs, K(X*, X) denotes the covariance between the test and training inputs, (:{sigma:}_{n}^{2}) represents the variance of noise, and I stands as the identity matrix.

MLP regression (Multi-layer perceptron regression)

MLP Regression is a variant of artificial neural networks characterized by its multi-layered architecture, where nodes (neurons) are interconnected across these layers. It is a versatile and powerful regression technique capable of modeling complex, nonlinear relationships between inputs and outputs²⁶.

The key equation for MLP Regression includes the forward propagation equation for a single neuron²⁶:

$$:{z}_{j}={sum}_{i=1}^{n}{w}_{ij}{x}_{i}+{b}_{j}$$

$$:{a}_{j}={upsigma:}left({z}_{j}right)$$

In this context, (:{z}_{j}) signifies the weighted summation of inputs corresponding to neuron j, where (:{w}_{ij}) represents the weight associated with the connection linking neuron i to neuron j.

PR (Polynomial regression)

PR is commonly employed in statistics and ML to model relationships between variables when a polynomial relationship is suspected. Unlike the assumption of linearity in Linear Regression, Polynomial Regression allows for the modeling of more complex, nonlinear relationships²⁷. In PR, the correlation between the dependent variable (typically represented as y) and the independent variable (typically represented as x) is expressed as a polynomial function of a chosen degree, often denoted as n. The general form of a polynomial regression equation is as follows²¹:

$$:y={{upbeta:}}_{0}+{{upbeta:}}_{1}x+{{upbeta:}}_{2}{x}^{2}+dots:+{{upbeta:}}_{n}{x}^{n}+epsilon:$$

In this context, y denotes the dependent variable, which serves as the target variable we seek to predict or elucidate, while x signifies the independent variable or predictor, representing the variable upon which y relies. Also, (:{beta:}_{0}), (:{beta:}_{1}), and so on are coefficients that need to be estimated from the data.

Advancing computational evaluation of adsorption via porous materials by artificial intelligence and computational fluid dynamics

Local outlier factor (LOF)

GPR (Gaussian process regression)

MLP regression (Multi-layer perceptron regression)

PR (Polynomial regression)

Continue Reading

More posts

UK economy flatlines in July in grim news for Rachel Reeves | Economic growth (GDP)

Astronomy photographer of the year 2025 – winners and finalists – The Guardian

Best and worst foods for brain health, according to research-backed longevity experts

Black holes just proved Stephen Hawking right with the clearest signal yet