Multimodal feature distinguishing and deep learning approach to detect lung disease from MRI images

The design goal of the MFDM is to improve lung disease detection accuracy through feature differentiation. The extractable and non-extractable features are exploited based on homogeneity and heterogeneous nature for training. The detected features are differentiated using a transformer network where verification and training classifications are independent. Using the supporting feature, the MRI image is analyzed for improved disease detection. In Fig. 1, a schematic representation of the proposed MFDM is given.

Fig. 1

Exhibition of the Schematic Representation of the MFDM Method.

The proposed method is portrayed in Fig. 1 for its detailed functions included. The extracted features are differentiated using their heterogeneity and homogeneous nature between the successive pixels. This nature is classified across various segments to identify the heterogeneous sequences detected. In the differentiation and classification process, the conventional transformer network is used. This is described later with appropriate equations and briefing (Refer to Fig. 1). First, the feature extraction relies on the maximum bits/pixel varying from 16(:times:)16, 32(:times:32), to(:256:times:256). In an MRI image, p(i, j) represents a pixel’s intensity. We employed a broad, publicly available lung MRI dataset including age, gender, and disease severity heterogeneity to avoid biases. Regular normalization and segmentation were used to prevent spurious feature enhancement. Homogeneity-based feature extraction allows data-driven, unbiased representation in MFDM. In unclear circumstances, transformer-based differentiation reinitializes from the previous valid homogeneity point, reducing cumulative bias. Stratified cross-validation was utilized to test performance across varied data subsets to ensure fair generalization and reduce selection or preparation bias across the detection pipeline. The symbol p(i, j) represents this value. It is crucial when computing the content feature C and the homogeneity feature H. This computation allows us to measure the distribution of pixel intensities (C) and the smoothness or consistency of intensities between surrounding pixels (H). For any such variation, the content (::left(cright)) and homogeneity (::left(Hright)) features are extracted. Let(::Ileft(x,yright)) denote an input MRI with(::xtimes:y) pixels such that,

$$:C=sum:{left(x,yright)}^{2}.:p(i,j)$$

(1a)

And

$$:H={sum:}_{i=1,:j=1}^{i=x,:j=y}frac{pleft(i,jright)}{1+left|j-iright|}:forall::begin{array}{c}iin:x:and:jin:y\:and:ile:x:and:jle:yend{array}$$

(1b)

Where(::p(i,j)) is a specific pixel location in (::left(x,yright)) and if(::i=x) and(::j=y,) the complete image (::I) is represented. The above features are consistent for the non-infected regions and show changes between heterogeneous pixels in any(::iin:x) and(::jin:y). Therefore (::Ileft(i,jright)forall::) inconsistency (::left(psi:right)) and consistency (::left(complement:right)) is to be analyzed for differentiation. How much pixel values vary with position is determined by C (content). Weighted intensity magnitude is indicated across the image. Pixel intensity homogeneity (H) is measured. Higher homogeneity values indicate smoother zones. Important elements to identify contaminated regions from uninfected ones: Infected areas show increased heterogeneity (lower H) and altered intensity patterns (R C). Also, sick areas have higher R C. For the differentiation process, the transformer network is constructed. This network contains (::L) layers for differentiation and maximum(::left(L-1right)) layers for classification. This is due to the intermediate training classification for the image features in meeting precision-supporting extractions. The transformer network requires (::L) layers for distinguishing sequences from (::16:times:16) to (::256:times:256) or mediates. Therefore, the conventional differentiation in the first(::L) requires sequential integration. This integration for (::C)and (::H) (i.e.,(::dc) and (::dH) for corresponding to (::pleft(i,jright)) is defined in equations (2a) and (2b) respectively:

$$:frac{dc}{dpleft(i,jright)}=1-left[frac{{C}_{1}}{left(L-1right)}+frac{{C}_{2}}{left(L-2right)}+dots:+frac{{C}_{i,j}}{left(L-left(i+jright)right)}right]$$

(2a)

$$:frac{dH}{dpleft(i,jright)}=-1left[frac{{H}_{1}}{L}+frac{{H}_{2}}{L}+dots:+frac{{H}_{left(i,jright)}}{L}right]=1-frac{1}{L}sum:_{i=1,j=1}^{i={x}_{1},j=y:}left({H}_{L,j}right)::$$

(2b)

The equations calculate the derivatives of content (dC) and homogeneity (dH) for each pixel value. To test how local pixel changes affect global feature trends: To account for the less immediate impact of deeper levels on categorization judgments, Eq. (2a) modifies the weighted average of content contributions from many layers by layer depth (−(+)) (L−(i + j). Using Eq. (2b), smoother regions are identified by eliminating noisy, heterogeneous pixels.

We employed a broad, publicly available lung MRI dataset including age, gender, and disease severity heterogeneity to avoid biases. Regular normalization and segmentation were used to prevent spurious feature enhancement. Homogeneity-based feature extraction allows data-driven, unbiased representation in MFDM. In unclear circumstances, transformer-based differentiation reinitializes from the previous valid homogeneity point, reducing cumulative bias. Stratified cross-validation was utilized to test performance across varied data subsets to ensure fair generalization and reduce selection or preparation bias across the detection pipeline. The above sequence is continuous such that the pixel variations are optimal across unsegmented regions. The change in (::C)(or) (::H)(or) both results in (::left[frac{{C}_{left(i,jright)}}{L-left(left(i+jright)right)}right]) and (::left(frac{{H}_{ij}}{L}right)ne:1:)such that the differentiation is not 0. If the differentiation is other, the classification is not required; the transformer network’s layer is terminated. The network representation (::forall::) differentiation = 0 is illustrated in Fig. 2.

Fig. 2
figure 2

Transformer Network Differentiation Representation.

The network represented in the above Fig. 2 is used for differentiation, representing 0 along with(::psi:) detection. This transformer network performs recurrent iteration to identify if(::frac{dc}{dpleft(i,jright)}:left(orright):frac{dH}{dpleft(i,jright)}=0). The case is valid only if(::frac{CL}{L-1}) or(:frac{{H}_{L}}{L}) is not 1 and therefore the next (::pleft(i,jright)) verification is moved on. The classification part of the transformer network comes into existence if(::psi:) is observed such that(::L=L+1) is the next possible representation apart from 1 to L. However, this network is self-iterated for (::C,) and classification is iterated for(::psi:). These two factors are computed using equations (3a) and (3b). The transformer network uses C, H, and their derivatives across L layers for progressive feature distinction and categorization. Early layers focus on differentiation using dC and dH, whereas later layers apply learnt feature patterns for classification.

Before(::L=L+1)

$$:C=left[frac{dH}{dpleft(i,jright)}+frac{dc}{dpleft(i,jright)}right]-left.begin{array}{c}2*left[frac{left(i*jright)}{left(L-xright)}right],:if:x>y\:-2*left[frac{left(i*jright)}{left(L-yright)}right],:if:y>x:end{array}right}$$

(3a)

The above two equations are defined based on invariant pixels(::left(x>y:or:y>xright)) and varying(::L). These two equations consider the maximum possibilities for inconsistency and the need for(::L). If the (::L) remains unconsidered, then the feature differentiation is repeated from the last (::L) provided the complexity for(::left(i=x:and:j=yright)) is high. Therefore, the inconsistency break is achieved for(::L) is sufficient for(::I(x,y)). This also prevents the same classification of the transformer network at(::left(L-1right)) or(::L). The (::left(L+1right)) the case is validated only if(::psi:) is true for(:>0); if(::psi:=0,) then(::C=1) and it does not require further differentiation. These estimations are used for identifying homogeneity-failing features for correlation. Such correlations are used for classification training. The infected regions exhibit either(::C) or(::psi:) or both between(::pleft(x,yright)) or the segments. This variation is required for identifying non-infected and infected regions more precisely. This requires classifications based on differentiations satisfying (::C) and (::psi:). In the classification process, the (::L) and (::left(L-1right)) variations are identified based on(::C) and (::psi:) for precise detection. The classification network is presented in Fig. 3.

Fig. 3
figure 3

Transformer Network Classification.

The (::L)inputs are distinguished(::forall::left(i,jright)) until(::left(x,yright):forall:p) is reached. The classification is performed for(::C) and (::L) individually, such that(::x or(::(x and (:L=L+1)is validated. The above conditions are validated from 1 to L for(:frac{1}{dpleft(i,jright)}) for (::i=x) and (::j=y) using recurrent iterations. This iteration is pursued until (::x)and (::y)in all(::L) are attained. In the alternate case of increasing(::L), the (::psi:)>0 is verified for(:frac{dH}{dpleft(i,jright)}) for (::left(L+jright)) and (::left(L+iright)=pleft(x,yright)) is reached. This alternating case is iterated only if the (::psi:>0)is obtained in any pursuing (::pleft(i,jright)forall:iin:x)and(::jin:y). In this case, the (::dH) output is verified for (::L) to the final(::left(x,yright)) for further differentiation (Fig. 3). The above classification pursues two different trainings as discussed before. The training is performed for two distinct instances under 3 classifications:

$$:left.begin{array}{cc}left.begin{array}{c}{O}_{1}=frac{{C}_{1}}{1}::::::::::::::::::::\:begin{array}{c}{O}_{L}=underbrace{frac{{C}_{L}}{L}-frac{dH}{dpleft(i,jright)}}\:without:psi:\:left(orright)end{array}\:begin{array}{c}{O}_{1}=frac{{C}_{1}}{1}-left(1-frac{{psi:}_{1}}{L}right)\: vdots \:begin{array}{c}underbrace{{O}_{L}=frac{{C}_{1}}{L}-left(1-frac{{psi:}_{L}}{L}right)}\:with:psi:end{array}end{array}end{array}right|&:begin{array}{c} Classification :1\:{O}_{2}={psi:}_{1}-frac{dc}{dpleft(i,jright)}*frac{1}{L}\:begin{array}{c} vdots \:underbrace{{O}_{L+1}={psi:}_{L}-frac{dc}{dpleft(i,jright)}*frac{1}{L}}\:begin{array}{c}psi:>0:Case\:Classification:2end{array}end{array}end{array}end{array}right}$$

(4)

Equation (4) the (::{O}_{1})to (::{O}_{L})for (:frac{dH}{dpleft(i,jright)})and (::{psi:}_{L}) denotes the classifications 1 and 2, respectively. These classifications are used for consistent pixel assessment before a new segment is pursued. In classification 3, the (::psi:) is observed between the 2nd layer and (::{left(L+1right)}^{th}) layer. The case of classification suppresses(::psi:>0) with (::dH) to (::dc) differentiation for region detection. In the alternating pixel inputs, the transformer network swaps for(::left({O}_{1}to:{O}_{L}right):)and(::({O}_{2}:and{O}_{L+1}). This case is optimal under non-region-identified continuous pixel distributions. Therefore, the features extracted between their two classifications (1, 2| and 3) are validated for(::psi:<0) condition. This is required to confine the complexity under similar/dissimilar pixel distributions. In this work, features cannot be differentiated; the network enters a self-correction process, but it does not explain the criteria for determining this, nor how the network identifies features that cannot be differentiated.

Features are non-differentiable if their homogeneity values across segments are within a small margin (∆H = 0), indicating inadequate variation for accurate categorization. If the transformer network finds little or ambiguous variation in dC and dH during training, it labels these features as indeterminate. The 3 classification processes using the experimental analysis are represented in Table 2.

The process is common under different segments of (::ine:j)(or)(::i (or)(::j provided the region remains unnoticed. Therefore, the region detected is pursued using (::p(i,j)notin:) either of the classifications for reducing complexity. Depending on the available pixels that are unclassified, the transformer network processes are modified abruptly. Such modification requires some formal conditions between the 3 classifications. The conditions and their occurrences are represented in Eq. (5)

(5)

Table 2 Classification process illustrations from experimental Results.

The occurrences are marked as either 1 (if classification is present) or 0 (if not classified), for which the transformer network operations are modified. In the (::C)and(::L) differentiation transformer networks, intermediate classification (i.e.) (::{O}_{2}) to (:{O}_{L+1})are incorporated. Therefore, the network for unclassified pixel detection is cultured from (::L=L+1) and (::psi:>0) layers. This layer is represented in Fig. 4.

Fig. 4
figure 4

Unclassified pixel detection Network Layer.

The variable (::varDelta:)represents the occurrences as either true/false based on (::{O}_{2}to:{O}_{L+1})estimation. In this case, the (::0<varDelta:) and(::varDelta:<1) are segregated using (::C) and (::psi:) for the last condition in Eq. (5). The first two conditions require(::L) (or) (::L+1) other than retaining the previous(::{O}_{L}). Therefore, the (::i=x)and (::j=y)are the satisfying (terminating) conditions for(::{O}_{L+1}). In this feature assessment process, the features that fail(::frac{dc}{dpleft(i,jright)}) and (:frac{dG}{dpleft(i,jright)}) are identified as abnormal. If the region surpasses the continuity issue, then a new (::pleft(i,jright)) with an increment from (::left(L+1right)) is observed. This process requires a formal correction such that new cases are identified at the case using intermediate classification (Refer to Fig. 4). The above network part classification is performed for augmenting precision under controlled complexity. This briefing is presented in the following section based on the condition in Eq. (5). In Table 3, the unclassified output from the experimental input is represented. Similarly, Table 4 presents the final output with the error measurement.

Table 3 Unclassified output Representations.
Table 4 (:left(varvec{L}+1right)) And final Output.

The feature classification performed in the above-proposed method boasts complexity reduction by preventing iterations for (::L)(or) (::L+1:)for (::C)and(::psi::) instances. Such a case is analyzed using the complexity analysis and its significance to the precision improvements. Therefore, two cases are considered eventually: the self-training of(::C=1) and (::psi:) training as in Fig. 2. First, the intermediate classifications (unclassified) for (::psi:) and(::C) sequences are differentiated. The differentiations are used for assessing complexities individually. Thus, the differentiating unclassified outputs are given as:

$$:begin{array}{cc}left.{Delta:}=frac{dc}{dpleft(i,jright)}+frac{{O}_{L+1}}{L}-Cright|&:{Delta:}=frac{dH}{dpleft(i,jright)}-psi:-frac{L}{L+1}:end{array}$$

(6)

The above Equation refers to the (::{Delta:}=1)or(::0) case under the varying segments wherein the occurrences are valid. Based on the cases discussed in Eq. (5), the(::{O}_{L+1}) extractions are alone considered for complexity analysis. In the case of(::0<{Delta:}) and(::{Delta:}<1{prime:}), the unclassified does not cause complexity. This is inappropriate for validating the segment-correlated outputs, such that the complexity of rising cases is (::left(L-1right)) to(::L) and (::L) to (::left(L+1right)). In the first range, the layers are revisited for the same segment features. In the latter range, the segments are induced with(::{O}_{L}) other than(::{O}_{L+1}) wherein the complexity is high. Therefore, the extracted features (as(::p) the last) experience two(:frac{dH}{dpleft(i,jright)}) and(:frac{dc}{dpleft(i,jright)}) processes that increase the computation. Such computations retain the precision rather than increasing their order. The two cases are derived as in equations (7a) and (7b) respectively.

$$:left.begin{array}{c}Feature:with:L,:left(LHright)Assessment,\:{Delta:}=frac{dc}{dpleft(i,jright)}+frac{{O}_{L+1}}{C}-frac{1}{psi:}\:begin{array}{c}begin{array}{c}left(orright)\:underbrace{{Delta:}=frac{dH}{dpleft(i,jright)}+frac{1}{psi:}-frac{C}{L}}end{array}\:left(C,psi:right):casesend{array}end{array}right}$$

(7a)

$$:left.begin{array}{c}Feature:with:left(L-1right),:L:Assessment\:{Delta:}=frac{dc}{dpleft(i,jright)}-frac{{O}_{L}}{L}\:begin{array}{c}left(orright)\:begin{array}{c}underbrace{varDelta:=frac{dH}{dpleft(i,jright)}-frac{psi:}{C}}\:psi::case:onlyend{array}end{array}end{array}right}$$

(7b)

In the above validations, the (::left(C,psi:right))and (::psi:)cases that cause complexity are handled. These two validations are yet to be finalized until classification 1 or 2 [From Eq. (4)] is replaced by the above in the iteration presented in Fig. 2. Therefore, from Eqs. (6) and [7(a), 7(b)] the complexity reduction process is defined as below

$$:left.begin{array}{cc}left.begin{array}{c}frac{dc}{dpleft(i,jright)}+frac{{O}_{L+1}}{L}-C=frac{dc}{dpleft(i,jright)}-frac{{O}_{L}}{L}\:underbrace{C=Cfrac{1}{L}left({O}_{L+1}+{O}_{L}right)}\:for:Cend{array}right|&:begin{array}{c}frac{dH}{dpleft(i,jright)}-psi:-frac{L}{left(L+1right)}=frac{dH}{dpleft(i,jright)}-frac{psi:}{C}\:underbrace{psi:left(frac{1}{C}-1right)=frac{L}{L+1}}\:for:psi:end{array}end{array}right}$$

(8)

Out of the 4 matching possibilities between equations (7a), (7b), and (6), the above extractions are identified as optimal. This optimality induces less complexity compared to the other 2 cases. Therefore, the complexity-increasing features for continuous/segmented features are validated across multiple possibilities from the transformer network. As discussed before, if both (::psi:)and(::C=1) iteration persists, then the complexity reduction feature is as above, for which either(::C) or(::psi:) iteration is performed. For single segments of varying size, the transformer network performs either(::frac{1}{L}left({O}_{L+1}+{O}_{L}right)) or(::left(frac{L}{L+1}right)) iteration for reducing extended complexity. Therefore, the identified (::C) and (::psi:) improve the precision for the similar segments and features. This means the (::L) is either increased/reduced for the classified feature unless it remains unclassified. The transformer network activates the iterative self-correction process when undifferentiated features are encountered, signaling ambiguity, with differentiation metrics dC and dH below a predetermined threshold. An adaptive learning rate increases attention weights as the network returns to the last distinguished section during training. It then reprocesses the uncertain region by adding contextual segments to the receptive field for better distinction. If segment differentiation fails during testing, the model keeps homogeneity-based categorization and refines locally. This pass uses contextual patterns to re-evaluate segment transitions to determine the best classification.

Continue Reading