Deep learning-driven IoT solution for smart tomato farming

The proposed smart platform was implemented in a controlled greenhouse environment. The sensor network successfully collected real-time data on soil moisture, temperature, and humidity. ESP32 transmitted this data wirelessly to the ThingsBoard platform, where it was visualized on a user-friendly dashboard shown in Fig. 9. The Raspberry Pi captured images of tomatoes at regular intervals. The YOLOv8 model accurately detected tomatoes in the images and classified them according to their ripening stage (green, half-ripened, fully ripened). The processed image data, including bounding boxes and ripening stage labels, was transmitted to the ThingsBoard platform.

Fig. 9

Serial monitor of arduino IDE.

Image with ripening stage detection: Convolution neural network model

The YOLOv8 model accurately detected the ripening stages of the tomatoes. The Raspberry Pi captured images of the tomatoes at different stages of ripeness, and the model classified them into green, half-ripened, and fully ripened categories with high accuracy. Figure 10 presents an example image captured by the Raspberry Pi camera. The image overlays bounding boxes around detected tomatoes with labels indicating their respective ripening stages (green, half-ripened, or fully ripened) predicted by the YOLOv8 model. This visual representation provides valuable information for harvest planning and resource allocation shown in Figs. 11 and 12.

Confusion matrix

To examine the effectiveness of the YOLOv8n model, a confusion matrix was created (Figs. 17, 18, 19). The matrix displays the predicted versus actual classes, allowing us to assess the model’s accuracy for each class. The results show that the model performs well in distinguishing between different ripeness stages. For instance, ‘l_green’ instances are mostly correctly classified, while some misclassifications occur between ‘b_half_ripened’ and ‘b_green’. The confusion matrix reveals that the ‘l_green’ class has the highest number of correct predictions, with a few instances of misclassification in other categories. This indicates that the model can accurately identify green tomatoes, but there is some room for improvement in distinguishing between half-ripened and fully ripened stages.

Model performance

Detection accuracy

The model demonstrates strong object detection capability across all six classes, with the highest accuracy in detecting l_green and l_fully_ripened tomatoes. Detection performance is slightly lower for b_fully_ripened and b_half_ripened, which can be attributed to the lower number of annotated samples.

In Table 4, the precision, recall, f1-score, and support columns show the results. Precision indicates the accuracy of the positive predictions made by the model, recall indicates how well the model can identify the overall positive class, and F1-score indicates the accuracy of the model. Based on the evaluation matrix results from Fig. 13, the model has an accuracy of 52% in testing the test data.

Table 4 Parameters for training the model: confusion matrix evaluation results of ripening detection model.

Visual prediction result

The output detection images in figure clearly show accurate bounding boxes and class labels overlaid on test images, confirming the model’s robustness under varied lighting and angle conditions.
Even in dense clusters or partial occlusions, the YOLOv8n model maintained reliable detections, making it viable for real-time agricultural applications.

Cloud dashboard: visualization and analysis

Figures 14 and 15 depicts the ThingsBoard dashboard displaying real-time sensor data from the greenhouse. The dashboard includes widgets showcasing the current values of soil moisture, temperature, and humidity. Users can customize the dashboard to display historical data in the form of graphs and charts, providing insights into trends and fluctuations over time. Analysing these trends enables farmers to identify potential issues like rising temperatures or declining soil moisture levels and take corrective actions before they negatively impact crop health.

Data visualization

Figure 16 illustrates a sample graph generated from the sensor data collected by the Thingsboard platform. The graph depicts the variations in temperature within the greenhouse over a specific period. The cloud-based dashboard provided a comprehensive view of the sensor data and ripening stages. The dashboard displayed real-time graphs of soil moisture, temperature, and humidity levels, as well as images of the tomatoes with their respective ripening stage.

Analysis

In this section, we present the results and analysis of the YOLOv8 model trained to detect the ripening stages of tomatoes. Various visualizations and metrics have been used to evaluate the model’s performance.

The instance distribution graph (Fig. 17) shows the number of instances for each class (‘b_fully_ripened’, ‘b_half_ripened, b_green’, ‘l_fully_ripened’, ‘l_half_ripened’, ‘l_green’). It highlights that the ‘l_green’ class has the highest number of instances, followed by ‘b_green’. The bounding box distributions (Fig. 18) display the normalized x and y coordinates, showing the density and distribution of bounding boxes in the images. The width and height distributions (Fig. 19) indicate the range of sizes of the bounding boxes, revealing a correlation between width and height.

Figure 20 presents the training and validation loss curves over 20 epochs. The training and validation box loss, class loss, and DFL loss are plotted. The results indicate a steady decrease in losses, suggesting that the model is learning effectively. Metrics such as precision, recall, mAP50, and mAP50-95 are also displayed, showing improvements as training progress. These metrics are critical for evaluating the model’s performance in detecting and classifying tomato ripeness stages.

Energy consumption

Understanding the energy consumption of hardware components is critical for designing efficient and sustainable systems, especially in applications like smart greenhouses where devices are continuously operational. This section presents a detailed analysis of the Raspberry Pi 3 Model B and the ESP32 microcontroller, two essential parts of the smart greenhouse system, and how much energy they require.

Energy consumption of ESP32⁸

The ESP32 microcontroller is designed for low-power applications and offers significant energy efficiency. It operates at a voltage of 3.3 V and has distinct power consumption modes depending on its activity:

Active mode

When actively processing data or transmitting information, the ESP32 consumes approximately 160 mA.

The power consumption in this mode is calculated as

$$:Pleft(activeright):=:Vtimes:I:=:3.3V:times::160mA:=:0.528:W$$

(1)

Sleep mode

In deep sleep mode, the ESP32 significantly reduces its power consumption to about 10 µA. The power consumption during sleep is:

$$:Pleft(sleepright)=Vtimes:I=3.3:Vtimes:10:mu:A=0.033:W$$

(2)

Assuming the ESP32 is active for 12 h a day and in deep sleep for the remaining 12 h, the daily energy consumption is:

$$:Active:Energy::Eleft(activeright)=0.528:Wtimes:12:h=6.336:Wh$$

(3)

$$:Sleep:Energy::Eleft(sleepright)=0.033:mWtimes:12:h=0.000396:Wh$$

(4)

Total daily energy consumption

$$:Eleft(totalright)=6.336:Wh+0.000396:Wh=::6.336:Wh$$

(5)

Wi-Fi transmission power

Wi-Fi communication is a major factor affecting ESP32’s power usage. The ESP32 uses ~ 260 mA when transmitting data over Wi-Fi, significantly increasing its power draw. To estimate the additional energy consumption:

$$:P(Wi-Fi):=:3.3V:times::260mA:=:0.858W$$

(6)

Wi-Fi Usage Scenario: Assuming the ESP32 transmits data for 15 min per hour throughout a 12-hour active period:

$$:Wi-Fi:Energy::E(Wi-Fi):=:0.858W:times::3h:=:2.574:Wh/day$$

(7)

Revised Total ESP32 Energy Consumption:

$$:Eleft(totalright):=:6.3364:Wh:+:2.574:Wh:=:8.9104:Wh/day$$

(8)

Energy consumption of raspberry Pi 3 B⁷

The Raspberry Pi 3 Model B is intended for more demanding computational operations and runs at a higher voltage of 5 V. The way it operates affects how much power it uses:

Active mode

When fully operational, the Raspberry Pi consumes approximately 500 mA. The power consumption is:

$$:{P}_{active}=Vtimes:I=5:Vtimes:500:mA=2.5:W$$

(9)

Idle mode

When in idle mode, the Raspberry Pi’s consumption drops to about 400 mA. The power consumption is:

$$:{P}_{idle}=Vtimes:I=5:Vtimes:400:mA=2:W$$

(10)

For a continuous operational period of 24 h, the daily energy consumption is:

$$:text{A}text{c}text{t}text{i}text{v}text{e}:text{e}text{n}text{e}text{r}text{g}text{y}:{E}_{active}=2.5:Wtimes:24:h=60:Wh$$

(11)

Wi-Fi power consumption

Raspberry Pi Wi-Fi transmission draws ~ 300 mA additional current, leading to increased consumption.

$$:Additional:power:due:to:Wi-Fi.:5V:times::300mA:=:1.5W$$

(12)

$$:Estimated:additional:energy.:1.5W:times::12h:=:18:Wh/day$$

(13)

Total energy consumption

$$:Eleft(totalright):=:60:Wh:+:18:Wh:=:78:Wh/day$$

(14)

The energy consumption comparison between the ESP32 and the Raspberry Pi reveals substantial differences as shown in Fig. 21.

ESP32

Consumes about 8.91 Wh/day when active for 12 h and in sleep mode for 12 h. This low energy requirement highlights its suitability for battery-powered and energy-efficient applications.

Raspberry Pi

Consumes approximately 78 Wh/day when continuously active. This higher energy consumption reflects its more intensive computational capabilities and constant operation.

Figure 21 shows the comparative energy usage, highlighting the need for power-efficient alternatives in future deployments (e.g., LoRa, Edge TPU).

The proposed work considered a 9 V Hi-Watt battery as a potential power source. While we have not physically tested this battery, analyzed its theoretical performance based on known specifications and expected power consumption patterns. The Hi-Watt 9 V battery is a commonly available dry-cell battery with an estimated 500mAh capacity, providing approximately 4.5Wh of total energy.

Since the ESP32 operates at 3.3 V, a voltage regulation circuit (such as an LDO or buck converter) would be required to step down the voltage. This introduces additional power losses, reducing the effective energy available to ESP32. Moreover, battery performance is influenced by nonlinear discharge characteristics, meaning that as the battery depletes, its voltage output gradually drops, affecting system stability.

The analysis highlights the importance of selecting a battery based on actual usage requirements rather than theoretical capacity alone. Factors like duty cycle, voltage regulation losses, and environmental conditions can significantly impact battery life. If a longer operational period is required, alternative power solutions such as higher-capacity Li-ion or Li-Po batteries, energy harvesting modules, or solar-powered setups could be considered.

The battery depletion curve for the ESP32 shown in Fig. 22 is not linear due to several factors, including internal resistance, chemical reaction kinetics, and discharge rate effects. As seen in the graph, the depletion follows an exponential decay trend rather than a straight-line decline. Initially, energy drains more rapidly due to higher available charge, but as the battery nears depletion, the rate of voltage drop slows down.

The ESP32’s energy consumption varies depending on its operating mode. In active transmission states (Wi-Fi/Bluetooth operations), power draw is significantly higher, whereas in deep sleep modes, the current consumption is minimal. The graph reflects an estimated depletion time of around 17–18 h, assuming a mixed operation profile. Once the battery level reaches the low battery threshold (20%), the system may enter power-saving modes or shut down due to insufficient voltage supply.

Understanding throughput vs. delay

Throughput refers to the rate at which data is successfully transmitted and received over the network, typically measured in kilobits per second (Kbps) or megabits per second (Mbps). Delay, on the other hand, represents the latency or time taken for data to reach the destination, which includes processing, queuing, and transmission delays.

In an ideal scenario, high throughput and low delay are desirable for efficient data transmission. However, real-world environments introduce factors such as:

Network Congestion: High data traffic may lead to increased queuing delays.
Wi-Fi Signal Strength: Weak signal strength can reduce data rates and increase retransmissions.
Cloud Processing Time: The ThingsBoard platform introduces an additional delay due to server-side data processing and dashboard updates.

From the Throughput vs. Delay graph shown in Fig. 23, it is evident that as throughput increases, delays tend to rise beyond a certain threshold. This can be attributed to limited bandwidth and network contention, where a higher data rate results in increased packet queuing and retransmissions. The ESP32, operating on a typical 2.4 GHz Wi-Fi connection, experiences latencies ranging from a few milliseconds to several hundred milliseconds, depending on environmental interference and cloud response times.

Deep learning-driven IoT solution for smart tomato farming