Accurate electricity demand forecasting is crucial for the stable operation of smart grids, as it enables proactive resource allocation and prevents grid failures caused by demand–supply mismatches. However, achieving precise predictions requires modeling both temporal consumption patterns and peak variations in electricity usage data. Regional power consumption data may contain sensitive commercial information, while federated learning (FL) offers a privacy-preserving approach to address data scarcity. Nevertheless, existing FL approaches struggle with two critical limitations: (1) the inherent risk of overfitting when modeling peak demand variations with sparse client-side data, and (2) the loss of client-specific features during the aggregation process, which can result in over-smoothing of predictions for some clients due to parameter inconsistencies across local models. To overcome these challenges, this paper proposes a Federated Two-Edge Graph Attention Network with Weighted Global Aggregation (FapDGN) for electricity demand forecasting. The FapDGN framework initiates by constructing a hybrid feature representation that simultaneously encapsulates both temporal dynamics and numerical fluctuations in electricity consumption patterns. Recognizing that temporal characteristics are crucial for prediction accuracy while peak variations pose higher overfitting risks, the system employs two-edge graph structures to process these elements independently. Specifically, it utilizes temporal edges in graphs coupled with a multi-scale attention mechanism to capture consumption trends over time, while implementing dynamic covariance through numerical structure edges in graphs to represent peak variations as parameterized Gaussian distributions, an approach that mitigates overfitting. The model subsequently combines these extracted temporal and peak variation features to produce its final predictive outputs. Furthermore, to combat potential over-smoothing issues, FapDGN integrates a similarity-based adaptive dynamic fusion mechanism for parameter aggregation at the server level when building the global model. Experimental results show that FapDGN outperforms commonly used FL methods in forecasting electricity demand.