Science, Technology, Engineering and Mathematics.
Open Access

HIERARCHICAL GNN FRAMEWORK FOR ENERGY-AWARE SCHEDULING IN GPU-ACCELERATED DISTRIBUTED SYSTEMS

Download as PDF

Volume 2, Issue 2, Pp 69-79, 2025

DOI: https://doi.org/10.61784/adsj3027

Author(s)

Isabella Marino, Lukas Hoffmann*

Affiliation(s)

Department of Computer Science, Technical University of Munich (TUM), Germany.

Corresponding Author

Lukas Hoffmann

ABSTRACT

Graphics Processing Units have become indispensable computational accelerators in modern distributed computing systems, powering applications ranging from deep learning to scientific simulations. However, the increasing computational demands and energy consumption of GPU-accelerated systems pose significant challenges for resource management and scheduling. Traditional scheduling algorithms often fail to capture the complex hierarchical structure and dynamic dependencies inherent in distributed GPU environments, leading to suboptimal energy efficiency and performance degradation. This paper proposes a novel hierarchical Graph Neural Network framework for energy-aware scheduling in GPU-accelerated distributed systems. The framework leverages the representational power of GNNs to model the complex interactions between computational tasks, GPU resources, and energy constraints at multiple hierarchical levels. By incorporating graph-structured representations of workload dependencies, resource topologies, and energy profiles, the proposed framework enables adaptive scheduling decisions that jointly optimize task completion time and energy consumption. Experimental results demonstrate that the proposed approach achieves up to 36 percent reduction in energy consumption while maintaining quality of service requirements compared to state-of-the-art scheduling methods. The hierarchical architecture effectively captures both fine-grained GPU-level characteristics and coarse-grained cluster-level dynamics, enabling scalable and efficient scheduling for large-scale distributed systems.

KEYWORDS

Graph neural networks; Energy-aware scheduling; GPU computing; Distributed systems; Hierarchical framework; Resource management; Deep learning

CITE THIS PAPER

Isabella Marino, Lukas Hoffmann. Hierarchical GNN framework for energy-aware scheduling in GPU-accelerated distributed systems. AI and Data Science Journal. 2025, 2(2): 69-79. DOI: https://doi.org/10.61784/adsj3027.

REFERENCES

[1] Choe SK, Ahn H, Bae J, et al. Large-scale training data attribution with efficient influence functions. ICLR 2025 Conference. 2025. https://openreview.net/forum?id=jZw0CWXuDc.

[2] Ahmed KMU, Bollen MH, Alvarez M. A review of data centers energy consumption and reliability modeling. IEEE Access, 2021, 9, 152536-152563. DOI: 10.1109/ACCESS.2021.3125092.

[3] Ahmed A. Calculating carbon emissions of the ICT sector: analyzing key drivers and future trends. LUT University. 2024. https://lutpub.lut.fi/handle/10024/168072.

[4] Hathwar DK, Bharadwaj SR, Basha SM. Power-aware virtualization: dynamic voltage frequency scaling insights and communication-aware request stacking. Computational Intelligence for Green Cloud Computing and Digital Waste Management, IGI Global Scientific Publishing, 2024, 84-108. DOI: 10.4018/979-8-3693-1552-1.ch005.

[5] Kanakis ME, Khalili R, Wang L. Machine learning for computer systems and networking: a survey. ACM Computing Surveys, 2022, 55(4): 1-36.

[6] Yang Y, Ding G, Chen Z, et al. GART: graph neural network-based adaptive and robust task scheduler for heterogeneous distributed computing. IEEE Access, 2025. DOI: 10.1109/ACCESS.2025.3633290.

[7] Wang G, Ying R, Huang J, et al. Improving graph attention networks with large margin-based constraints. ArXiv, 2019. DOI: https://doi.org/10.48550/arXiv.1910.11945.

[8] Gárate-Escamilla AK, El Hassani AH, Andres E. Big data execution time based on Spark machine learning libraries. Proceedings of the 2019 3rd International Conference on Cloud and Big Data Computing(ICCBDC’19). Association for Computing Machinery, New York, NY, USA, 2019, 78-83. DOI: 10.1145/3358505.3358519.

[9] Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 2019, 68-80.

[10] Murino T, Monaco R, Nielsen PS, et al. Sustainable energy data centres: a holistic conceptual framework for design and operations. Energies, 2023, 16(15): 5764.

[11] Chen Z, Zhao X, Zhi C, et al. DeepBoot: dynamic scheduling system for training and inference deep learning tasks in GPU cluster. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(9): 2553-2567.

[12] Ma Y, Song F, Pau G, et al. Adaptive service provisioning for dynamic resource allocation in network digital twin. IEEE Network, 2023, 38(1): 61-68.

[13] Ranjbar B, Hosseinghorban A, Salehi M, et al. Toward the design of fault-tolerance-aware and peak-power-aware multicore mixed-criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 41(5): 1509-1522.

[14] Qiu L. Multi-agent reinforcement learning for coordinated smart grid and building energy management across urban communities. Computer Life, 2025, 13(3): 8-5.

[15] Zhang H. Physics-informed neural networks for high-fidelity electromagnetic field approximation in VLSI and RF EDA applications. Journal of Computing and Electronic Information Management, 2025, 18(2): 38-46.

[16] Hu X, Zhao X, Wang J, et al. Information-theoretic multi-scale geometric pre-training for enhanced molecular property prediction. PLoS One, 2025, 20(10): e0332640.

[17] Qiu L. Machine learning approaches to minimize carbon emissions through optimized road traffic flow and routing. Frontiers in Environmental Science and Sustainability, 2025, 2(1): 30-41.

[18] Zhang X, Li P, Han X, et al. Enhancing time series product demand forecasting with hybrid attention-based deep learning models. IEEE Access, 2024, 12, 190079-190091. DOI: 10.1109/ACCESS.2024.3516697.

[19] Wang M, Zhang X, Yang Y, et al. Explainable machine learning in risk management: balancing accuracy and interpretability. Journal of Financial Risk Management, 2025, 14(3): 185-198.

[20] Sun T, Yang J, Li J, et al. Enhancing auto insurance risk evaluation with transformer and SHAP. IEEE Access, 2024, 12, 116546-116557. DOI: 10.1109/ACCESS.2024.3446179.

[21] Wang M, Zhang X, Han X. AI driven systems for improving accounting accuracy fraud detection and financial transparency. Frontiers in Artificial Intelligence Research, 2025, 2(3): 403-421.

[22] Zhang H, Ge Y, Zhao X, et al. Hierarchical deep reinforcement learning for multi-objective integrated circuit physical layout optimization with congestion-aware reward shaping. IEEE Access, 2025, 13, 162533-162551. DOI: 10.1109/ACCESS.2025.3610615.

[23] Sun T, Wang M. Usage-based and personalized insurance enabled by AI and telematics. Frontiers in Business and Finance, 2025, 2(2): 262-273.

[24] Ren S, Chen S. Large language models for cybersecurity intelligence threat hunting and decision support. Computer Life, 2025, 13(3): 39-47.

[25] Chen S, Liu Y, Zhang Q, et al. Multi-distance spatial-temporal graph neural network for anomaly detection in blockchain transactions. Advanced Intelligent Systems, 2025, 7(8): 2400898. DOI: 10.1002/aisy.202400898.

[26] Ge Y, Wang Y, Liu J, et al. GAN-enhanced implied volatility surface reconstruction for option pricing error mitigation. IEEE Access, 2025, 13, 176770-176787. DOI: 10.1109/ACCESS.2025.3619553.

[27] Wang Y, Ding G, Zeng Z, et al. Causal-aware multimodal transformer for supply chain demand forecasting: integrating text time series and satellite imagery. IEEE Access, 2025, 13, 176813-176829. DOI: 10.1109/ACCESS.2025.3619552.

[28] Liu J, Wang J, Lin H. Coordinated physics-informed multi-agent reinforcement learning for risk-aware supply chain optimization. IEEE Access, 2025, 13, 190980-190993. DOI: 10.1109/ACCESS.2025.3629716.

[29] Sun T, Wang M, Han X. Deep learning in insurance fraud detection: techniques datasets and emerging trends. Journal of Banking and Financial Dynamics, 2025, 9(8): 1-11.

[30] Wang M, Zhang X, Yang Y, et al. Explainable machine learning in risk management: balancing accuracy and interpretability. Journal of Financial Risk Management, 2025, 14(3): 185-198.

[31] Zhang S, Qiu L, Zhang H. Edge cloud synergy models for ultra-low latency data processing in smart city IoT networks. International Journal of Science, 2025, 12(10).

[32] Yang J, Zeng Z, Shen Z. Neural-symbolic dual-indexing architectures for scalable retrieval-augmented generation. IEEE Access, 2025.

[33] Sun T, Wang M, Chen J. Leveraging machine learning for tax fraud detection and risk scoring in corporate filings. Asian Business Research Journal, 2025, 10(11): 1-13.

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2025 Science, Technology, Engineering and Mathematics.   All Rights Reserved.