A Review of Fault Tolerance Techniques in Generative Multi-Agent Systems for Real-Time Applications

Authors

  • Subhan Uddin School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China Author
  • Babar Hussain School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China Author
  • Sidra Fareed School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China Author
  • Aqsa Arif School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China Author
  • Babar Ali School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China Author

DOI:

https://doi.org/10.64229/d8g06y36

Keywords:

Generative Agents, Rollback-Recovery, Human Behavior, Message-Passing Systems, Interactive Simulacra

Abstract

Low-light image enhancement remains a significant challenge in real-world computer vision applications, particularly in environments where lighting conditions are unpredictable and training data in the form of paired low/normallight images is unavailable. Inconsistent illumination, sensor noise, and non-uniform exposure settings severely degrade image quality, adversely impacting both human perception and the performance of downstream vision tasks such as object detection, recognition, and tracking. Although transformer-based models have recently demonstrated promising results on benchmark datasets under controlled conditions, their generalization to naturally degraded real-world images is still limited. Most existing models rely on synthetically paired datasets, which do not adequately capture the complexity and variability found in real-world low-light scenes.This paper introduces a novel approach to adapt a Retinex-inspired transformer model, Retinexformer, for real-world low-light image enhancement using unpaired data. We propose a domain adaptation framework that leverages unsupervised reconstruction losses, perceptual feature alignment, and domain-invariant regularization to fine-tune the model on naturally captured images. Our training pipeline eliminates the need for paired supervision, making the model more scalable and deployable across diverse applications. Extensive experiments on public real-world datasets such as ExDark and See-in-the-Dark (SID) demonstrate that our adapted model achieves superior performance in terms of both perceptual quality and quantitative metrics (PSNR, SSIM, LPIPS) when compared to baseline and state-of-the-art enhancement methods. Moreover, the proposed strategy significantly improves robustness in challenging low-light conditions while maintaining computational efficiency. Our findings show the potential of unpaired, real-world adaptation of transformerbased architectures for practical low-light imaging tasks including night photography, surveillance, and mobile vision systems. 

References

[1]Q. Zhang and W. Ma. Generative multi-agent systems in complex environments. Journal of Artificial Intelligence Research, 62:87–104, 2020.

[2]J. Liu and Y. Zhang. Autonomous vehicles: A survey on fault tolerance and safety measures. IEEE Transactions on Intelligent Transportation Systems, 22(9):3871–3885, 2021.

[3]S. Ghosh. Distributed systems: An algorithmic approach. 2006.

[4]Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto, and P. Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.

[5]F. Xu and Y. Zhang. Resilience in multi-agent systems for autonomous vehicles. IEEE Transactions on Systems, Man, and Cybernetics, 51(4):1101–1115, 2021.

[6]J. Chen and Z. Zhao. Adaptive fault recovery strategies for multi-agent systems using reinforcement learning. Journal of Autonomous Systems, 58(7):532–548, 2021.

[7]J. Park, J. O’Brien, C.J. Cai, M.R. Morris, P. Liang, and M.S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, pages 1–17, 2023.

[8]Y. Jiang and X. Yang. Fault tolerance in autonomous systems: Applications and challenges. IEEE Transactions on Robotics, 39(5):1452–1465, 2021.

[9]S. Tan and K. Hu. Healthcare applications of generative multi-agent systems: Fault tolerance and memory models. Journal of Medical Systems, 44(3):91, 2020.

[10]F. Xu and L. Zhang. Coordination in autonomous vehicle systems using generative multi-agent models. IEEE Transactions on Intelligent Transportation Systems, 22(9):2451–2463, 2022.

[11]S. Ma and T. Wang. Vehicle-to-vehicle communication in autonomous systems. IEEE Transactions on Vehicular Technology, 69(5):4324–4337, 2020.

[12]E.N. Elnozahy, L. Alvisi, Y.M. Wang, and D.B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 34(3):375–408, 2002.

[13]L. Zhou and S. Wang. Proactive fault tolerance in multi-agent systems. IEEE Transactions on Autonomous Systems, 7(3):345–358, 2021.

[14]F. Xu and H. Liu. Learning-based fault detection for autonomous systems. Journal of Artificial Intelligence Research, 59:415–430, 2022.

[15]J. Wang and X. Zhang. Adaptive fault recovery strategies in distributed multi-agent systems. IEEE Transactions on Robotics, 38(5):889–902, 2021.

[16]L. Liu and Y. Zhang. Adaptive fault-tolerant strategies for autonomous multi-agent systems. IEEE Transactions on Autonomous Systems, 8(4):1122–1134, 2021.

[17]T. Chen and Z. Yang. Autonomous decision-making in fault-tolerant systems for smart cities. ACM Transactions on Autonomous and Adaptive Systems, 17(2):76–89, 2022.

[18]A. S. Tanenbaum and M. Van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, 3rd edition, 2016.

[19]K. Valmeekam, S. Srivastava, and S. Zhang. Planning with large language models for generative agents. arXiv preprint arXiv:2305.16960, 2023.

[20]H. Wang and Y. Zhang. Load balancing for scalable and efficient multi-agent systems in smart cities. Journal of Internet of Things, 17(4):152–163, 2022.

[21]X. Xu, P. Zhao, and Y. Liu. Scalability and load distribution in multi-agent systems: A survey. IEEE Transactions on Systems, Man, and Cybernetics, 50(5):1719–1732, 2020.

[22]Y. Li and L. Zhang. Distributed memory management for scalable multi-agent systems. ACM Transactions on Autonomous and Adaptive Systems, 16(3):1–18, 2021.

[23]J. Zhang and Y. Cheng. Scalability challenges and solutions in multi-agent systems for autonomous fleets. IEEE Transactions on Intelligent Transportation Systems, 21(6):2899–2908, 2020.

[24]J. Hou and L. Zeng. Gossip protocols for decentralized communication in large-scale multi-agent systems. IEEE Internet of Things Journal, 8(5):3881–3891, 2021.

[25]Y. Liu and S. Li. Machine learning models for scalable multi-agent systems. Journal of Artificial Intelligence, 15(3):1104–1120, 2021.

[26]H. Huang and F. Yang. Quantum computing for scalable and resilient multi-agent systems. Quantum Computing and Communication, 2(2):55–70, 2021.

[27]J. Jiang and Y. Yu. Efficient incremental checkpointing for distributed systems. Journal of Cloud Computing, 8(1):12–25, 2020.

[28]R. Sundaram and K. Marzullo. The sly algorithm for fault-tolerant sensor fusion. Journal of Distributed Computing, 25(3):208–215, 2019.

[29]K. Valmeekam, S. Srivastava, and S. Zhang. Integrating symbolic reasoning with probabilistic models for generative agents. Artificial Intelligence Review, 54(2):103–117, 2020.

[30]Y. Li and X. Lu. Autonomous decision-making in fault-tolerant multi-agent systems. IEEE Transactions on Autonomous Systems, 10(2):121–137, 2021.

[31]A. S. Tanenbaum and M. Van Steen. Distributed systems: Principles and paradigms. 2019.

[32]E. N. Elnozahy, L. Alvisi, and Y. M. Wang. Survey on memory checkpointing in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 32(5):1027–1040, 2021.

[33]Z. Ji and W. Zhang. Adaptive fault tolerance mechanisms in real-time multi-agent systems. IEEE Transactions on Autonomous Systems, 7(3):456–468, 2022.

[34]C. Li and X. Yang. Hybrid fault tolerance for autonomous systems: Combining replication and checkpointing for resilience. Journal of Autonomous Robotics, 45(5):205–220, 2021.

[35]R. Sundaram and K. Marzullo. The sly algorithm for sensor fusion. In Proceedings of the 1998 IEEE International Conference on Distributed Computing Systems, pages 208–215, 1998.

[36]H. Liu and Y. Zhang. Adaptive fault detection models for resilient multi-agent systems. IEEE Transactions on Autonomous Systems, 6(4):29–42, 2022.

[37]W. Zhang and X. Wang. Predictive fault tolerance in autonomous vehicle systems: A machine learning approach. Journal of Autonomous Systems, 16(3):103–118, 2021.

[38]T. Li and L. Zhang. Reinforcement learning for fault recovery in autonomous systems. IEEE Transactions on Robotics, 37(7):1911–1924, 2021.

[39]C. Chavez and A. Rodriguez. Anomaly detection in large-scale multi-agent systems: Applications and approaches. Artificial Intelligence Review, 53(2):311–327, 2020.

[40]A. Guerra and M. Pinto. Multi-agent reinforcement learning for fault tolerance in smart cities. IEEE Transactions on Smart Cities, 8(6):52–66, 2021.

[41]Z. Jia and W. Wu. Collaborative fault tolerance in autonomous vehicle fleets using multi-agent reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 22(8):2899–2910, 2021.

[42]L. Zhou and H. Zhang. Adaptive hybrid fault tolerance strategies for multi-agent systems in dynamic environments. Journal of Computing and Security, 52:135–149, 2022.

[43]B. Xu and L. Zhang. Quantum computing and fault tolerance in multi-agent systems. Quantum Computing Research, 4(2):67–82, 2022.

[44]T. Li and Y. Wang. Edge computing for fault tolerant multi-agent systems. IEEE Transactions on Cloud Computing, 10(6):1251-1264,2022.

Downloads

Published

2025-07-03

Issue

Section

Articles