US20250299059
2025-09-25
Physics
G06N3/092
An innovative AI foundation model is introduced, specifically designed for the environmental, social, and governance (ESG) domain. Utilizing a Transformer-based architecture with approximately 30 billion parameters, it supports extensive context windows of up to 128,000 tokens. This capability is crucial for analyzing comprehensive ESG documents like sustainability reports and policies. The model integrates both textual and visual data through gated cross-attention and a Mixture-of-Experts (MoE) architecture, enhancing its ability to understand multimodal contexts effectively.
Traditional AI models face limitations when applied to ESG tasks due to their general-purpose training and limited context windows. These models often fail to accurately process lengthy ESG-specific documents or integrate visual data with textual analysis. Moreover, typical adaptation techniques only partially adjust model parameters, leading to incomplete domain adaptation. The new model addresses these issues by offering a specialized approach that fully fine-tunes its parameters for the ESG domain, ensuring comprehensive understanding and analysis.
ESG analysis often requires the integration of both text and visual data. For instance, environmental assessments may involve satellite imagery while corporate reports include charts. Traditional models lack the ability to process such multimodal data cohesively. The new model combines a vision encoder with a language model, enabling holistic analysis of ESG issues by correlating textual descriptions with visual evidence, thus providing a more complete understanding of complex ESG topics.
The model employs Group Relative Policy Optimization (GRPO), an advanced reinforcement learning strategy that refines outputs based on group-relative advantages from multiple candidate generations. This approach enhances the model's reasoning capabilities and output quality in ESG contexts by rewarding not just correctness but also comprehensiveness and articulation of responses. This method significantly improves upon standard reinforcement learning techniques by considering multiple outputs simultaneously.
The model is trained on a vast corpus of approximately 20 trillion tokens from diverse sources, ensuring broad coverage of both general language and ESG-specific knowledge. It uses a detailed 47-class ESG classification framework during data preprocessing and training to maintain domain specificity. The training process includes full fine-tuning across all parameters and incorporates safety controls to mitigate bias or inappropriate content. This comprehensive approach ensures the model's outputs are accurate, coherent, and aligned with ESG values.