Enhancing XBRL tagging with Large Language Model (LLM)

By Manish Kumar Das on September 16, 2024

In financial reporting, Extended Business Reporting Language (XBRL) serves as a crucial tool for standardizing and organizing data, making it easier to compare financial information across different entities. Despite its advantages, XBRL tagging, assigning specific labels to financial data can be complex and prone to errors due to the intricate nature of financial terminology and data.

Recent breakthroughs in large language models (LLMs) and natural language processing (NLP) have introduced exciting possibilities for improving XBRL tagging. These models, particularly those built on transformer architectures, are set to revolutionize how we handle financial data tagging. This article explores how LLMs are transforming XBRL tagging, comparing traditional methods with these cutting-edge technologies.

Traditional Tagging Methods

Rule-Based Systems and Basic Machine Learning

Historically, XBRL tagging has depended on rule-based systems and basic machine learning models. Here’s a brief overview:

FiNER: This rule-based system is designed to recognize a limited set of financial entities using predefined rules and simple machine learning techniques. While effective for common tags, FiNER struggles with less frequent or more complex tags.
AttentionXML Pipeline: Employing BERT-based models, this method improves tagging accuracy by focusing on relevant parts of the text. It outperforms older methods by handling a broader range of XBRL tags (Sharma et al., 2023).
GalaXC: Enhances tagging accuracy through the use of document-label graphs, which provide additional context around labels. This approach improves classification by considering metadata associated with labels (Saini et al., 2021).
Label Semantics: This method uses entity descriptions within named entity recognition (NER) frameworks to address semantic issues in tagging. It offers better accuracy by understanding the context of labels but faces scalability challenges (Ma et al., 2022).

The Impact of Generative Models

Transformer-Based Approaches

Generative models, especially those based on transformer architectures, have introduced new methods for XBRL tagging:

ChatGPT: Generative models like GPT-3.5-turbo are known for their strong general language understanding. However, they can be limited in specialized financial tagging tasks due to their broad focus and lack of domain-specific training.
FLAN-T5: A variant of the T5 model, FLAN-T5 has been specifically tuned with instructions for financial data. This makes it more adept at handling complex XBRL tagging tasks by providing targeted training for financial contexts.

Introducing FLAN-FinXC

What Makes FLAN-FinXC Unique?

The FLAN-FinXC framework represents a significant advancement in XBRL tagging. Here’s what makes it stand out:

Instruction Tuning: FLAN-FinXC uses specific instructions tailored for financial data to fine-tune large language models. This approach enhances the model’s accuracy in applying XBRL tags by providing precise guidance during training.
Efficient Techniques: The framework employs parameter-efficient methods like Prefix Tuning and Low-Rank Adaptation (LoRA). These techniques optimize model performance while keeping computational costs manageable, making it suitable for complex tagging tasks.

Comparing FLAN-FinXC

To evaluate the performance of FLAN-FinXC, we compare it with various traditional and modern methods:

FiNER: Provides a baseline for understanding how well the model handles a limited set of tags.
AttentionXML Pipeline: Serves as a modern benchmark with advanced attention mechanisms, allowing us to assess the improvements FLAN-FinXC offers.
GalaXC: Offers a comparison point with models using document-label graphs, highlighting the impact of additional context on tagging accuracy.
Label Semantics: Evaluates how semantic understanding enhances tagging accuracy compared to using tags alone.
ChatGPT: Provides insights into the performance of generative models in a few-shot setting, comparing their effectiveness with FLAN-FinXC.

Results and Analysis

Key Findings

FLAN-FinXC Performance: Achieves a Macro-F1 score of 66.23 on the FNXL dataset, demonstrating significant improvements over traditional models. This score reflects its ability to handle a wide range of XBRL tags effectively.
Comparison Insights: FLAN-FinXC shows a 39.3% improvement in Macro-F1 and a 17.2% improvement in Hits@1 compared to the AttentionXML Pipeline. These results underscore its effectiveness in addressing complex XBRL tagging scenarios.

Model Variations

T5-Large vs. T5-Base: Larger models like T5-Large deliver better performance, reinforcing the benefits of using more sophisticated models for tagging tasks.
Instruction Tuning vs. Fine-Tuning: Instruction tuning, particularly with FLAN-T5-Large and LoRA, outperforms traditional fine-tuning approaches, highlighting the advantages of parameter-efficient training techniques.

In-Depth Analysis

Handling Rare Labels

FLAN-FinXC excels in tagging rare labels, a common challenge in financial data, ensuring accurate tagging across diverse scenarios.

Zero-Shot Performance

The model shows strong zero-shot capabilities, achieving a Macro-F1 score of 58.89 on unseen labels. This indicates FLAN-FinXC’s ability to adapt to new tagging situations.

Comparative Performance

ChatGPT Performance: In a few-shot setting, ChatGPT achieves a Macro-F1 score of 9.08, showing its limitations compared to FLAN-FinXC in specialized tagging tasks.

Conclusion

FLAN-FinXC represents a major advancement in XBRL tagging, leveraging instruction tuning and efficient techniques to achieve notable improvements in accuracy. Its ability to handle diverse tagging scenarios and adapt to new situations highlights its potential to transform financial reporting.

Future research will focus on integrating external financial knowledge and exploring additional contextual elements to further enhance performance. As the field of financial reporting evolves, innovations like FLAN-FinXC will play a vital role in ensuring the accuracy and reliability of financial data.

Limitations

Despite its advancements, FLAN-FinXC has some limitations:

External Financial Knowledge: The current model does not incorporate external financial data, which could provide additional context and enhance accuracy.
Contextual Limitations: The model’s performance is confined to sentence-level text, and incorporating broader contextual elements could further improve its effectiveness.

Addressing these limitations will be crucial for future improvements in XBRL tagging and overall model robustness.