Yubi, the world’s first unified credit platform for corporates and lenders, has launched India’s first indigenous open source fintech language model YubiBERT. YubiBERT is a language model trained from scratch (similar to BERT from Google) which understands fintech text applied to Indian context better. It currently supports 13 top regional languages along with English.
Despite being the world’s biggest and most innovative fintech market, Indian fintech companies are compelled to use large language models (LLMs) which are not designed for the fintech sector or the Indian context. This has resulted in multiple inefficiencies in the fintech sector. With YubiBERT, Yubi aims to solve this problem for the entire fintech industry so that the ecosystem can collectively grow.
Commenting on the launch of this language model Mathangi Sri, Chief Data Officer, Yubi said, “Despite having an innovative fintech ecosystem in India, very few data science teams in Indian fintech companies attempt to train a model from scratch because of which most fintech companies finetune models given by Google, Microsoft, and Facebook. This approach has severely hindered the growth of the sector. India being a unique market for financial services needed a unique language and with a very strong data team at Yubi, we wanted to be the pioneers in building this language model. We are thrilled to launch this language model as an open source so that the entire Indian fintech ecosystem can thrive collectively.”
YubiBERT was trained with 200+Gb fintech public data and over 1 billion sentences, making it the one of the most robustly trained language model in the world. When fine tuned on FinTech related Natural Language Processing (NLP) tasks, it performs better than BERT, RoBERTa, FinBERT and DistilBERT models.
Commenting on the rationale of building YubiBERT, Swapnil Ashok Jadhav, Director Data Science, Yubi said, “Natural language processing has been a crucial part of many tech companies and their success. However, we noticed two main pain points. Firstly, India being a complex market with multiple languages, there was no model to analyze regional languages. Secondly, domain specific models perform better than generalized state of the art models. While there are domain specific models for fintech, none of these models consider the vastly different context of the Indian fintech market. These two pain points motivated us to train a model from scratch which resulted in YubiBERT. We are positive that this will have a massive impact on the fintech community and we are excited to see how the data science community takes this language model to the next level.”
With an accuracy of over 90% across different natural language processing (NLP) use cases in fintech, YubiBERT’s accuracy is higher than the fine-tuned State of The Art (SOTA) models. The language model is also faster than SOTA models as it is trained on very small architecture and also works on CPU in milliseconds making it cost-effective to deploy.
Data Scientists can access the model here:
https://github.com/Yubi2Community/YubiAI/tree/master/yubiai/nlp/yubiEmbeddings