Debasish Dhal

Machine Learning | Physics | Data Analytics | LLMs

About Me

Greetings and welcome to my website.

I'm interested in Machine Learning, Data Analytics, Data Visualization and LLMs.

I completed my Master's in Physics in 2023, from National Institute of Science Education and Research, Odisha, India, a research institute under Department of Atomic Energy.

My Résumé.

Portfolio

  1. Retrieval of Atmospheric Properties using INSAT-3DR Satellite data (Master's Thesis)
  2. Datasets for fine-tuning of Large Language Models for Indic languages

  3. Youtube Playlist Statistics (Web App deployed for Public use)
  4. Language Transliteration Project (Web App deployed for Public use)
  5. Upwork Freelancer
    • Completed one contract on optimization of an implementation of simulated annealing algorithm.

Publication

  1. OdiaGenAI's participation at WAT2023 (Paper)
  2. This paper offers an in-depth overview of the team “ODIAGEN’s” translation system submitted to the Workshop on Asian Translation (WAT2023). Our focus lies in the domain of Indic Multimodal tasks, specifically targeting English to Hindi, English to Malayalam, and English to Bengali translations. The system uses a state-of-the-art Transformer-based architecture, specifically the NLLB-200 model, fine-tuned with language-specific Visual Genome Datasets. With this robust system, we were able to manage both text-to-text and multimodal translations, demonstrating versatility in handling different translation modes. Our results showcase strong performance across the board, with particularly promising results in the Hindi and Bengali translation tasks. A noteworthy achievement of our system lies in its stellar performance across all text-to-text translation tasks. In the categories of English to Hindi, English to Bengali, and English to Malayalam translations, our system claimed the top positions for both the evaluation and challenge sets. This system not only advances our understanding of the challenges and nuances of Indic language translation but also opens avenues for future research to enhance translation accuracy and performance.

  3. Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set (arXiv)
  4. Building LLMs for languages other than English is in great demand due to the unavailability and performance of multilingual LLMs, such as understanding the local context. The problem is critical for low-resource languages due to the need for instruction sets. In a multilingual country like India, there is a need for LLMs supporting Indic languages to provide generative AI and LLM-based technologies and services to its citizens.

    This paper presents our approach of i) generating a large Odia instruction set, including domain knowledge data suitable for LLM fine-tuning, and ii) building a Llama2-finetuned model tailored for enhanced performance in the Odia domain. The proposed work will help researchers build an instruction set and LLM, particularly for Indic languages. We will release the model and instruction set for the public for research and noncommercial purposes.

Miscellaneous

  1. Workshop on Asian Translation 2023
  2. Odisha AI ML Conference 2023
  3. Generative AI and LLM Workshop 2023
  4. 1 year streak on StackOverflow

Contact and Links

Mail, Github, Medium, LinkedIn , Hugging Face, Twitter, StackOverflow

Extracurriculars