Research
– 10 min read
Palmyra-Med: instruction-based fine-tuning of LLMs enhancing medical domain performance
Our research paper, “Palmyra-Med: instruction-based fine-tuning of LLMs enhancing medical domain performance,” presents a comprehensive study on the fine-tuning of large language models (LLMs) for medical applications. The Palmyra-Med project involves the fine-tuning of the Palmyra-20b and Palmyra-40b models using a custom-curated medical dataset containing 200,000 examples. This dataset includes data from PubMedQA and MedQA, focusing on biomedical research questions and USMLE-style questions. The fine-tuning process employs instruction-based techniques, utilizing the AdamW optimizer, WarmupDecayLR learning rate scheduler, and training in bf16 precision on multiple GPUs. This approach has significantly enhanced the models’ capabilities in understanding and generating medically relevant responses.
Key findings and takeaways:
- Enhanced model performance: The Palmyra-Med models, specifically the Palmyra-Med-20b and Palmyra-Med-40b, demonstrated superior performance compared to their base models and other domain-specific pretrained LLMs, including GPT-4 in certain evaluations.
- High accuracy on medical datasets: On the PubMedQA dataset, Palmyra-Med-20b achieved an accuracy of 75.6%, and Palmyra-Med-40b reached 81.1%. On the MedQA dataset, accuracies were 44.6% and 72.4% respectively, showcasing significant improvements over other models.
- Effective use of instruction-based fine-tuning: The instruction-based fine-tuning approach has proven effective in significantly improving the performance of LLMs for medical domain tasks, highlighting the importance of tailored training protocols and domain-specific datasets.
Our fine-tuned models outperform both their base counterparts and other LLMs pre-trained on domain-specific knowledge. This research demonstrates the effectiveness of instruction-based fine-tuning in enhancing LLMs performance in the medical domain.