Abstract
Purpose: The automated classification of clinical trials and key categories within the medical literature is increasingly relevant, particularly in oncology, as the volume of publications and trial reports continues to expand. Large Language Models (LLMs) may provide new opportunities for automating diverse classification tasks. They could be used for general-purpose text classification retrieving information about oncological trials. Methods and Materials: A general text classification framework with adaptable prompt, model and categories for the classification was developed. The framework was tested with four datasets comprising nine binary classification questions related to oncological trials. Evaluation was conducted using a locally hosted Mixtral-8x7B-Instruct v0.1-GPTQ model and three cloud-based LLMs: Mixtral-8x7B-Instruct v0.1, Llama3.1-70B-Instruct, and Qwen-2.5-72B. Results: The system consistently produced valid responses with the local Mixtral-8x7B-Instruct model and the Llama3.1-70B-Instruct model. It achieved a response validity rate of 99.70% and 99.88% for the cloud-based Mixtral and Qwen models, respectively. Across all models, the framework achieved an overall accuracy of >94%, precision of >92%, recall of >90%, and an F1-score of >92%. Question-specific accuracy ranged from 86.33% to 99.83% for the local Mixtral model, 85.49% to 99.83% for the cloud-based Mixtral model, 90.50% to 99.83% for the Llama3.1 model, and 77.13% to 99.83% for the Qwen model. Conclusions: The LLM-based classification framework exhibits robust accuracy and adaptability across various oncological trial classification tasks. While there remain some challenges such as strong prompt dependence and high computational and hardware demands, LLMs will play a crucial role for automating the classification of oncological trials and literature as the technology continues to advance.