Researchers have publicly released Flan-T5 checkpoints that achieve high performance in a few shots compared to the much larger PaLM 62B model.
Fine-tuned language models on a group of instruction-based datasets play a prominent role in improving generalization and model performance on unseen tasks. In an effort to move in this direction, Google AI has released a new open-source language model, Flan-T5, which is capable of over 1,800 different tasks. The article focuses primarily on learning how to optimize instructions in areas such as scaling the number of tasks and model size, as well as chain-of-think data.
The article reads, “We find that optimizing instructions with the above aspects significantly improves performance on different classes of models (PaLM, T5, U-PaLM), prompting settings (zero-shot, small-shot, CoT) and evaluation tests (MMLU, BBH, TyDiQA, MGSM, open generation).” The team has publicly released the Flan-T5 checkpoints, which achieve strong shooting results.
Compared to the much larger PaLM 62B model. In addition, instructional optimization is a common method used to improve performance and usability of proven language models. As for Flan-T5, the researchers claim that the new model will lead to improved cueing and multi-step thinking abilities.