Abstract
GPT introduced unsupervised pre-training of large language models using multitask learning. This approach enabled single-model, end-to-end solutions to many tasks by combining training objectives in a simple but effective paradigm.
-
📚
Authors: Radford et al.
-
📅
Year: 2018
-
📄
Publisher: ACL 2018 Conference
Key Contribution
Demonstrated that large-scale unsupervised language models can effectively learn representations for many diverse tasks without task-specific tuning.
Technical Implementation
Model Architecture
Transformer-based language model with 12 or 48 layers depending on model size. Uses masked language modeling and next sentence prediction objectives.
Training Process
Pretrained on 40GB Wikipedia text corpus. No task-specific tuning. Trained using multinomial logistic regression objective across multiple tasks.
Model Performance
Text Completion
Demonstrates strong performance on open-ended generation tasks with coherent context preservation.
Question Answering
Outperforms many task-specific models on standardized benchmarks without explicit training.
Code Generation
Shows ability to understand and generate code in multiple programming languages.