Open Datasets Library

Access thousands of carefully curated datasets across every domain - all available for immediate research and development.

Browse by Category

Explore datasets by category or search all thousands of open resources available for research and experimentation.

๐Ÿงฌ

Biology

2,147 datasets

๐Ÿง 

AI

3,892 datasets

๐ŸŒ

Geospatial

1,563 datasets

๐Ÿ“Š

Economics

2,345 datasets

๐Ÿงช

Medicine

1,987 datasets

5000 datasets found

AI Public

Comprehensive LLM Training Data (2023)

2.1TB of curated text from GitHub, Wikipedia, and academic papers for training large language models.

NLP 175B tokens Multilingual CC BY-NC