mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2025-12-29 08:02:43 +00:00
DataSet
This will point to the training data we use for training various models.
| Dataset | Introduction |
|---|---|
| MLDR | Docuemtn Retrieval Dataset, covering 13 languages |
| bge-m3-data | Fine-tuning data used by bge-m3 |
| public-data | Public data identical to e5-mistral |
| full-data | The full dataset we used for training bge-en-icl |
| reranker-data | a mixture of multilingual datasets |