mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2025-06-27 02:39:58 +00:00
DataSet
This will point to the training data we use for training various models.
Dataset | Introduction |
---|---|
MLDR | Document Retrieval Dataset, covering 13 languages |
bge-m3-data | Fine-tuning data used by bge-m3 |
public-data | Public data identical to e5-mistral |
full-data | The full dataset we used for training bge-en-icl |
bge-multilingual-gemma2-data | The full multilingual dataset we used for training bge-multilingual-gemma2 |
reranker-data | a mixture of multilingual datasets |