2024-10-28 11:06:47 +08:00
..
2024-10-28 11:06:47 +08:00

DataSet

This will point to the training data we use for training various models.

Dataset Introduction
MLDR Docuemtn Retrieval Dataset, covering 13 languages
bge-m3-data Fine-tuning data used by bge-m3
public-data Public data identical to e5-mistral
full-data The full dataset we used for training bge-en-icl
reranker-data a mixture of multilingual datasets