diff --git a/dataset/README.md b/dataset/README.md index 2a2d4a2..e9aaa1f 100644 --- a/dataset/README.md +++ b/dataset/README.md @@ -4,7 +4,7 @@ This will point to the training data we use for training various models. | Dataset | Introduction | | ------------------------------------------------------------ | ------------------------------------------------------------ | -| [MLDR](https://huggingface.co/datasets/Shitao/MLDR) | Docuemtn Retrieval Dataset, covering 13 languages | +| [MLDR](https://huggingface.co/datasets/Shitao/MLDR) | Document Retrieval Dataset, covering 13 languages | | [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data) | Fine-tuning data used by [bge-m3](https://huggingface.co/BAAI/bge-m3) | | [public-data](https://huggingface.co/datasets/cfli/bge-e5data) | Public data identical to [e5-mistral](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | | [full-data](https://huggingface.co/datasets/cfli/bge-full-data) | The full dataset we used for training [bge-en-icl](https://huggingface.co/BAAI/bge-en-icl) |