mirror of
https://github.com/FlagOpen/FlagEmbedding.git
synced 2025-06-27 02:39:58 +00:00
update docs
This commit is contained in:
parent
235f967a01
commit
875fd4ffcb
36
docs/source/bge/bge_code.rst
Normal file
36
docs/source/bge/bge_code.rst
Normal file
@ -0,0 +1,36 @@
|
||||
BGE-Code-v1
|
||||
===========
|
||||
|
||||
**`BGE-Code-v1 <https://huggingface.co/BAAI/bge-code-v1>`_** is an LLM-based code embedding model that supports code retrieval, text retrieval, and multilingual retrieval. It primarily demonstrates the following capabilities:
|
||||
- Superior Code Retrieval Performance: The model demonstrates exceptional code retrieval capabilities, supporting natural language queries in both English and Chinese, as well as 20 programming languages.
|
||||
- Robust Text Retrieval Capabilities: The model maintains strong text retrieval capabilities comparable to text embedding models of similar scale.
|
||||
- Extensive Multilingual Support: BGE-Code-v1 offers comprehensive multilingual retrieval capabilities, excelling in languages such as English, Chinese, Japanese, French, and more.
|
||||
|
||||
+-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+
|
||||
| Model | Language | Parameters | Model Size | Description |
|
||||
+===================================================================+=================+============+==============+====================================================================================================+
|
||||
| `BAAI/bge-code-v1 <https://huggingface.co/BAAI/bge-code-v1>`_ | Multilingual | 1.5B | 6.18 GB | SOTA code retrieval model, with exceptional multilingual text retrieval performance as well |
|
||||
+-------------------------------------------------------------------+-----------------+------------+--------------+----------------------------------------------------------------------------------------------------+
|
||||
|
||||
|
||||
.. code:: python
|
||||
from FlagEmbedding import FlagLLMModel
|
||||
|
||||
queries = [
|
||||
"Delete the record with ID 4 from the 'Staff' table.",
|
||||
'Delete all records in the "Livestock" table where age is greater than 5'
|
||||
]
|
||||
documents = [
|
||||
"DELETE FROM Staff WHERE StaffID = 4;",
|
||||
"DELETE FROM Livestock WHERE age > 5;"
|
||||
]
|
||||
|
||||
model = FlagLLMModel('BAAI/bge-code-v1',
|
||||
query_instruction_format="<instruct>{}\n<query>{}",
|
||||
query_instruction_for_retrieval="Given a question in text, retrieve SQL queries that are appropriate responses to the question.",
|
||||
trust_remote_code=True,
|
||||
use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
|
||||
embeddings_1 = model.encode_queries(queries)
|
||||
embeddings_2 = model.encode_corpus(documents)
|
||||
similarity = embeddings_1 @ embeddings_2.T
|
||||
print(similarity)
|
@ -16,6 +16,8 @@ BGE-VL contains light weight CLIP based models as well as more powerful LLAVA-Ne
|
||||
+----------------------------------------------------------------------+-----------+------------+--------------+-----------------------------------------------------------------------+
|
||||
| `BAAI/bge-vl-MLLM-S2 <https://huggingface.co/BAAI/BGE-VL-MLLM-S2>`_ | English | 7.57B | 15.14 GB | Finetune BGE-VL-MLLM-S1 with one epoch on MMEB training set |
|
||||
+----------------------------------------------------------------------+-----------+------------+--------------+-----------------------------------------------------------------------+
|
||||
| `BAAI/BGE-VL-v1.5-zs <https://huggingface.co/BAAI/BGE-VL-v1.5-zs>`_ | English | 7.57B | 15.14 GB | Better multi-modal retrieval model with performs well in all kinds of tasks |
|
||||
| `BAAI/BGE-VL-v1.5-mmeb <https://huggingface.co/BAAI/BGE-VL-v1.5-mmeb>`_ | English | 7.57B | 15.14 GB | Better multi-modal retrieval model, additionally fine-tuned on MMEB training set |
|
||||
|
||||
|
||||
BGE-VL-CLIP
|
||||
@ -107,4 +109,50 @@ The normalized last hidden state of the [EOS] token in the MLLM is used as the e
|
||||
print(scores)
|
||||
|
||||
|
||||
BGE-VL-v1.5
|
||||
-----------
|
||||
|
||||
BGE-VL-v1.5 series is the updated version of BGE-VL, bringing better performance on both retrieval and multi-modal understanding. The models were trained on 30M MegaPairs data and extra 10M natural and synthetic data.
|
||||
|
||||
`bge-vl-v1.5-zs` is a zero-shot model, only trained on the data mentioned above. `bge-vl-v1.5-mmeb` is the fine-tuned version on MMEB training set.
|
||||
|
||||
|
||||
.. code:: python
|
||||
|
||||
import torch
|
||||
from transformers import AutoModel
|
||||
from PIL import Image
|
||||
|
||||
MODEL_NAME= "BAAI/BGE-VL-v1.5-mmeb" # "BAAI/BGE-VL-v1.5-zs"
|
||||
|
||||
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
|
||||
model.eval()
|
||||
model.cuda()
|
||||
|
||||
with torch.no_grad():
|
||||
model.set_processor(MODEL_NAME)
|
||||
|
||||
query_inputs = model.data_process(
|
||||
text="Make the background dark, as if the camera has taken the photo at night",
|
||||
images="../../imgs/cir_query.png",
|
||||
q_or_c="q",
|
||||
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
|
||||
)
|
||||
|
||||
candidate_inputs = model.data_process(
|
||||
images=["../../imgs/cir_candi_1.png", "../../imgs/cir_candi_2.png"],
|
||||
q_or_c="c",
|
||||
)
|
||||
|
||||
query_embs = model(**query_inputs, output_hidden_states=True)[:, -1, :]
|
||||
candi_embs = model(**candidate_inputs, output_hidden_states=True)[:, -1, :]
|
||||
|
||||
query_embs = torch.nn.functional.normalize(query_embs, dim=-1)
|
||||
candi_embs = torch.nn.functional.normalize(candi_embs, dim=-1)
|
||||
|
||||
scores = torch.matmul(query_embs, candi_embs.T)
|
||||
print(scores)
|
||||
|
||||
|
||||
|
||||
For more details, check out the repo of `MegaPairs <https://github.com/VectorSpaceLab/MegaPairs>`_
|
@ -15,6 +15,7 @@ BGE
|
||||
bge_m3
|
||||
bge_icl
|
||||
bge_vl
|
||||
bge_code
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
Loading…
x
Reference in New Issue
Block a user