895 Commits

Author SHA1 Message Date
hanhainebula
f9f5c477a6 debug for EncoderOnlyEmbedderM3Model: colbert 2024-10-21 17:46:53 +08:00
hanhainebula
7f98e4a864 debug for EncoderOnlyEmbedderM3Model: colbert 2024-10-21 17:39:38 +08:00
hanhainebula
58b18d56fd disable cross_device negs for m3 embedder example 2024-10-21 17:28:55 +08:00
hanhainebula
7f0d3c677f debug for EncoderOnlyEmbedderM3Model: colbert:done 2024-10-21 17:22:32 +08:00
hanhainebula
75b71a8808 debug for EncoderOnlyEmbedderM3Model: colbert:done 2024-10-21 17:01:08 +08:00
hanhainebula
b30b922049 debug for EncoderOnlyEmbedderM3Model: colbert 2024-10-21 16:38:16 +08:00
hanhainebula
f5d10e4a3c debug for EncoderOnlyEmbedderM3Model: colbert 2024-10-21 16:27:04 +08:00
hanhainebula
f3172a0a7e Merge branch 'new-flagembedding-v1' of github.com:hanhainebula/FlagEmbedding into new-flagembedding-v1 2024-10-21 16:14:24 +08:00
hanhainebula
11725b356a simplify refresh data print info 2024-10-21 16:13:50 +08:00
cfli
626e95123c Merge branch 'new-flagembedding-v1' of https://github.com/hanhainebula/FlagEmbedding into new-flagembedding-v1 2024-10-21 16:13:26 +08:00
cfli
4e9d0b386f update reranker finetune 2024-10-21 16:13:13 +08:00
hanhainebula
48af6d3f98 remove old para for embedder finetune examples
- rm kd_loss_plus_normal_loss
2024-10-21 16:10:07 +08:00
hanhainebula
bce8a96553 update forward func and rm old para for embedder
- adapt to post-refactor embedder finetune code
  - rm kd_loss_plus_normal_loss
  - refactor m3 modeling code
2024-10-21 16:09:10 +08:00
hanhainebula
7c83edeff0 Merge branch 'new-flagembedding-v1' of github.com:hanhainebula/FlagEmbedding into new-flagembedding-v1 2024-10-21 16:04:54 +08:00
hanhainebula
a16dd081e5 refactor embedder finetune code
- remove kd_loss_plus_normal_loss para
- simplify forward function
2024-10-21 16:03:41 +08:00
cfli
dbbff43909 add source 2024-10-21 14:09:40 +08:00
hanhainebula
8f3b1e0c15 decrease num_epochs for embedder finetune examples 2024-10-21 13:38:02 +08:00
hanhainebula
dbbcabb888 fix a bug in AbsEmbedderSameDatasetTrainDataset
- allow empty pos_scores and neg_scores
2024-10-21 13:31:55 +08:00
hanhainebula
efa9703506 fix a bug in EncoderOnlyEmbedderTrainer 2024-10-21 13:28:34 +08:00
hanhainebula
ff611e127e fix a bug in AbsArguments.py for embedder 2024-10-21 12:59:12 +08:00
hanhainebula
918571ad6e upload example finetune scripts for embedder 2024-10-21 12:56:30 +08:00
hanhainebula
bc99e9a126 upload deepspeed configs for embedder finetune 2024-10-21 12:56:06 +08:00
hanhainebula
94822a897f upload example-data for embedder finetune 2024-10-21 12:55:31 +08:00
hanhainebula
dbb01ed6f8 upload finetune code for embedder: decoder icl 2024-10-21 01:38:54 +08:00
hanhainebula
d973adafad adapt finetune code of embedder for new para: kd
- encoder-only: base & m3
- decode-only: base
2024-10-21 01:38:18 +08:00
hanhainebula
70212678e9 update loss computation for embedder finetune
- add para: kd_loss_type & kd_loss_plus_normal_loss
- fix no_in_batch_neg option
- add two kinds of kd loss: kl_div, m3_kd_loss
2024-10-21 01:36:10 +08:00
hanhainebula
97be9b0f48 simplify example code for embedder inference 2024-10-19 22:31:30 +08:00
hanhainebula
effc2bb352 simplify example code for embedder inference 2024-10-19 22:23:53 +08:00
hanhainebula
9a8bcd7dfa update embedder inference code
- use_fp16=True, normalize_embeddings=True
- add more default para for auto_embedder
2024-10-19 22:23:28 +08:00
hanhainebula
33970b899b add expected results for m3 compute_score 2024-10-19 21:56:55 +08:00
hanhainebula
291dec2f6a fix a bug in M3Embedder 2024-10-19 21:44:27 +08:00
hanhainebula
46dc5a031f fix a bug in EncoderOnlyEmbedderM3Model 2024-10-19 21:40:10 +08:00
hanhainebula
8916939a8d fix a bug in EncoderOnlyEmbedderM3Model 2024-10-19 21:38:46 +08:00
hanhainebula
d2232b4159 fix a bug in EncoderOnlyEmbedderM3Model 2024-10-19 21:37:38 +08:00
hanhainebula
d3b4f7bb78 fix a bug in m3 embedder compute_score 2024-10-19 21:35:10 +08:00
hanhainebula
2c4c9629f3 upload m3 embedder compute_score examples 2024-10-19 21:33:16 +08:00
hanhainebula
7fc9fca908 implement compute_score func for m3 embedder 2024-10-19 21:23:22 +08:00
hanhainebula
02197abddf fix a bug for embedder inference: single input 2024-10-19 21:22:22 +08:00
hanhainebula
f4d46ff73a add expected results for embedder inference code 2024-10-19 20:48:46 +08:00
hanhainebula
ae72acf12c fix a bug in ICLLLMEmbedder inference 2024-10-19 20:28:52 +08:00
hanhainebula
fce4e46635 del debug code for FlagLLMModel inference 2024-10-19 20:28:04 +08:00
hanhainebula
0ba20823a2 fix a bug in ICLLLMEmbedder inference 2024-10-19 20:25:14 +08:00
hanhainebula
39b362112a fix bugs in embedder inference: enable use_fp16 2024-10-19 20:22:52 +08:00
hanhainebula
eb724106ad fix UserWarning: rm max_length para when padding 2024-10-19 20:11:31 +08:00
hanhainebula
99dca1d3d2 update example infer code for multilingual-gemma2 2024-10-19 20:03:05 +08:00
hanhainebula
78d1a8727c update example code for embedder 2024-10-19 18:57:56 +08:00
hanhainebula
baa06bf033 fix UserWarning: rm max_length para when encoding 2024-10-19 18:15:43 +08:00
hanhainebula
2d3b64a43c set weights_only=True when loading m3 linear 2024-10-19 18:09:33 +08:00
hanhainebula
4d2916cfe6 update m3 example code 2024-10-19 18:01:31 +08:00
hanhainebula
052954588a update multi-gpu code for decoder-only embedder 2024-10-19 18:01:08 +08:00