Make ctx_segment_ids a list instead of np.zeros_like

* fix #1687 * fix - UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow.. * fix RuntimeError: received 0 items of ancdata * Remove set_sharing_strategy from this branch and replace numpy.zeros_like with python numpy
2026-01-05 11:38:20 +00:00 · 2022-01-03 09:33:55 +02:00 · 2022-01-03 09:33:55 +02:00 · a1fb70bbbd
commit a1fb70bbbd
parent 39573cf0a9
1 changed files with 1 additions and 2 deletions
--- a/haystack/modeling/data_handler/processor.py
+++ b/haystack/modeling/data_handler/processor.py
@ -1102,8 +1102,7 @@ class TextSimilarityProcessor(Processor):
                        return_token_type_ids=True
                    )

-                    # TODO check if we need this and potentially remove
-                    ctx_segment_ids = np.zeros_like(ctx_inputs["token_type_ids"], dtype=np.int32)
+                    ctx_segment_ids = [[0] * len(ctx_inputs["token_type_ids"][0])] * len(ctx_inputs["token_type_ids"])

                    # get tokens in string format
                    tokenized_passage = [self.passage_tokenizer.convert_ids_to_tokens(ctx) for ctx in ctx_inputs["input_ids"]]