MichelBartels 0b0b9689a4
Add TinyBERT data augmentation (#1923)
* add tinybert data augmentation

* don't reload glove in tinybert data augmentation

* fix unnecessary load_glove call

* fix type hints

* add comments and type hints

* add batch_size argument

* don't predict subwords as alternative for words

* fix subword predictions

* limit sequence length

* actually limit sequence length

* improve performance by calculating nearest glove vector on gpu

* add model and tokenizer parameter

* fix type hints

* improve data augmentation performance

* explained limits of script

* corrected comment

* added data augmentation test

* don't label every question in augmented dataset as impossible

* add sample glove

* better handling of downloading of glove

* fix typo of last commit
2022-01-04 18:34:16 +01:00
..
2019-11-27 17:53:42 +01:00
2021-10-25 12:27:02 +02:00
2020-06-08 11:07:19 +02:00
2021-12-22 17:20:23 +01:00