# pdelfin Toolkit for truly understanding PDF documents in the wild. image Things supported: - A prompting strategy to get really good natural text parsing using ChatGPT 4o (silver_data) - An eval toolkit for comparing different pipeline versions - Basic filtering by language and SEO spam removal - Finetuning code for Qwen2-VL (and soon other VLMs) ### Note: Font installation You will probably need to install some fonts on your computer so that any pdfs you render come out looking nice. ``` sudo apt-get install ttf-mscorefonts-installer msttcorefonts fonts-crosextra-caladea fonts-crosextra-carlito gsfonts lcdf-typetools ```