Commit Graph

  • 3990b94a4a Bump version to v0.4.3 for release main v0.4.3 Jake Poznanski 2025-10-31 17:37:01 +00:00
  • 419b57649c Version bump Jake Poznanski 2025-10-31 17:36:53 +00:00
  • a39ca53168 Merge branch 'jakep/priority_scheduler' Jake Poznanski 2025-10-30 21:25:21 +00:00
  • 5c82a15cdf Qwen3 torch compile broken jakep/v0.5 Jake Poznanski 2025-10-30 18:57:13 +00:00
  • 3362b6e5b1 Try without torch compile Jake Poznanski 2025-10-30 18:43:39 +00:00
  • 471c25b8ed Test config to fix some trainer dependecy issues Jake Poznanski 2025-10-30 18:38:51 +00:00
  • e95aaede75 Fixing issue which likely corrects #371 Jake Poznanski 2025-10-30 18:21:00 +00:00
  • e1e3d5bf22 README for finetuning Jake Poznanski 2025-10-30 18:13:04 +00:00
  • 1de9e4ba76 Trying idea of priority scheduler to get more throughput on cluster jakep/priority_scheduler Jake Poznanski 2025-10-30 16:52:13 +00:00
  • ec1bf2471c Removing $ inline math from dolma viewer script, it's not reliable and our models don't output that format anymore Jake Poznanski 2025-10-30 16:23:33 +00:00
  • 3d2c977ac5 Merge branch 'jakep/lorafix' Jake Poznanski 2025-10-30 16:20:53 +00:00
  • d754446756 Prepare checkpoint fixes jakep/lorafix Jake Poznanski 2025-10-29 22:09:14 +00:00
  • 41bee1edb4
    Update README.md Jake Poznanski 2025-10-29 15:02:54 -07:00
  • 39b813d125 Configs Jake Poznanski 2025-10-29 21:50:47 +00:00
  • 0ee4536375 Config adjust Jake Poznanski 2025-10-28 22:42:03 +00:00
  • d6915b7044 Fixing lora Jake Poznanski 2025-10-28 22:40:38 +00:00
  • 4768db63d2 Fixes to train.py Jake Poznanski 2025-10-28 22:15:45 +00:00
  • 91962d64e2 Example finetuning config Jake Poznanski 2025-10-28 22:11:19 +00:00
  • c62e221b69 Adding init file in data Jake Poznanski 2025-10-28 21:56:50 +00:00
  • f9e0afc0cd Cleanup Jake Poznanski 2025-10-28 20:43:13 +00:00
  • f9ac725e2c Cleanup of some table synth data generation stuff Jake Poznanski 2025-10-28 18:59:45 +00:00
  • 6fe1ea29ad Fixing race condition if multiple mine templates running at same time Jake Poznanski 2025-10-28 18:17:12 +00:00
  • abd755dcec Adjusting prompt to fix table columns in synth data Jake Poznanski 2025-10-28 17:56:18 +00:00
  • 4f6c92e1a3 Adding reliable check to see if table is fully specified, which we can use later to help synth pipeline avoid this Jake Poznanski 2025-10-27 22:59:47 +00:00
  • 1da20c934d Reorganizing table parsing code Jake Poznanski 2025-10-27 20:54:36 +00:00
  • c3761eb6c7 Merge branch 'main' into jakep/v0.5 Jake Poznanski 2025-10-27 18:12:10 +00:00
  • 4ee20056c0 Adding a flag to run_benchmark script Jake Poznanski 2025-10-27 18:12:00 +00:00
  • 2a47b7067d Merge branch 'main' into jakep/v0.5 Jake Poznanski 2025-10-27 17:55:36 +00:00
  • 579e52a77f Adding Cirrascale as inference provider Jake Poznanski 2025-10-27 17:47:09 +00:00
  • 4e5dc6ebf6 Ok, dataloader hopefully works with new transformers which only supports pt tensor outputs in processor Jake Poznanski 2025-10-24 22:48:36 +00:00
  • 97e357e9d5 Bringing in bench updates Jake Poznanski 2025-10-24 22:20:54 +00:00
  • a93c780b87 Can uncomment some table tests that failed before Jake Poznanski 2025-10-24 22:16:19 +00:00
  • c4dcc4ded4 Some small table test gen fixes Jake Poznanski 2025-10-24 22:15:25 +00:00
  • 2e03b8a825
    Bump actions/download-artifact from 4 to 6 dependabot/github_actions/actions/download-artifact-6 dependabot[bot] 2025-10-24 22:12:55 +00:00
  • 6dca0b9476
    Bump actions/upload-artifact from 4 to 5 dependabot/github_actions/actions/upload-artifact-5 dependabot[bot] 2025-10-24 22:12:51 +00:00
  • cce7a6c4de Adding more row span col span tests Jake Poznanski 2025-10-24 21:50:32 +00:00
  • c9f0b2c709 Table checking code refactored Jake Poznanski 2025-10-24 21:37:25 +00:00
  • 81d11ced06 Simplifying table test code Jake Poznanski 2025-10-24 20:28:41 +00:00
  • e25b4e4bac Table parsing improved Jake Poznanski 2025-10-24 20:12:01 +00:00
  • 8633256ddb Ok, making the heading crawls a bit different, it's nice now Jake Poznanski 2025-10-24 20:06:45 +00:00
  • 8079609004 Launching qwen3vl version again Jake Poznanski 2025-10-24 19:50:28 +00:00
  • 247d51d747 Updating deepinfra pricing Jake Poznanski 2025-10-24 17:29:13 +00:00
  • f31f68a4d2 New transformers for qwen3 vl stuff Jake Poznanski 2025-10-24 16:56:39 +00:00
  • 10ab6a60e0 Adding suppor for smaller error bars in benchmarking Jake Poznanski 2025-10-24 16:49:49 +00:00
  • ba9e14154e Setting new transformers version for training Jake Poznanski 2025-10-23 22:27:04 +00:00
  • 15496b973f Infrapartner benchmarking script Jake Poznanski 2025-10-23 21:49:23 +00:00
  • 1ef66fd313 Working on table parsing Jake Poznanski 2025-10-23 21:19:27 +00:00
  • 88937c6e40 Prepping to train qwen3 vl Jake Poznanski 2025-10-23 18:54:38 +00:00
  • 2d2c2c9202 Adjusted table Jake Poznanski 2025-10-23 18:43:16 +00:00
  • 7d551ecb56 Cleanup bench results table Jake Poznanski 2025-10-23 18:42:51 +00:00
  • 7c5a3854d2 Copy over results table to main readme Jake Poznanski 2025-10-23 18:42:07 +00:00
  • 2c90e6aaf2 More readme updated with paper benchmark scores Jake Poznanski 2025-10-23 18:36:11 +00:00
  • 53be91a0ed Add citations and arxiv paper links Jake Poznanski 2025-10-23 18:26:39 +00:00
  • ab108a5c5a Lint fixes Jake Poznanski 2025-10-23 04:03:39 +00:00
  • 210389f0f1 Merge branch 'main' of https://github.com/allenai/olmocr Jake Poznanski 2025-10-22 22:22:50 +00:00
  • 5083967589 Cleaning up repo to move all unit tests to a consistent place Jake Poznanski 2025-10-22 22:22:49 +00:00
  • e7d6036ab3
    Add files via upload Jake Poznanski 2025-10-22 09:30:40 -07:00
  • d41584772e
    Update README.md Jake Poznanski 2025-10-22 09:09:09 -07:00
  • f5569fc443
    Add files via upload Jake Poznanski 2025-10-22 09:07:03 -07:00
  • 197be00aa4
    Update README.md Jake Poznanski 2025-10-22 08:44:11 -07:00
  • da3b1a8b60
    Update README.md Jake Poznanski 2025-10-22 08:23:44 -07:00
  • ffa4ecc9c2
    Add files via upload Luca Soldaini 2025-10-22 10:49:31 -04:00
  • 4a4e5a5406
    paper Kyle Lo 2025-10-22 02:47:50 -05:00
  • f5fad405c0 Bump version to v0.4.2 for release v0.4.2 Jake Poznanski 2025-10-22 04:49:12 +00:00
  • ee37a6a0ac Updating tests and a few CI fixes Jake Poznanski 2025-10-22 04:49:01 +00:00
  • fe0bde009c Bump version to v0.4.1 for release v0.4.1 Jake Poznanski 2025-10-22 04:27:51 +00:00
  • b21c933af2 Version bump to rebuild Jake Poznanski 2025-10-22 04:27:47 +00:00
  • 970c5d08d0 Adding dependecies so unit tests run in CI Jake Poznanski 2025-10-22 04:26:51 +00:00
  • 373704ccab
    Bump transformers from 4.55.2 to 4.57.1 dependabot/pip/transformers-4.57.1 dependabot[bot] 2025-10-22 04:14:11 +00:00
  • a426f0c462 Bump version to v0.4.0 for release v0.4.0 Jake Poznanski 2025-10-22 04:13:54 +00:00
  • ea414c9e71 Version bump Jake Poznanski 2025-10-22 04:13:32 +00:00
  • 87137db70c Merge branch 'jakep/new_data' Jake Poznanski 2025-10-22 04:12:09 +00:00
  • 4b8146c532 Unit test fixes jakep/new_data Jake Poznanski 2025-10-22 03:47:58 +00:00
  • 3786c4c5ba Renaming Jake Poznanski 2025-10-21 20:15:26 +00:00
  • 5c16f52d3b Paddle vl benchmark runner saves off data Jake Poznanski 2025-10-21 20:09:39 +00:00
  • 0c3d2d2e16 One more args fix Jake Poznanski 2025-10-20 22:38:49 +00:00
  • 8118680b4b Fixes Jake Poznanski 2025-10-20 22:12:33 +00:00
  • 7472ef905e More args Jake Poznanski 2025-10-20 22:09:05 +00:00
  • 3d3fd78499 Test Jake Poznanski 2025-10-20 22:08:19 +00:00
  • d211276a73 Adjust again Jake Poznanski 2025-10-20 22:07:29 +00:00
  • 096cb3e521 Ugh Jake Poznanski 2025-10-20 22:02:22 +00:00
  • 255ee48594 Fixing other way to run benchmark Jake Poznanski 2025-10-20 22:00:32 +00:00
  • eaf83026d3 Lints Jake Poznanski 2025-10-20 18:43:13 +00:00
  • 4fc9cd112b Improving docs Jake Poznanski 2025-10-20 18:42:49 +00:00
  • 47ed6bbe66 VLLM based nanonets ocr2 Jake Poznanski 2025-10-20 17:32:30 +00:00
  • 76e05f8165 Fixes, adding more runners Jake Poznanski 2025-10-20 17:11:12 +00:00
  • e796448482 Adding paddlevl script Jake Poznanski 2025-10-20 16:26:08 +00:00
  • 7a744cc0b4 FInal docs on model setup Jake Poznanski 2025-10-19 18:21:45 +00:00
  • 5959cf1ae6 More smaller jakep/new_data_finetune_on_bench_data Jake Poznanski 2025-10-17 23:45:52 +00:00
  • 30f59751a8 More seeds jus tin case Jake Poznanski 2025-10-17 22:23:15 +00:00
  • b1758ed4f1 Fixing configs Jake Poznanski 2025-10-17 22:14:37 +00:00
  • baa380d9f4 Try a soup Jake Poznanski 2025-10-17 22:12:44 +00:00
  • e0c02dfb4f 10 epoch finetunes Jake Poznanski 2025-10-17 20:08:08 +00:00
  • c1107c2902 More script changes Jake Poznanski 2025-10-17 17:23:00 +00:00
  • 09fc557b59 Fixing olmocr v3 bench fine tune Jake Poznanski 2025-10-17 16:55:17 +00:00
  • ae6ed3b200 Adjusting scripts slightly Jake Poznanski 2025-10-17 16:45:54 +00:00
  • 0c036ea589 Adding a few configs to get ablations for paper Jake Poznanski 2025-10-17 03:58:34 +00:00
  • ba56823218 Merge branch 'jakep/new_data' into jakep/new_data_finetune_on_bench_data Jake Poznanski 2025-10-17 03:49:48 +00:00
  • da66d69e77
    Bump sphinx-autodoc-typehints from 1.23.3 to 3.5.2 dependabot/pip/sphinx-autodoc-typehints-3.5.2 dependabot[bot] 2025-10-16 22:14:12 +00:00
  • 6c32ff2c7d Update dates Jake Poznanski 2025-10-16 18:21:18 +00:00