Jake Poznanski
|
0ddaf9023d
|
Getting ready to launch a new training run
|
2024-10-02 23:04:56 +00:00 |
|
Jake Poznanski
|
1686790ac8
|
Checking filtering logic
|
2024-10-02 22:45:40 +00:00 |
|
Jake Poznanski
|
b340ae5092
|
A few notes, starting to test dataloader with new structured response format
|
2024-10-02 22:17:15 +00:00 |
|
Jake Poznanski
|
8315162a25
|
Merge branch 'main' of https://github.com/allenai/pdelfin
|
2024-10-02 20:48:58 +00:00 |
|
Jake Poznanski
|
6d8e638152
|
Readme
|
2024-10-02 20:48:39 +00:00 |
|
Jake Poznanski
|
ad1d818816
|
Update README.md
|
2024-10-02 13:42:43 -07:00 |
|
Jake Poznanski
|
68b9ee8c90
|
Small prompt fix
|
2024-10-02 20:19:03 +00:00 |
|
Jake Poznanski
|
a5c27212f0
|
Need more token output due to structured outputs
|
2024-10-02 19:54:54 +00:00 |
|
Jake Poznanski
|
d05832ebee
|
Fixes and evals for structured outputs
|
2024-10-02 19:51:15 +00:00 |
|
Jake Poznanski
|
802632c49f
|
Building openai prompt with structured output
|
2024-10-02 18:10:47 +00:00 |
|
Jake Poznanski
|
be00ccf321
|
Switching buildsilver to use new anchor code
|
2024-10-02 17:29:44 +00:00 |
|
Jake Poznanski
|
0071cbd788
|
Appears as if the report method works really well, might need one last step to detect rotated pages
|
2024-10-02 16:44:39 +00:00 |
|
Jake Poznanski
|
5703a59e50
|
Fix for voting on multiple docs in the same eval page
|
2024-10-02 16:31:59 +00:00 |
|
Jake Poznanski
|
73fb81ef6c
|
Review page size option, fixing mkdirs in convertsilver script
|
2024-10-02 15:53:21 +00:00 |
|
Jake Poznanski
|
276465aab1
|
Adding flag to allow skipping filter
|
2024-10-02 15:46:12 +00:00 |
|
Jake Poznanski
|
549e07bed0
|
filtering out stupid ads
|
2024-10-02 15:36:41 +00:00 |
|
Jake Poznanski
|
6ef8226347
|
Can spit out anchor text for a gpt engine using pypdf, showing locations of images and text
|
2024-10-01 23:15:53 +00:00 |
|
Jake Poznanski
|
e42cecf96c
|
Adding anchor code based off of pypdf that visits each text block, hopefully so we can make it output good bboxes
|
2024-10-01 22:10:58 +00:00 |
|
Jake Poznanski
|
09e8840c56
|
coherency based anchor text
|
2024-10-01 20:19:03 +00:00 |
|
Jake Poznanski
|
28fe314539
|
prepping anchor text generation code
|
2024-10-01 19:59:48 +00:00 |
|
Jake Poznanski
|
7795f65a53
|
Fixing bug where we were not showing all the worst alignments
|
2024-10-01 16:56:15 +00:00 |
|
Jake Poznanski
|
9d6e2faf95
|
Runeval is much improved now
|
2024-10-01 16:46:35 +00:00 |
|
Jake Poznanski
|
8a66ecee25
|
Script to rerun openai prompts on the same data
|
2024-10-01 16:25:16 +00:00 |
|
Jake Poznanski
|
f99f6a6729
|
Prompt utils
|
2024-10-01 16:02:24 +00:00 |
|
Jake Poznanski
|
b6543a4f65
|
Qwen checkpoint fixer script
|
2024-10-01 16:02:10 +00:00 |
|
Jake Poznanski
|
2c7323d1c4
|
Convert silver adjustments
|
2024-09-30 22:41:51 +00:00 |
|
Jake Poznanski
|
80bb0cbc23
|
Open ai to openai comparison now supported, new prompts
|
2024-09-30 22:08:30 +00:00 |
|
Jake Poznanski
|
e179453cc5
|
Fixing qwen checkpoint script
|
2024-09-30 20:34:06 +00:00 |
|
Jake Poznanski
|
963e946233
|
Convertsilver birr script can go in and out of S3 now
|
2024-09-30 20:06:45 +00:00 |
|
Jake Poznanski
|
b856b4551f
|
Fixes to convertsilver to birr script
|
2024-09-30 19:54:30 +00:00 |
|
Jake Poznanski
|
da1982acb8
|
Refactoring prompts into their own new folder
|
2024-09-30 18:48:17 +00:00 |
|
Jake Poznanski
|
d74f9a352b
|
Send silver script tries to open file first, before sending an API requests
|
2024-09-30 18:41:50 +00:00 |
|
Jake Poznanski
|
1216d9c7c9
|
retrieve silver script reports errors better
|
2024-09-30 18:41:33 +00:00 |
|
Jake Poznanski
|
b4e9d6a2b8
|
Buildsilver script suppors reservoir sampling so it can sample 100M+ paths now efficiently
|
2024-09-30 18:41:18 +00:00 |
|
Jake Poznanski
|
8ec9e35f22
|
dataprep issue
|
2024-09-28 04:31:11 +00:00 |
|
Jake Poznanski
|
e53f782b0f
|
Datasetdict fix
|
2024-09-28 03:38:29 +00:00 |
|
Jake Poznanski
|
decfd7fbc1
|
Fixing the refiner input prompt to something simpler that doesn't depend on the training data. Fixing beaker job workspace and bumping priority to high.
|
2024-09-27 22:54:07 +00:00 |
|
Jake Poznanski
|
22b765e6be
|
Going back to non iterable dataset, so shuffling works better, applying a light filter
|
2024-09-27 15:48:56 +00:00 |
|
Jake Poznanski
|
65a9c9981e
|
Hopefuly will train now
|
2024-09-27 15:16:12 +00:00 |
|
Jake Poznanski
|
e864b9d88f
|
weird dataloader stuff now
|
2024-09-27 02:53:59 +00:00 |
|
Jake Poznanski
|
37f10051f6
|
typo
|
2024-09-27 01:19:21 +00:00 |
|
Jake Poznanski
|
c00e40d1c4
|
More fixes
|
2024-09-26 23:10:07 +00:00 |
|
Jake Poznanski
|
d098a87ed2
|
Column name fix
|
2024-09-26 22:29:19 +00:00 |
|
Jake Poznanski
|
84e9da637c
|
Removing lambda due to pickling errors
|
2024-09-26 21:39:08 +00:00 |
|
Jake Poznanski
|
61dd7bb61f
|
Fix for map in iterable mode
|
2024-09-26 20:44:47 +00:00 |
|
Jake Poznanski
|
49efa5cb40
|
Typo
|
2024-09-26 19:57:53 +00:00 |
|
Jake Poznanski
|
cf1aa0176e
|
Proper use of iterable_dataset
|
2024-09-26 19:55:54 +00:00 |
|
Jake Poznanski
|
05fdb81da2
|
map and filter on iterable dataset
|
2024-09-26 19:01:34 +00:00 |
|
Jake Poznanski
|
f14e910175
|
bnb
|
2024-09-26 03:30:35 +00:00 |
|
Jake Poznanski
|
7707bc08da
|
trying cheaper optimizer to solve ooms
|
2024-09-25 22:56:05 +00:00 |
|