72 Commits

Author SHA1 Message Date
Sebastian Raschka
4014bdd520
Ch06 classifier function asserts (#703) 2025-06-23 08:21:55 -05:00
Sebastian Raschka
222803737d
Fix data download if UCI is temporarily down (#592) 2025-03-31 16:25:53 -05:00
Sebastian Raschka
c21bfe4a23
Add PyPI package (#576)
* Add PyPI package

* fixes

* fixes
2025-03-23 19:28:49 -05:00
Sebastian Raschka
86b714a5e0
Specify UTF-8 encoding in the json load command explicitely (#557) 2025-03-05 11:46:21 -06:00
Sebastian Raschka
d1e99f6092
Fix timeout issue related to spam data backup url (#544)
* Add backup url for Spam Dataset

* import urllib

* fix url

* fix timeout issue
2025-02-20 09:26:23 -06:00
Sebastian Raschka
c39aa32ef5
Add backup url for Spam Dataset (#543)
* Add backup url for Spam Dataset

* import urllib

* fix url
2025-02-20 08:08:28 -06:00
Sebastian Raschka
a08d7aaa84
Uv workflow improvements (#531)
* Uv workflow improvements

* Uv workflow improvements

* linter improvements

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* pytproject.toml fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* windows fixes

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix

* win32 fix
2025-02-16 13:16:51 -06:00
Sebastian Raschka
a6cc574605
Upgrade to NumPy 2.0 (#520)
* Upgrade to NumPy 2.0

* bump pytorch

* bump pytorch

* bump pytorch

* bump pytorch

* bump pytorch

* update

* update packages
2025-02-09 06:21:58 -06:00
Sebastian Raschka
8cfa52bf1d
More pythonic way to find the longest sequence (#512)
* More pythonic way to find the longest sequence

* pep8 fix
2025-02-01 10:22:47 -06:00
Sebastian Raschka
701090815e
Add backup URL for gpt2 weights (#469)
* Add backup URL for gpt2 weights

* newline
2025-01-05 11:28:09 -06:00
Sebastian Raschka
f6281ab91b
Add utility to prevent double execution of certain cells (#437) 2024-11-14 19:56:49 +09:00
rasbt
a20ce1b817
remove redundant code line 2024-10-13 15:58:11 -05:00
Sebastian Raschka
7ef5129e18
Fix truncation issue in classify_review function (#373) 2024-09-25 19:54:36 -05:00
Sebastian Raschka
52ee1c7cdb
Add missing bullet point 2024-09-21 12:59:12 -05:00
Mingyuan Xu
f77c376b05
Run generate example in ch06 optionally on GPU (#352)
* model.to("cuda")

model.to("cuda")

* update device placement

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-09-13 08:01:52 -05:00
Sebastian Raschka
c443035d56
Note about MPS in ch06 and ch07 (#325) 2024-08-19 08:11:33 -05:00
TITC
38390b2a8d
track tokens seen in chapter5, track examples seen in chapter6 (#319) 2024-08-13 07:09:05 -05:00
Sebastian Raschka
08040f024c
Test code in pytorch 2.4 (#285)
* test code in pytorch 2.4

* update
2024-07-24 21:53:41 -05:00
Sebastian Raschka
8d02cb1cee
Add download help message (#274) 2024-07-19 08:29:29 -05:00
Sebastian Raschka
4f0a107692 show how to use the finetuned model 2024-07-09 06:43:26 -07:00
Sebastian Raschka
f6bcdd37bd
Fix links in summary sections (#254) 2024-06-29 07:51:31 -05:00
rasbt
31806828d0
add links to summary sections 2024-06-29 07:33:26 -05:00
Daniel Kleine
1e69c8e0b5
fixed minor issues (#252)
* fixed typo

* fixed var name in md text
2024-06-29 06:38:25 -05:00
Daniel Kleine
06921f3333
minor markdown fixes (#236) 2024-06-21 13:55:34 -05:00
Sebastian Raschka
6c0dc2362b
Add standalone finetuning and evaluation scripts for chapter 7 (#234)
* add finetuning and eval scripts

* update link

* update links

* fix link
2024-06-21 05:23:24 -05:00
rasbt
283397aaf2
add main and optional sections 2024-06-19 17:48:25 -05:00
Daniel Kleine
bbb2a0c3d5
fixed num_workers (#229)
* fixed num_workers

* ch06 & ch07: added num_workers to create_dataloader_v1
2024-06-19 17:36:46 -05:00
Jinge Wang
10018e00ff
Fixed some typos in ch06.ipynb (#219) 2024-06-18 05:54:01 -05:00
rasbt
3e0b0c66a8
fix spelling 2024-06-18 05:50:40 -05:00
rasbt
19c5784f82
replace figure 2024-06-18 05:46:36 -05:00
Daniel Kleine
dcbdc1d2e5
fixes for code (#206)
* updated .gitignore

* removed unused GELU import

* fixed model_configs, fixed all tensors on same device

* removed unused tiktoken

* update

* update hparam search

* remove redundant tokenizer argument

---------

Co-authored-by: rasbt <mail@sebastianraschka.com>
2024-06-11 20:59:48 -05:00
rasbt
1b1fd21d64
fix typo in comment 2024-06-09 06:14:02 -05:00
Sebastian Raschka
72a073bbbf
Remove leftover instances of self.tokenizer (#201)
* Remove leftover instances of self.tokenizer

* add endoftext token
2024-06-08 14:57:34 -05:00
rasbt
98d453b666
update formatting 2024-05-24 07:20:37 -05:00
rasbt
18e729643d
add assertion about data set length 2024-05-23 06:50:43 -05:00
rasbt
86f6c2df43
Fix device setting 2024-05-22 17:51:51 -05:00
rasbt
a8a28017c0
remove duplicated text 2024-05-19 11:34:47 -05:00
rasbt
02e6f06a11
add test mode for dataset download 2024-05-18 17:38:19 -05:00
rasbt
c7c83904a0
tokens seen -> examples seen 2024-05-13 20:08:48 -05:00
rasbt
16d19751b0
spelling 2024-05-13 20:06:38 -05:00
rasbt
cd7ea15e8d
add readme 2024-05-13 08:50:55 -05:00
rasbt
b28cc0cb8c
pep8 fixes 2024-05-13 07:50:51 -05:00
rasbt
a740a62239
tests and exercises 2024-05-13 07:45:59 -05:00
rasbt
8bc15ab316
fix tests 2024-05-12 19:03:14 -05:00
rasbt
21172a6a7e
add chapter 6 unit test 2024-05-12 18:51:28 -05:00
rasbt
281400feca
add missing figure 2024-05-12 18:37:02 -05:00
rasbt
88176a82eb
chapter 06 summary file 2024-05-12 18:27:50 -05:00
rasbt
2e47a6e61c
update dataset naming 2024-05-12 09:22:42 -05:00
rasbt
55c3a91838
rename download_and_unzip to make it more specific 2024-05-12 08:36:24 -05:00
rasbt
4b4e1e1ad5
use spam / not spam labels 2024-05-11 13:42:18 -05:00