mirror of
https://github.com/rasbt/LLMs-from-scratch.git
synced 2025-08-30 11:31:08 +00:00
add usage
This commit is contained in:
parent
fb54b064c9
commit
d311bae25a
@ -7,6 +7,8 @@ For example,
|
||||
- comparing rows 1 and 3 answers the question: "What is the performance difference when we train only the last layer instead of the last block?";
|
||||
- and so forth.
|
||||
|
||||
|
||||
|
||||
| | Model | Weights | Trainable token | Trainable layers | Context length | CPU/GPU | Training time | Training acc | Validation acc | Test acc |
|
||||
|---|--------------------|------------|-----------------|------------------|-------------------------|---------|---------------|--------------|----------------|----------|
|
||||
| 1 | gpt2-small (124M) | pretrained | last | last_block | longest train ex. (120) | V100 | 0.39 min | 96.63% | 97.99% | 94.33% |
|
||||
@ -17,3 +19,16 @@ For example,
|
||||
| 6 | gpt2-large (774M) | pretrained | last | last_block | longest train ex. (120) | V100 | 1.91 min | 99.52% | 98.66% | 96.67% |
|
||||
| 7 | gpt2-small (124M) | random | last | all | longest train ex. (120) | V100 | 0.93 min | 100% | 97.32% | 93.00% |
|
||||
| 8 | gpt2-small (124M) | pretrained | last | last_block | context length (1024) | V100 | 3.24 min | 83.08% | 87.92% | 78.33% |
|
||||
|
||||
|
||||
|
||||
### Usage:
|
||||
|
||||
- Row 1: `python additional-experiments.py`
|
||||
- Row 2: `python additional-experiments.py --trainable_token first`
|
||||
- Row 3: `python additional-experiments.py --trainable_layers last_layer`
|
||||
- Row 4: `python additional-experiments.py --trainable_layers all`
|
||||
- Row 5: `python additional-experiments.py --model_size gpt2-medium (355M)`
|
||||
- Row 6: `python additional-experiments.py --model_size gpt2-large (774M)`
|
||||
- Row 7: `python additional-experiments.py --weights random --trainable_layers all`
|
||||
- Row 8: `python additional-experiments.py --context_length "model_context_length"`
|
Loading…
x
Reference in New Issue
Block a user