Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Can I track model loss and accuracy of each epoch during fine-tuning, to make sure model is stable? #99

Open
XuanrZhang opened this issue Apr 13, 2023 · 0 comments

Comments

@XuanrZhang
Copy link

hi @Zhihan1996

Thanks for developing this useful tool.

I used your pre-trained DNA6M model to fine-tune my datasets to do binary classification; it takes 16 hours for a very small dataset. And I tried to use GPU, but it didn't work. Any suggestion for using GPU?

And also, I would like to know where I can find the training log file, to track loss and accuracy during fine-tuning.
After model training, here are all the files I got. I need to plot model loss to see whether the model is trained enough and stable.
├── config.json
├── eval_results.txt
├── pytorch_model.bin
├── special_tokens_map.json
├── tokenizer_config.json
├── training_args.bin
└── vocab.txt

I do fine-tune by using the below parameters.

python3 /g/data/zk16/xzhang/DNABERT/examples/run_finetune.py \
--model_type dnalongcat \
--tokenizer_name=/g/data/zk16/xzhang/DNABERT/pre_model/6-new-12w-0/vocab.txt \
--model_name_or_path $MODEL_PATH \
--task_name dnaprom \
--do_train \
--do_eval \
--data_dir $DATA_PATH \
--max_seq_length 1536 \
--per_gpu_train_batch_size=32   \
--per_gpu_eval_batch_size=32  \
--learning_rate 2e-4 \
--num_train_epochs 5.0 \
--output_dir $OUTPUT_PATH \
--evaluate_during_training \
--logging_steps 100 \
--save_steps 4000 \
--warmup_percent 0.1 \
--hidden_dropout_prob 0.1 \
--overwrite_output \
--weight_decay 0.01 \
--n_process 8

I really appreciate any help you can provide.

Best,
Xuan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant