Skip to content

Latest commit

 

History

History
68 lines (53 loc) · 1.61 KB

README.md

File metadata and controls

68 lines (53 loc) · 1.61 KB

Code-Generator

Image Classification Template by Code-Generator

This is the image classification template by Code-Generator using resnet18 model and cifar10 dataset from TorchVision and training is powered by PyTorch and PyTorch-Ignite.

Getting Started

Install the dependencies with pip:

pip install -r requirements.txt --progress-bar off -U

Code structure

|
|- README.md
|
|- main.py : main script to run
|- data.py : helper module with functions to setup input datasets and create dataloaders
|- models.py : helper module with functions to create a model or multiple models
|- trainers.py : helper module with functions to create trainer and evaluator
|- utils.py : module with various helper functions
|
|- requirements.txt : dependencies to install with pip
|
|- config.yaml : global configuration YAML file
|
|- test_all.py : test file with few basic sanity checks

Training

Multi GPU Training (torchrun) (recommended)

torchrun \
  --nproc_per_node 2 \
  main.py config.yaml --backend nccl

Multi Node, Multi GPU Training (torchrun) (recommended)

  • Execute on master node
torchrun \
  --nproc_per_node 4 \
  --nnodes 2 \
  --node_rank 0 \
  --master_addr 127.0.0.1 \
  --master_port 8080 \
  main.py config.yaml --backend nccl
  • Execute on worker nodes
torchrun \
  --nproc_per_node 4 \
  --nnodes 2 \
  --node_rank <node_rank> \
  --master_addr 127.0.0.1 \
  --master_port 8080 \
  main.py config.yaml --backend nccl