Welcome to ZhiJian!

https://github.com/zhangyikaii/LAMDA-ZhiJian/blob/main/docs/source/_static/images/overview.png?raw=true

ZhiJian (执简驭繁) is a comprehensive and user-friendly PyTorch-based toolbox for leveraging foundation pre-trained models and their fine-tuned counterparts to extract knowledge and expedite learning in real-world tasks, i.e., serving the Model Reuse tasks.

The rapid progress in deep learning has led to the emergence of numerous open-source Pre-Trained Models (PTMs) on platforms like PyTorch, TensorFlow, and HuggingFace Transformers. Leveraging these PTMs for specific tasks empowers them to handle objectives effectively, creating valuable resources for the machine-learning community. Reusing PTMs is vital in enhancing target models’ capabilities and efficiency, achieved through adapting the architecture, customizing learning on target data, or devising optimized inference strategies to leverage PTM knowledge. To facilitate a holistic consideration of various model reuse strategies, ZhiJian categorizes model reuse methods into three sequential modules: Architect, Tuner, and Merger, aligning with the stages of model preparation, model learning, and model inference on the target task, respectively. The provided interface methods include:

  • A rchitect Module

    The Architect module involves modifying the pre-trained model to fit the target task, and reusing certain parts of the pre-trained model while introducing new learnable parameters with specialized structures.

    • Linear Probing & Partial-k, How transferable are features in deep neural networks? In: NeurIPS’14. [Paper]

    • Adapter, Parameter-Efficient Transfer Learning for NLP. In: ICML’19. [Paper]

    • Diff Pruning, Parameter-Efficient Transfer Learning with Diff Pruning. In: ACL’21. [Paper]

    • LoRA, LoRA: Low-Rank Adaptation of Large Language Models. In: ICLR’22. [Paper]

    • Visual Prompt Tuning / Prefix, Visual Prompt Tuning. In: ECCV’22. [Paper]

    • Head2Toe, Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning. In:ICML’22. [Paper]

    • Scaling & Shifting, Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning. In: NeurIPS’22. [Paper]

    • AdaptFormer, AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In: NeurIPS’22. [Paper]

    • BitFit, BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In: ACL’22. [Paper]

    • Convpass, Convolutional Bypasses Are Better Vision Transformer Adapters. In: Tech Report 07-2022. [Paper]

    • Fact-Tuning, FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer. In: AAAI’23. [Paper]

    • VQT, Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning. In: CVPR’23. [Paper]

  • T uner Module

    The Tuner module focuses on training the target model with guidance from pre-trained model knowledge to expedite the optimization process, e.g., via adjusting objectives, optimizers, or regularizers.

    • Knowledge Transfer and Matching, NeC4.5: neural ensemble based C4.5. In: IEEE Trans. Knowl. Data Eng. 2004. [Paper]

    • FitNet, FitNets: Hints for Thin Deep Nets. In: ICLR’15. [Paper]

    • LwF, Learning without Forgetting. In: ECCV’16. [Paper]

    • FSP, A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: CVPR’17. [Paper]

    • NST, Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. In: CVPR’17. [Paper]

    • RKD, Relational Knowledge Distillation. In: CVPR’19. [Paper]

    • SPKD, Similarity-Preserving Knowledge Distillation. In: CVPR’19. [Paper]

    • CRD, Contrastive Representation Distillation. In: ICLR’20. [Paper]

    • REFILLED, Distilling Cross-Task Knowledge via Relationship Matching. In: CVPR’20. [Paper]

    • WiSE-FT, Robust fine-tuning of zero-shot models. In: CVPR’22. [Paper]

    • L2 penalty / L2 SP, Explicit Inductive Bias for Transfer Learning with Convolutional Networks. In:ICML’18. [Paper]

    • Spectral Norm, Spectral Normalization for Generative Adversarial Networks. In: ICLR’18. [Paper]

    • BSS, Catastrophic Forgetting Meets Negative Transfer:Batch Spectral Shrinkage for Safe Transfer Learning. In: NeurIPS’19.. [Paper]

    • DELTA, DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks. In: ICLR’19. [Paper]

    • DeiT, Training data-efficient image transformers & distillation through attention. In ICML’21. [Paper]

    • DIST, Knowledge Distillation from A Stronger Teacher. In: NeurIPS’22. [Paper]

  • M erger Module

    The Merger module influences the inference phase by either reusing pre-trained features or incorporating adapted logits from the pre-trained model.

    • Logits Ensemble, Ensemble Methods: Foundations and Algorithms. 2012. [Book]

    • Nearest Class Mean, Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost. In: IEEE Trans. Pattern Anal. Mach. Intell. 2013. [Paper]

    • SimpleShot, SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. In: CVPR’19. [Paper]

    • via Optimal Transport, Model Fusion via Optimal Transport. In: NeurIPS’20. [Paper]

    • Model Soup, Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: ICML’22. [Paper]

    • Fisher Merging, Merging Models with Fisher-Weighted Averaging. In: NeurIPS’22. [Paper]

    • Deep Model Reassembly, Deep Model Reassembly. In: NeurIPS’22. [Paper]

    • REPAIR, REPAIR: REnormalizing Permuted Activations for Interpolation Repair. In: ICLR’23. [Paper]

    • Git Re-Basin, Git Re-Basin: Merging Models modulo Permutation Symmetries. In: ICLR’23. [Paper]

    • ZipIt, ZipIt! Merging Models from Different Tasks without Training. [Paper]

💡 ZhiJian also has the following highlights:

  • Support reuse of various pre-trained model zoo, including:

  • Extremely easy to get started and customize

  • Concise things do big

    • Only ~5000 lines of the base code, with incorporating method like building LEGO blocks

    • State-of-the-art results on VTAB benchmark with approximately 10k experiments [here]

    • Support friendly guideline and comprehensive documentation to custom dataset and pre-trained model [here]

🔥 The Naming of ZhiJian: In Chinese “ZhiJian-YuFan” means handling complexity with concise and efficient methods. Given the variations in pre-trained models and the deployment overhead of full parameter fine-tuning, ZhiJian represents a solution that is easily reusable, maintains high accuracy, and maximizes the potential of pre-trained models. “执简驭繁”的意思是用简洁高效的方法驾驭纷繁复杂的事物。“繁”表示现有预训练模型和复用方法种类多、差异大、部署难,所以取名”执简”的意思是通过该工具包,能轻松地驾驭模型复用方法,易上手、快复用、稳精度,最大限度地唤醒预训练模型的知识。

🕹️ Quick Start

  1. An environment with Python 3.7+ from conda, venv, or virtualenv.

  2. Install ZhiJian using pip:

    $ pip install zhijian
    

    For more details please click installation instructions.

    • [Option] Install with the newest version through GitHub:

      $ pip install git+https://github.com/zhangyikaii/lamda-zhijian.git@main --upgrade
      
  3. Open your python console and type:

    import zhijian
    print(zhijian.__version__)
    

    If no error occurs, you have successfully installed ZhiJian.

📚 Documentation

The tutorials and API documentation are hosted on zhijian.readthedocs.io

中文文档位于 zhijian.readthedocs.io/zh

Why ZhiJian?

Related Library

Stars

# of Alg.

# of Model

# of Dataset

# of Fields

LLM Supp.

Docs.

PEFT

8k+

6

~15

(3)

1 (a)

✔️

✔️

adapter-transformers

1k+

10

~15

(3)

1 (a)

✔️

LLaMA-Efficient-Tuning

2k+

4

5

~20

1 (a)

✔️

Knowledge-Distillation-Zoo

1k+

20

2

2

1 (b)

Easy Few-Shot Learning

608

10

3

2

1 (c)

Model Soups

255

3

3

5

1 (d)

Git Re-Basin

410

3

5

4

1 (d)

ZhiJian (Ours)

ing

30+

~50

19

1 (a,b,c,d)

✔️

✔️

Get Started

👋🏼

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

  • What & Why Reuse?

    • Performing downstream tasks with the help of pre-trained model, including model structures, weights, or other derived rules.

    • Significantly accelerating convergence and improving downstream performance.

The recent booming development of deep learning techniques has resulted in a mushrooming of open-source pre-trained models, with significant contributions from PyTorch, TensorFlow, and HuggingFace Transformers. These PTMs stem from the de-facto paradigm as “pre-training to full-parameter fine-tuning”, where one of the most fundamental and representative approaches involves initializing the model with pre-trained weights. Recently, advanced methods have been developed to harness PTM knowledge from diverse perspectives. These approaches include expanding model structures, applying constraints on weight initialization, or seeking guidance from the source hypothesis space. They are applicable in scenarios where target task data accumulates dynamically or exhibits distribution shifts.

To better categorize and summarize the reuse methods, we consider the architect, tuner, and merger pipeline. ZhiJian offers a modular design, easy-to-use interfaces, and rich custom configuration, empowering deep learning practitioners to seamlessly switch between and combine diverse reuse methods. Furthermore, it boosts the creation of novel reuse methods tailored to current target tasks.

Overview

In the following example, we show how ZhiJian:

  • Construct a Pre-Trained Vision Transformer from timm

    • with custom LoRA module

  • Tune with supervision on CIFAR-100 dataset

  • Infer to evaluate the performance

The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

https://github.com/zhangyikaii/LAMDA-ZhiJian/blob/main/docs/source/_static/images/tutorials_get_started_vit_lora.png?raw=true
  • All in just 10 minutes

    • 1 min to install zhijian

    • 2 mins to select the dataset

    • 3 mins to construct the Vision Transformer from timm with custom LoRA module

    • 4 mins to deploy supervised fine-tuning and test process

🚀 Let’s get started!

Install ZhiJian

$ pip install zhijian
After installation, open your python console and type
import zhijian
print(zhijian.__version__)

If no error occurs, you have successfully installed.

Select Dataset

ZhiJian provides the loading interface for 19 datasets of VTAB benchmark, which spanning several domains including general objects, animals and plants, food and daily necessities, medicine, remote sensing and so on. Customize your own dataset, please see here.

  • For better prompting, we first import a tool function that guides the input:

    from zhijian.models.utils import select_from_input
    
  • Now, run the following code block, select the target dataset (CIFAR-100) and corresponding directory to be deployed:

    available_datasets = [
        'VTAB-1k.CIFAR-100', 'VTAB-1k.CLEVR-Count', 'VTAB-1k.CLEVR-Distance', 'VTAB-1k.Caltech101', 'VTAB-1k.DTD',
        'VTAB-1k.Diabetic-Retinopathy', 'VTAB-1k.Dmlab', 'VTAB-1k.EuroSAT', 'VTAB-1k.KITTI', 'VTAB-1k.Oxford-Flowers-102',
        'VTAB-1k.Oxford-IIIT-Pet', 'VTAB-1k.PatchCamelyon', 'VTAB-1k.RESISC45', 'VTAB-1k.SUN397', 'VTAB-1k.SVHN',
        'VTAB-1k.dSprites-Location', 'VTAB-1k.dSprites-Orientation', 'VTAB-1k.smallNORB-Azimuth', 'VTAB-1k.smallNORB-Elevation'
    ] # dataset options.
    dataset     = select_from_input('dataset', available_datasets)  # user input about dataset
    dataset_dir = input(f"Please input your dataset directory: ")   # user input about dataset directory
    
    $ Please input a dataset, type 'help' to show the options: help
    $ Available dataset(s):
              [1] VTAB-1k.CIFAR-100
              [2] VTAB-1k.CLEVR-Count
              [3] VTAB-1k.CLEVR-Distance
              [4] VTAB-1k.Caltech101
              [5] VTAB-1k.DTD
              [6] VTAB-1k.Diabetic-Retinopathy
              [7] VTAB-1k.Dmlab
              [8] VTAB-1k.EuroSAT
              [9] VTAB-1k.KITTI
              [10] VTAB-1k.Oxford-Flowers-102
              [11] VTAB-1k.Oxford-IIIT-Pet
              [12] VTAB-1k.PatchCamelyon
              [13] VTAB-1k.RESISC45
              [14] VTAB-1k.SUN397
              [15] VTAB-1k.SVHN
              [16] VTAB-1k.dSprites-Location
              [17] VTAB-1k.dSprites-Orientation
              [18] VTAB-1k.smallNORB-Azimuth
              [19] VTAB-1k.smallNORB-Elevation
    $ Please input a dataset, type 'help' to show the options: 1
    $ Your selection: [1] VTAB-1k.CIFAR-100
    
    $ Please input your dataset directory: your/dataset/directory
    

Construct Pre-trained Model

Next, we will construct a pre-trained Vision Transformer from timm library, with the custom LoRA module.

  • Seamlessly modify the structure is possible. ZhiJian welcomes any base model and any additional modifications. The base part supports:

ZhiJian also includes assembling additional tuning structures, similar to building LEGO bricks. For more detailed customization of each part, please see here.

Adapt the Vision Transformer structure just requires 1~3 lines of code.

  • Now, run the following code block, select the model architecture (Vision Transformer as below):

    available_example_models = {
        'timm.vit_base_patch16_224_in21k': {
            'LoRA': '(LoRA.adapt): ...->(blocks[0:12].attn.qkv){inout1}->...',
            'Adapter': '(Adapter.adapt): ...->(blocks[0:12].drop_path1){inout1}->...',
            'Convpass': ('(Convpass.adapt): ...->(blocks[0:12].norm1){in1}->(blocks[0:11].drop_path1){in2}->...,' # follow the next line
                        '(Convpass.adapt): ...->{in1}(blocks[0:11].norm2)->(blocks[0:12].drop_path2){in2}->...'),
            'None': None
        }
    } # model options, Dict(model name: Dict(add-in structure name: add-in blitz configuration)).
    
    model = select_from_input('model', list(available_example_models.keys())) # user input about model
    
    $ Please input a model, type 'help' to show the options: help
    $ Available model(s):
                    [1] timm.vit_base_patch16_224_in21k
    $ Please input a model, type 'help' to show the options: 1
    $ Your selection: [1] timm.vit_base_patch16_224_in21k
    
  • Next, run the following code block, select the additional add-in structure (LoRA as below):

    availables   = available_example_models[model]
    config_blitz = availables[select_from_input('add-in structure', availables.keys())]   # user input about add-in structure
    
    $ Please input a add-in structure, type 'help' to show the options: help
    $ Available add-in structure(s):
                        [1] LoRA
                        [2] Adapter
                        [3] Convpass
                        [4] None
    $ Please input a add-in structure, type 'help' to show the options: 1
    $ Your selection: [1] LoRA
    

Deploy Training and Test Process

ZhiJian enables customization of the fine-tune which part of the parameters using args.reuse_key, such as assigning blocks[6:8] to only tune model.blocks[6] to model.blocks[8] and their sub-modules.

  • Now, run the following code block, select which part of the parameters to fine-tune (the rest are frozen)

    available_example_reuse_modules = {
        'timm.vit_base_patch16_224_in21k': {
            'linear layer only': 'addin,head,fc_norm',
            'the last block and the linear layer (Partial-1)': 'addin,blocks[11],head,fc_norm',
            'the last two blocks and the linear layer (Partial-2)': 'addin,blocks[10:12],head,fc_norm',
            'the last four blocks and the linear layer (Partial-4)': 'addin,blocks[8:12],head,fc_norm',
            'all parameters': ''
        }
    }
    
    availables          = available_example_reuse_modules[model]
    reuse_modules_blitz = availables[select_from_input('reuse module', availables.keys())] # user input about reuse modules
    
    $ Please input a reuse module, type 'help' to show the options: help
    $ Available reuse modules(s):
                      [1] add-ins and linear layer
                      [2] add-ins and the last block and the linear layer (Partial-1)
                      [3] add-ins and the last two blocks and the linear layer (Partial-2)
                      [4] add-ins and the last four blocks and the linear layer (Partial-4)
    $ Please input a reuse module, type 'help' to show the options: 1
    $ Your selection: [1] add-ins and linear layer
    
  • Taking the training_mode as finetune, and next, we configure the parameters

    For the rest of the training configuration with more customization options, please see here

    training_mode = 'finetune'
    args = get_args(
        dataset=dataset,                # dataset
        dataset_dir=dataset_dir,        # dataset directory
        model=model,                    # backbone network
        config_blitz=config_blitz,      # addin blitz configuration
        training_mode=training_mode,    # training mode
        optimizer='adam',               # optimizer
        lr=1e-2,                        # learning rate
        wd=1e-5,                        # weight decay
        gpu='0',                        # gpu id
        verbose=True                    # control the verbosity of the output
    )
    pprint(vars(args))
    
    $ Preparing args..
      {'aa': None,
      'addins': [{'hook': [['adapt', 'post']],
                  'location': [['blocks', 0, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 1, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 2, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 3, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 4, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 5, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 6, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 7, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 8, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 9, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 10, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 11, 'attn', 'qkv']],
                  'name': 'LoRA'}],
      'amp': False,
      'amp_dtype': 'float16',
      'amp_impl': 'native',
      'aot_autograd': False,
      'aug_repeats': 0,
      'aug_splits': 0,
      'batch_size': 64,
      'bce_loss': False,
      ...
      'warmup_epochs': 5,
      'warmup_lr': 1e-05,
      'warmup_prefix': False,
      'wd': 5e-05,
      'weight_decay': 2e-05,
      'worker_seeding': 'all'}
    
  • Next, run the following code block to configure the GPU:

    assert torch.cuda.is_available()
    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
    torch.cuda.set_device(int(args.gpu))
    
  • Run the following to get the pre-trained model, which includes the additional add-in modules that have been accessed:

    model, model_args, device = get_model(args)
    
  • Run the following to get the dataloader:

    train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
    
    $ Log level set to: INFO
      Log files are recorded in: your/log/directory/0718-15-17-52-580
      Trainable/total parameters of the model: 0.37M / 86.17M (0.43148%)
    
  • Run the following to prepare the optimizer, learning rate scheduler and loss function

    For more customization options, please see TODO

    optimizer = optim.Adam(
        model.parameters(),
        lr=args.lr,
        weight_decay=args.wd
    )
    lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(
        optimizer,
        args.max_epoch,
        eta_min=args.eta_min
    )
    criterion = nn.CrossEntropyLoss()
    
  • Run the following to initialize the trainer, ready to start training:

    trainer = prepare_trainer(
        args,
        model=model, model_args=model_args, device=device,
        train_loader=train_loader, val_loader=val_loader, num_classes=num_classes,
        optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion
    )
    
  • Run the following to train and test with ZhiJian:

    trainer.fit()
    trainer.test()
    
    $       Epoch   GPU Mem.       Time       Loss         LR
              1/5      7.16G     0.3105      4.629      0.001: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.66batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5      7.16G     0.1188      3.334      14.02: 100%|██████████| 157/157 [00:18<00:00, 8.35batch/s]
      ***   Best results: [Acc@1: 3.3339968152866244], [Acc@5: 14.022691082802547]
    
            Epoch   GPU Mem.       Time       Loss         LR
              2/5      7.16G     0.2883      4.255 0.00090451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              2/5      7.16G     0.1182       4.22      16.28: 100%|██████████| 157/157 [00:18<00:00, 8.37batch/s]
      ***   Best results: [Acc@1: 4.219745222929936], [Acc@5: 16.28184713375796]
    
            Epoch   GPU Mem.       Time       Loss         LR
              3/5      7.16G      0.296      4.026 0.00065451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              3/5      7.16G     0.1197      5.255      17.71: 100%|██████████| 157/157 [00:18<00:00, 8.28batch/s]
      ***   Best results: [Acc@1: 5.254777070063694], [Acc@5: 17.70501592356688]
    
            Epoch   GPU Mem.       Time       Loss         LR
              4/5      7.16G     0.2983       3.88 0.00034549: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.87batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              4/5      7.16G     0.1189      5.862      19.06: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s]
      ***   Best results: [Acc@1: 5.8618630573248405], [Acc@5: 19.058519108280255]
    
            Epoch   GPU Mem.       Time       Loss         LR
              5/5      7.16G     0.2993      3.811 9.5492e-05: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.90batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              5/5      7.16G      0.119      5.723      19.39: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s]
      ***   Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/1      7.16G     0.1192      5.723      19.39: 100%|██████████| 157/157 [00:18<00:00, 8.30batch/s]
      ***   Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
    

Config with ~1 Line Blitz

🌱

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

  • What & Why Reuse?

    • Performing downstream tasks with the help of pre-trained model, including model structures, weights, or other derived rules.

    • Significantly accelerating convergence and improving downstream performance.

In ZhiJian, adding the LoRA module to the pre-trained model and adjusting which part of the parameters to fine-tune just require about :customlyellow:`one` line of code.

Overview

In the following example, we show how ZhiJian:

  • Represent the modules of the pre-trained model

  • Config the extended add-in module with entry points

Modules of Pre-trained Model in One Line description

In the Architect module, to facilitate the modification of model structures, additional adaptive structures are incorporated into pre-trained models. ZhiJian accepts a one-line serialized representation of the base pre-trained model, as exemplified in the Vision Transformer model from the timm library in the following manner:

_images/tutorial_one_line_config.png

The modules within the parentheses () represent the base pre-trained model, and the dot . is used as a access operator.

The arrows -> indicate the connections between modules, and ellipsis ... represents default modules. Partial structures can be connected with arrows.

Extended Add-in Module with Entry Points

We use (): ` to denote an additional adaptive structure, where the part after the dot :code:.` represents the main forward function of the extra structure. The data flows into the module and primarily passes through this method.

We use {} to indicate the entry points of the extra structure into the pre-trained model, encompassing the entry of source model features and the return points of features after the added structure is processed.

With the aforementioned configuration, ZhiJian seamlessly supports the modification of pre-trained model structures. It automatically recognizes the additional structures defined in zhijianmodelsaddin, enabling the construction of pre-trained models.

Customize Pre-trained Model

🛠️

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

Overview

In the following example, we show how to customize your own pre-trained model with a new target structure in ZhiJian.

Feel free to deploy model reusability technology on any pre-trained model, with loading in the conventional PyTorch style.

Construct Custom Model

Let’s begin with a three-layer Multilayer Perceptron (MLP).

_images/tutorials_mlp.png

Custom Multilayer Perceptron (MLP) Architecture

Although a multi-layer perceptron is not a good image learner, we can quickly get started with it. For other custom networks, we can also make similar designs and modifications by analogy.

  • Run the code block below to customize the model:

import torch.nn as nn

class MLP(nn.Module):
    """
    MLP Class
    ==============

    Multilayer Perceptron (MLP) model for image (224x224) classification tasks.

    Args:
        args (object): Custom arguments or configurations.
        num_classes (int): Number of output classes.
    """
    def __init__(self, args, num_classes):
        super(MLP, self).__init__()
        self.args = args
        self.image_size = 224
        self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, num_classes)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output logits.
        """
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = nn.ReLU()(x)
        x = self.fc2(x)
        x = nn.ReLU()(x)
        x = self.fc3(x)
        return x
  • Next, run the code block below to configure the GPU and the model:

    model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')])
    model = ModelWrapper(model)
    model_args = dict2args({'hidden_size': 512})
    
  • Now, run the code block below to prepare the trainer with passing in the parameter model:

    trainer = prepare_trainer(
        args,
        model=model,
        model_args=model_args,
        device=device,
        ...
    )
    
    trainer.fit()
    trainer.test()
    
    $ Log level set to: INFO
      Log files are recorded in: your/log/directory/0718-19-52-36-748
      Trainable/total parameters of the model: 0.03M / 38.64M (0.08843%)
    
            Epoch   GPU Mem.       Time       Loss         LR
              1/5     0.589G     0.1355      4.602      0.001: 100%|██████████| 16.0/16.0 [00:01<00:00, 12.9batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5     0.629G    0.03114      1.871      7.932: 100%|██████████| 157/157 [00:05<00:00, 30.9batch/s]
      ***   Best results: [Acc@1: 1.8710191082802548], [Acc@5: 7.931926751592357]
    
            Epoch   GPU Mem.       Time       Loss         LR
              2/5     0.784G     0.1016      4.538 0.00090451: 100%|██████████| 16.0/16.0 [00:00<00:00, 19.4batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              2/5     0.784G    0.02669      2.498      9.504: 100%|██████████| 157/157 [00:04<00:00, 35.9batch/s]
      ***   Best results: [Acc@1: 2.4980095541401273], [Acc@5: 9.504378980891719]
    
            Epoch   GPU Mem.       Time       Loss         LR
              3/5     0.784G    0.09631      4.488 0.00065451: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.6batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              3/5     0.784G    0.02688      2.379      10.16: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s]
      ***   Best results: [Acc@1: 2.3785828025477707], [Acc@5: 10.161226114649681]
    
            Epoch   GPU Mem.       Time       Loss         LR
              4/5     0.784G    0.09126       4.45 0.00034549: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.2batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              4/5     0.784G    0.02644      2.468      10.29: 100%|██████████| 157/157 [00:04<00:00, 36.2batch/s]
      ***   Best results: [Acc@1: 2.468152866242038], [Acc@5: 10.290605095541402]
    
            Epoch   GPU Mem.       Time       Loss         LR
              5/5     0.784G     0.0936      4.431 9.5492e-05: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.5batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              5/5     0.784G    0.02706      2.558      10.43: 100%|██████████| 157/157 [00:04<00:00, 35.8batch/s]
      ***   Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5     0.784G    0.02667      2.558      10.43: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s]
      ***   Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
    

Customize Dataloader

📂

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

Overview

In the following example, we show how to customize your own dataloader for a new target dataset in ZhiJian.

Feel free to deploy model reusability technology on any dataset, with loading in the conventional PyTorch style.

Prepare Custom Dataset

  • Configure without dataset configuration and organize the custom dataset in the following structure:

    • within the your/dataset/dir directory

    • create a separate folder for each category

    • store all the data corresponding to each category within its respective folder

      /your/dataset/directory
      ├── train
      │   ├── class_1
      │      ├── train_class_1_img_1.jpg
      │      ├── train_class_1_img_2.jpg
      │      ├── train_class_1_img_3.jpg
      │      └── ...
      │   ├── class_2
      │      ├── train_class_2_img_1.jpg
      │      └── ...
      │   ├── class_3
      │      └── ...
      │   ├── class_4
      │      └── ...
      │   ├── class_5
      │      └── ...
      └── test
          ├── class_1
             ├── test_class_1_img_1.jpg
             ├── test_class_1_img_2.jpg
             ├── test_class_1_img_3.jpg
             └── ...
          ├── class_2
             ├── test_class_2_img_1.jpg
             └── ...
          ├── class_3
             └── ...
          ├── class_4
             └── ...
          └── class_5
              └── ...
      
  • Set up the custom dataset:

    train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        )
    ])
    val_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        )
    ])
    
    train_dataset = ImageFolder(root='/your/dataset/directory/train', transform=train_transform)
    val_dataset = ImageFolder(root='/your/dataset/directory/test', transform=val_transform)
    
  • Implement the corresponding loader:

    train_loader = torch.utils.data.DataLoader(
            train_dataset,
            batch_size=args.batch_size,
            num_workers=args.num_workers,
            pin_memory=True,
            shuffle=True
        )
    val_loader = torch.utils.data.DataLoader(
            val_dataset,
            batch_size=args.batch_size,
            num_workers=args.num_workers,
            pin_memory=True,
            shuffle=False
        )
    num_classes = len(train_dataset.classes)
    
  • Now, set up the trainer with passing in parameter train_loader and val_loader:

    trainer = prepare_trainer(
        args,
        model=model, model_args=model_args, device=device,
        train_loader=train_loader,
        val_loader=val_loader,
        num_classes=num_classes,
        optimizer=optimizer,
        lr_scheduler=lr_scheduler,
        criterion=criterion
    )
    
    trainer.fit()
    trainer.test()
    
    $ Log level set to: INFO
      Log files are recorded in: your/log/directory/0718-20-10-57-792
      Trainable/total parameters of the model: 0.30M / 86.10M (0.34700%)
    
          Epoch   GPU Mem.       Time       Loss         LR
              1/5      5.48G      1.686       1.73      0.001: 100%|██████████| 1.00/1.00 [00:01<00:00, 1.22s/batch]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5      5.48G     0.3243         16        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.39batch/s]
      ***   Best results: [Acc@1: 16.0], [Acc@5: 100.0]
    
          Epoch   GPU Mem.       Time       Loss         LR
              2/5       5.6G      1.093      1.448 0.00090451: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.52batch/s]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              2/5       5.6G     0.2647         12        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.58batch/s]
      ***   Best results: [Acc@1: 12.0], [Acc@5: 100.0]
    
          Epoch   GPU Mem.       Time       Loss         LR
              3/5       5.6G      1.088      1.369 0.00065451: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.54batch/s]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              3/5       5.6G     0.2899         12        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.54batch/s]
      ***   Best results: [Acc@1: 12.0], [Acc@5: 100.0]
    
          Epoch   GPU Mem.       Time       Loss         LR
              4/5       5.6G      1.067      1.403 0.00034549: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.53batch/s]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              4/5       5.6G     0.2879         16        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.42batch/s]
      ***   Best results: [Acc@1: 16.0], [Acc@5: 100.0]
    
          Epoch   GPU Mem.       Time       Loss         LR
              5/5       5.6G      1.077      1.342 9.5492e-05: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.55batch/s]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              5/5       5.6G      0.246         16        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.79batch/s]
      ***   Best results: [Acc@1: 16.0], [Acc@5: 100.0]
    
          Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/1       5.6G     0.2901         16        100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.52batch/s]
      ***   Best results: [Acc@1: 16.0], [Acc@5: 100.0]
    

Fine-tune a Pre-trained ViT from timm

👓

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

Overview

In the following example, we show how ZhiJian:

  • Construct a Pre-Trained Vision Transformer from timm

  • Tune with supervision on CIFAR-100 dataset

  • Infer to evaluate the performance

The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

_images/tutorials_get_started_vit.png

Prepare Dataset and Model

ZhiJian provides the loading interface for 19 datasets of VTAB benchmark, which spanning several domains including general objects, animals and plants, food and daily necessities, medicine, remote sensing and so on. Customize your own dataset, please see here.

  • For better prompting, we first import a tool function that guides the input:

    from zhijian.models.utils import select_from_input
    
  • Now, run the following code block, select the target dataset (CIFAR-100) and corresponding directory to be deployed:

    available_datasets = [
        'VTAB-1k.CIFAR-100', 'VTAB-1k.CLEVR-Count', 'VTAB-1k.CLEVR-Distance', 'VTAB-1k.Caltech101', 'VTAB-1k.DTD',
        'VTAB-1k.Diabetic-Retinopathy', 'VTAB-1k.Dmlab', 'VTAB-1k.EuroSAT', 'VTAB-1k.KITTI', 'VTAB-1k.Oxford-Flowers-102',
        'VTAB-1k.Oxford-IIIT-Pet', 'VTAB-1k.PatchCamelyon', 'VTAB-1k.RESISC45', 'VTAB-1k.SUN397', 'VTAB-1k.SVHN',
        'VTAB-1k.dSprites-Location', 'VTAB-1k.dSprites-Orientation', 'VTAB-1k.smallNORB-Azimuth', 'VTAB-1k.smallNORB-Elevation'
    ] # dataset options.
    dataset     = select_from_input('dataset', available_datasets)  # user input about dataset
    dataset_dir = input(f"Please input your dataset directory: ")   # user input about dataset directory
    
    $ Please input a dataset, type 'help' to show the options: help
    $ Available dataset(s):
              [1] VTAB-1k.CIFAR-100
              [2] VTAB-1k.CLEVR-Count
              [3] VTAB-1k.CLEVR-Distance
              [4] VTAB-1k.Caltech101
              [5] VTAB-1k.DTD
              [6] VTAB-1k.Diabetic-Retinopathy
              [7] VTAB-1k.Dmlab
              [8] VTAB-1k.EuroSAT
              [9] VTAB-1k.KITTI
              [10] VTAB-1k.Oxford-Flowers-102
              [11] VTAB-1k.Oxford-IIIT-Pet
              [12] VTAB-1k.PatchCamelyon
              [13] VTAB-1k.RESISC45
              [14] VTAB-1k.SUN397
              [15] VTAB-1k.SVHN
              [16] VTAB-1k.dSprites-Location
              [17] VTAB-1k.dSprites-Orientation
              [18] VTAB-1k.smallNORB-Azimuth
              [19] VTAB-1k.smallNORB-Elevation
    $ Please input a dataset, type 'help' to show the options: 1
    $ Your selection: [1] VTAB-1k.CIFAR-100
    
    $ Please input your dataset directory: your/dataset/directory
    

Next, we will construct a pre-trained Vision Transformer from timm library.

  • Seamlessly modify the structure is possible. ZhiJian welcomes any base model and any additional modifications. The base part supports:

Adapt the Vision Transformer structure just requires 1~3 lines of code. Customize your own pre-trained model, please see here.

  • Now, run the following code block, select the model architecture (Vision Transformer as below):

    available_example_models = {
        'timm.vit_base_patch16_224_in21k': {
            'LoRA': '(LoRA.adapt): ...->(blocks[0:12].attn.qkv){inout1}->...',
            'Adapter': '(Adapter.adapt): ...->(blocks[0:12].drop_path1){inout1}->...',
            'Convpass': ('(Convpass.adapt): ...->(blocks[0:12].norm1){in1}->(blocks[0:11].drop_path1){in2}->...,' # follow the next line
                        '(Convpass.adapt): ...->{in1}(blocks[0:11].norm2)->(blocks[0:12].drop_path2){in2}->...'),
            'None': None
        }
    } # model options, Dict(model name: Dict(add-in structure name: add-in blitz configuration)).
    
    model = select_from_input('model', list(available_example_models.keys())) # user input about model
    
    $ Please input a model, type 'help' to show the options: help
    $ Available model(s):
                    [1] timm.vit_base_patch16_224_in21k
    $ Please input a model, type 'help' to show the options: 1
    $ Your selection: [1] timm.vit_base_patch16_224_in21k
    

Deploy Training and Test Process

ZhiJian enables customization of the fine-tune which part of the parameters using args.reuse_key, such as assigning blocks[6:8] to only tune model.blocks[6] to model.blocks[8] and their sub-modules.

  • Taking the training_mode as finetune, and next, we configure the parameters

    For the rest of the training configuration with more customization options, please see here

    training_mode = 'finetune'
    args = get_args(
        dataset=dataset,                # dataset
        dataset_dir=dataset_dir,        # dataset directory
        model=model,                    # backbone network
        config_blitz=config_blitz,      # addin blitz configuration
        training_mode=training_mode,    # training mode
        optimizer='adam',               # optimizer
        lr=1e-2,                        # learning rate
        wd=1e-5,                        # weight decay
        gpu='0',                        # gpu id
        verbose=True                    # control the verbosity of the output
    )
    pprint(vars(args))
    
    $ {
      'addins': [{'hook': [['adapt', 'post']],
                  'location': [['blocks', 0, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 1, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 2, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 3, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 4, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 5, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 6, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 7, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 8, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 9, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 10, 'attn', 'qkv']],
                  'name': 'LoRA'},
                  {'hook': [['adapt', 'post']],
                  'location': [['blocks', 11, 'attn', 'qkv']],
                  'name': 'LoRA'}],
      'amp': False,
      'amp_dtype': 'float16',
      'amp_impl': 'native',
      'aot_autograd': False,
      'aug_repeats': 0,
      'aug_splits': 0,
      'batch_size': 64,
      'bce_loss': False,
      ...
      'warmup_epochs': 5,
      'warmup_lr': 1e-05,
      'warmup_prefix': False,
      'wd': 5e-05,
      'weight_decay': 2e-05,
      'worker_seeding': 'all'}
    
  • Next, run the following code block to configure the GPU:

    assert torch.cuda.is_available()
    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
    torch.cuda.set_device(int(args.gpu))
    
  • Run the following to get the pre-trained model, which includes the additional add-in modules that have been accessed:

    model, model_args, device = get_model(args)
    
  • Run the following to get the dataloader:

    train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
    
    $ Log level set to: INFO
      Log files are recorded in: your/log/directory/0718-15-17-52-580
      Trainable/total parameters of the model: 0.37M / 86.17M (0.43148%)
    
  • Run the following to prepare the optimizer, learning rate scheduler and loss function

    optimizer = optim.Adam(
        model.parameters(),
        lr=args.lr,
        weight_decay=args.wd
    )
    lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(
        optimizer,
        args.max_epoch,
        eta_min=args.eta_min
    )
    criterion = nn.CrossEntropyLoss()
    
  • Run the following to initialize the trainer, ready to start training:

    trainer = prepare_trainer(
        args,
        model=model, model_args=model_args, device=device,
        train_loader=train_loader, val_loader=val_loader, num_classes=num_classes,
        optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion
    )
    
  • Run the following to train and test with ZhiJian:

    trainer.fit()
    trainer.test()
    
    $       Epoch   GPU Mem.       Time       Loss         LR
              1/5      7.16G     0.3105      4.629      0.001: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.66batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5      7.16G     0.1188      3.334      14.02: 100%|██████████| 157/157 [00:18<00:00, 8.35batch/s]
      ***   Best results: [Acc@1: 3.3339968152866244], [Acc@5: 14.022691082802547]
    
            Epoch   GPU Mem.       Time       Loss         LR
              2/5      7.16G     0.2883      4.255 0.00090451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              2/5      7.16G     0.1182       4.22      16.28: 100%|██████████| 157/157 [00:18<00:00, 8.37batch/s]
      ***   Best results: [Acc@1: 4.219745222929936], [Acc@5: 16.28184713375796]
    
            Epoch   GPU Mem.       Time       Loss         LR
              3/5      7.16G      0.296      4.026 0.00065451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              3/5      7.16G     0.1197      5.255      17.71: 100%|██████████| 157/157 [00:18<00:00, 8.28batch/s]
      ***   Best results: [Acc@1: 5.254777070063694], [Acc@5: 17.70501592356688]
    
            Epoch   GPU Mem.       Time       Loss         LR
              4/5      7.16G     0.2983       3.88 0.00034549: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.87batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              4/5      7.16G     0.1189      5.862      19.06: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s]
      ***   Best results: [Acc@1: 5.8618630573248405], [Acc@5: 19.058519108280255]
    
            Epoch   GPU Mem.       Time       Loss         LR
              5/5      7.16G     0.2993      3.811 9.5492e-05: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.90batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              5/5      7.16G      0.119      5.723      19.39: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s]
      ***   Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/1      7.16G     0.1192      5.723      19.39: 100%|██████████| 157/157 [00:18<00:00, 8.30batch/s]
      ***   Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
    

Fine-tune a Custom Pre-Trained Model

🕶️

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

Overview

In the following example, we show how ZhiJian:

  • Construct a custom MLP

  • Tune with supervision on a cutom dataset

  • Infer to evaluate the performance

The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

_images/tutorials_get_started_mlp.png

Construct Custom Model

We fisrt begin with a three-layer Multilayer Perceptron (MLP).

_images/tutorials_mlp.png

Custom Multilayer Perceptron (MLP) Architecture

Although a multi-layer perceptron is not a good image learner, we can quickly get started with it. For other custom networks, we can also make similar designs and modifications by analogy.

  • Run the code block below to customize the model:

import torch.nn as nn

class MLP(nn.Module):
    """
    MLP Class
    ==============

    Multilayer Perceptron (MLP) model for image (224x224) classification tasks.

    Args:
        args (object): Custom arguments or configurations.
        num_classes (int): Number of output classes.
    """
    def __init__(self, args, num_classes):
        super(MLP, self).__init__()
        self.args = args
        self.image_size = 224
        self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, num_classes)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output logits.
        """
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = nn.ReLU()(x)
        x = self.fc2(x)
        x = nn.ReLU()(x)
        x = self.fc3(x)
        return x
  • Next, run the code block below to configure the GPU and the model:

    model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')])
    model = ModelWrapper(model)
    model_args = dict2args({'hidden_size': 512})
    
  • Now, run the code block below to prepare the trainer with passing in the parameter model:

    trainer = prepare_trainer(
        args,
        model=model,
        model_args=model_args,
        device=device,
        ...
    )
    
    trainer.fit()
    trainer.test()
    

Prepare Custom Dataset

  • Configure without dataset configuration and organize the custom dataset in the following structure:

    • within the your/dataset/dir directory

    • create a separate folder for each category

    • store all the data corresponding to each category within its respective folder

      /your/dataset/directory
      ├── train
      │   ├── class_1
      │      ├── train_class_1_img_1.jpg
      │      ├── train_class_1_img_2.jpg
      │      ├── train_class_1_img_3.jpg
      │      └── ...
      │   ├── class_2
      │      ├── train_class_2_img_1.jpg
      │      └── ...
      │   ├── class_3
      │      └── ...
      │   ├── class_4
      │      └── ...
      │   ├── class_5
      │      └── ...
      └── test
          ├── class_1
             ├── test_class_1_img_1.jpg
             ├── test_class_1_img_2.jpg
             ├── test_class_1_img_3.jpg
             └── ...
          ├── class_2
             ├── test_class_2_img_1.jpg
             └── ...
          ├── class_3
             └── ...
          ├── class_4
             └── ...
          └── class_5
              └── ...
      
  • Set up the custom dataset:

    train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        )
    ])
    val_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        )
    ])
    
    train_dataset = ImageFolder(root='/your/dataset/directory/train', transform=train_transform)
    val_dataset = ImageFolder(root='/your/dataset/directory/test', transform=val_transform)
    
  • Implement the corresponding loader:

    train_loader = torch.utils.data.DataLoader(
            train_dataset,
            batch_size=args.batch_size,
            num_workers=args.num_workers,
            pin_memory=True,
            shuffle=True
        )
    val_loader = torch.utils.data.DataLoader(
            val_dataset,
            batch_size=args.batch_size,
            num_workers=args.num_workers,
            pin_memory=True,
            shuffle=False
        )
    num_classes = len(train_dataset.classes)
    

Advanced: Extended Structure

🛠️

ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.

Overview

In the following example, we show how ZhiJian:

  • Customize your own pre-trained model for new ideas of structure

  • Tailor and integrate any add-in extra module within the vast pre-trained model with lightning speed

_images/tutorials_addin_overview.png

This chapter may involve more advanced configuration.

Introduce the Custom Model

Let’s begin with a three-layer Multilayer Perceptron (MLP).

  • Run the code block below to customize the model:

import torch.nn as nn

class MLP(nn.Module):
    """
    MLP Class
    ==============

    Multilayer Perceptron (MLP) model for image (224x224) classification tasks.

    Args:
        args (object): Custom arguments or configurations.
        num_classes (int): Number of output classes.
    """
    def __init__(self, args, num_classes):
        super(MLP, self).__init__()
        self.args = args
        self.image_size = 224
        self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, num_classes)

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output logits.
        """
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = nn.ReLU()(x)
        x = self.fc2(x)
        x = nn.ReLU()(x)
        x = self.fc3(x)
        return x
_images/tutorials_mlp.png

Custom Multilayer Perceptron (MLP) Architecture

Now, expand models from a moment of inspiration, do as you please.

We will customize and modify the network structure through a few lines of code from ZhiJian. This additional structures are also implemented based on the PyTorch framework, and inherit the base class AddinBase, which integrates some basic methods for data access.

  • In the following paragraphs, we introduce the components of the extended structure, they are:

    • 1. Main forward function.

    • 2. Entry points to guide inputs

    • 3. Configuration syntax for entry point.

Design Additional Add-in Modules

  • Run the code block below to customize add-in modules and entry points for the model.

class MLPAddin(AddinBase):
    """
    MLPAddin Class
    ==============

    Multilayer Perceptron (MLP) add-in.

    Args:
        config (object): Custom configuration or arguments.
        model_config (object): Configuration specific to the model.
    """
    def __init__(self, config, model_config):
        super(MLPAddin, self).__init__()

        self.config = config
        self.embed_dim = model_config.hidden_size

        self.reduction_dim = 16

        self.fc1 = nn.Linear(self.embed_dim, self.reduction_dim)
        if config.mlp_addin_output_size is not None:
            self.fc2 = nn.Linear(self.reduction_dim, config.mlp_addin_output_size)
        else:
            self.fc2 = nn.Linear(self.reduction_dim, self.embed_dim)

    def forward(self, x):
        """
        Forward pass of the MLP add-in.

        Args:
            x (tensor): Input tensor.

        Returns:
            tensor: Output tensor after passing through the MLP add-in.
        """
        identity = x
        out = self.fc1(identity)
        out = nn.ReLU()(out)
        out = self.fc2(out)

        return out

    def adapt_input(self, module, inputs):
        """
        Hook function to adapt the input data before it enters the module.

        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).

        Returns:
            tensor: Adapted input tensor after passing through the MLP add-in.
        """
        x = inputs[0]
        return self.forward(x)

    def adapt_output(self, module, inputs, outputs):
        """
        Hook function to adapt the output data after it leaves the module.

        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).
            outputs (tensor): Outputs after the module.

        Returns:
            tensor: Adapted output tensor after passing through the MLP add-in.
        """
        return self.forward(outputs)

    def adapt_across_input(self, module, inputs):
        """
        Hook function to adapt the data across the modules.

        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).

        Returns:
            tensor: Adapted input tensor after adding the MLP add-in output to the subsequent module.
        """
        x = inputs[0]
        x = x + self.forward(self.inputs_cache)
        return x

    def adapt_across_output(self, module, inputs, outputs):
        """
        Hook function to adapt the data across the modules.

        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).
            outputs (tensor): Outputs after the module.

        Returns:
            tensor: Adapted input tensor after adding the MLP add-in output to the previous module.
        """
        outputs = outputs + self.forward(self.inputs_cache)
        return outputs
Main forward function

In the extended auxiliary structure MLPAddin mentioned above, we add a low-rank bottleneck (consisting of two linear layers, with a reduced dimension in the middle) inspired by efficient parameter methods like Adapter or LoRA.

We define and implement this as in the __init__ and forward functions. The data goes through this structure and executes via the forward function.

_images/tutorials_addin_structure.png

Additional Auxiliary Structure Example

Entry points to guide inputs

As shown above, the hook methods starting with adapt_ are our entry points (functions) to guide the input data. They serve as hooks to attach the extended modules to the base model.

They are roughly divided into two categories:

  • guide data input before the modules

  • direct data output after the modules

These are generally closely associated with the forward function, and the data enters extended structures through these entry points. We will further explain their roles in the following Configuration Syntax section.

Config Syntax of Entry Points

We aim to customize our model by inter-layer insertion and cross-layer concatenation of the auxiliary structures at different positions within the base model (such as the custom MLP mentioned earlier). When configuring the insertion or concatenation positions, ZhiJian provides a minimalistic one-line configuration syntax.

The syntax for configuring add-in module into the base model is as follows. We will start with one or two examples and gradually understand the meaning of each configuration part.

  • Inter-layer Insertion:

    >>> (MLPAddin.adapt_input): ...->{inout1}(fc2)->...
    
    _images/tutorials_mlp_addin_1.png

    Additional Add-in Structure - Inter-layer Insertion 1

    >>> (MLPAddin.adapt_input): ...->(fc2){inout1}->...
    
    _images/tutorials_mlp_addin_2.png

    Additional Add-in Structure - Inter-layer Insertion 2

  • Cross-layer Concatenation:

    >>> (MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{out1}(fc3)->...
    
    _images/tutorials_mlp_addin_3.png

    Additional Add-in Structure - Cross-layer Concatenation

Base Module: ->(fc1)

Consider a base model implemented based on the PyTorch framework, where the representation of each layer and module in the model is straightforward:

  • As shown in the figure, the print command can output the defined names of the model structure:

    print(model)
    
  • The structure of some classic backbone can be represented as follows

    • MLP:

      >>> input->(fc1)->(fc2)->(fc3)->output
      
    • ViT block[i]`:

      >>> input->...->(block[i].norm1)->
            (block[i].attn.qkv)->(block[i].attn.attn_drop)->(block[i].attn.proj)->(block[i].attn.proj_drop)->
              (block[i].ls1)->(block[i].drop_path1)->
                (block[i].norm2)->
                  (block[i].mlp.fc1)->(block[i].mlp.act)->(block[i].mlp.drop1)->(block[i].mlp.fc2)->(block[i].mlp.drop2)->
                    (block[i].ls2)->(block[i].drop_path2)->...->output
      
Default Module: ...

In the configuration syntax of ZhiJian, the ... can be used to represent the default layer or module.

  • For example, when we only focus on the (fc2) module in MLP and the (block[i].mlp.fc2) module in ViT:

    • MLP:

      >>> ...->(fc2)->...
      
    • ViT:

      >>> ...->(block[i].mlp.fc2)->...
      
Insertion & Concatenation Function: ():

Considering the custom auxiliary structure MLPAddin mentioned above, the functions starting with adapt_ will serve as the processing center that insert and concatenate into the base model.

  • There are primarily two types of parameter passing methods:

    def adapt_input(self, module, inputs):
        """
        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).
        """
        ...
    
    def adapt_output(self, module, inputs, outputs):
        """
        Args:
            module (nn.Module): The module being hooked.
            inputs (tuple): (Inputs before the module,).
            outputs (tensor): Outputs after the module.
        """
        ...
    

    where

    • adapt_input(self, module, inputs) is generally set before the module and is called before the data enters the module to process inputs and truncate the input.

    • adapt_output(self, module, inputs, outputs) is generally set before the module and is called before the data enters the module to process outputs and truncate the output.

These functions will be “hooked” into the base model in the main method of configuring the module, serving as key connectors between the base model and the auxiliary structure.

Insertion & Concatenation Point: {}

Consider an independent extended auxiliary structure (such as the MLPAddin mentioned above), its insertion or concatenation points with the base network must consist of “Data Input” and “Data Output” where:

  • “Data Input” refers to the network features input into the extended auxiliary structure.

  • “Data Output” refers to the adapted features output from the auxiliary structure back to the base network.

Next, let’s use some configuration examples of MLP to illustrate the syntax and functionality of ZhiJian for module integration:

Inter-layer Insertion: inout
  • As shown in the above Fig. 5, the configuration expression is:

    >>> (MLPAddin.adapt_input): ...->{inout1}(fc2)->...
    

    where

    • {inout1} refers to the position which gets the base model features (or output, at any layer or module).

      It denotes the “Data Input” and “Data Output”. The configuration can be {inoutx}, where x represents the xth integration point. For example, {inout1} represents the first integration point.

    • In the example above, this inter-layer insertion configuration truncates the features of the input fc2 module, passes them through, and then return to the fc2 module. At this point, the original fc2 features no longer enter.

Cross-layer Concatenation in, out
  • As shown in the above Fig. 7, the configuration expression is:

    >>> (MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{out1}(fc3)->...`
    

    where

    • {in1}: represents the integration point where the base network features (or output, at any layer or module) enter the additional add-in structure.

      It denotes the “Data Input”. The configuration can be {inx}, where x represents the xth integration point. For example, {in1} represents the first integration point.

    • {out1}: represent the integration points where the features processed by the additional add-in structure are returned to the base network.

      It denotes the “Data Output”. The configuration can be {outx}, where x represents the xth integration point. For example, {out1} represents the first integration point.

    • This cross-layer concatenation configuration extracts the features of the fc1 module’s output, passes them into the auxiliary structure, and then returns them to the base network before the fc3 module in the form of residual addition.

  • For a better prompt, let’s create a tool function that guides the input first:

    def select_from_input(prompt_for_select, valid_selections):
        selections2print = '\n\t'.join([f'[{idx + 1}] {i}' for idx, i in enumerate(valid_selections)])
        while True:
            selected = input(f"Please input a {prompt_for_select}, type 'help' to show the options: ")
    
            if selected == 'help':
                print(f"Available {prompt_for_select}(s):\n\t{selections2print}")
            elif selected.isdigit() and int(selected) >= 1 and int(selected) <= len(valid_selections):
                selected = valid_selections[int(selected) - 1]
                break
            elif selected in valid_selections:
                break
            else:
                print("Sorry, input not support.")
                print(f"Available {prompt_for_select}(s):\n\t{selections2print}")
    
        return selected
    
    available_example_config_blitzs = {
        'Insert between `fc1` and `fc2` layer (performed before `fc2`)': "(MLPAddin.adapt_input): ...->{inout1}(fc2)->...",
        'Insert between `fc1` and `fc2` layer (performed after `fc1`)': "(MLPAddin.adapt_output): ...->(fc1){inout1}->...",
        'Splice across `fc2` layer (performed before `fc2` and `fc3`)': "(MLPAddin.adapt_across_input): ...->{in1}(fc2)->{out1}(fc3)->...",
        'Splice across `fc2` layer (performed after `fc1` and before `fc3`)': "(MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{in2}(fc3)->...",
        'Splice across `fc2` layer (performed before and after `fc2`)': "(MLPAddin.adapt_across_output): ...->{in1}(fc2){in2}->...",
        'Splice across `fc2` layer (performed after `fc1` and `fc2`)': "(MLPAddin.adapt_across_output): ...->(fc1){in1}->(fc2){in2}->...",
    }
    
    config_blitz = select_from_input('add-in structure', available_example_config_blitzs.keys()) # user input about model
    
    $ Available dataset(s):
          [1] VTAB-1k.CIFAR-100
          [2] VTAB-1k.CLEVR-Count
          [3] VTAB-1k.CLEVR-Distance
          [4] VTAB-1k.Caltech101
          [5] VTAB-1k.DTD
          [6] VTAB-1k.Diabetic-Retinopathy
          [7] VTAB-1k.Dmlab
          [8] VTAB-1k.EuroSAT
          [9] VTAB-1k.KITTI
          [10] VTAB-1k.Oxford-Flowers-102
          [11] VTAB-1k.Oxford-IIIT-Pet
          [12] VTAB-1k.PatchCamelyon
          [13] VTAB-1k.RESISC45
          [14] VTAB-1k.SUN397
          [15] VTAB-1k.SVHN
          [16] VTAB-1k.dSprites-Location
          [17] VTAB-1k.dSprites-Orientation
          [18] VTAB-1k.smallNORB-Azimuth
          [19] VTAB-1k.smallNORB-Elevation
      Your selection: VTAB-1k.CIFAR-100
      Your dataset directory: /data/zhangyk/data/zhijian
    
  • Next, we will configure the parameters and proceed with model training and testing:

    args = get_args(
        model='timm.vit_base_patch16_224_in21k',    # backbone network
        config_blitz=config_blitz,                  # addin blitz configuration
        dataset='VTAB.cifar',                       # dataset
        dataset_dir='your/dataset/directory',       # dataset directory
        training_mode='finetune',                   # training mode
        optimizer='adam',                           # optimizer
        lr=1e-2,                                    # learning rate
        wd=1e-5,                                    # weight decay
        verbose=True                                # control the verbosity of the output
    )
    pprint(vars(args))
    
    $ {'aa': None,
       'addins': [{'hook': [['get_pre', 'pre'], ['adapt_across_output', 'post']],
                   'location': [['fc2'], ['fc2']],
                   'name': 'MLPAddin'}],
       'amp': False,
       'amp_dtype': 'float16',
       'amp_impl': 'native',
       'aot_autograd': False,
       'aug_repeats': 0,
       'aug_splits': 0,
       'batch_size': 64,
       'bce_loss': False,
       ...
       'warmup_epochs': 5,
       'warmup_lr': 1e-05,
       'warmup_prefix': False,
       'wd': 5e-05,
       'weight_decay': 2e-05,
       'worker_seeding': 'all'}
    
  • Run the code block below to configure the GPU and the model (excluding additional auxiliary structures):

    assert torch.cuda.is_available()
    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
    torch.cuda.set_device(int(args.gpu))
    
    model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')])
    model = ModelWrapper(model)
    model_args = dict2args({'hidden_size': 512})
    
  • Run the code block below to configure additional auxiliary structures:

    args.mlp_addin_output_size = 256
    addins, fixed_params = prepare_addins(args, model_args, addin_classes=[MLPAddin])
    
    prepare_hook(args.addins, addins, model, 'addin')
    prepare_gradient(args.reuse_keys, model)
    device = prepare_cuda(model)
    
  • Run the code block below to configure the dataset, optimizer, loss function, and other settings:

    train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
    
    optimizer = optim.Adam(
        model.parameters(),
        lr=args.lr,
        weight_decay=args.wd
    )
    lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(
        optimizer,
        args.max_epoch,
        eta_min=args.eta_min
    )
    criterion = nn.CrossEntropyLoss()
    
  • Run the code block below to prepare the trainer object and start training and testing:

    trainer = prepare_trainer(
        args,
        model=model,
        model_args=model_args,
        device=device,
        train_loader=train_loader,
        val_loader=val_loader,
        num_classes=num_classes,
        optimizer=optimizer,
        lr_scheduler=lr_scheduler,
        criterion=criterion
    )
    
    trainer.fit()
    trainer.test()
    
    $ Log level set to: INFO
      Log files are recorded in: your/log/directory/0718-19-52-36-748
      Trainable/total parameters of the model: 0.03M / 38.64M (0.08843%)
    
            Epoch   GPU Mem.       Time       Loss         LR
              1/5     0.589G     0.1355      4.602      0.001: 100%|██████████| 16.0/16.0 [00:01<00:00, 12.9batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5     0.629G    0.03114      1.871      7.932: 100%|██████████| 157/157 [00:05<00:00, 30.9batch/s]
      ***   Best results: [Acc@1: 1.8710191082802548], [Acc@5: 7.931926751592357]
    
            Epoch   GPU Mem.       Time       Loss         LR
              2/5     0.784G     0.1016      4.538 0.00090451: 100%|██████████| 16.0/16.0 [00:00<00:00, 19.4batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              2/5     0.784G    0.02669      2.498      9.504: 100%|██████████| 157/157 [00:04<00:00, 35.9batch/s]
      ***   Best results: [Acc@1: 2.4980095541401273], [Acc@5: 9.504378980891719]
    
            Epoch   GPU Mem.       Time       Loss         LR
              3/5     0.784G    0.09631      4.488 0.00065451: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.6batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              3/5     0.784G    0.02688      2.379      10.16: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s]
      ***   Best results: [Acc@1: 2.3785828025477707], [Acc@5: 10.161226114649681]
    
            Epoch   GPU Mem.       Time       Loss         LR
              4/5     0.784G    0.09126       4.45 0.00034549: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.2batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              4/5     0.784G    0.02644      2.468      10.29: 100%|██████████| 157/157 [00:04<00:00, 36.2batch/s]
      ***   Best results: [Acc@1: 2.468152866242038], [Acc@5: 10.290605095541402]
    
            Epoch   GPU Mem.       Time       Loss         LR
              5/5     0.784G     0.0936      4.431 9.5492e-05: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.5batch/s]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              5/5     0.784G    0.02706      2.558      10.43: 100%|██████████| 157/157 [00:04<00:00, 35.8batch/s]
      ***   Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
    
            Epoch   GPU Mem.       Time      Acc@1      Acc@5
              1/5     0.784G    0.02667      2.558      10.43: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s]
      ***   Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
    

Advanced: Knowledge Transfer

🛠️

Advanced: Model Merging

🛠️

zhijian.args

Base Args

Preprocess

Args of Pre-trained Model

Args of Architect

Args of Tuner

Args of Merger

zhijian.models

Prepare Pre-trained Model

Switch to GPU

Adjust Which Part of the Parameters to Fine-tune

zhijian.data

Dataset

zhijian.trainer

Base Trainer

Architect Module

Prepare External Structure

Add External Structure to Pre-trained Model

Tuner Module

Knowledge Matching and Transfer

Regularization Constraints

Merger Module

Merge Trained Parameters

Contributing to ZhiJian

To submit a Pull Request (PR) to the ZhiJian project, follow these simple steps:

  1. Fork the project by clicking the “Fork” button at the top right corner of the page.

  2. Upload and modify the necessary files.

  3. Submit your Pull Request:

    • Go to your Forked project page and click the “Pull Request” button.

    • Select the branch with your changes and the main project’s branch on the comparison page.

    • Provide a brief description and any additional details about your changes.

    • Click the “Create Pull Request” button to submit your PR.

If you encounter any issues or need further assistance, please feel free to contact us at yumzhangyk@gmail.com.

Thank you for your contribution! We will review your PR and provide feedback as soon as possible.

Contributors

We sincerely appreciate and encourage contributions to enhance the development of ZhiJian. Presented below is a partial list of our esteemed contributors (for a more comprehensive list, please refer to here).

Indices and tables