Welcome to ZhiJian!¶

ZhiJian (执简驭繁) is a comprehensive and user-friendly PyTorch
-based toolbox for leveraging foundation pre-trained models and their fine-tuned counterparts to extract knowledge and expedite learning in real-world tasks, i.e., serving the Model Reuse tasks.
The rapid progress in deep learning has led to the emergence of numerous open-source Pre-Trained Models (PTMs) on platforms like PyTorch, TensorFlow, and HuggingFace Transformers. Leveraging these PTMs for specific tasks empowers them to handle objectives effectively, creating valuable resources for the machine-learning community. Reusing PTMs is vital in enhancing target models’ capabilities and efficiency, achieved through adapting the architecture, customizing learning on target data, or devising optimized inference strategies to leverage PTM knowledge. To facilitate a holistic consideration of various model reuse strategies, ZhiJian categorizes model reuse methods into three sequential modules: Architect, Tuner, and Merger, aligning with the stages of model preparation, model learning, and model inference on the target task, respectively. The provided interface methods include:
A rchitect Module
The Architect module involves modifying the pre-trained model to fit the target task, and reusing certain parts of the pre-trained model while introducing new learnable parameters with specialized structures.
Linear Probing & Partial-k, How transferable are features in deep neural networks? In: NeurIPS’14. [Paper]
Adapter, Parameter-Efficient Transfer Learning for NLP. In: ICML’19. [Paper]
Diff Pruning, Parameter-Efficient Transfer Learning with Diff Pruning. In: ACL’21. [Paper]
LoRA, LoRA: Low-Rank Adaptation of Large Language Models. In: ICLR’22. [Paper]
Visual Prompt Tuning / Prefix, Visual Prompt Tuning. In: ECCV’22. [Paper]
Head2Toe, Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning. In:ICML’22. [Paper]
Scaling & Shifting, Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning. In: NeurIPS’22. [Paper]
AdaptFormer, AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In: NeurIPS’22. [Paper]
BitFit, BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In: ACL’22. [Paper]
Convpass, Convolutional Bypasses Are Better Vision Transformer Adapters. In: Tech Report 07-2022. [Paper]
Fact-Tuning, FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer. In: AAAI’23. [Paper]
VQT, Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning. In: CVPR’23. [Paper]
T uner Module
The Tuner module focuses on training the target model with guidance from pre-trained model knowledge to expedite the optimization process, e.g., via adjusting objectives, optimizers, or regularizers.
Knowledge Transfer and Matching, NeC4.5: neural ensemble based C4.5. In: IEEE Trans. Knowl. Data Eng. 2004. [Paper]
FitNet, FitNets: Hints for Thin Deep Nets. In: ICLR’15. [Paper]
LwF, Learning without Forgetting. In: ECCV’16. [Paper]
FSP, A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: CVPR’17. [Paper]
NST, Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. In: CVPR’17. [Paper]
RKD, Relational Knowledge Distillation. In: CVPR’19. [Paper]
SPKD, Similarity-Preserving Knowledge Distillation. In: CVPR’19. [Paper]
CRD, Contrastive Representation Distillation. In: ICLR’20. [Paper]
REFILLED, Distilling Cross-Task Knowledge via Relationship Matching. In: CVPR’20. [Paper]
WiSE-FT, Robust fine-tuning of zero-shot models. In: CVPR’22. [Paper]
L2 penalty / L2 SP, Explicit Inductive Bias for Transfer Learning with Convolutional Networks. In:ICML’18. [Paper]
Spectral Norm, Spectral Normalization for Generative Adversarial Networks. In: ICLR’18. [Paper]
BSS, Catastrophic Forgetting Meets Negative Transfer:Batch Spectral Shrinkage for Safe Transfer Learning. In: NeurIPS’19.. [Paper]
DELTA, DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks. In: ICLR’19. [Paper]
DeiT, Training data-efficient image transformers & distillation through attention. In ICML’21. [Paper]
DIST, Knowledge Distillation from A Stronger Teacher. In: NeurIPS’22. [Paper]
M erger Module
The Merger module influences the inference phase by either reusing pre-trained features or incorporating adapted logits from the pre-trained model.
Logits Ensemble, Ensemble Methods: Foundations and Algorithms. 2012. [Book]
Nearest Class Mean, Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost. In: IEEE Trans. Pattern Anal. Mach. Intell. 2013. [Paper]
SimpleShot, SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. In: CVPR’19. [Paper]
via Optimal Transport, Model Fusion via Optimal Transport. In: NeurIPS’20. [Paper]
Model Soup, Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: ICML’22. [Paper]
Fisher Merging, Merging Models with Fisher-Weighted Averaging. In: NeurIPS’22. [Paper]
Deep Model Reassembly, Deep Model Reassembly. In: NeurIPS’22. [Paper]
REPAIR, REPAIR: REnormalizing Permuted Activations for Interpolation Repair. In: ICLR’23. [Paper]
Git Re-Basin, Git Re-Basin: Merging Models modulo Permutation Symmetries. In: ICLR’23. [Paper]
ZipIt, ZipIt! Merging Models from Different Tasks without Training. [Paper]
💡 ZhiJian also has the following highlights:
Support reuse of various pre-trained model zoo, including:
PyTorch Torchvision; OpenAI CLIP; 🤗Hugging Face PyTorch Image Models (timm), Transformers
Other popular projects, e.g., vit-pytorch (stars 14k)
Extremely easy to get started and customize
Get started with a 10 minute blitz [Open In Colab]
Customize datasets and pre-trained models with step-by-step instructions [Open In Colab]
Feel free to create a novel approach for reusing pre-trained model [Open In Colab]
Concise things do big
Only ~5000 lines of the base code, with incorporating method like building LEGO blocks
State-of-the-art results on VTAB benchmark with approximately 10k experiments [here]
Support friendly guideline and comprehensive documentation to custom dataset and pre-trained model [here]
🔥 The Naming of ZhiJian: In Chinese “ZhiJian-YuFan” means handling complexity with concise and efficient methods. Given the variations in pre-trained models and the deployment overhead of full parameter fine-tuning, ZhiJian represents a solution that is easily reusable, maintains high accuracy, and maximizes the potential of pre-trained models. “执简驭繁”的意思是用简洁高效的方法驾驭纷繁复杂的事物。“繁”表示现有预训练模型和复用方法种类多、差异大、部署难,所以取名”执简”的意思是通过该工具包,能轻松地驾驭模型复用方法,易上手、快复用、稳精度,最大限度地唤醒预训练模型的知识。
🕹️ Quick Start¶
An environment with Python 3.7+ from conda, venv, or virtualenv.
Install ZhiJian using pip:
$ pip install zhijian
For more details please click installation instructions.
[Option] Install with the newest version through GitHub:
$ pip install git+https://github.com/zhangyikaii/lamda-zhijian.git@main --upgrade
Open your python console and type:
import zhijian print(zhijian.__version__)
If no error occurs, you have successfully installed ZhiJian.
📚 Documentation¶
The tutorials and API documentation are hosted on zhijian.readthedocs.io
中文文档位于 zhijian.readthedocs.io/zh
Why ZhiJian?¶
Related Library |
Stars |
# of Alg. |
# of Model |
# of Dataset |
# of Fields |
LLM Supp. |
Docs. |
---|---|---|---|---|---|---|---|
8k+ |
6 |
~15 |
–(3) |
1 (a) |
✔️ |
✔️ |
|
1k+ |
10 |
~15 |
–(3) |
1 (a) |
❌ |
✔️ |
|
2k+ |
4 |
5 |
~20 |
1 (a) |
✔️ |
❌ |
|
1k+ |
20 |
2 |
2 |
1 (b) |
❌ |
❌ |
|
608 |
10 |
3 |
2 |
1 (c) |
❌ |
❌ |
|
255 |
3 |
3 |
5 |
1 (d) |
❌ |
❌ |
|
410 |
3 |
5 |
4 |
1 (d) |
❌ |
❌ |
|
ZhiJian (Ours) |
ing |
30+ |
~50 |
19 |
1 (a,b,c,d) |
✔️ |
✔️ |
Get Started¶
👋🏼ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
What & Why Reuse?
Performing downstream tasks with the help of pre-trained model, including model structures, weights, or other derived rules.
Significantly accelerating convergence and improving downstream performance.
The recent booming development of deep learning techniques has resulted in a mushrooming of open-source pre-trained models, with significant contributions from PyTorch
, TensorFlow
, and HuggingFace Transformers
.
These PTMs stem from the de-facto paradigm as “pre-training to full-parameter fine-tuning”, where one of the most fundamental and representative approaches involves initializing the model with pre-trained weights. Recently, advanced methods have been developed to harness PTM knowledge from diverse perspectives. These approaches include expanding model structures, applying constraints on weight initialization, or seeking guidance from the source hypothesis space. They are applicable in scenarios where target task data accumulates dynamically or exhibits distribution shifts.
To better categorize and summarize the reuse methods, we consider the architect, tuner, and merger pipeline. ZhiJian offers a modular design, easy-to-use interfaces, and rich custom configuration, empowering deep learning practitioners to seamlessly switch between and combine diverse reuse methods. Furthermore, it boosts the creation of novel reuse methods tailored to current target tasks.
Overview¶
In the following example, we show how ZhiJian:
Construct a Pre-Trained Vision Transformer from
timm
with custom LoRA module
Tune with supervision on CIFAR-100 dataset
Infer to evaluate the performance
The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

All in just 10 minutes
1 min to install
zhijian
2 mins to select the dataset
3 mins to construct the Vision Transformer from
timm
with custom LoRA module4 mins to deploy supervised fine-tuning and test process
🚀 Let’s get started!
Install ZhiJian¶
$ pip install zhijian
- After installation, open your python console and type
import zhijian print(zhijian.__version__)
If no error occurs, you have successfully installed.
Select Dataset¶
ZhiJian provides the loading interface for 19 datasets of VTAB benchmark, which spanning several domains including general objects, animals and plants, food and daily necessities, medicine, remote sensing and so on. Customize your own dataset, please see here.
For better prompting, we first import a tool function that guides the input:
from zhijian.models.utils import select_from_input
Now, run the following code block, select the target dataset (CIFAR-100) and corresponding directory to be deployed:
available_datasets = [ 'VTAB-1k.CIFAR-100', 'VTAB-1k.CLEVR-Count', 'VTAB-1k.CLEVR-Distance', 'VTAB-1k.Caltech101', 'VTAB-1k.DTD', 'VTAB-1k.Diabetic-Retinopathy', 'VTAB-1k.Dmlab', 'VTAB-1k.EuroSAT', 'VTAB-1k.KITTI', 'VTAB-1k.Oxford-Flowers-102', 'VTAB-1k.Oxford-IIIT-Pet', 'VTAB-1k.PatchCamelyon', 'VTAB-1k.RESISC45', 'VTAB-1k.SUN397', 'VTAB-1k.SVHN', 'VTAB-1k.dSprites-Location', 'VTAB-1k.dSprites-Orientation', 'VTAB-1k.smallNORB-Azimuth', 'VTAB-1k.smallNORB-Elevation' ] # dataset options. dataset = select_from_input('dataset', available_datasets) # user input about dataset dataset_dir = input(f"Please input your dataset directory: ") # user input about dataset directory
$ Please input a dataset, type 'help' to show the options: help $ Available dataset(s): [1] VTAB-1k.CIFAR-100 [2] VTAB-1k.CLEVR-Count [3] VTAB-1k.CLEVR-Distance [4] VTAB-1k.Caltech101 [5] VTAB-1k.DTD [6] VTAB-1k.Diabetic-Retinopathy [7] VTAB-1k.Dmlab [8] VTAB-1k.EuroSAT [9] VTAB-1k.KITTI [10] VTAB-1k.Oxford-Flowers-102 [11] VTAB-1k.Oxford-IIIT-Pet [12] VTAB-1k.PatchCamelyon [13] VTAB-1k.RESISC45 [14] VTAB-1k.SUN397 [15] VTAB-1k.SVHN [16] VTAB-1k.dSprites-Location [17] VTAB-1k.dSprites-Orientation [18] VTAB-1k.smallNORB-Azimuth [19] VTAB-1k.smallNORB-Elevation $ Please input a dataset, type 'help' to show the options: 1 $ Your selection: [1] VTAB-1k.CIFAR-100 $ Please input your dataset directory: your/dataset/directory
Construct Pre-trained Model¶
Next, we will construct a pre-trained Vision Transformer from timm
library, with the custom LoRA module.
Seamlessly modify the structure is possible. ZhiJian welcomes any base model and any additional modifications. The base part supports:
🤗 Hugging Face series — PyTorch Image Models (timm), Transformers, PyTorch series — Torchvision, and OpenAI series — CLIP.
Other popular projects, e.g., vit-pytorch (stars 14k) and any custom architecture.
Large Language Model, including baichuan (7B), LLaMA (7B/13B), and BLOOM (560M/1.1B/1.7B/3B/7.1B).
ZhiJian also includes assembling additional tuning structures, similar to building LEGO bricks. For more detailed customization of each part, please see here.
Adapt the Vision Transformer structure just requires 1~3 lines of code.
Now, run the following code block, select the model architecture (Vision Transformer as below):
available_example_models = { 'timm.vit_base_patch16_224_in21k': { 'LoRA': '(LoRA.adapt): ...->(blocks[0:12].attn.qkv){inout1}->...', 'Adapter': '(Adapter.adapt): ...->(blocks[0:12].drop_path1){inout1}->...', 'Convpass': ('(Convpass.adapt): ...->(blocks[0:12].norm1){in1}->(blocks[0:11].drop_path1){in2}->...,' # follow the next line '(Convpass.adapt): ...->{in1}(blocks[0:11].norm2)->(blocks[0:12].drop_path2){in2}->...'), 'None': None } } # model options, Dict(model name: Dict(add-in structure name: add-in blitz configuration)). model = select_from_input('model', list(available_example_models.keys())) # user input about model
$ Please input a model, type 'help' to show the options: help $ Available model(s): [1] timm.vit_base_patch16_224_in21k $ Please input a model, type 'help' to show the options: 1 $ Your selection: [1] timm.vit_base_patch16_224_in21k
Next, run the following code block, select the additional add-in structure (LoRA as below):
availables = available_example_models[model] config_blitz = availables[select_from_input('add-in structure', availables.keys())] # user input about add-in structure
$ Please input a add-in structure, type 'help' to show the options: help $ Available add-in structure(s): [1] LoRA [2] Adapter [3] Convpass [4] None $ Please input a add-in structure, type 'help' to show the options: 1 $ Your selection: [1] LoRA
Deploy Training and Test Process¶
ZhiJian enables customization of the fine-tune which part of the parameters using args.reuse_key, such as assigning blocks[6:8] to only tune model.blocks[6] to model.blocks[8] and their sub-modules.
Now, run the following code block, select which part of the parameters to fine-tune (the rest are frozen)
available_example_reuse_modules = { 'timm.vit_base_patch16_224_in21k': { 'linear layer only': 'addin,head,fc_norm', 'the last block and the linear layer (Partial-1)': 'addin,blocks[11],head,fc_norm', 'the last two blocks and the linear layer (Partial-2)': 'addin,blocks[10:12],head,fc_norm', 'the last four blocks and the linear layer (Partial-4)': 'addin,blocks[8:12],head,fc_norm', 'all parameters': '' } } availables = available_example_reuse_modules[model] reuse_modules_blitz = availables[select_from_input('reuse module', availables.keys())] # user input about reuse modules
$ Please input a reuse module, type 'help' to show the options: help $ Available reuse modules(s): [1] add-ins and linear layer [2] add-ins and the last block and the linear layer (Partial-1) [3] add-ins and the last two blocks and the linear layer (Partial-2) [4] add-ins and the last four blocks and the linear layer (Partial-4) $ Please input a reuse module, type 'help' to show the options: 1 $ Your selection: [1] add-ins and linear layer
Taking the
training_mode
as finetune, and next, we configure the parametersFor the rest of the training configuration with more customization options, please see here
training_mode = 'finetune' args = get_args( dataset=dataset, # dataset dataset_dir=dataset_dir, # dataset directory model=model, # backbone network config_blitz=config_blitz, # addin blitz configuration training_mode=training_mode, # training mode optimizer='adam', # optimizer lr=1e-2, # learning rate wd=1e-5, # weight decay gpu='0', # gpu id verbose=True # control the verbosity of the output ) pprint(vars(args))
$ Preparing args.. {'aa': None, 'addins': [{'hook': [['adapt', 'post']], 'location': [['blocks', 0, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 1, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 2, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 3, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 4, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 5, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 6, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 7, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 8, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 9, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 10, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 11, 'attn', 'qkv']], 'name': 'LoRA'}], 'amp': False, 'amp_dtype': 'float16', 'amp_impl': 'native', 'aot_autograd': False, 'aug_repeats': 0, 'aug_splits': 0, 'batch_size': 64, 'bce_loss': False, ... 'warmup_epochs': 5, 'warmup_lr': 1e-05, 'warmup_prefix': False, 'wd': 5e-05, 'weight_decay': 2e-05, 'worker_seeding': 'all'}
Next, run the following code block to configure the GPU:
assert torch.cuda.is_available() os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu torch.cuda.set_device(int(args.gpu))
Run the following to get the pre-trained model, which includes the additional add-in modules that have been accessed:
model, model_args, device = get_model(args)
Run the following to get the
dataloader
:train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
$ Log level set to: INFO Log files are recorded in: your/log/directory/0718-15-17-52-580 Trainable/total parameters of the model: 0.37M / 86.17M (0.43148%)
Run the following to prepare the optimizer, learning rate scheduler and loss function
For more customization options, please see TODO
optimizer = optim.Adam( model.parameters(), lr=args.lr, weight_decay=args.wd ) lr_scheduler = optim.lr_scheduler.CosineAnnealingLR( optimizer, args.max_epoch, eta_min=args.eta_min ) criterion = nn.CrossEntropyLoss()
Run the following to initialize the
trainer
, ready to start training:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, train_loader=train_loader, val_loader=val_loader, num_classes=num_classes, optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion )
Run the following to train and test with ZhiJian:
trainer.fit() trainer.test()
$ Epoch GPU Mem. Time Loss LR 1/5 7.16G 0.3105 4.629 0.001: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.66batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 7.16G 0.1188 3.334 14.02: 100%|██████████| 157/157 [00:18<00:00, 8.35batch/s] *** Best results: [Acc@1: 3.3339968152866244], [Acc@5: 14.022691082802547] Epoch GPU Mem. Time Loss LR 2/5 7.16G 0.2883 4.255 0.00090451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 2/5 7.16G 0.1182 4.22 16.28: 100%|██████████| 157/157 [00:18<00:00, 8.37batch/s] *** Best results: [Acc@1: 4.219745222929936], [Acc@5: 16.28184713375796] Epoch GPU Mem. Time Loss LR 3/5 7.16G 0.296 4.026 0.00065451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 3/5 7.16G 0.1197 5.255 17.71: 100%|██████████| 157/157 [00:18<00:00, 8.28batch/s] *** Best results: [Acc@1: 5.254777070063694], [Acc@5: 17.70501592356688] Epoch GPU Mem. Time Loss LR 4/5 7.16G 0.2983 3.88 0.00034549: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.87batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 4/5 7.16G 0.1189 5.862 19.06: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s] *** Best results: [Acc@1: 5.8618630573248405], [Acc@5: 19.058519108280255] Epoch GPU Mem. Time Loss LR 5/5 7.16G 0.2993 3.811 9.5492e-05: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.90batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 5/5 7.16G 0.119 5.723 19.39: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s] *** Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236] Epoch GPU Mem. Time Acc@1 Acc@5 1/1 7.16G 0.1192 5.723 19.39: 100%|██████████| 157/157 [00:18<00:00, 8.30batch/s] *** Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
Config with ~1 Line Blitz¶
🌱ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
What & Why Reuse?
Performing downstream tasks with the help of pre-trained model, including model structures, weights, or other derived rules.
Significantly accelerating convergence and improving downstream performance.
In ZhiJian, adding the LoRA module to the pre-trained model and adjusting which part of the parameters to fine-tune just require about :customlyellow:`one` line of code.
Overview¶
In the following example, we show how ZhiJian:
Represent the modules of the pre-trained model
Config the extended add-in module with entry points
Modules of Pre-trained Model in One Line description¶
In the Architect module, to facilitate the modification of model structures, additional adaptive structures are incorporated into pre-trained models. ZhiJian accepts a one-line serialized representation of the base pre-trained model, as exemplified in the Vision Transformer model from the timm
library in the following manner:

The modules within the parentheses ()
represent the base pre-trained model, and the dot .
is used as a access operator.
The arrows ->
indicate the connections between modules, and ellipsis ...
represents default modules. Partial structures can be connected with arrows.
Extended Add-in Module with Entry Points¶
We use (): ` to denote an additional adaptive structure, where the part after the dot :code:
.` represents the main forward function of the extra structure. The data flows into the module and primarily passes through this method.
We use {}
to indicate the entry points of the extra structure into the pre-trained model, encompassing the entry of source model features and the return points of features after the added structure is processed.
With the aforementioned configuration, ZhiJian seamlessly supports the modification of pre-trained model structures. It automatically recognizes the additional structures defined in zhijianmodelsaddin
, enabling the construction of pre-trained models.
Customize Pre-trained Model¶
🛠️ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
Overview¶
In the following example, we show how to customize your own pre-trained model with a new target structure in ZhiJian.
Feel free to deploy model reusability technology on any pre-trained model, with loading in the conventional PyTorch style.
Construct Custom Model¶
Let’s begin with a three-layer Multilayer Perceptron (MLP).

Custom Multilayer Perceptron (MLP) Architecture¶
Although a multi-layer perceptron is not a good image learner, we can quickly get started with it. For other custom networks, we can also make similar designs and modifications by analogy.
Run the code block below to customize the model:
import torch.nn as nn
class MLP(nn.Module):
"""
MLP Class
==============
Multilayer Perceptron (MLP) model for image (224x224) classification tasks.
Args:
args (object): Custom arguments or configurations.
num_classes (int): Number of output classes.
"""
def __init__(self, args, num_classes):
super(MLP, self).__init__()
self.args = args
self.image_size = 224
self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, num_classes)
def forward(self, x):
"""
Forward pass of the model.
Args:
x (torch.Tensor): Input tensor.
Returns:
torch.Tensor: Output logits.
"""
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = nn.ReLU()(x)
x = self.fc2(x)
x = nn.ReLU()(x)
x = self.fc3(x)
return x
Next, run the code block below to configure the GPU and the model:
model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')]) model = ModelWrapper(model) model_args = dict2args({'hidden_size': 512})
Now, run the code block below to prepare the
trainer
with passing in the parametermodel
:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, ... ) trainer.fit() trainer.test()
$ Log level set to: INFO Log files are recorded in: your/log/directory/0718-19-52-36-748 Trainable/total parameters of the model: 0.03M / 38.64M (0.08843%) Epoch GPU Mem. Time Loss LR 1/5 0.589G 0.1355 4.602 0.001: 100%|██████████| 16.0/16.0 [00:01<00:00, 12.9batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 0.629G 0.03114 1.871 7.932: 100%|██████████| 157/157 [00:05<00:00, 30.9batch/s] *** Best results: [Acc@1: 1.8710191082802548], [Acc@5: 7.931926751592357] Epoch GPU Mem. Time Loss LR 2/5 0.784G 0.1016 4.538 0.00090451: 100%|██████████| 16.0/16.0 [00:00<00:00, 19.4batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 2/5 0.784G 0.02669 2.498 9.504: 100%|██████████| 157/157 [00:04<00:00, 35.9batch/s] *** Best results: [Acc@1: 2.4980095541401273], [Acc@5: 9.504378980891719] Epoch GPU Mem. Time Loss LR 3/5 0.784G 0.09631 4.488 0.00065451: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.6batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 3/5 0.784G 0.02688 2.379 10.16: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s] *** Best results: [Acc@1: 2.3785828025477707], [Acc@5: 10.161226114649681] Epoch GPU Mem. Time Loss LR 4/5 0.784G 0.09126 4.45 0.00034549: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.2batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 4/5 0.784G 0.02644 2.468 10.29: 100%|██████████| 157/157 [00:04<00:00, 36.2batch/s] *** Best results: [Acc@1: 2.468152866242038], [Acc@5: 10.290605095541402] Epoch GPU Mem. Time Loss LR 5/5 0.784G 0.0936 4.431 9.5492e-05: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.5batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 5/5 0.784G 0.02706 2.558 10.43: 100%|██████████| 157/157 [00:04<00:00, 35.8batch/s] *** Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 0.784G 0.02667 2.558 10.43: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s] *** Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
Customize Dataloader¶
📂ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
Overview¶
In the following example, we show how to customize your own dataloader for a new target dataset in ZhiJian.
Feel free to deploy model reusability technology on any dataset, with loading in the conventional PyTorch style.
Prepare Custom Dataset¶
Configure without dataset configuration and organize the custom dataset in the following structure:
within the
your/dataset/dir
directorycreate a separate folder for each category
store all the data corresponding to each category within its respective folder
/your/dataset/directory ├── train │ ├── class_1 │ │ ├── train_class_1_img_1.jpg │ │ ├── train_class_1_img_2.jpg │ │ ├── train_class_1_img_3.jpg │ │ └── ... │ ├── class_2 │ │ ├── train_class_2_img_1.jpg │ │ └── ... │ ├── class_3 │ │ └── ... │ ├── class_4 │ │ └── ... │ ├── class_5 │ │ └── ... └── test ├── class_1 │ ├── test_class_1_img_1.jpg │ ├── test_class_1_img_2.jpg │ ├── test_class_1_img_3.jpg │ └── ... ├── class_2 │ ├── test_class_2_img_1.jpg │ └── ... ├── class_3 │ └── ... ├── class_4 │ └── ... └── class_5 └── ...
Set up the custom dataset:
train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) val_transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) train_dataset = ImageFolder(root='/your/dataset/directory/train', transform=train_transform) val_dataset = ImageFolder(root='/your/dataset/directory/test', transform=val_transform)
Implement the corresponding loader:
train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, num_workers=args.num_workers, pin_memory=True, shuffle=True ) val_loader = torch.utils.data.DataLoader( val_dataset, batch_size=args.batch_size, num_workers=args.num_workers, pin_memory=True, shuffle=False ) num_classes = len(train_dataset.classes)
Now, set up the trainer with passing in parameter
train_loader
andval_loader
:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, train_loader=train_loader, val_loader=val_loader, num_classes=num_classes, optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion ) trainer.fit() trainer.test()
$ Log level set to: INFO Log files are recorded in: your/log/directory/0718-20-10-57-792 Trainable/total parameters of the model: 0.30M / 86.10M (0.34700%) Epoch GPU Mem. Time Loss LR 1/5 5.48G 1.686 1.73 0.001: 100%|██████████| 1.00/1.00 [00:01<00:00, 1.22s/batch] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 5.48G 0.3243 16 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.39batch/s] *** Best results: [Acc@1: 16.0], [Acc@5: 100.0] Epoch GPU Mem. Time Loss LR 2/5 5.6G 1.093 1.448 0.00090451: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.52batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 2/5 5.6G 0.2647 12 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.58batch/s] *** Best results: [Acc@1: 12.0], [Acc@5: 100.0] Epoch GPU Mem. Time Loss LR 3/5 5.6G 1.088 1.369 0.00065451: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.54batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 3/5 5.6G 0.2899 12 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.54batch/s] *** Best results: [Acc@1: 12.0], [Acc@5: 100.0] Epoch GPU Mem. Time Loss LR 4/5 5.6G 1.067 1.403 0.00034549: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.53batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 4/5 5.6G 0.2879 16 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.42batch/s] *** Best results: [Acc@1: 16.0], [Acc@5: 100.0] Epoch GPU Mem. Time Loss LR 5/5 5.6G 1.077 1.342 9.5492e-05: 100%|██████████| 1.00/1.00 [00:00<00:00, 1.55batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 5/5 5.6G 0.246 16 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.79batch/s] *** Best results: [Acc@1: 16.0], [Acc@5: 100.0] Epoch GPU Mem. Time Acc@1 Acc@5 1/1 5.6G 0.2901 16 100: 100%|██████████| 1.00/1.00 [00:00<00:00, 2.52batch/s] *** Best results: [Acc@1: 16.0], [Acc@5: 100.0]
Fine-tune a Pre-trained ViT from timm
¶
👓
ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
Overview¶
In the following example, we show how ZhiJian:
Construct a Pre-Trained Vision Transformer from
timm
Tune with supervision on CIFAR-100 dataset
Infer to evaluate the performance
The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

Prepare Dataset and Model¶
ZhiJian provides the loading interface for 19 datasets of VTAB benchmark, which spanning several domains including general objects, animals and plants, food and daily necessities, medicine, remote sensing and so on. Customize your own dataset, please see here.
For better prompting, we first import a tool function that guides the input:
from zhijian.models.utils import select_from_input
Now, run the following code block, select the target dataset (CIFAR-100) and corresponding directory to be deployed:
available_datasets = [ 'VTAB-1k.CIFAR-100', 'VTAB-1k.CLEVR-Count', 'VTAB-1k.CLEVR-Distance', 'VTAB-1k.Caltech101', 'VTAB-1k.DTD', 'VTAB-1k.Diabetic-Retinopathy', 'VTAB-1k.Dmlab', 'VTAB-1k.EuroSAT', 'VTAB-1k.KITTI', 'VTAB-1k.Oxford-Flowers-102', 'VTAB-1k.Oxford-IIIT-Pet', 'VTAB-1k.PatchCamelyon', 'VTAB-1k.RESISC45', 'VTAB-1k.SUN397', 'VTAB-1k.SVHN', 'VTAB-1k.dSprites-Location', 'VTAB-1k.dSprites-Orientation', 'VTAB-1k.smallNORB-Azimuth', 'VTAB-1k.smallNORB-Elevation' ] # dataset options. dataset = select_from_input('dataset', available_datasets) # user input about dataset dataset_dir = input(f"Please input your dataset directory: ") # user input about dataset directory
$ Please input a dataset, type 'help' to show the options: help $ Available dataset(s): [1] VTAB-1k.CIFAR-100 [2] VTAB-1k.CLEVR-Count [3] VTAB-1k.CLEVR-Distance [4] VTAB-1k.Caltech101 [5] VTAB-1k.DTD [6] VTAB-1k.Diabetic-Retinopathy [7] VTAB-1k.Dmlab [8] VTAB-1k.EuroSAT [9] VTAB-1k.KITTI [10] VTAB-1k.Oxford-Flowers-102 [11] VTAB-1k.Oxford-IIIT-Pet [12] VTAB-1k.PatchCamelyon [13] VTAB-1k.RESISC45 [14] VTAB-1k.SUN397 [15] VTAB-1k.SVHN [16] VTAB-1k.dSprites-Location [17] VTAB-1k.dSprites-Orientation [18] VTAB-1k.smallNORB-Azimuth [19] VTAB-1k.smallNORB-Elevation $ Please input a dataset, type 'help' to show the options: 1 $ Your selection: [1] VTAB-1k.CIFAR-100 $ Please input your dataset directory: your/dataset/directory
Next, we will construct a pre-trained Vision Transformer from timm
library.
Seamlessly modify the structure is possible. ZhiJian welcomes any base model and any additional modifications. The base part supports:
🤗 Hugging Face series — PyTorch Image Models (timm), Transformers, PyTorch series — Torchvision, and OpenAI series — CLIP.
Other popular projects, e.g., vit-pytorch (stars 14k) and any custom architecture.
Large Language Model, including baichuan (7B), LLaMA (7B/13B), and BLOOM (560M/1.1B/1.7B/3B/7.1B).
Adapt the Vision Transformer structure just requires 1~3 lines of code. Customize your own pre-trained model, please see here.
Now, run the following code block, select the model architecture (Vision Transformer as below):
available_example_models = { 'timm.vit_base_patch16_224_in21k': { 'LoRA': '(LoRA.adapt): ...->(blocks[0:12].attn.qkv){inout1}->...', 'Adapter': '(Adapter.adapt): ...->(blocks[0:12].drop_path1){inout1}->...', 'Convpass': ('(Convpass.adapt): ...->(blocks[0:12].norm1){in1}->(blocks[0:11].drop_path1){in2}->...,' # follow the next line '(Convpass.adapt): ...->{in1}(blocks[0:11].norm2)->(blocks[0:12].drop_path2){in2}->...'), 'None': None } } # model options, Dict(model name: Dict(add-in structure name: add-in blitz configuration)). model = select_from_input('model', list(available_example_models.keys())) # user input about model
$ Please input a model, type 'help' to show the options: help $ Available model(s): [1] timm.vit_base_patch16_224_in21k $ Please input a model, type 'help' to show the options: 1 $ Your selection: [1] timm.vit_base_patch16_224_in21k
Deploy Training and Test Process¶
ZhiJian enables customization of the fine-tune which part of the parameters using args.reuse_key, such as assigning blocks[6:8] to only tune model.blocks[6] to model.blocks[8] and their sub-modules.
Taking the
training_mode
as finetune, and next, we configure the parametersFor the rest of the training configuration with more customization options, please see here
training_mode = 'finetune' args = get_args( dataset=dataset, # dataset dataset_dir=dataset_dir, # dataset directory model=model, # backbone network config_blitz=config_blitz, # addin blitz configuration training_mode=training_mode, # training mode optimizer='adam', # optimizer lr=1e-2, # learning rate wd=1e-5, # weight decay gpu='0', # gpu id verbose=True # control the verbosity of the output ) pprint(vars(args))
$ { 'addins': [{'hook': [['adapt', 'post']], 'location': [['blocks', 0, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 1, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 2, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 3, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 4, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 5, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 6, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 7, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 8, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 9, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 10, 'attn', 'qkv']], 'name': 'LoRA'}, {'hook': [['adapt', 'post']], 'location': [['blocks', 11, 'attn', 'qkv']], 'name': 'LoRA'}], 'amp': False, 'amp_dtype': 'float16', 'amp_impl': 'native', 'aot_autograd': False, 'aug_repeats': 0, 'aug_splits': 0, 'batch_size': 64, 'bce_loss': False, ... 'warmup_epochs': 5, 'warmup_lr': 1e-05, 'warmup_prefix': False, 'wd': 5e-05, 'weight_decay': 2e-05, 'worker_seeding': 'all'}
Next, run the following code block to configure the GPU:
assert torch.cuda.is_available() os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu torch.cuda.set_device(int(args.gpu))
Run the following to get the pre-trained model, which includes the additional add-in modules that have been accessed:
model, model_args, device = get_model(args)
Run the following to get the
dataloader
:train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args)
$ Log level set to: INFO Log files are recorded in: your/log/directory/0718-15-17-52-580 Trainable/total parameters of the model: 0.37M / 86.17M (0.43148%)
Run the following to prepare the optimizer, learning rate scheduler and loss function
optimizer = optim.Adam( model.parameters(), lr=args.lr, weight_decay=args.wd ) lr_scheduler = optim.lr_scheduler.CosineAnnealingLR( optimizer, args.max_epoch, eta_min=args.eta_min ) criterion = nn.CrossEntropyLoss()
Run the following to initialize the
trainer
, ready to start training:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, train_loader=train_loader, val_loader=val_loader, num_classes=num_classes, optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion )
Run the following to train and test with ZhiJian:
trainer.fit() trainer.test()
$ Epoch GPU Mem. Time Loss LR 1/5 7.16G 0.3105 4.629 0.001: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.66batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 7.16G 0.1188 3.334 14.02: 100%|██████████| 157/157 [00:18<00:00, 8.35batch/s] *** Best results: [Acc@1: 3.3339968152866244], [Acc@5: 14.022691082802547] Epoch GPU Mem. Time Loss LR 2/5 7.16G 0.2883 4.255 0.00090451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 2/5 7.16G 0.1182 4.22 16.28: 100%|██████████| 157/157 [00:18<00:00, 8.37batch/s] *** Best results: [Acc@1: 4.219745222929936], [Acc@5: 16.28184713375796] Epoch GPU Mem. Time Loss LR 3/5 7.16G 0.296 4.026 0.00065451: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.96batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 3/5 7.16G 0.1197 5.255 17.71: 100%|██████████| 157/157 [00:18<00:00, 8.28batch/s] *** Best results: [Acc@1: 5.254777070063694], [Acc@5: 17.70501592356688] Epoch GPU Mem. Time Loss LR 4/5 7.16G 0.2983 3.88 0.00034549: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.87batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 4/5 7.16G 0.1189 5.862 19.06: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s] *** Best results: [Acc@1: 5.8618630573248405], [Acc@5: 19.058519108280255] Epoch GPU Mem. Time Loss LR 5/5 7.16G 0.2993 3.811 9.5492e-05: 100%|██████████| 16.0/16.0 [00:04<00:00, 3.90batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 5/5 7.16G 0.119 5.723 19.39: 100%|██████████| 157/157 [00:18<00:00, 8.33batch/s] *** Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236] Epoch GPU Mem. Time Acc@1 Acc@5 1/1 7.16G 0.1192 5.723 19.39: 100%|██████████| 157/157 [00:18<00:00, 8.30batch/s] *** Best results: [Acc@1: 5.722531847133758], [Acc@5: 19.386942675159236]
Fine-tune a Custom Pre-Trained Model¶
🕶️ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
Overview¶
In the following example, we show how ZhiJian:
Construct a custom MLP
Tune with supervision on a cutom dataset
Infer to evaluate the performance
The figure below shows the three stages of our example. To run the following code, please click [Open In Colab].

Construct Custom Model¶
We fisrt begin with a three-layer Multilayer Perceptron (MLP).

Custom Multilayer Perceptron (MLP) Architecture¶
Although a multi-layer perceptron is not a good image learner, we can quickly get started with it. For other custom networks, we can also make similar designs and modifications by analogy.
Run the code block below to customize the model:
import torch.nn as nn
class MLP(nn.Module):
"""
MLP Class
==============
Multilayer Perceptron (MLP) model for image (224x224) classification tasks.
Args:
args (object): Custom arguments or configurations.
num_classes (int): Number of output classes.
"""
def __init__(self, args, num_classes):
super(MLP, self).__init__()
self.args = args
self.image_size = 224
self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, num_classes)
def forward(self, x):
"""
Forward pass of the model.
Args:
x (torch.Tensor): Input tensor.
Returns:
torch.Tensor: Output logits.
"""
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = nn.ReLU()(x)
x = self.fc2(x)
x = nn.ReLU()(x)
x = self.fc3(x)
return x
Next, run the code block below to configure the GPU and the model:
model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')]) model = ModelWrapper(model) model_args = dict2args({'hidden_size': 512})
Now, run the code block below to prepare the
trainer
with passing in the parametermodel
:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, ... ) trainer.fit() trainer.test()
Prepare Custom Dataset¶
Configure without dataset configuration and organize the custom dataset in the following structure:
within the
your/dataset/dir
directorycreate a separate folder for each category
store all the data corresponding to each category within its respective folder
/your/dataset/directory ├── train │ ├── class_1 │ │ ├── train_class_1_img_1.jpg │ │ ├── train_class_1_img_2.jpg │ │ ├── train_class_1_img_3.jpg │ │ └── ... │ ├── class_2 │ │ ├── train_class_2_img_1.jpg │ │ └── ... │ ├── class_3 │ │ └── ... │ ├── class_4 │ │ └── ... │ ├── class_5 │ │ └── ... └── test ├── class_1 │ ├── test_class_1_img_1.jpg │ ├── test_class_1_img_2.jpg │ ├── test_class_1_img_3.jpg │ └── ... ├── class_2 │ ├── test_class_2_img_1.jpg │ └── ... ├── class_3 │ └── ... ├── class_4 │ └── ... └── class_5 └── ...
Set up the custom dataset:
train_transform = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) val_transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) ]) train_dataset = ImageFolder(root='/your/dataset/directory/train', transform=train_transform) val_dataset = ImageFolder(root='/your/dataset/directory/test', transform=val_transform)
Implement the corresponding loader:
train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, num_workers=args.num_workers, pin_memory=True, shuffle=True ) val_loader = torch.utils.data.DataLoader( val_dataset, batch_size=args.batch_size, num_workers=args.num_workers, pin_memory=True, shuffle=False ) num_classes = len(train_dataset.classes)
Advanced: Extended Structure¶
🛠️ZhiJian is an unifying and rapidly deployable toolbox for pre-trained model reuse.
Overview¶
In the following example, we show how ZhiJian:
Customize your own pre-trained model for new ideas of structure
Tailor and integrate any add-in extra module within the vast pre-trained model with lightning speed

This chapter may involve more advanced configuration.
Introduce the Custom Model¶
Let’s begin with a three-layer Multilayer Perceptron (MLP).
Run the code block below to customize the model:
import torch.nn as nn
class MLP(nn.Module):
"""
MLP Class
==============
Multilayer Perceptron (MLP) model for image (224x224) classification tasks.
Args:
args (object): Custom arguments or configurations.
num_classes (int): Number of output classes.
"""
def __init__(self, args, num_classes):
super(MLP, self).__init__()
self.args = args
self.image_size = 224
self.fc1 = nn.Linear(self.image_size * self.image_size * 3, 256)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, num_classes)
def forward(self, x):
"""
Forward pass of the model.
Args:
x (torch.Tensor): Input tensor.
Returns:
torch.Tensor: Output logits.
"""
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = nn.ReLU()(x)
x = self.fc2(x)
x = nn.ReLU()(x)
x = self.fc3(x)
return x

Custom Multilayer Perceptron (MLP) Architecture¶
Now, expand models from a moment of inspiration, do as you please.
We will customize and modify the network structure through a few lines of code from ZhiJian. This additional structures are also implemented based on the PyTorch framework, and inherit the base class AddinBase, which integrates some basic methods for data access.
In the following paragraphs, we introduce the components of the extended structure, they are:
1. Main forward function.
2. Entry points to guide inputs
3. Configuration syntax for entry point.
Design Additional Add-in Modules¶
Run the code block below to customize add-in modules and entry points for the model.
class MLPAddin(AddinBase):
"""
MLPAddin Class
==============
Multilayer Perceptron (MLP) add-in.
Args:
config (object): Custom configuration or arguments.
model_config (object): Configuration specific to the model.
"""
def __init__(self, config, model_config):
super(MLPAddin, self).__init__()
self.config = config
self.embed_dim = model_config.hidden_size
self.reduction_dim = 16
self.fc1 = nn.Linear(self.embed_dim, self.reduction_dim)
if config.mlp_addin_output_size is not None:
self.fc2 = nn.Linear(self.reduction_dim, config.mlp_addin_output_size)
else:
self.fc2 = nn.Linear(self.reduction_dim, self.embed_dim)
def forward(self, x):
"""
Forward pass of the MLP add-in.
Args:
x (tensor): Input tensor.
Returns:
tensor: Output tensor after passing through the MLP add-in.
"""
identity = x
out = self.fc1(identity)
out = nn.ReLU()(out)
out = self.fc2(out)
return out
def adapt_input(self, module, inputs):
"""
Hook function to adapt the input data before it enters the module.
Args:
module (nn.Module): The module being hooked.
inputs (tuple): (Inputs before the module,).
Returns:
tensor: Adapted input tensor after passing through the MLP add-in.
"""
x = inputs[0]
return self.forward(x)
def adapt_output(self, module, inputs, outputs):
"""
Hook function to adapt the output data after it leaves the module.
Args:
module (nn.Module): The module being hooked.
inputs (tuple): (Inputs before the module,).
outputs (tensor): Outputs after the module.
Returns:
tensor: Adapted output tensor after passing through the MLP add-in.
"""
return self.forward(outputs)
def adapt_across_input(self, module, inputs):
"""
Hook function to adapt the data across the modules.
Args:
module (nn.Module): The module being hooked.
inputs (tuple): (Inputs before the module,).
Returns:
tensor: Adapted input tensor after adding the MLP add-in output to the subsequent module.
"""
x = inputs[0]
x = x + self.forward(self.inputs_cache)
return x
def adapt_across_output(self, module, inputs, outputs):
"""
Hook function to adapt the data across the modules.
Args:
module (nn.Module): The module being hooked.
inputs (tuple): (Inputs before the module,).
outputs (tensor): Outputs after the module.
Returns:
tensor: Adapted input tensor after adding the MLP add-in output to the previous module.
"""
outputs = outputs + self.forward(self.inputs_cache)
return outputs
Main forward function¶
In the extended auxiliary structure MLPAddin
mentioned above, we add a low-rank bottleneck (consisting of two linear layers, with a reduced dimension in the middle) inspired by efficient parameter methods like Adapter or LoRA.
We define and implement this as in the __init__
and forward
functions. The data goes through this structure and executes via the forward
function.

Additional Auxiliary Structure Example¶
Entry points to guide inputs¶
As shown above, the hook methods starting with adapt_ are our entry points (functions) to guide the input data. They serve as hooks to attach the extended modules to the base model.
They are roughly divided into two categories:
guide data input before the modules
direct data output after the modules
These are generally closely associated with the forward
function, and the data enters extended structures through these entry points. We will further explain their roles in the following Configuration Syntax section.
Config Syntax of Entry Points¶
We aim to customize our model by inter-layer insertion and cross-layer concatenation of the auxiliary structures at different positions within the base model (such as the custom MLP mentioned earlier). When configuring the insertion or concatenation positions, ZhiJian provides a minimalistic one-line configuration syntax.
The syntax for configuring add-in module into the base model is as follows. We will start with one or two examples and gradually understand the meaning of each configuration part.
Inter-layer Insertion:
>>> (MLPAddin.adapt_input): ...->{inout1}(fc2)->...
Additional Add-in Structure - Inter-layer Insertion 1¶
>>> (MLPAddin.adapt_input): ...->(fc2){inout1}->...
Additional Add-in Structure - Inter-layer Insertion 2¶
Cross-layer Concatenation:
>>> (MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{out1}(fc3)->...
Additional Add-in Structure - Cross-layer Concatenation¶
Base Module: ->(fc1)
¶
Consider a base model implemented based on the PyTorch framework, where the representation of each layer and module in the model is straightforward:
As shown in the figure, the print command can output the defined names of the model structure:
print(model)
The structure of some classic backbone can be represented as follows
MLP:
>>> input->(fc1)->(fc2)->(fc3)->output
ViT
block[i]`
:>>> input->...->(block[i].norm1)-> (block[i].attn.qkv)->(block[i].attn.attn_drop)->(block[i].attn.proj)->(block[i].attn.proj_drop)-> (block[i].ls1)->(block[i].drop_path1)-> (block[i].norm2)-> (block[i].mlp.fc1)->(block[i].mlp.act)->(block[i].mlp.drop1)->(block[i].mlp.fc2)->(block[i].mlp.drop2)-> (block[i].ls2)->(block[i].drop_path2)->...->output
Default Module: ...
¶
In the configuration syntax of ZhiJian, the ...
can be used to represent the default layer or module.
For example, when we only focus on the
(fc2)
module in MLP and the(block[i].mlp.fc2)
module in ViT:MLP:
>>> ...->(fc2)->...
ViT:
>>> ...->(block[i].mlp.fc2)->...
Insertion & Concatenation Function: ():
¶
Considering the custom auxiliary structure MLPAddin
mentioned above, the functions starting with adapt_
will serve as the processing center that insert and concatenate into the base model.
There are primarily two types of parameter passing methods:
def adapt_input(self, module, inputs): """ Args: module (nn.Module): The module being hooked. inputs (tuple): (Inputs before the module,). """ ... def adapt_output(self, module, inputs, outputs): """ Args: module (nn.Module): The module being hooked. inputs (tuple): (Inputs before the module,). outputs (tensor): Outputs after the module. """ ...
where
adapt_input(self, module, inputs)
is generally set before the module and is called before the data enters the module to process inputs and truncate theinput
.adapt_output(self, module, inputs, outputs)
is generally set before the module and is called before the data enters the module to process outputs and truncate theoutput
.
These functions will be “hooked” into the base model in the main method of configuring the module, serving as key connectors between the base model and the auxiliary structure.
Insertion & Concatenation Point: {}
¶
Consider an independent extended auxiliary structure (such as the MLPAddin
mentioned above), its insertion or concatenation points with the base network must consist of “Data Input” and “Data Output” where:
“Data Input” refers to the network features input into the extended auxiliary structure.
“Data Output” refers to the adapted features output from the auxiliary structure back to the base network.
Next, let’s use some configuration examples of MLP to illustrate the syntax and functionality of ZhiJian for module integration:
Inter-layer Insertion: inout
¶
As shown in the above Fig. 5, the configuration expression is:
>>> (MLPAddin.adapt_input): ...->{inout1}(fc2)->...
where
{inout1}
refers to the position which gets the base model features (or output, at any layer or module).It denotes the “Data Input” and “Data Output”. The configuration can be
{inoutx}
, wherex
represents the xth integration point. For example,{inout1}
represents the first integration point.In the example above, this inter-layer insertion configuration truncates the features of the input
fc2
module, passes them through, and then return to thefc2
module. At this point, the originalfc2
features no longer enter.
Cross-layer Concatenation in
, out
¶
As shown in the above Fig. 7, the configuration expression is:
>>> (MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{out1}(fc3)->...`
where
{in1}
: represents the integration point where the base network features (or output, at any layer or module) enter the additional add-in structure.It denotes the “Data Input”. The configuration can be
{inx}
, wherex
represents the xth integration point. For example,{in1}
represents the first integration point.{out1}
: represent the integration points where the features processed by the additional add-in structure are returned to the base network.It denotes the “Data Output”. The configuration can be
{outx}
, wherex
represents the xth integration point. For example,{out1}
represents the first integration point.This cross-layer concatenation configuration extracts the features of the
fc1
module’s output, passes them into the auxiliary structure, and then returns them to the base network before thefc3
module in the form of residual addition.
For a better prompt, let’s create a tool function that guides the input first:
def select_from_input(prompt_for_select, valid_selections): selections2print = '\n\t'.join([f'[{idx + 1}] {i}' for idx, i in enumerate(valid_selections)]) while True: selected = input(f"Please input a {prompt_for_select}, type 'help' to show the options: ") if selected == 'help': print(f"Available {prompt_for_select}(s):\n\t{selections2print}") elif selected.isdigit() and int(selected) >= 1 and int(selected) <= len(valid_selections): selected = valid_selections[int(selected) - 1] break elif selected in valid_selections: break else: print("Sorry, input not support.") print(f"Available {prompt_for_select}(s):\n\t{selections2print}") return selected available_example_config_blitzs = { 'Insert between `fc1` and `fc2` layer (performed before `fc2`)': "(MLPAddin.adapt_input): ...->{inout1}(fc2)->...", 'Insert between `fc1` and `fc2` layer (performed after `fc1`)': "(MLPAddin.adapt_output): ...->(fc1){inout1}->...", 'Splice across `fc2` layer (performed before `fc2` and `fc3`)': "(MLPAddin.adapt_across_input): ...->{in1}(fc2)->{out1}(fc3)->...", 'Splice across `fc2` layer (performed after `fc1` and before `fc3`)': "(MLPAddin.adapt_across_input): ...->(fc1){in1}->...->{in2}(fc3)->...", 'Splice across `fc2` layer (performed before and after `fc2`)': "(MLPAddin.adapt_across_output): ...->{in1}(fc2){in2}->...", 'Splice across `fc2` layer (performed after `fc1` and `fc2`)': "(MLPAddin.adapt_across_output): ...->(fc1){in1}->(fc2){in2}->...", } config_blitz = select_from_input('add-in structure', available_example_config_blitzs.keys()) # user input about model
$ Available dataset(s): [1] VTAB-1k.CIFAR-100 [2] VTAB-1k.CLEVR-Count [3] VTAB-1k.CLEVR-Distance [4] VTAB-1k.Caltech101 [5] VTAB-1k.DTD [6] VTAB-1k.Diabetic-Retinopathy [7] VTAB-1k.Dmlab [8] VTAB-1k.EuroSAT [9] VTAB-1k.KITTI [10] VTAB-1k.Oxford-Flowers-102 [11] VTAB-1k.Oxford-IIIT-Pet [12] VTAB-1k.PatchCamelyon [13] VTAB-1k.RESISC45 [14] VTAB-1k.SUN397 [15] VTAB-1k.SVHN [16] VTAB-1k.dSprites-Location [17] VTAB-1k.dSprites-Orientation [18] VTAB-1k.smallNORB-Azimuth [19] VTAB-1k.smallNORB-Elevation Your selection: VTAB-1k.CIFAR-100 Your dataset directory: /data/zhangyk/data/zhijian
Next, we will configure the parameters and proceed with model training and testing:
args = get_args( model='timm.vit_base_patch16_224_in21k', # backbone network config_blitz=config_blitz, # addin blitz configuration dataset='VTAB.cifar', # dataset dataset_dir='your/dataset/directory', # dataset directory training_mode='finetune', # training mode optimizer='adam', # optimizer lr=1e-2, # learning rate wd=1e-5, # weight decay verbose=True # control the verbosity of the output ) pprint(vars(args))
$ {'aa': None, 'addins': [{'hook': [['get_pre', 'pre'], ['adapt_across_output', 'post']], 'location': [['fc2'], ['fc2']], 'name': 'MLPAddin'}], 'amp': False, 'amp_dtype': 'float16', 'amp_impl': 'native', 'aot_autograd': False, 'aug_repeats': 0, 'aug_splits': 0, 'batch_size': 64, 'bce_loss': False, ... 'warmup_epochs': 5, 'warmup_lr': 1e-05, 'warmup_prefix': False, 'wd': 5e-05, 'weight_decay': 2e-05, 'worker_seeding': 'all'}
Run the code block below to configure the GPU and the model (excluding additional auxiliary structures):
assert torch.cuda.is_available() os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu torch.cuda.set_device(int(args.gpu)) model = MLP(args, DATASET2NUM_CLASSES[args.dataset.replace('VTAB.','')]) model = ModelWrapper(model) model_args = dict2args({'hidden_size': 512})
Run the code block below to configure additional auxiliary structures:
args.mlp_addin_output_size = 256 addins, fixed_params = prepare_addins(args, model_args, addin_classes=[MLPAddin]) prepare_hook(args.addins, addins, model, 'addin') prepare_gradient(args.reuse_keys, model) device = prepare_cuda(model)
Run the code block below to configure the dataset, optimizer, loss function, and other settings:
train_loader, val_loader, num_classes = prepare_vision_dataloader(args, model_args) optimizer = optim.Adam( model.parameters(), lr=args.lr, weight_decay=args.wd ) lr_scheduler = optim.lr_scheduler.CosineAnnealingLR( optimizer, args.max_epoch, eta_min=args.eta_min ) criterion = nn.CrossEntropyLoss()
Run the code block below to prepare the
trainer
object and start training and testing:trainer = prepare_trainer( args, model=model, model_args=model_args, device=device, train_loader=train_loader, val_loader=val_loader, num_classes=num_classes, optimizer=optimizer, lr_scheduler=lr_scheduler, criterion=criterion ) trainer.fit() trainer.test()
$ Log level set to: INFO Log files are recorded in: your/log/directory/0718-19-52-36-748 Trainable/total parameters of the model: 0.03M / 38.64M (0.08843%) Epoch GPU Mem. Time Loss LR 1/5 0.589G 0.1355 4.602 0.001: 100%|██████████| 16.0/16.0 [00:01<00:00, 12.9batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 0.629G 0.03114 1.871 7.932: 100%|██████████| 157/157 [00:05<00:00, 30.9batch/s] *** Best results: [Acc@1: 1.8710191082802548], [Acc@5: 7.931926751592357] Epoch GPU Mem. Time Loss LR 2/5 0.784G 0.1016 4.538 0.00090451: 100%|██████████| 16.0/16.0 [00:00<00:00, 19.4batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 2/5 0.784G 0.02669 2.498 9.504: 100%|██████████| 157/157 [00:04<00:00, 35.9batch/s] *** Best results: [Acc@1: 2.4980095541401273], [Acc@5: 9.504378980891719] Epoch GPU Mem. Time Loss LR 3/5 0.784G 0.09631 4.488 0.00065451: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.6batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 3/5 0.784G 0.02688 2.379 10.16: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s] *** Best results: [Acc@1: 2.3785828025477707], [Acc@5: 10.161226114649681] Epoch GPU Mem. Time Loss LR 4/5 0.784G 0.09126 4.45 0.00034549: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.2batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 4/5 0.784G 0.02644 2.468 10.29: 100%|██████████| 157/157 [00:04<00:00, 36.2batch/s] *** Best results: [Acc@1: 2.468152866242038], [Acc@5: 10.290605095541402] Epoch GPU Mem. Time Loss LR 5/5 0.784G 0.0936 4.431 9.5492e-05: 100%|██████████| 16.0/16.0 [00:00<00:00, 20.5batch/s] Epoch GPU Mem. Time Acc@1 Acc@5 5/5 0.784G 0.02706 2.558 10.43: 100%|██████████| 157/157 [00:04<00:00, 35.8batch/s] *** Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484] Epoch GPU Mem. Time Acc@1 Acc@5 1/5 0.784G 0.02667 2.558 10.43: 100%|██████████| 157/157 [00:04<00:00, 36.0batch/s] *** Best results: [Acc@1: 2.557722929936306], [Acc@5: 10.429936305732484]
Advanced: Knowledge Transfer¶
🛠️Advanced: Model Merging¶
🛠️zhijian.args¶
Base Args¶
Preprocess¶
Args of Pre-trained Model¶
Args of Architect¶
Args of Tuner¶
Args of Merger¶
zhijian.models¶
Prepare Pre-trained Model¶
Switch to GPU¶
Adjust Which Part of the Parameters to Fine-tune¶
zhijian.data¶
Dataset¶
zhijian.trainer¶
Base Trainer¶
Architect Module¶
Prepare External Structure¶
Add External Structure to Pre-trained Model¶
Tuner Module¶
Knowledge Matching and Transfer¶
Regularization Constraints¶
Merger Module¶
Merge Trained Parameters¶
Contributing to ZhiJian¶
To submit a Pull Request (PR) to the ZhiJian project, follow these simple steps:
Fork the project by clicking the “Fork” button at the top right corner of the page.
Upload and modify the necessary files.
Submit your Pull Request:
Go to your Forked project page and click the “Pull Request” button.
Select the branch with your changes and the main project’s branch on the comparison page.
Provide a brief description and any additional details about your changes.
Click the “Create Pull Request” button to submit your PR.
If you encounter any issues or need further assistance, please feel free to contact us at yumzhangyk@gmail.com.
Thank you for your contribution! We will review your PR and provide feedback as soon as possible.
Contributors¶
We sincerely appreciate and encourage contributions to enhance the development of ZhiJian. Presented below is a partial list of our esteemed contributors (for a more comprehensive list, please refer to here).