Update Logs and Known Issues¶

Version 0.3.2¶

We improve the docs.
We support BMTrain to accelerate the training, and parallelize the training of models that are hard to fit in a single GPU. Check tutorial/2_with_bmtrain.py
We add a functionality to inspect the optimizer. The user can see the number of trainable parameters in the optimizer and verify that opendelta is being used correctly.
We move the functions to inspect the delta models into inspect.py

We update must_try.py for a simple introduction of the core functionality of OpenDelta.
Thanks to Weilin Zhao We merge a long-developed branch parallel_adapter into the main branch.

Add this changelog for a granular record of updates.
The default configuration of delta models can be applied to more wrapped models.
- There is less need to configure ‘modified_modules’ for wrapped models like BertForSequenceClassification or even OpenMatch.DRModel, as long as it has a model we support default configuration inside. Note that if you customize modified_modules by yourself, most pytorch models are supported.
LoRA and BitFit models now does not need pseudo data to instantiate the model.
BitFit models can now support Conv1D using default configuration.
Improve type hint for AutoDeltaModel.
Fix bugs in documentation.
Fix small bugs when saving a model without a config attributes.
Make the default modified modules of adapter-like methods more accurate: attach the adapter-like modules after the output of attention layer and second feed-forward layer, both before the layernorm layers.
A simple unit test folder containing development-time tests has been added for interested users.

SoftPrompt is still not supported for wrapped model if the model has no attribute get_input_embeddings.
Prefix Tuning is still limited to T5, GPT2, Bart, Bert, Roberta.

examples/examples_seq2seq and examples/examples_text-classification is depreciated and moved to legacy
Thanks to Zhen Zhang, we provide examples_prompt, as a cleaner and more general framework, which unifies the delta tuning paradigm and the prompt-tuning paradigm. It is still based on Huggingface Trainers. In this example framework, the running pipeline is a unified script, the differences in tasks, models, delta tuning models, and even prompt-tuning paradigms are more modular and be more independent . Please try it out!