[ PROMPT_NODE_22405 ]
Fine Tuning Axolotl – API
[ SKILL_DOCUMENTATION ]
# Axolotl - Api
**Pages:** 150
---
## cli.cloud.modal_
**URL:** https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html
**Contents:**
- cli.cloud.modal_
- Classes
- ModalCloud
- Functions
- run_cmd
Modal Cloud support from CLI
Modal Cloud implementation.
Run a command inside a folder, with Modal Volume reloading before and commit on success.
**Examples:**
Example 1 (python):
```python
cli.cloud.modal_.ModalCloud(config, app=None)
```
Example 2 (python):
```python
cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)
```
---
## core.trainers.base
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.base.html
**Contents:**
- core.trainers.base
- Classes
- AxolotlTrainer
- Methods
- log
- Parameters
- push_to_hub
- store_metrics
- Parameters
Module for customized trainers
Extend the base Trainer for axolotl helpers
Log logs on the various objects watching training, including stored metrics.
Overwrite the push_to_hub method in order to force-add the tags when pushing the model on the Hub. Please refer to ~transformers.Trainer.push_to_hub for more details.
Store metrics with specified reduction type.
**Examples:**
Example 1 (python):
```python
core.trainers.base.AxolotlTrainer(
*_args,
bench_data_collator=None,
eval_data_collator=None,
dataset_tags=None,
**kwargs,
)
```
Example 2 (python):
```python
core.trainers.base.AxolotlTrainer.log(logs, start_time=None)
```
Example 3 (python):
```python
core.trainers.base.AxolotlTrainer.push_to_hub(*args, **kwargs)
```
Example 4 (python):
```python
core.trainers.base.AxolotlTrainer.store_metrics(
metrics,
train_eval='train',
reduction='mean',
)
```
---
## prompt_strategies.input_output
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html
**Contents:**
- prompt_strategies.input_output
- Classes
- RawInputOutputPrompter
- RawInputOutputStrategy
prompt_strategies.input_output
Module for plain input/output prompt pairs
prompter for raw i/o data
Prompt Strategy class for input/output pairs
**Examples:**
Example 1 (python):
```python
prompt_strategies.input_output.RawInputOutputPrompter()
```
Example 2 (python):
```python
prompt_strategies.input_output.RawInputOutputStrategy(
*args,
eos_token=None,
**kwargs,
)
```
---
## prompt_strategies.completion
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html
**Contents:**
- prompt_strategies.completion
- Classes
- CompletionPromptTokenizingStrategy
- CompletionPrompter
prompt_strategies.completion
Basic completion text
Tokenizing strategy for Completion prompts.
Prompter for completion
**Examples:**
Example 1 (python):
```python
prompt_strategies.completion.CompletionPromptTokenizingStrategy(
*args,
max_length=None,
**kwargs,
)
```
Example 2 (python):
```python
prompt_strategies.completion.CompletionPrompter()
```
---
## utils.collators.core
**URL:** https://docs.axolotl.ai/docs/api/utils.collators.core.html
**Contents:**
- utils.collators.core
basic shared collator constants
---
## monkeypatch.data.batch_dataset_fetcher
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html
**Contents:**
- monkeypatch.data.batch_dataset_fetcher
- Functions
- apply_multipack_dataloader_patch
- patch_fetchers
- patched_worker_loop
- remove_multipack_dataloader_patch
monkeypatch.data.batch_dataset_fetcher
Monkey patches for the dataset fetcher to handle batches of packed indexes.
This patch allows DataLoader to correctly process batches that contain multiple bins of packed sequences.
Apply patches to PyTorch’s DataLoader components.
Worker loop that ensures patches are applied in worker processes.
Remove the monkeypatch and restore original PyTorch DataLoader behavior.
**Examples:**
Example 1 (python):
```python
monkeypatch.data.batch_dataset_fetcher.apply_multipack_dataloader_patch()
```
Example 2 (python):
```python
monkeypatch.data.batch_dataset_fetcher.patch_fetchers()
```
Example 3 (python):
```python
monkeypatch.data.batch_dataset_fetcher.patched_worker_loop(*args, **kwargs)
```
Example 4 (python):
```python
monkeypatch.data.batch_dataset_fetcher.remove_multipack_dataloader_patch()
```
---
## core.datasets.chat
**URL:** https://docs.axolotl.ai/docs/api/core.datasets.chat.html
**Contents:**
- core.datasets.chat
- Classes
- TokenizedChatDataset
Tokenized chat dataset
**Examples:**
Example 1 (python):
```python
core.datasets.chat.TokenizedChatDataset(
data,
model_transform,
*args,
message_transform=None,
formatter=None,
process_count=None,
keep_in_memory=False,
**kwargs,
)
```
---
## utils.freeze
**URL:** https://docs.axolotl.ai/docs/api/utils.freeze.html
**Contents:**
- utils.freeze
- Classes
- LayerNamePattern
- Methods
- match
- Functions
- freeze_layers_except
module to freeze/unfreeze parameters by name
Represents a regex pattern for layer names, potentially including a parameter index range.
Checks if the given layer name matches the regex pattern.
Parameters: - name (str): The layer name to check.
Returns: - bool: True if the layer name matches the pattern, False otherwise.
Freezes all layers of the given model except for the layers that match given regex patterns. Periods in the patterns are treated as literal periods, not as wildcard characters.
Parameters: - model (nn.Module): The PyTorch model to be modified. - regex_patterns (list of str): List of regex patterns to match layer names to keep unfrozen. Note that you cannot use a dot as a wildcard character in the patterns since it is reserved for separating layer names. Also, to match the entire layer name, the pattern should start with “^” and end with “(", otherwise it will match any part of the layer name. The range pattern part is optional and it is not compiled as a regex pattern which means you must put ")” before the range pattern if you want to match the entire layer name. E.g., [“^model.embed_tokens.weight([:32000]", "layers.2[0-9]+.block_sparse_moe.gate.[a-z]+)”]
Returns: None; the model is modified in place.
**Examples:**
Example 1 (python):
```python
utils.freeze.LayerNamePattern(pattern)
```
Example 2 (python):
```python
utils.freeze.LayerNamePattern.match(name)
```
Example 3 (python):
```python
utils.freeze.freeze_layers_except(model, regex_patterns)
```
---
## monkeypatch.unsloth_
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html
**Contents:**
- monkeypatch.unsloth_
module for patching with unsloth optimizations
---
## utils.schemas.datasets
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html
**Contents:**
- utils.schemas.datasets
- Classes
- DPODataset
- KTODataset
- PretrainingDataset
- SFTDataset
- Methods
- handle_legacy_message_fields
- StepwiseSupervisedDataset
- UserDefinedDPOType
utils.schemas.datasets
Pydantic models for datasets-related configuration
DPO configuration subset
KTO configuration subset
Pretraining dataset configuration subset
SFT configuration subset
Handle backwards compatibility between legacy message field mapping and new property mapping system.
Stepwise supervised dataset configuration subset
User defined typing for DPO
User defined typing for KTO
Structure for user defined prompt types
**Examples:**
Example 1 (python):
```python
utils.schemas.datasets.DPODataset()
```
Example 2 (python):
```python
utils.schemas.datasets.KTODataset()
```
Example 3 (python):
```python
utils.schemas.datasets.PretrainingDataset()
```
Example 4 (python):
```python
utils.schemas.datasets.SFTDataset()
```
---
## core.chat.format.llama3x
**URL:** https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html
**Contents:**
- core.chat.format.llama3x
core.chat.format.llama3x
Llama 3.x chat formatting functions for MessageContents
---
## datasets
**URL:** https://docs.axolotl.ai/docs/api/datasets.html
**Contents:**
- datasets
- Classes
- TokenizedPromptDataset
- Parameters
Module containing dataset functionality.
We want this to be a wrapper for an existing dataset that we have loaded. Lets use the concept of middlewares to wrap each dataset. We’ll use the collators later on to pad the datasets.
Dataset that returns tokenized prompts from a stream of text files.
**Examples:**
Example 1 (python):
```python
datasets.TokenizedPromptDataset(
prompt_tokenizer,
dataset,
process_count=None,
keep_in_memory=False,
**kwargs,
)
```
---
## prompt_strategies.bradley_terry.llama3
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html
**Contents:**
- prompt_strategies.bradley_terry.llama3
- Functions
- icr
prompt_strategies.bradley_terry.llama3
chatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template
chatml transforms for datasets with system, input, chosen, rejected ex. https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
**Examples:**
Example 1 (python):
```python
prompt_strategies.bradley_terry.llama3.icr(cfg, **kwargs)
```
---
## common.datasets
**URL:** https://docs.axolotl.ai/docs/api/common.datasets.html
**Contents:**
- common.datasets
- Classes
- TrainDatasetMeta
- Functions
- load_datasets
- Parameters
- Returns
- load_preference_datasets
- Parameters
- Returns
Dataset loading utilities.
Dataclass with fields for training and validation datasets and metadata.
Loads one or more training or evaluation datasets, calling axolotl.utils.data.prepare_datasets. Optionally, logs out debug information.
Loads one or more training or evaluation datasets for RL training using paired preference data, calling axolotl.utils.data.rl.prepare_preference_datasets. Optionally, logs out debug information.
Randomly sample num_samples samples with replacement from dataset.
**Examples:**
Example 1 (python):
```python
common.datasets.TrainDatasetMeta(
train_dataset,
eval_dataset=None,
total_num_steps=None,
)
```
Example 2 (python):
```python
common.datasets.load_datasets(cfg, cli_args=None, debug=False)
```
Example 3 (python):
```python
common.datasets.load_preference_datasets(cfg, cli_args=None)
```
Example 4 (python):
```python
common.datasets.sample_dataset(dataset, num_samples)
```
---
## cli.train
**URL:** https://docs.axolotl.ai/docs/api/cli.train.html
**Contents:**
- cli.train
- Functions
- do_cli
- Parameters
- do_train
- Parameters
CLI to run training on a model.
Parses axolotl config, CLI args, and calls do_train.
Trains a transformers model by first loading the dataset(s) specified in the axolotl config, and then calling axolotl.train.train. Also runs the plugin manager’s post_train_unload once training completes.
**Examples:**
Example 1 (python):
```python
cli.train.do_cli(config=Path('examples/'), **kwargs)
```
Example 2 (python):
```python
cli.train.do_train(cfg, cli_args)
```
---
## cli.utils.fetch
**URL:** https://docs.axolotl.ai/docs/api/cli.utils.fetch.html
**Contents:**
- cli.utils.fetch
- Functions
- fetch_from_github
- Parameters
Utilities for axolotl fetch CLI command.
Sync files from a specific directory in the GitHub repository. Only downloads files that don’t exist locally or have changed.
**Examples:**
Example 1 (python):
```python
cli.utils.fetch.fetch_from_github(dir_prefix, dest_dir=None, max_workers=5)
```
---
## utils.tokenization
**URL:** https://docs.axolotl.ai/docs/api/utils.tokenization.html
**Contents:**
- utils.tokenization
- Functions
- color_token_for_rl_debug
- process_tokens_for_rl_debug
Module for tokenization utilities
Helper function to color tokens based on their type.
Helper function to process and color tokens.
**Examples:**
Example 1 (python):
```python
utils.tokenization.color_token_for_rl_debug(
decoded_token,
encoded_token,
color,
text_only,
)
```
Example 2 (python):
```python
utils.tokenization.process_tokens_for_rl_debug(
tokens,
color,
tokenizer,
text_only,
)
```
---
## core.trainers.grpo.sampler
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html
**Contents:**
- core.trainers.grpo.sampler
- Classes
- SequenceParallelRepeatRandomSampler
- Parameters
- Methods
- set_epoch
- Parameters
core.trainers.grpo.sampler
Repeat random sampler (similar to the one implemented in https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py) that adds sequence parallelism functionality; i.e., duplicating data across ranks in the same sequence parallel group.
Sampler for GRPO training with sequence parallelism.
This sampler ensures: - Ranks in the same sequence parallel (SP) group receive identical data. - Each index is repeated multiple times for sampling different completions. - Entire batches are repeated for reuse in multiple updates. - Data is properly distributed across SP groups.
In the table below, the values represent dataset indices. Each SP group has context_parallel_size = 2 GPUs working together on the same data. There are 2 SP groups (SP0 and SP1), with world_size = 4 total GPUs.
grad_accum=2 ▲ ▲ 0 0 [0 0 0 1 1 1] [2 2 2 3 3 3] <- SP groups get different data ▼ | 0 1 [0 0 0 1 1 1] [2 2 2 3 3 3] <- Same data for each SP group GPU | | 1 2 [0 0 0 1 1 1] [2 2 2 3 3 3] <- Repeat same indices for iterations num_iterations=2 ▼ 1 3 [0 0 0 1 1 1] [2 2 2 3 3 3] <- When using gradient accumulation
Sets the epoch for this sampler.
**Examples:**
Example 1 (python):
```python
core.trainers.grpo.sampler.SequenceParallelRepeatRandomSampler(
dataset,
mini_repeat_count,
world_size,
rank,
batch_size=1,
repeat_count=1,
context_parallel_size=1,
shuffle=True,
seed=0,
drop_last=False,
)
```
Example 2 (unknown):
```unknown
Sequence Parallel Groups
| SP0 | SP1 |
| GPU 0 | GPU 1 | GPU 2 | GPU 3 |
global_step step mini_repeat_count=3
batch_size=2 per SP group
```
Example 3 (unknown):
```unknown
2 4 [4 4 4 5 5 5] [6 6 6 7 7 7] 0.43.3).
**Examples:**
Example 1 (python):
```python
kernels.quantize.dequantize(W, quant_state=None, out=None)
```
---
## integrations.spectrum.args
**URL:** https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html
**Contents:**
- integrations.spectrum.args
- Classes
- SpectrumArgs
integrations.spectrum.args
Module for handling Spectrum input arguments.
Input args for Spectrum.
**Examples:**
Example 1 (python):
```python
integrations.spectrum.args.SpectrumArgs()
```
---
## prompt_strategies.alpaca_chat
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html
**Contents:**
- prompt_strategies.alpaca_chat
- Classes
- AlpacaChatPrompter
- AlpacaConcisePrompter
- AlpacaQAPromptTokenizingStrategy
- CamelAIPromptTokenizingStrategy
- NoSystemPrompter
prompt_strategies.alpaca_chat
Module for Alpaca prompt strategy classes
Alpaca Chat Prompter extending the system prompt to for chat-instruct answers
Alpaca Prompter extending the system prompt to ask for concise chat-instruct answers
Tokenizing strategy for AlpacaQA
Tokenizing strategy for CamelAI datasets
Null Prompter with no system prompts
**Examples:**
Example 1 (python):
```python
prompt_strategies.alpaca_chat.AlpacaChatPrompter()
```
Example 2 (python):
```python
prompt_strategies.alpaca_chat.AlpacaConcisePrompter(
prompt_style=PromptStyle.INSTRUCT.value,
)
```
Example 3 (python):
```python
prompt_strategies.alpaca_chat.AlpacaQAPromptTokenizingStrategy(
prompter,
tokenizer,
train_on_inputs=False,
sequence_len=2048,
)
```
Example 4 (python):
```python
prompt_strategies.alpaca_chat.CamelAIPromptTokenizingStrategy(
prompter,
tokenizer,
train_on_inputs=False,
sequence_len=2048,
)
```
---
## utils.collators.mamba
**URL:** https://docs.axolotl.ai/docs/api/utils.collators.mamba.html
**Contents:**
- utils.collators.mamba
- Classes
- MambaDataCollator
utils.collators.mamba
Collator for State Space Models (Mamba)
**Examples:**
Example 1 (python):
```python
utils.collators.mamba.MambaDataCollator(tokenizer)
```
---
## prompt_strategies.messages.chat
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html
**Contents:**
- prompt_strategies.messages.chat
- Classes
- ChatMessageDatasetWrappingStrategy
prompt_strategies.messages.chat
Chat dataset wrapping strategy for new internal messages representations
Chat dataset wrapping strategy for new internal messages representations
**Examples:**
Example 1 (python):
```python
prompt_strategies.messages.chat.ChatMessageDatasetWrappingStrategy(
processor,
message_transform=None,
formatter=None,
**kwargs,
)
```
---
## train
**URL:** https://docs.axolotl.ai/docs/api/train.html
**Contents:**
- train
- Functions
- create_model_card
- Parameters
- execute_training
- Parameters
- handle_untrained_tokens_fix
- Parameters
- save_initial_configs
- Parameters
Prepare and train a model on a dataset. Can also infer from a model or merge lora
Create a model card for the trained model if needed.
Execute the training process with appropriate SDP kernel configurations.
Apply fixes for untrained tokens if configured.
Save initial configurations before training.
Save the trained model according to configuration and training setup.
Load the tokenizer, processor (for multimodal models), and model based on configuration.
Load model, tokenizer, trainer, etc. Helper function to encapsulate the full trainer setup.
Set up the Axolotl badge and add the Axolotl config to the model card if available.
Set up the reference model for RL training if needed.
Set up signal handler for graceful termination.
Train a model on the given dataset.
**Examples:**
Example 1 (python):
```python
train.create_model_card(cfg, trainer)
```
Example 2 (python):
```python
train.execute_training(cfg, trainer, resume_from_checkpoint)
```
Example 3 (python):
```python
train.handle_untrained_tokens_fix(
cfg,
model,
tokenizer,
train_dataset,
safe_serialization,
)
```
Example 4 (python):
```python
train.save_initial_configs(cfg, tokenizer, model, peft_config, processor)
```
---
## cli.utils.load
**URL:** https://docs.axolotl.ai/docs/api/cli.utils.load.html
**Contents:**
- cli.utils.load
- Functions
- load_model_and_tokenizer
- Parameters
- Returns
Utilities for model, tokenizer, etc. loading.
Helper function for loading a model, tokenizer, and processor specified in the given axolotl config.
**Examples:**
Example 1 (python):
```python
cli.utils.load.load_model_and_tokenizer(cfg, inference=False)
```
---
## loaders.model
**URL:** https://docs.axolotl.ai/docs/api/loaders.model.html
**Contents:**
- loaders.model
- Classes
- ModelLoader
- The loading process includes
- Attributes
- Methods
- load
- Returns
Model loader class implementation for loading, configuring, and patching various models.
Manages model configuration, initialization and application of patches during model loading.
This class orchestrates the entire process of loading a model from configuration to final preparation. It handles device mapping, quantization, attention mechanisms, adapter integration, and various optimizations.
Load and prepare the model with all configurations and patches.
**Examples:**
Example 1 (python):
```python
loaders.model.ModelLoader(
cfg,
tokenizer,
*,
inference=False,
reference_model=False,
**kwargs,
)
```
Example 2 (python):
```python
loaders.model.ModelLoader.load()
```
---
## utils.distributed
**URL:** https://docs.axolotl.ai/docs/api/utils.distributed.html
**Contents:**
- utils.distributed
- Functions
- barrier
- cleanup_distributed
- compute_and_broadcast
- gather_from_all_ranks
- gather_scalar_from_all_ranks
- is_distributed
- is_main_process
- Returns
Utilities for distributed functionality.
Acts as a barrier to wait for all processes. This ensures that all processes reach the barrier before proceeding further.
Destroy process group if torch distributed is initialized. Called in training early termination or when training successfully completes.
Compute a value using the function ‘fn’ only on the specified rank (default is 0). The value is then broadcasted to all other ranks.
Args: - fn (callable): A function that computes the value. This should not have any side effects. - rank (int, optional): The rank that computes the value. Default is 0.
Returns: - The computed value (int or float).
Run a callable ‘fn’ on all ranks and gather the results on the specified rank.
Args: - fn (callable): A function that computes the value. This should not have any side effects. - rank (int, optional): The rank that gathers the values. Default is 0. - world_size (int, optional): Total number of processes in the current distributed setup.
Returns: - A list of computed values from all ranks if on the gathering rank, otherwise None.
Run a callable ‘fn’ on all ranks and gather the results on the specified rank.
Args: - fn (callable): A function that computes the value. This should not have any side effects. - rank (int, optional): The rank that gathers the values. Default is 0. - world_size (int, optional): Total number of processes in the current distributed setup.
Returns: - A list of computed values from all ranks if on the gathering rank, otherwise None.
Check if distributed training is initialized.
Check if the current process is the main process. If not in distributed mode, always return True.
We use a simpler logic when the distributed state is not initialized: we just log on the 0-th local rank.
Run a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’, and then broadcast the reduced result to all ranks.
Args: - fn1 (callable): A function that computes the value on each rank. - fn2 (callable): A reduction function that takes a list of values and returns a single value. - world_size (int, optional): Total number of processes in the current distributed setup.
Returns: - The reduced and broadcasted value.
runs the wrapped context so that rank 0 runs first before other ranks
**Examples:**
Example 1 (python):
```python
utils.distributed.barrier()
```
Example 2 (python):
```python
utils.distributed.cleanup_distributed()
```
Example 3 (python):
```python
utils.distributed.compute_and_broadcast(fn)
```
Example 4 (python):
```python
utils.distributed.gather_from_all_ranks(fn, world_size=1)
```
---
## cli.config
**URL:** https://docs.axolotl.ai/docs/api/cli.config.html
**Contents:**
- cli.config
- Functions
- check_remote_config
- Parameters
- Returns
- Raises
- choose_config
- Parameters
- Returns
- Raises
Configuration loading and processing.
First, determines if the passed config is a valid HTTPS URL. Then, attempts to query for it and parse its content, first as JSON, then as YAML (YAML is preferred). Finally, the parsed content is written to a local file and its path is returned.
Helper method for choosing a axolotl config YAML file (considering only files ending with .yml or .yaml). If more than one config file exists in the passed path, the user is prompted to choose one.
Loads the axolotl configuration stored at config, validates it, and performs various setup.
Registers the plugins for the given configuration.
**Examples:**
Example 1 (python):
```python
cli.config.check_remote_config(config)
```
Example 2 (python):
```python
cli.config.choose_config(path)
```
Example 3 (python):
```python
cli.config.load_cfg(config=Path('examples/'), **kwargs)
```
Example 4 (python):
```python
cli.config.prepare_plugins(cfg)
```
---
## cli.checks
**URL:** https://docs.axolotl.ai/docs/api/cli.checks.html
**Contents:**
- cli.checks
- Functions
- check_accelerate_default_config
- check_user_token
- Returns
- Raises
Various checks for Axolotl CLI.
Logs at warning level if no accelerate config file is found.
Checks for HF user info. Check is skipped if HF_HUB_OFFLINE=1.
**Examples:**
Example 1 (python):
```python
cli.checks.check_accelerate_default_config()
```
Example 2 (python):
```python
cli.checks.check_user_token()
```
---
## prompt_strategies.llama2_chat
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html
**Contents:**
- prompt_strategies.llama2_chat
- Classes
- LLama2ChatTokenizingStrategy
- Llama2ChatConversation
- Methods
- append_message
- get_prompt
- Llama2ChatPrompter
prompt_strategies.llama2_chat
Prompt Strategy for finetuning Llama2 chat models see also https://github.com/facebookresearch/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/generation.py#L213 for ma reference implementation.
This implementation is based on the Vicuna PR and the fastchat repo, see also: https://github.com/lm-sys/FastChat/blob/cdd7730686cb1bf9ae2b768ee171bdf7d1ff04f3/fastchat/conversation.py#L847
Use dataset type: “llama2_chat” in conig.yml to use this prompt style.
E.g. in the config.yml:
The dataset itself should look like this:
in a jsonl file. The first message should be from the human, the second from gpt. For a custom system message, the first “from” can be “system” (followed by alternating “human” and “gpt” turns).
Important: Don’t use “special_tokens:” in your config.yml if you are not sure what you are doing!
Tokenizing strategy for Llama2 prompts. adapted from https://github.com/lm-sys/FastChat/blob/main/fastchat/train/train.py
A class that manages prompt templates and keeps all conversation history. copied from https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
Append a new message.
Get the prompt for generation.
A prompter that generates prompts for Llama2 models.
**Examples:**
Example 1 (unknown):
```unknown
datasets:
- path: llama_finetune_train.jsonl
type: llama2_chat
```
Example 2 (unknown):
```unknown
{'conversations':[{"from": "human", "value": "Who are you?"}, {"from": "gpt", "value": "I am Vicuna"},...]}
```
Example 3 (python):
```python
prompt_strategies.llama2_chat.LLama2ChatTokenizingStrategy(*args, **kwargs)
```
Example 4 (python):
```python
prompt_strategies.llama2_chat.Llama2ChatConversation(
name='llama2',
system="[INST] <>nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.nnIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.n<>nn",
roles=('[INST]', '[/INST]'),
messages=list(),
offset=0,
)
```
---
## cli.utils
**URL:** https://docs.axolotl.ai/docs/api/cli.utils.html
**Contents:**
- cli.utils
Init for axolotl.cli.utils module.
---
## cli.utils.args
**URL:** https://docs.axolotl.ai/docs/api/cli.utils.args.html
**Contents:**
- cli.utils.args
- Functions
- add_options_from_config
- Parameters
- Returns
- add_options_from_dataclass
- Parameters
- Returns
- filter_none_kwargs
- Parameters
Utilities for axolotl CLI args.
Create Click options from the fields of a Pydantic model.
Create Click options from the fields of a dataclass.
Wraps function to remove None-valued kwargs.
**Examples:**
Example 1 (python):
```python
cli.utils.args.add_options_from_config(config_class)
```
Example 2 (python):
```python
cli.utils.args.add_options_from_dataclass(config_class)
```
Example 3 (python):
```python
cli.utils.args.filter_none_kwargs(func)
```
---
## integrations.grokfast.optimizer
**URL:** https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html
**Contents:**
- integrations.grokfast.optimizer
integrations.grokfast.optimizer
---
## core.builders.causal
**URL:** https://docs.axolotl.ai/docs/api/core.builders.causal.html
**Contents:**
- core.builders.causal
- Classes
- HFCausalTrainerBuilder
Builder for causal trainers
Build the HuggingFace training args/trainer for causal models and reward modeling using TRL.
**Examples:**
Example 1 (python):
```python
core.builders.causal.HFCausalTrainerBuilder(
cfg,
model,
tokenizer,
processor=None,
)
```
---
## prompt_strategies.dpo.user_defined
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html
**Contents:**
- prompt_strategies.dpo.user_defined
prompt_strategies.dpo.user_defined
User-defined DPO strategies
---
## cli.evaluate
**URL:** https://docs.axolotl.ai/docs/api/cli.evaluate.html
**Contents:**
- cli.evaluate
- Functions
- do_cli
- Parameters
- do_evaluate
- Parameters
CLI to run evaluation on a model.
Parses axolotl config, CLI args, and calls do_evaluate.
Evaluates a transformers model by first loading the dataset(s) specified in the axolotl config, and then calling axolotl.evaluate.evaluate, which computes evaluation metrics on the given dataset(s) and writes them to disk.
**Examples:**
Example 1 (python):
```python
cli.evaluate.do_cli(config=Path('examples/'), **kwargs)
```
Example 2 (python):
```python
cli.evaluate.do_evaluate(cfg, cli_args)
```
---
## utils.schemas.utils
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.utils.html
**Contents:**
- utils.schemas.utils
- Functions
- handle_legacy_message_fields_logic
- Parameters
- Returns
- Raises
Utilities for Axolotl Pydantic models
Handle backwards compatibility between legacy message field mapping and new property mapping system.
Previously, the config only supported mapping ‘role’ and ‘content’ fields via dedicated config options: - message_field_role: Mapped to the role field - message_field_content: Mapped to the content field
The new system uses message_property_mappings to support arbitrary field mappings: message_property_mappings: role: source_role_field content: source_content_field additional_field: source_field
**Examples:**
Example 1 (python):
```python
utils.schemas.utils.handle_legacy_message_fields_logic(data)
```
---
## prompt_strategies.alpaca_instruct
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html
**Contents:**
- prompt_strategies.alpaca_instruct
prompt_strategies.alpaca_instruct
Module loading the AlpacaInstructPromptTokenizingStrategy class
---
## utils.callbacks.lisa
**URL:** https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html
**Contents:**
- utils.callbacks.lisa
Adapted from https://github.com/OptimalScale/LMFlow/pull/701 for HF transformers & Axolotl Arxiv: https://arxiv.org/abs/2403.17919 License: Apache 2.0
---
## models.mamba.modeling_mamba
**URL:** https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html
**Contents:**
- models.mamba.modeling_mamba
models.mamba.modeling_mamba
---
## prompt_strategies.metharme
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html
**Contents:**
- prompt_strategies.metharme
- Classes
- MetharmePromptTokenizingStrategy
- MetharmePrompter
prompt_strategies.metharme
Module containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class
Tokenizing strategy for the Metharme models
Prompter for the Metharme models.
**Examples:**
Example 1 (python):
```python
prompt_strategies.metharme.MetharmePromptTokenizingStrategy(
prompter,
tokenizer,
train_on_inputs=False,
sequence_len=2048,
)
```
Example 2 (python):
```python
prompt_strategies.metharme.MetharmePrompter(*args, **kwargs)
```
---
## core.trainers.mamba
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.mamba.html
**Contents:**
- core.trainers.mamba
- Classes
- AxolotlMambaTrainer
Module for mamba trainer
Mamba specific trainer to handle loss calculation
**Examples:**
Example 1 (python):
```python
core.trainers.mamba.AxolotlMambaTrainer(
*_args,
bench_data_collator=None,
eval_data_collator=None,
dataset_tags=None,
**kwargs,
)
```
---
## utils.ctx_managers.sequence_parallel
**URL:** https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html
**Contents:**
- utils.ctx_managers.sequence_parallel
- Classes
- AllGatherWithGrad
- Methods
- backward
- Parameters
- Returns
- forward
- Parameters
- Returns
utils.ctx_managers.sequence_parallel
Module for Axolotl trainer sequence parallelism manager and utilities
Custom autograd function for all-gather to preserve gradients.
Backward pass for all-gather operation.
Extracts the gradient slice corresponding to this rank’s original input from the full gradient tensor.
Forward pass of all-gather of data with sequence dimension.
Context manager for sequence parallelism operations.
This class provides a context that will automatically apply sequence parallelism during model forward passes using a pre-forward hook, and gather outputs from across the sequence parallelism group using a post-forward hook.
Apply sequence parallelism slicing to a batch.
Special handling is implemented for integer logits_to_keep, which indicates to only keep the last N tokens in the sequence during generation.
**Examples:**
Example 1 (python):
```python
utils.ctx_managers.sequence_parallel.AllGatherWithGrad()
```
Example 2 (python):
```python
utils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(
ctx,
grad_output,
)
```
Example 3 (python):
```python
utils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(
ctx,
input_tensor,
group,
)
```
Example 4 (python):
```python
utils.ctx_managers.sequence_parallel.SequenceParallelContextManager(
models,
context_parallel_size,
gradient_accumulation_steps,
ring_attn_func,
heads_k_stride,
gather_outputs,
device_mesh=None,
)
```
---
## utils.callbacks.qat
**URL:** https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html
**Contents:**
- utils.callbacks.qat
- Classes
- QATCallback
- Functions
- toggle_fake_quant
- Parameters
QAT Callback for HF Causal Trainer
Callback to toggle fake quantization for the model.
Toggle fake quantization for any fake quantized linear or embedding layers in the model.
**Examples:**
Example 1 (python):
```python
utils.callbacks.qat.QATCallback(cfg)
```
Example 2 (python):
```python
utils.callbacks.qat.toggle_fake_quant(mod, enable)
```
---
## prompt_strategies.dpo.zephyr
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html
**Contents:**
- prompt_strategies.dpo.zephyr
prompt_strategies.dpo.zephyr
DPO strategies for zephyr
---
## kernels.utils
**URL:** https://docs.axolotl.ai/docs/api/kernels.utils.html
**Contents:**
- kernels.utils
Utilities for axolotl.kernels submodules.
---
## monkeypatch.multipack
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html
**Contents:**
- monkeypatch.multipack
monkeypatch.multipack
multipack patching for v2 of sample packing
---
## cli.main
**URL:** https://docs.axolotl.ai/docs/api/cli.main.html
**Contents:**
- cli.main
- Functions
- cli
- evaluate
- Parameters
- fetch
- Parameters
- inference
- Parameters
- merge_lora
Click CLI definitions for various axolotl commands.
Axolotl CLI - Train and fine-tune large language models
Fetch example configs or other resources.
Available directories: - examples: Example configuration files - deepspeed_configs: DeepSpeed configuration files
Run inference with a trained model.
Merge trained LoRA adapters into a base model.
Merge sharded FSDP model weights.
Preprocess datasets before training.
Train or fine-tune a model.
**Examples:**
Example 1 (python):
```python
cli.main.cli()
```
Example 2 (python):
```python
cli.main.evaluate(ctx, config, launcher, **kwargs)
```
Example 3 (python):
```python
cli.main.fetch(directory, dest)
```
Example 4 (python):
```python
cli.main.inference(ctx, config, launcher, gradio, **kwargs)
```
---
## core.trainers.mixins.optimizer
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html
**Contents:**
- core.trainers.mixins.optimizer
- Classes
- OptimizerInitMixin
- OptimizerMixin
core.trainers.mixins.optimizer
Module for Axolotl trainer optimizer mixin
Mixin to handle common optimizer initialization logic for Trainers (mostly TRL) that do not accept optimizer_cls_and_kwargs as kwarg in constructor.
Mixin class for shared handling of building custom optimizers
**Examples:**
Example 1 (python):
```python
core.trainers.mixins.optimizer.OptimizerInitMixin(*args, **kwargs)
```
Example 2 (python):
```python
core.trainers.mixins.optimizer.OptimizerMixin()
```
---
## integrations.kd.trainer
**URL:** https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html
**Contents:**
- integrations.kd.trainer
- Classes
- AxolotlKDTrainer
- Methods
- compute_loss
integrations.kd.trainer
Custom trainer subclass for Knowledge Distillation (KD)
How the loss is computed by Trainer. By default, all models return the loss in the first element.
Subclass and override for custom behavior.
**Examples:**
Example 1 (python):
```python
integrations.kd.trainer.AxolotlKDTrainer(*args, **kwargs)
```
Example 2 (python):
```python
integrations.kd.trainer.AxolotlKDTrainer.compute_loss(
model,
inputs,
return_outputs=False,
num_items_in_batch=None,
)
```
---
## integrations.lm_eval.args
**URL:** https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html
**Contents:**
- integrations.lm_eval.args
- Classes
- LMEvalArgs
integrations.lm_eval.args
Module for handling lm eval harness input arguments.
Input args for lm eval harness
**Examples:**
Example 1 (python):
```python
integrations.lm_eval.args.LMEvalArgs()
```
---
## integrations.cut_cross_entropy.args
**URL:** https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html
**Contents:**
- integrations.cut_cross_entropy.args
- Classes
- CutCrossEntropyArgs
integrations.cut_cross_entropy.args
Module for handling Cut Cross Entropy input arguments.
Input args for Cut Cross Entropy.
**Examples:**
Example 1 (python):
```python
integrations.cut_cross_entropy.args.CutCrossEntropyArgs()
```
---
## monkeypatch.mistral_attn_hijack_flash
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html
**Contents:**
- monkeypatch.mistral_attn_hijack_flash
monkeypatch.mistral_attn_hijack_flash
Flash attention monkey patch for mistral model
---
## loaders.constants
**URL:** https://docs.axolotl.ai/docs/api/loaders.constants.html
**Contents:**
- loaders.constants
Shared constants for axolotl.loaders module
---
## utils.bench
**URL:** https://docs.axolotl.ai/docs/api/utils.bench.html
**Contents:**
- utils.bench
- Functions
- check_cuda_device
Benchmarking and measurement utilities
wraps a function and returns the default value instead of running the wrapped function if cuda isn’t available or the device is auto :param default_value: :return:
**Examples:**
Example 1 (python):
```python
utils.bench.check_cuda_device(default_value)
```
---
## utils.trainer
**URL:** https://docs.axolotl.ai/docs/api/utils.trainer.html
**Contents:**
- utils.trainer
- Functions
- add_pose_position_ids
- add_position_ids
- drop_long_seq
- setup_trainer
- Parameters
- Returns
Module containing the Trainer class and related functions
use the PoSE technique to extend the context length by randomly skipping positions in the context. We only want to skip right before tokens in the split_on_token_ids list. We should attempt to randomly distribute the skips, but we don’t need the final position_ids to be the full context_len. There may be multiple turns in the context, so we want to make sure we take into account the maximum possible number of skips remaining in each sample.
Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]
Drop samples whose sequence length is either too long (> sequence_len) or too short (< min_sequence_len).
Works for both single-example (list[int]) or batched (list[list[int]]).
Helper method for instantiating and building a (causal or RLHF) trainer.
**Examples:**
Example 1 (python):
```python
utils.trainer.add_pose_position_ids(
sample,
max_context_len=32768,
split_on_token_ids=None,
chunks=2,
)
```
Example 2 (python):
```python
utils.trainer.add_position_ids(sample)
```
Example 3 (python):
```python
utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)
```
Example 4 (python):
```python
utils.trainer.setup_trainer(
cfg,
train_dataset,
eval_dataset,
model,
tokenizer,
processor,
total_num_steps,
model_ref=None,
peft_config=None,
)
```
---
## utils.schemas.config
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.config.html
**Contents:**
- utils.schemas.config
- Classes
- AxolotlConfigWCapabilities
- AxolotlInputConfig
Module with Pydantic models for configuration.
wrapper to valdiate GPU capabilities with the configured options
Wrapper of all config options.
**Examples:**
Example 1 (python):
```python
utils.schemas.config.AxolotlConfigWCapabilities()
```
Example 2 (python):
```python
utils.schemas.config.AxolotlInputConfig()
```
---
## cli.args
**URL:** https://docs.axolotl.ai/docs/api/cli.args.html
**Contents:**
- cli.args
- Classes
- EvaluateCliArgs
- InferenceCliArgs
- PreprocessCliArgs
- QuantizeCliArgs
- TrainerCliArgs
- VllmServeCliArgs
Module for axolotl CLI command arguments.
Dataclass with CLI arguments for axolotl evaluate command.
Dataclass with CLI arguments for axolotl inference command.
Dataclass with CLI arguments for axolotl preprocess command.
Dataclass with CLI arguments for axolotl quantize command.
Dataclass with CLI arguments for axolotl train command.
Dataclass with CLI arguments for axolotl vllm-serve command.
**Examples:**
Example 1 (python):
```python
cli.args.EvaluateCliArgs(
debug=False,
debug_text_only=False,
debug_num_examples=0,
)
```
Example 2 (python):
```python
cli.args.InferenceCliArgs(prompter=None)
```
Example 3 (python):
```python
cli.args.PreprocessCliArgs(
debug=False,
debug_text_only=False,
debug_num_examples=1,
prompter=None,
download=True,
iterable=False,
)
```
Example 4 (python):
```python
cli.args.QuantizeCliArgs(
base_model=None,
weight_dtype=None,
activation_dtype=None,
quantize_embedding=None,
group_size=None,
output_dir=None,
hub_model_id=None,
)
```
---
## common.architectures
**URL:** https://docs.axolotl.ai/docs/api/common.architectures.html
**Contents:**
- common.architectures
Common architecture specific constants
---
## cli.merge_sharded_fsdp_weights
**URL:** https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html
**Contents:**
- cli.merge_sharded_fsdp_weights
- Classes
- BFloat16CastPlanner
- Functions
- do_cli
- Parameters
- merge_fsdp_weights
- Parameters
- Raises
cli.merge_sharded_fsdp_weights
CLI to merge sharded FSDP model checkpoints into a single combined checkpoint.
A custom planner to cast tensors to bfloat16 on the fly during loading.
Parses axolotl config, CLI args, and calls merge_fsdp_weights.
Merge the weights from sharded FSDP model checkpoints into a single combined checkpoint. Should be used if SHARDED_STATE_DICT was used for the model. Weights will be saved to {output_path}/model.safetensors if safe_serialization else pytorch_model.bin.
Note: this is a CPU-bound process.
**Examples:**
Example 1 (python):
```python
cli.merge_sharded_fsdp_weights.BFloat16CastPlanner()
```
Example 2 (python):
```python
cli.merge_sharded_fsdp_weights.do_cli(config=Path('examples/'), **kwargs)
```
Example 3 (python):
```python
cli.merge_sharded_fsdp_weights.merge_fsdp_weights(
checkpoint_dir,
output_path,
safe_serialization=False,
remove_checkpoint_dir=False,
)
```
---
## utils.data.streaming
**URL:** https://docs.axolotl.ai/docs/api/utils.data.streaming.html
**Contents:**
- utils.data.streaming
Data handling specific to streaming datasets.
---
## core.chat.format.chatml
**URL:** https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html
**Contents:**
- core.chat.format.chatml
core.chat.format.chatml
ChatML transformation functions for MessageContents
---
## prompt_strategies.kto.chatml
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html
**Contents:**
- prompt_strategies.kto.chatml
- Functions
- argilla_chat
- intel
- ultra
prompt_strategies.kto.chatml
KTO strategies for chatml
for argilla/kto-mix-15k conversations
For Intel Orca KTO ex: argilla/distilabel-intel-orca-kto
for ultrafeedback binarized conversations ex: argilla/ultrafeedback-binarized-preferences-cleaned-kto
**Examples:**
Example 1 (python):
```python
prompt_strategies.kto.chatml.argilla_chat(cfg, **kwargs)
```
Example 2 (python):
```python
prompt_strategies.kto.chatml.intel(cfg, **kwargs)
```
Example 3 (python):
```python
prompt_strategies.kto.chatml.ultra(cfg, **kwargs)
```
---
## utils.schemas.trl
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.trl.html
**Contents:**
- utils.schemas.trl
- Classes
- TRLConfig
Pydantic models for TRL trainer configuration
**Examples:**
Example 1 (python):
```python
utils.schemas.trl.TRLConfig()
```
---
## monkeypatch.llama_attn_hijack_xformers
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html
**Contents:**
- monkeypatch.llama_attn_hijack_xformers
monkeypatch.llama_attn_hijack_xformers
Directly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments
---
## kernels.geglu
**URL:** https://docs.axolotl.ai/docs/api/kernels.geglu.html
**Contents:**
- kernels.geglu
- Functions
- geglu_backward
- Parameters
- Returns
- Note
- geglu_forward
- Parameters
- Returns
Module for definition of GEGLU Triton kernels.
See “GLU Variants Improve Transformer” (https://arxiv.org/abs/2002.05202).
Credit to unsloth (https://unsloth.ai/) for inspiration for this implementation.
GEGLU backward pass using in-place operations.
This function modifies its input tensors in-place to store results.
**Examples:**
Example 1 (python):
```python
kernels.geglu.geglu_backward(grad_output, gate, up)
```
Example 2 (python):
```python
kernels.geglu.geglu_forward(gate, up)
```
---
## utils.callbacks.profiler
**URL:** https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html
**Contents:**
- utils.callbacks.profiler
- Classes
- PytorchProfilerCallback
utils.callbacks.profiler
HF Trainer callback for creating pytorch profiling snapshots
PyTorch Profiler callback to create snapshots of GPU memory usage at specified steps.
**Examples:**
Example 1 (python):
```python
utils.callbacks.profiler.PytorchProfilerCallback(
steps_to_profile=5,
profiler_steps_start=0,
)
```
---
## kernels.lora
**URL:** https://docs.axolotl.ai/docs/api/kernels.lora.html
**Contents:**
- kernels.lora
- Classes
- LoRA_MLP
- Methods
- backward
- Parameters
- Returns
- forward
- Parameters
- Returns
Module for definition of Low-Rank Adaptation (LoRA) Triton kernels.
See “LoRA: Low-Rank Adaptation of Large Language Models” (https://arxiv.org/abs/2106.09685).
Credit to unsloth (https://unsloth.ai/) for inspiration for this implementation.
Optimized LoRA MLP implementation.
Performs backward pass computation for LoRA MLP.
Forward pass for LoRA MLP.
Optimized LoRA implementation for output projection.
Backward pass computing gradients for LoRA output projection.
Forward pass for output projection with LoRA.
Optimized LoRA QKV implementation with quantization support.
Implements efficient computation of query, key, value projections with LoRA, supporting quantization and memory optimization.
Backward pass computing gradients for LoRA QKV.
Forward pass computing Q, K, V projections with LoRA.
Applies LoRA to MLP layer with GEGLU activation.
Applies LoRA to MLP layer with SwiGLU activation.
Applies LoRA to output projection layer.
Applies LoRA to compute Query, Key, Value projections.
Gets LoRA parameters from a projection module.
Efficient fused matmul + LoRA computation.
**Examples:**
Example 1 (python):
```python
kernels.lora.LoRA_MLP()
```
Example 2 (python):
```python
kernels.lora.LoRA_MLP.backward(ctx, grad_output)
```
Example 3 (python):
```python
kernels.lora.LoRA_MLP.forward(
ctx,
X,
gate_weight,
gate_bias,
gate_quant,
gate_A,
gate_B,
gate_scale,
up_weight,
up_bias,
up_quant,
up_A,
up_B,
up_scale,
down_weight,
down_bias,
down_quant,
down_A,
down_B,
down_scale,
activation_fn,
activation_fn_backward,
inplace=True,
)
```
Example 4 (python):
```python
kernels.lora.LoRA_O()
```
---
## monkeypatch.trainer_fsdp_optim
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html
**Contents:**
- monkeypatch.trainer_fsdp_optim
- Functions
- patch_training_loop_for_fsdp
monkeypatch.trainer_fsdp_optim
fix for FSDP optimizer save in trainer w 4.47.0
monkeypatch for fixing the training loop for fsdp with optimizer save
**Examples:**
Example 1 (python):
```python
monkeypatch.trainer_fsdp_optim.patch_training_loop_for_fsdp()
```
---
## utils.schemas.multimodal
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html
**Contents:**
- utils.schemas.multimodal
- Classes
- MultiModalConfig
- Methods
- convert_image_resize_algorithm
utils.schemas.multimodal
Pydantic models for multimodal-related configuration
Multi-modal configuration subset
Convert the image resize algorithm to a PIL.Image.Resampling enum.
**Examples:**
Example 1 (python):
```python
utils.schemas.multimodal.MultiModalConfig()
```
Example 2 (python):
```python
utils.schemas.multimodal.MultiModalConfig.convert_image_resize_algorithm(
image_resize_algorithm,
)
```
---
## prompt_strategies.dpo.llama3
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html
**Contents:**
- prompt_strategies.dpo.llama3
- Functions
- argilla_chat
- icr
- intel
- ultra
prompt_strategies.dpo.llama3
DPO strategies for llama-3 chat template
for argilla/dpo-mix-7k conversations
chatml transforms for datasets with system, input, chosen, rejected ex. https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
For Intel Orca DPO Pairs
for ultrafeedback binarized conversations
**Examples:**
Example 1 (python):
```python
prompt_strategies.dpo.llama3.argilla_chat(cfg, **kwargs)
```
Example 2 (python):
```python
prompt_strategies.dpo.llama3.icr(cfg, **kwargs)
```
Example 3 (python):
```python
prompt_strategies.dpo.llama3.intel(cfg, **kwargs)
```
Example 4 (python):
```python
prompt_strategies.dpo.llama3.ultra(cfg, **kwargs)
```
---
## core.chat.format.shared
**URL:** https://docs.axolotl.ai/docs/api/core.chat.format.shared.html
**Contents:**
- core.chat.format.shared
core.chat.format.shared
shared functions for format transforms
---
## monkeypatch.llama_expand_mask
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html
**Contents:**
- monkeypatch.llama_expand_mask
monkeypatch.llama_expand_mask
expands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf
---
## core.chat.messages
**URL:** https://docs.axolotl.ai/docs/api/core.chat.messages.html
**Contents:**
- core.chat.messages
- Classes
- ChatFormattedChats
- Chats
- MessageContentTypes
- MessageContents
- MessageRoles
- Messages
- PreferenceChats
- SpecialToken
internal message representations of chat messages
Chat formatted chats with formatter and optional train on inputs
top level data structure for chat conversations
Message content types for text, image, audio, tool calls, and tool responses
Message contents with type, value, metadata, weight, newline, and end of contents
Message roles for the system, user, assistant, and tools
Messages with role, content, metadata, weight, and chat formatting
representation for preference data for chat
Special tokens for beginning of string and end of string
Tool with description, function, and parameters
Tool call contents with name, arguments, and optional id
Tool call function with name and arguments
Tool response contents with name, content, and optional id
**Examples:**
Example 1 (python):
```python
core.chat.messages.ChatFormattedChats()
```
Example 2 (python):
```python
core.chat.messages.Chats()
```
Example 3 (python):
```python
core.chat.messages.MessageContentTypes()
```
Example 4 (python):
```python
core.chat.messages.MessageContents()
```
---
## core.datasets.transforms.chat_builder
**URL:** https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html
**Contents:**
- core.datasets.transforms.chat_builder
- Functions
- chat_message_transform_builder
- Parameters
- Returns
core.datasets.transforms.chat_builder
This module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.
Builds a transform that takes a row from the dataset and converts it to a Chat
**Examples:**
Example 1 (python):
```python
core.datasets.transforms.chat_builder.chat_message_transform_builder(
train_on_inputs=False,
conversations_field='messages',
message_field_role=None,
message_field_content=None,
message_field_training=None,
)
```
---
## utils.chat_templates
**URL:** https://docs.axolotl.ai/docs/api/utils.chat_templates.html
**Contents:**
- utils.chat_templates
This module provides functionality for selecting chat templates based on user choices. These templates are used for formatting messages in a conversation.
---
## core.trainers.dpo.trainer
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html
**Contents:**
- core.trainers.dpo.trainer
- Classes
- AxolotlDPOTrainer
- Methods
- push_to_hub
core.trainers.dpo.trainer
DPO trainer for axolotl
Extend the base DPOTrainer for axolotl helpers.
Overwrite the push_to_hub method in order to force-add the tags when pushing the model on the Hub. Please refer to ~transformers.Trainer.push_to_hub for more details.
**Examples:**
Example 1 (python):
```python
core.trainers.dpo.trainer.AxolotlDPOTrainer(*args, dataset_tags=None, **kwargs)
```
Example 2 (python):
```python
core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub(*args, **kwargs)
```
---
## monkeypatch.gradient_checkpointing.offload_disk
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html
**Contents:**
- monkeypatch.gradient_checkpointing.offload_disk
- Classes
- Disco
- Methods
- backward
- forward
- get_instance
- DiskOffloadManager
- Methods
- cleanup
monkeypatch.gradient_checkpointing.offload_disk
DISCO - DIsk-based Storage and Checkpointing with Optimized prefetching
Disco: DIsk-based Storage and Checkpointing with Optimized prefetching Advanced disk-based gradient checkpointer with prefetching.
Backward pass that loads activations from disk with prefetching
Forward pass that offloads activations to disk asynchronously
Get or create the offload manager
Manages offloaded tensors and handles prefetching in a separate thread. Includes synchronization to prevent race conditions.
Clean up all temp files and stop prefetch thread with proper synchronization
Clean up a specific tensor file after it’s been used
Load tensor from disk or prefetch cache with proper synchronization
Save tensor to disk asynchronously and return file path with thread-safe operations
Trigger prefetching of the next N tensors with proper synchronization
Wait for a tensor to be saved to disk
**Examples:**
Example 1 (python):
```python
monkeypatch.gradient_checkpointing.offload_disk.Disco()
```
Example 2 (python):
```python
monkeypatch.gradient_checkpointing.offload_disk.Disco.backward(
ctx,
*grad_outputs,
)
```
Example 3 (python):
```python
monkeypatch.gradient_checkpointing.offload_disk.Disco.forward(
ctx,
forward_function,
hidden_states,
*args,
prefetch_size=1,
prefetch_to_gpu=True,
save_workers=4,
)
```
Example 4 (python):
```python
monkeypatch.gradient_checkpointing.offload_disk.Disco.get_instance(
prefetch_size=1,
prefetch_to_gpu=True,
save_workers=4,
)
```
---
## utils.samplers.multipack
**URL:** https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html
**Contents:**
- utils.samplers.multipack
- Classes
- MultipackBatchSampler
- Methods
- efficiency
- gather_efficiency
- Returns
- gather_len_batches
- generate_batches
- Parameters
utils.samplers.multipack
Multipack Batch Sampler - An efficient batch sampler for packing variable-length sequences into fixed-capacity batches to optimize memory usage and training throughput.
Batch sampler class for efficient packing of variable-length sequences
This sampler packs sequences into fixed-capacity bins (batches) to maximize GPU memory utilization and training throughput by reducing padding.
It supports both parallel packing (using FFD algorithm) and sequential packing (preserving original sequence order).
Calculate the packing efficiency (ratio of tokens used to total token slots). Higher is better - 1.0 would mean perfect packing with no wasted space.
Gather and synchronize packing efficiency estimates across all distributed ranks.
Gather and synchronize batch counts across all distributed ranks. Returns the minimum number of batches available on any rank.
Generate packed batches for training.
Set the epoch number, used for reproducible shuffling across epochs
Sequential allocator that preserves example order.
First-fit-decreasing bin packing algorithm check.
Checks if sequences with the given lengths could fit in the specified number of bins.
Pack a group of sequences into bins using First-Fit Decreasing algorithm.
Pack sequences into bins using parallel processing.
Returns: List of bins, where each bin contains indices of sequences assigned to it.
**Examples:**
Example 1 (python):
```python
utils.samplers.multipack.MultipackBatchSampler(
sampler,
batch_size,
batch_max_len,
lengths,
packing_efficiency_estimate=1.0,
drop_last=True,
num_count_samples=4,
sequential=False,
group_size=100000,
bin_size=200,
num_processes=None,
safe_mode=True,
mp_start_method='fork',
**kwargs,
)
```
Example 2 (python):
```python
utils.samplers.multipack.MultipackBatchSampler.efficiency()
```
Example 3 (python):
```python
utils.samplers.multipack.MultipackBatchSampler.gather_efficiency()
```
Example 4 (python):
```python
utils.samplers.multipack.MultipackBatchSampler.gather_len_batches(num)
```
---
## core.trainers.mixins.scheduler
**URL:** https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html
**Contents:**
- core.trainers.mixins.scheduler
- Classes
- SchedulerMixin
- Methods
- create_scheduler
- Parameters
core.trainers.mixins.scheduler
Module for Axolotl trainer scheduler mixin
Mixin class for scheduler setup in CausalTrainer.
Set up the scheduler. The optimizer of the trainer must have been set up either before this method is called or passed as an argument.
**Examples:**
Example 1 (python):
```python
core.trainers.mixins.scheduler.SchedulerMixin()
```
Example 2 (python):
```python
core.trainers.mixins.scheduler.SchedulerMixin.create_scheduler(
num_training_steps,
optimizer=None,
)
```
---
## utils.collators.batching
**URL:** https://docs.axolotl.ai/docs/api/utils.collators.batching.html
**Contents:**
- utils.collators.batching
- Classes
- BatchSamplerDataCollatorForSeq2Seq
- DataCollatorForSeq2Seq
- Parameters
- PretrainingBatchSamplerDataCollatorForSeq2Seq
- V2BatchSamplerDataCollatorForSeq2Seq
utils.collators.batching
Data collators for axolotl to pad labels and position_ids for packed sequences
Collator for multipack specific to the using the BatchSampler
Data collator that will dynamically pad the inputs received, as well as the labels and position_ids
Collator for multipack specific to the using the BatchSampler
Collator for multipack specific to the using the BatchSampler
**Examples:**
Example 1 (python):
```python
utils.collators.batching.BatchSamplerDataCollatorForSeq2Seq(
tokenizer,
model=None,
padding=True,
max_length=None,
pad_to_multiple_of=None,
label_pad_token_id=-100,
position_pad_token_id=0,
return_tensors='pt',
)
```
Example 2 (python):
```python
utils.collators.batching.DataCollatorForSeq2Seq(
tokenizer,
model=None,
padding=True,
max_length=None,
pad_to_multiple_of=None,
label_pad_token_id=-100,
position_pad_token_id=0,
return_tensors='pt',
)
```
Example 3 (python):
```python
utils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(
*args,
multipack_attn=True,
**kwargs,
)
```
Example 4 (python):
```python
utils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(
tokenizer,
model=None,
padding=True,
max_length=None,
pad_to_multiple_of=None,
label_pad_token_id=-100,
position_pad_token_id=0,
return_tensors='pt',
squash_position_ids=False,
)
```
---
## prompt_strategies.orcamini
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html
**Contents:**
- prompt_strategies.orcamini
- Classes
- OrcaMiniPrompter
prompt_strategies.orcamini
Prompt Strategy for finetuning Orca Mini (v2) models see also https://huggingface.co/psmathur/orca_mini_v2_7b for more information
Use dataset type: orcamini in conig.yml to use this prompt style.
Compared to the alpaca_w_system.open_orca dataset type, this one specifies the system prompt with “### System:”.
Not suited/tested for multiple-turn conversations without further adjustments.
Adjusted Prompter for Orca Mini (v2) datasets
**Examples:**
Example 1 (python):
```python
prompt_strategies.orcamini.OrcaMiniPrompter(
prompt_style=PromptStyle.INSTRUCT.value,
)
```
---
## prompt_strategies.dpo.chat_template
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html
**Contents:**
- prompt_strategies.dpo.chat_template
- Functions
- argilla_chat
- Parameters
- Returns
- Dataset format
prompt_strategies.dpo.chat_template
DPO prompt strategies for using tokenizer chat templates.
DPO chat template strategy for argilla-style datasets.
For argilla-style datasets where chosen/rejected contain full conversations instead of single response messages. Extracts the conversation history from the chosen field and formats both chosen/rejected responses using the configured chat template.
{ “chosen”: [ {“role”: “user”, “content”: “…”}, {“role”: “assistant”, “content”: “…”} ], “rejected”: [ {“role”: “user”, “content”: “…”}, {“role”: “assistant”, “content”: “…”} ] }
**Examples:**
Example 1 (python):
```python
prompt_strategies.dpo.chat_template.argilla_chat(cfg, dataset_idx=0, **kwargs)
```
---
## monkeypatch.relora
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.relora.html
**Contents:**
- monkeypatch.relora
- Classes
- ReLoRACallback
Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.
Callback to merge LoRA weights into the base model and save full-weight checkpoints
**Examples:**
Example 1 (python):
```python
monkeypatch.relora.ReLoRACallback(cfg)
```
---
## monkeypatch.transformers_fa_utils
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html
**Contents:**
- monkeypatch.transformers_fa_utils
- Functions
- fixed_fa_peft_integration_check
- Parameters
monkeypatch.transformers_fa_utils
see https://github.com/huggingface/transformers/pull/35834
PEFT usually casts the layer norms in float32 for training stability reasons therefore the input hidden states gets silently casted in float32. Hence, we need cast them back in float16 / bfloat16 just to be sure everything works as expected. This might slowdown training & inference so it is recommended to not cast the LayerNorms!
**Examples:**
Example 1 (python):
```python
monkeypatch.transformers_fa_utils.fixed_fa_peft_integration_check(
query,
key,
value,
target_dtype=None,
preferred_dtype=None,
)
```
---
## utils.collators.mm_chat
**URL:** https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html
**Contents:**
- utils.collators.mm_chat
- Classes
- MultiModalChatDataCollator
utils.collators.mm_chat
Collators for multi-modal chat messages and packing
Collator for multi-modal chat messages
**Examples:**
Example 1 (python):
```python
utils.collators.mm_chat.MultiModalChatDataCollator(
tokenizer,
processing_strategy,
packing=False,
return_tensors='pt',
padding=True,
pad_to_multiple_of=None,
)
```
---
## utils.lora
**URL:** https://docs.axolotl.ai/docs/api/utils.lora.html
**Contents:**
- utils.lora
- Functions
- get_lora_merged_state_dict
- Parameters
- Returns
module to get the state dict of a merged lora model
Create and return a state_dict that has the LoRA deltas merged into the base model’s weights, without modifying model in place.
**Examples:**
Example 1 (python):
```python
utils.lora.get_lora_merged_state_dict(model)
```
---
## utils.model_shard_quant
**URL:** https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html
**Contents:**
- utils.model_shard_quant
- Functions
- load_and_quantize
utils.model_shard_quant
module to handle loading model on cpu/meta device for FSDP
Loads value tensor into submodule of module, optionally skipping skip_names and converting to dtype.
Quantizes Params4bit on device then places on “cpu” if to_cpu=True or “meta” if to_meta=True.
**Examples:**
Example 1 (python):
```python
utils.model_shard_quant.load_and_quantize(
module,
name,
value,
device=None,
dtype=None,
skip_names=None,
to_cpu=False,
to_meta=False,
verbose=False,
quant_method='bnb',
)
```
---
## monkeypatch.gradient_checkpointing.offload_cpu
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html
**Contents:**
- monkeypatch.gradient_checkpointing.offload_cpu
- Classes
- CPU_Offloaded_Gradient_Checkpointer
monkeypatch.gradient_checkpointing.offload_cpu
CPU offloaded checkpointing
Saves VRAM by smartly offloading to RAM. Tiny hit to performance, since we mask the movement via non blocking calls.
**Examples:**
Example 1 (python):
```python
monkeypatch.gradient_checkpointing.offload_cpu.CPU_Offloaded_Gradient_Checkpointer(
)
```
---
## core.builders.base
**URL:** https://docs.axolotl.ai/docs/api/core.builders.base.html
**Contents:**
- core.builders.base
- Classes
- TrainerBuilderBase
- Methods
- get_post_trainer_create_callbacks
Base class for trainer builder
Base class for trainer builder.
Callbacks added after the trainer is created, usually b/c these need access to the trainer
**Examples:**
Example 1 (python):
```python
core.builders.base.TrainerBuilderBase(cfg, model, tokenizer, processor=None)
```
Example 2 (python):
```python
core.builders.base.TrainerBuilderBase.get_post_trainer_create_callbacks(trainer)
```
---
## core.builders.rl
**URL:** https://docs.axolotl.ai/docs/api/core.builders.rl.html
**Contents:**
- core.builders.rl
- Classes
- HFRLTrainerBuilder
Builder for RLHF trainers
Trainer factory class for TRL-based RLHF trainers (e.g. DPO)
**Examples:**
Example 1 (python):
```python
core.builders.rl.HFRLTrainerBuilder(cfg, model, tokenizer, processor=None)
```
---
## utils.schemas.integrations
**URL:** https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html
**Contents:**
- utils.schemas.integrations
- Classes
- CometConfig
- GradioConfig
- LISAConfig
- MLFlowConfig
- OpenTelemetryConfig
- RayConfig
- WandbConfig
utils.schemas.integrations
Pydantic models for Axolotl integrations
Comet configuration subset
Gradio configuration subset
LISA configuration subset
MLFlow configuration subset
OpenTelemetry configuration subset
Ray launcher configuration subset
Wandb configuration subset
**Examples:**
Example 1 (python):
```python
utils.schemas.integrations.CometConfig()
```
Example 2 (python):
```python
utils.schemas.integrations.GradioConfig()
```
Example 3 (python):
```python
utils.schemas.integrations.LISAConfig()
```
Example 4 (python):
```python
utils.schemas.integrations.MLFlowConfig()
```
---
## utils.data.sft
**URL:** https://docs.axolotl.ai/docs/api/utils.data.sft.html
**Contents:**
- utils.data.sft
- Functions
- prepare_datasets
- Parameters
- Returns
Data handling specific to SFT.
Prepare training and evaluation datasets based on configuration.
**Examples:**
Example 1 (python):
```python
utils.data.sft.prepare_datasets(cfg, tokenizer, processor=None)
```
---
## integrations.liger.args
**URL:** https://docs.axolotl.ai/docs/api/integrations.liger.args.html
**Contents:**
- integrations.liger.args
- Classes
- LigerArgs
integrations.liger.args
Module for handling LIGER input arguments.
Input args for LIGER.
**Examples:**
Example 1 (python):
```python
integrations.liger.args.LigerArgs()
```
---
## monkeypatch.mixtral
**URL:** https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html
**Contents:**
- monkeypatch.mixtral
Patches to support multipack for mixtral
---
## cli.preprocess
**URL:** https://docs.axolotl.ai/docs/api/cli.preprocess.html
**Contents:**
- cli.preprocess
- Functions
- do_cli
- Parameters
- do_preprocess
- Parameters
CLI to run preprocessing of a dataset.
Parses axolotl config, CLI args, and calls do_preprocess.
Preprocesses dataset specified in axolotl config.
**Examples:**
Example 1 (python):
```python
cli.preprocess.do_cli(config=Path('examples/'), **kwargs)
```
Example 2 (python):
```python
cli.preprocess.do_preprocess(cfg, cli_args)
```
---
## prompt_strategies.kto.llama3
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html
**Contents:**
- prompt_strategies.kto.llama3
- Functions
- argilla_chat
- intel
- ultra
prompt_strategies.kto.llama3
KTO strategies for llama-3 chat template
for argilla/kto-mix-15k conversations
For Intel Orca KTO ex: argilla/distilabel-intel-orca-kto
for ultrafeedback binarized conversations ex: argilla/ultrafeedback-binarized-preferences-cleaned-kto
**Examples:**
Example 1 (python):
```python
prompt_strategies.kto.llama3.argilla_chat(cfg, **kwargs)
```
Example 2 (python):
```python
prompt_strategies.kto.llama3.intel(cfg, **kwargs)
```
Example 3 (python):
```python
prompt_strategies.kto.llama3.ultra(cfg, **kwargs)
```
---
## prompt_strategies.orpo.chat_template
**URL:** https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html
**Contents:**
- prompt_strategies.orpo.chat_template
- Classes
- Message
- MessageList
- ORPODatasetParsingStrategy
- Methods
- get_chosen_conversation_thread
- get_prompt
- get_rejected_conversation_thread
- ORPOPrompter
prompt_strategies.orpo.chat_template
chatml prompt tokenization strategy for ORPO
Strategy to parse chosen rejected dataset into messagelist
Dataset structure mappings
Map the data to extract everything up to the last turn
Dataset structure mappings
Single Turn prompter for ORPO
rejected_input_ids input_ids rejected_attention_mask attention_mask rejected_labels labels
chatml transforms for datasets with system, input, chosen, rejected
**Examples:**
Example 1 (python):
```python
prompt_strategies.orpo.chat_template.Message()
```
Example 2 (python):
```python
prompt_strategies.orpo.chat_template.MessageList()
```
Example 3 (python):
```python
prompt_strategies.orpo.chat_template.ORPODatasetParsingStrategy()
```
Example 4 (python):
```python
prompt_strategies.orpo.chat_template.ORPODatasetParsingStrategy.get_chosen_conversation_thread(
prompt,
)
```
---
## loaders.processor
**URL:** https://docs.axolotl.ai/docs/api/loaders.processor.html
**Contents:**
- loaders.processor
Processor loading functionality for multi-modal models
---
## utils.callbacks.comet_
**URL:** https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html
**Contents:**
- utils.callbacks.comet_
- Classes
- SaveAxolotlConfigtoCometCallback
utils.callbacks.comet_
Comet module for trainer callbacks
Callback to save axolotl config to comet
**Examples:**
Example 1 (python):
```python
utils.callbacks.comet_.SaveAxolotlConfigtoCometCallback(axolotl_config_path)
```
---
Source: claude-code-templates (MIT). See About Us for full credits.