[ PROMPT_NODE_22694 ]
artifacts
[ SKILL_DOCUMENTATION ]
# Artifacts & Model Registry Guide
Complete guide to data versioning and model management with W&B Artifacts.
## Table of Contents
- What are Artifacts
- Creating Artifacts
- Using Artifacts
- Model Registry
- Versioning & Lineage
- Best Practices
## What are Artifacts
Artifacts are versioned datasets, models, or files tracked with lineage.
**Key Features:**
- Automatic versioning (v0, v1, v2...)
- Lineage tracking (which runs produced/used artifacts)
- Efficient storage (deduplication)
- Collaboration (team-wide access)
- Aliases (latest, best, production)
**Common Use Cases:**
- Dataset versioning
- Model checkpoints
- Preprocessed data
- Evaluation results
- Configuration files
## Creating Artifacts
### Basic Dataset Artifact
python
import wandb
run = wandb.init(project="my-project")
# Create artifact
dataset = wandb.Artifact(
name='training-data',
type='dataset',
description='ImageNet training split with augmentations',
metadata={
'size': '1.2M images',
'format': 'JPEG',
'resolution': '224x224'
}
)
# Add files
dataset.add_file('data/train.csv') # Single file
dataset.add_dir('data/images') # Entire directory
dataset.add_reference('s3://bucket/data') # Cloud reference
# Log artifact
run.log_artifact(dataset)
wandb.finish()
### Model Artifact
python
import torch
import wandb
run = wandb.init(project="my-project")
# Train model
model = train_model()
# Save model
torch.save(model.state_dict(), 'model.pth')
# Create model artifact
model_artifact = wandb.Artifact(
name='resnet50-classifier',
type='model',
description='ResNet50 trained on ImageNet',
metadata={
'architecture': 'ResNet50',
'accuracy': 0.95,
'loss': 0.15,
'epochs': 50,
'framework': 'PyTorch'
}
)
# Add model file
model_artifact.add_file('model.pth')
# Add config
model_artifact.add_file('config.yaml')
# Log with aliases
run.log_artifact(model_artifact, aliases=['latest', 'best'])
wandb.finish()
### Preprocessed Data Artifact
python
import pandas as pd
import wandb
run = wandb.init(project="nlp-project")
# Preprocess data
df = pd.read_csv('raw_data.csv')
df_processed = preprocess(df)
df_processed.to_csv('processed_data.csv', index=False)
# Create artifact
processed_data = wandb.Artifact(
name='processed-text-data',
type='dataset',
metadata={
'rows': len(df_processed),
'columns': list(df_processed.columns),