AI Skill Report Card
Building ML Systems
Quick Start15 / 15
Python# End-to-end ML pipeline template import torch import torch.nn as nn from transformers import AutoTokenizer, AutoModel import mlflow import wandb class ProductionModel(nn.Module): def __init__(self, base_model, num_classes): super().__init__() self.base = AutoModel.from_pretrained(base_model) self.classifier = nn.Linear(self.base.config.hidden_size, num_classes) def forward(self, input_ids, attention_mask): outputs = self.base(input_ids=input_ids, attention_mask=attention_mask) return self.classifier(outputs.pooler_output) # Training with experiment tracking wandb.init(project="ml-system") model = ProductionModel("bert-base-uncased", num_classes=3) optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
Recommendation▾
Replace abstract workflow checklist with more specific, actionable steps that include actual commands or code snippets
Workflow13 / 15
Progress:
- Problem definition and data analysis
- Architecture selection and baseline implementation
- Experiment setup with tracking (W&B/MLflow)
- Training with distributed setup if needed
- Model optimization (quantization/pruning)
- Validation and bias testing
- Deployment pipeline setup
- Monitoring and maintenance
1. Architecture Design
Python# Modern transformer architecture class CustomTransformer(nn.Module): def __init__(self, vocab_size, d_model=512, nhead=8, num_layers=6): super().__init__() self.embedding = nn.Embedding(vocab_size, d_model) self.pos_encoding = PositionalEncoding(d_model) encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, batch_first=True) self.transformer = nn.TransformerEncoder(encoder_layer, num_layers) def forward(self, x, mask=None): x = self.embedding(x) + self.pos_encoding(x) return self.transformer(x, src_key_padding_mask=mask)
2. Distributed Training Setup
Python# Multi-GPU training with DDP import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP def setup_distributed(): dist.init_process_group("nccl") torch.cuda.set_device(int(os.environ["LOCAL_RANK"])) model = DDP(model.cuda(), device_ids=[local_rank])
3. Model Optimization
Python# Quantization for deployment import torch.quantization as quant # Post-training quantization model_quantized = torch.quantization.quantize_dynamic( model, {nn.Linear}, dtype=torch.qint8 ) # Knowledge distillation def distillation_loss(student_logits, teacher_logits, labels, temperature=3.0, alpha=0.5): soft_loss = nn.KLDivLoss(reduction='batchmean')( F.log_softmax(student_logits/temperature, dim=1), F.softmax(teacher_logits/temperature, dim=1) ) * (temperature ** 2) hard_loss = nn.CrossEntropyLoss()(student_logits, labels) return alpha * soft_loss + (1 - alpha) * hard_loss
Recommendation▾
Reduce redundant explanations and focus on unique ML system challenges rather than general ML concepts
Examples18 / 20
Example 1: LLM Fine-tuning Input: Custom domain text classification task Output:
Pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium") model = AutoModelForSequenceClassification.from_pretrained( "microsoft/DialoGPT-medium", num_labels=3 ) training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, gradient_accumulation_steps=2, warmup_steps=500, weight_decay=0.01, logging_dir="./logs", dataloader_num_workers=4, fp16=True, ) trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, )
Example 2: MLOps Pipeline Input: Production deployment requirements Output:
YAML# kubeflow_pipeline.yaml apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: name: ml-pipeline spec: templates: - name: train-model container: image: ml-training:latest command: [python, train.py] resources: limits: nvidia.com/gpu: 4 - name: deploy-model container: image: ml-serving:latest command: [python, deploy.py]
Recommendation▾
Add more concrete input/output examples showing different ML system scenarios (computer vision, time series, etc.)
Best Practices
Experiment Management
- Log hyperparameters, metrics, and artifacts systematically
- Use reproducible seeds and version control for data
- Implement early stopping and checkpointing
- Track computational costs and carbon footprint
Model Development
Python# Proper validation setup from sklearn.model_selection import StratifiedKFold def robust_validation(model, X, y, cv=5): skf = StratifiedKFold(n_splits=cv, shuffle=True, random_state=42) scores = [] for train_idx, val_idx in skf.split(X, y): # Train and validate score = evaluate_fold(model, X[train_idx], y[train_idx], X[val_idx], y[val_idx]) scores.append(score) return np.mean(scores), np.std(scores)
Responsible AI
Python# Bias detection from fairlearn.metrics import demographic_parity_difference from lime import lime_tabular def check_fairness(model, X_test, y_test, sensitive_features): y_pred = model.predict(X_test) bias_score = demographic_parity_difference( y_test, y_pred, sensitive_features=sensitive_features ) return bias_score # Model explainability explainer = lime_tabular.LimeTabularExplainer(X_train) explanation = explainer.explain_instance(X_test[0], model.predict_proba)
Performance Optimization
- Use mixed precision training (FP16)
- Implement gradient checkpointing for memory efficiency
- Profile GPU utilization and optimize data loading
- Consider model parallelism for large models
Common Pitfalls
- Data leakage: Ensure proper train/validation/test splits, especially with time series
- Overfitting to validation: Use nested cross-validation for hyperparameter tuning
- Ignoring class imbalance: Use stratified sampling and appropriate metrics
- Poor scaling: Don't forget to normalize/standardize features
- Memory issues: Use gradient accumulation instead of large batch sizes
- Reproducibility: Always set seeds for random operations
- Deployment mismatch: Ensure training and serving environments are consistent
- Monitoring gaps: Track data drift and model performance degradation in production