AI Skill Report Card

Building ML Systems

B+78·Mar 10, 2026·Source: Web
15 / 15
Python
# End-to-end ML pipeline template import torch import torch.nn as nn from transformers import AutoTokenizer, AutoModel import mlflow import wandb class ProductionModel(nn.Module): def __init__(self, base_model, num_classes): super().__init__() self.base = AutoModel.from_pretrained(base_model) self.classifier = nn.Linear(self.base.config.hidden_size, num_classes) def forward(self, input_ids, attention_mask): outputs = self.base(input_ids=input_ids, attention_mask=attention_mask) return self.classifier(outputs.pooler_output) # Training with experiment tracking wandb.init(project="ml-system") model = ProductionModel("bert-base-uncased", num_classes=3) optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
Recommendation
Replace abstract workflow checklist with more specific, actionable steps that include actual commands or code snippets
13 / 15

Progress:

  • Problem definition and data analysis
  • Architecture selection and baseline implementation
  • Experiment setup with tracking (W&B/MLflow)
  • Training with distributed setup if needed
  • Model optimization (quantization/pruning)
  • Validation and bias testing
  • Deployment pipeline setup
  • Monitoring and maintenance

1. Architecture Design

Python
# Modern transformer architecture class CustomTransformer(nn.Module): def __init__(self, vocab_size, d_model=512, nhead=8, num_layers=6): super().__init__() self.embedding = nn.Embedding(vocab_size, d_model) self.pos_encoding = PositionalEncoding(d_model) encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, batch_first=True) self.transformer = nn.TransformerEncoder(encoder_layer, num_layers) def forward(self, x, mask=None): x = self.embedding(x) + self.pos_encoding(x) return self.transformer(x, src_key_padding_mask=mask)

2. Distributed Training Setup

Python
# Multi-GPU training with DDP import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP def setup_distributed(): dist.init_process_group("nccl") torch.cuda.set_device(int(os.environ["LOCAL_RANK"])) model = DDP(model.cuda(), device_ids=[local_rank])

3. Model Optimization

Python
# Quantization for deployment import torch.quantization as quant # Post-training quantization model_quantized = torch.quantization.quantize_dynamic( model, {nn.Linear}, dtype=torch.qint8 ) # Knowledge distillation def distillation_loss(student_logits, teacher_logits, labels, temperature=3.0, alpha=0.5): soft_loss = nn.KLDivLoss(reduction='batchmean')( F.log_softmax(student_logits/temperature, dim=1), F.softmax(teacher_logits/temperature, dim=1) ) * (temperature ** 2) hard_loss = nn.CrossEntropyLoss()(student_logits, labels) return alpha * soft_loss + (1 - alpha) * hard_loss
Recommendation
Reduce redundant explanations and focus on unique ML system challenges rather than general ML concepts
18 / 20

Example 1: LLM Fine-tuning Input: Custom domain text classification task Output:

Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium") model = AutoModelForSequenceClassification.from_pretrained( "microsoft/DialoGPT-medium", num_labels=3 ) training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, gradient_accumulation_steps=2, warmup_steps=500, weight_decay=0.01, logging_dir="./logs", dataloader_num_workers=4, fp16=True, ) trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset, )

Example 2: MLOps Pipeline Input: Production deployment requirements Output:

YAML
# kubeflow_pipeline.yaml apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: name: ml-pipeline spec: templates: - name: train-model container: image: ml-training:latest command: [python, train.py] resources: limits: nvidia.com/gpu: 4 - name: deploy-model container: image: ml-serving:latest command: [python, deploy.py]
Recommendation
Add more concrete input/output examples showing different ML system scenarios (computer vision, time series, etc.)

Experiment Management

  • Log hyperparameters, metrics, and artifacts systematically
  • Use reproducible seeds and version control for data
  • Implement early stopping and checkpointing
  • Track computational costs and carbon footprint

Model Development

Python
# Proper validation setup from sklearn.model_selection import StratifiedKFold def robust_validation(model, X, y, cv=5): skf = StratifiedKFold(n_splits=cv, shuffle=True, random_state=42) scores = [] for train_idx, val_idx in skf.split(X, y): # Train and validate score = evaluate_fold(model, X[train_idx], y[train_idx], X[val_idx], y[val_idx]) scores.append(score) return np.mean(scores), np.std(scores)

Responsible AI

Python
# Bias detection from fairlearn.metrics import demographic_parity_difference from lime import lime_tabular def check_fairness(model, X_test, y_test, sensitive_features): y_pred = model.predict(X_test) bias_score = demographic_parity_difference( y_test, y_pred, sensitive_features=sensitive_features ) return bias_score # Model explainability explainer = lime_tabular.LimeTabularExplainer(X_train) explanation = explainer.explain_instance(X_test[0], model.predict_proba)

Performance Optimization

  • Use mixed precision training (FP16)
  • Implement gradient checkpointing for memory efficiency
  • Profile GPU utilization and optimize data loading
  • Consider model parallelism for large models
  • Data leakage: Ensure proper train/validation/test splits, especially with time series
  • Overfitting to validation: Use nested cross-validation for hyperparameter tuning
  • Ignoring class imbalance: Use stratified sampling and appropriate metrics
  • Poor scaling: Don't forget to normalize/standardize features
  • Memory issues: Use gradient accumulation instead of large batch sizes
  • Reproducibility: Always set seeds for random operations
  • Deployment mismatch: Ensure training and serving environments are consistent
  • Monitoring gaps: Track data drift and model performance degradation in production
0
Grade B+AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
13/15
Examples
18/20
Completeness
15/20
Format
15/15
Conciseness
12/15