AI Skill Report Card
Optimizing Web ML Inference
Optimizing Web ML Inference with LiteRT.js
Quick Start15 / 15
JavaScript// Install and setup npm install @litertjs/core // Basic model inference import {loadLiteRt, loadAndCompile, Tensor} from '@litertjs/core'; await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/'); const model = await loadAndCompile('/path/to/model.tflite', { accelerator: 'webgpu' // or 'wasm' for CPU }); // Run inference const inputData = new Float32Array(224 * 224 * 3).fill(0); const inputTensor = new Tensor(inputData, [1, 3, 224, 224]); const outputs = await model.run(inputTensor); // Clean up inputTensor.delete(); const result = await outputs[0].data(); outputs[0].delete();
Recommendation▾
Add performance benchmarks or timing comparisons between WebGPU and WASM to quantify the benefits
Workflow15 / 15
Progress:
- Install LiteRT.js dependencies
- Convert existing model to .tflite format
- Set up LiteRT.js runtime with appropriate accelerator
- Load and compile the model
- Create input tensors with correct shape
- Run inference and handle outputs
- Clean up tensor resources
- Test performance vs existing solution
1. Installation and Setup
Bashnpm install @litertjs/core # For TensorFlow.js integration: npm install @litertjs/tfjs-interop
Copy wasm files from node_modules/@litertjs/core/wasm/ to your server.
2. Model Conversion (PyTorch)
Pythonimport litert_torch import torchvision # Load your model model = torchvision.models.resnet18(pretrained=True) sample_inputs = (torch.randn(1, 3, 224, 224),) # Convert to LiteRT edge_model = litert_torch.convert(model.eval(), sample_inputs) edge_model.export('model.tflite')
3. Runtime Configuration
JavaScriptimport {loadLiteRt} from '@litertjs/core'; // Load runtime with CDN or local hosting await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/'); // or locally: await loadLiteRt('your/path/to/wasm/');
4. Model Loading and Inference
JavaScriptconst model = await loadAndCompile('/model.tflite', { accelerator: 'webgpu' // Best performance on Chromium browsers }); // Multiple input formats supported: const outputs1 = await model.run(inputTensor); const outputs2 = await model.run([inputTensor1, inputTensor2]); const outputs3 = await model.run({'input_name': inputTensor});
Recommendation▾
Include a troubleshooting section with common error messages and their solutions
Examples18 / 20
Example 1: Image Classification
JavaScript// Input: 224x224 RGB image const imageData = new Float32Array(224 * 224 * 3); // ... fill with normalized pixel values const inputTensor = new Tensor(imageData, [1, 3, 224, 224]); const outputs = await model.run(inputTensor); const predictions = await outputs[0].data(); // Output: Float32Array of class probabilities
Example 2: TensorFlow.js Pipeline Integration
JavaScriptimport {runWithTfjsTensors} from '@litertjs/tfjs-interop'; import * as tf from '@tensorflow/tfjs'; // Use existing TensorFlow.js tensors const tfInput = tf.randomNormal([1, 224, 224, 3]); const tfOutputs = await runWithTfjsTensors(liteRtModel, [tfInput]); // Returns TensorFlow.js tensors compatible with existing pipeline
Example 3: Multiple Accelerator Fallback
JavaScriptlet accelerator = 'webgpu'; try { const model = await loadAndCompile('/model.tflite', {accelerator}); } catch (e) { console.log('WebGPU not available, falling back to WASM'); accelerator = 'wasm'; const model = await loadAndCompile('/model.tflite', {accelerator}); }
Recommendation▾
Provide a complete end-to-end example with actual model files and expected outputs for easier testing
Best Practices
Performance Optimization:
- Use WebGPU accelerator for best performance on Chromium browsers
- Host WASM files locally rather than CDN for production
- Pre-warm models with dummy inputs for consistent latency
- Batch multiple inputs when possible
Resource Management:
- Always call
tensor.delete()after use to prevent memory leaks - Clean up model outputs immediately after extracting data
- Use
model.getInputDetails()andmodel.getOutputDetails()for debugging
TensorFlow.js Integration:
- Keep pre/post-processing in TensorFlow.js, only replace model inference
- Use
runWithTfjsTensorsfor seamless tensor compatibility - Test input/output tensor shapes after conversion - they may change
Model Conversion:
- Provide representative sample inputs during PyTorch conversion
- Test converted model outputs match original before deployment
- Use debugging tools included with LiteRT.js conversion path
Common Pitfalls
Tensor Shape Mismatches:
- Input dimensions may be reordered during conversion (NCHW vs NHWC)
- Use
model.getInputDetails()to verify expected input shape - Transpose tensors if layout changed:
tf.transpose(input, [0, 3, 1, 2])
Memory Leaks:
- Forgetting to call
delete()on tensors causes memory accumulation - Not cleaning up output tensors after extracting data
- Creating tensors in loops without proper cleanup
Accelerator Issues:
- WebGPU not available on all browsers - always have WASM fallback
- Not importing TensorFlow.js WebGPU backend when using interop
- Assuming WebGPU performance gains apply to all model types equally
Integration Problems:
- Named inputs may have different names after conversion
- Model input/output order might change during conversion
- Mixing LiteRT.js Tensors with TensorFlow.js operations without interop layer