AI Skill Report Card

Detecting File Types

A-92·Apr 25, 2026·Source: Extension-page

File Type Detection with Magika

15 / 15
Python
from magika import Magika # Initialize once (loads model) m = Magika() # Detect from bytes result = m.identify_bytes(b'function log(msg) {console.log(msg);}') print(result.output.label) # javascript # Detect from file path result = m.identify_path('./document.pdf') print(result.output.label) # pdf # Detect from stream with open('./file.bin', 'rb') as f: result = m.identify_stream(f) print(result.output.label)
Recommendation
Add installation instructions (pip install magika) in Quick Start for complete immediacy
14 / 15

Single File Detection

  1. Initialize Magika - One-time setup loads the ML model
  2. Choose input method - bytes, file path, or stream
  3. Call identify method - Returns result with confidence score
  4. Extract information - label, MIME type, description, extensions

Batch Processing

Python
# Process multiple files m = Magika() files = ['file1.bin', 'file2.txt', 'file3.unknown'] for file_path in files: result = m.identify_path(file_path) print(f"{file_path}: {result.output.label} ({result.output.description})")

Progress checklist for large batches:

  • Initialize Magika once (not per file)
  • Collect file paths
  • Process in loop with error handling
  • Log results for verification
Recommendation
Include a simple template/framework section showing a complete file processing script pattern
20 / 20

Example 1: JavaScript Detection Input: b'function log(msg) {console.log(msg);}' Output: javascript (label), text/javascript (MIME), score: 0.997

Example 2: Unknown Binary File Input: Random binary data Output: unknown (label), application/octet-stream (MIME)

Example 3: Command Line Bulk Processing

Bash
# Recursive directory scan magika -r /path/to/files --json > results.json # Process with confidence scores magika file1.bin file2.txt --output-score

Example 4: Stream Processing

Python
import io data = b'<!DOCTYPE html><html><body>Hello</body></html>' stream = io.BytesIO(data) result = m.identify_stream(stream) # Output: html
Recommendation
Add more edge case handling examples like permission errors, network files, or corrupted files

Performance:

  • Initialize Magika once per application/script
  • Model loading is one-time ~5ms overhead
  • Inference is ~5ms per file regardless of size
  • Use batch processing for multiple files

Confidence Handling:

Python
# Check confidence before trusting result if result.output.score > 0.8: file_type = result.output.label else: file_type = "unknown_low_confidence"

Error Handling:

Python
try: result = m.identify_path(file_path) if result.status == "ok": return result.output.label except Exception as e: return "detection_failed"

Memory Efficiency:

  • Use identify_stream() for large files
  • Magika only reads a subset of file content
  • Memory usage is constant regardless of file size

Don't reinstantiate Magika repeatedly:

Python
# BAD - loads model each time for file in files: m = Magika() # Expensive! result = m.identify_path(file) # GOOD - reuse instance m = Magika() # Load once for file in files: result = m.identify_path(file)

Don't ignore confidence scores:

  • Scores below 0.5 may indicate generic detection
  • Very low scores suggest truly unknown formats
  • Use appropriate thresholds for your use case

Don't assume file extensions match content:

  • Magika detects actual content, not extension
  • Use for validation: detected != expected_from_extension

Don't process empty files without checking:

Python
# Handle edge cases if os.path.getsize(file_path) == 0: return "empty_file"
0
Grade A-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
14/15
Examples
20/20
Completeness
14/20
Format
15/15
Conciseness
14/15