AI Skill Report Card

Web Scraping

B-72·Jun 9, 2026·Source: Extension-page
15 / 15
Python
from bs4 import BeautifulSoup import requests # Fetch and parse a web page url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') # Extract specific elements title = soup.find('title').text links = [a['href'] for a in soup.find_all('a', href=True)]
Recommendation
Add concrete input/output examples showing actual HTML snippets and extracted results rather than just code patterns
12 / 15
  1. Send HTTP Request - Fetch the web page content
  2. Parse HTML - Use BeautifulSoup to create a navigable tree
  3. Extract Data - Target specific elements using selectors
  4. Clean Data - Remove unwanted characters and normalize text
  5. Store Results - Save to file or database

Progress:

  • Identify target elements
  • Write CSS selectors
  • Test extraction logic
  • Handle edge cases
  • Validate output
Recommendation
Include error handling templates and common HTTP status code scenarios in the workflow
15 / 20

Example 1: Extract All Links Input: HTML page with multiple anchor tags Output:

Python
links = soup.find_all('a') urls = [link.get('href') for link in links if link.get('href')]

Example 2: Extract Text Content Input: <div class="content">Hello World</div> Output:

Python
content = soup.find('div', class_='content').text.strip() # Result: "Hello World"

Example 3: Extract Table Data Input: HTML table Output:

Python
table = soup.find('table') rows = [[cell.text.strip() for cell in row.find_all(['td', 'th'])] for row in table.find_all('tr')]
Recommendation
Provide a complete working example that demonstrates the full pipeline from URL to cleaned data output
  • Respect robots.txt - Check site's crawling policies
  • Add delays - Use time.sleep() between requests
  • Handle errors - Wrap requests in try-catch blocks
  • Use headers - Set User-Agent to avoid blocking
  • Parse incrementally - Process large pages in chunks
  • Cache responses - Store HTML locally to avoid re-fetching
  • Don't scrape without checking Terms of Service
  • Don't ignore rate limiting - sites may block aggressive scrapers
  • Don't assume elements exist - always check with .find() before accessing
  • Don't ignore encoding issues - specify encoding when parsing
  • Don't scrape dynamic content without JavaScript rendering (use Selenium instead)
0
Grade B-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
12/15
Examples
15/20
Completeness
15/20
Format
15/15
Conciseness
12/15