AI Skill Report Card

Web Scraping Analysis

B+78·Jun 14, 2026·Source: Web

Web Scraping and Analysis

15 / 15
Python
import requests from bs4 import BeautifulSoup import pandas as pd # Basic web scraping url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') # Extract specific elements titles = soup.find_all('h2', class_='title') data = [title.get_text().strip() for title in titles] print(data)
Recommendation
Add error handling patterns and retry mechanisms in the workflow section
13 / 15

Progress:

  • Identify target website and data requirements
  • Choose scraping method (requests vs selenium)
  • Inspect HTML structure and identify selectors
  • Handle rate limiting and headers
  • Extract and clean data
  • Store results in structured format

Step-by-step Process:

  1. Setup and Headers

    Python
    headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' } session = requests.Session() session.headers.update(headers)
  2. Handle Dynamic Content (if needed)

    Python
    from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait driver = webdriver.Chrome() driver.get(url) WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.CLASS_NAME, "target-class")) )
  3. Extract and Structure Data

    Python
    results = [] for item in soup.find_all('div', class_='item'): data = { 'title': item.find('h3').get_text().strip(), 'price': item.find('span', class_='price').get_text(), 'url': item.find('a')['href'] } results.append(data) df = pd.DataFrame(results)
Recommendation
Include concrete input/output examples showing actual scraped data structure rather than just code snippets
15 / 20

Example 1: Input: Scrape product listings from e-commerce site Output:

Python
products = [] for product in soup.find_all('div', class_='product-item'): products.append({ 'name': product.find('h4').text.strip(), 'price': product.find('span', class_='price').text, 'rating': len(product.find_all('i', class_='star-filled')) })

Example 2: Input: Extract news headlines with timestamps Output:

Python
articles = [] for article in soup.select('article.news-item'): articles.append({ 'headline': article.select_one('h2').text.strip(), 'timestamp': article.select_one('time')['datetime'], 'summary': article.select_one('.summary').text.strip() })
Recommendation
Add templates for common scraping patterns (pagination, form submission, authentication) to improve completeness
  • Always check robots.txt and respect rate limits
  • Use sessions for multiple requests to same domain
  • Implement exponential backoff for failed requests
  • Cache responses when possible to avoid redundant requests
  • Use CSS selectors for precise element targeting
  • Handle encoding issues with proper charset detection
  • Store raw HTML for debugging complex parsing issues
  • Don't scrape too aggressively - implement delays between requests
  • Don't ignore HTTP status codes - handle 404s, 403s properly
  • Don't assume HTML structure is consistent across pages
  • Don't forget to close selenium drivers to avoid memory leaks
  • Don't hardcode selectors without fallback options
  • Don't ignore JavaScript-rendered content when present
  • Don't scrape without checking if an API exists first
0
Grade B+AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
13/15
Examples
15/20
Completeness
8/20
Format
15/15
Conciseness
12/15