AI Skill Report Card

Generated Skill

B-70·Apr 19, 2026·Source: Web
YAML
--- name: extracting-itau-tax-data description: Extracts data from Itaú income statements to generate structured tables with Type, Code, Company Name, CNPJ and Value columns. Use when processing Brazilian tax documents or Itaú bank statements. ---

Extracting Itaú Tax Data

Python
import pandas as pd import re def extract_itau_data(text): data = [] # Section 1.1 mapping if match := re.search(r'1\.1.*?Valor Líquido Total.*?(\d+[\.,]\d+)', text, re.DOTALL): data.append({ 'Tipo': 'Rendimentos Sujeitos à Tributação Exclusiva/Definitiva', 'Código': '10', 'Razão Social': extract_company_name(text), 'CNPJ': extract_cnpj(text), 'Valor': clean_value(match.group(1)) }) return pd.DataFrame(data)
Recommendation
Consider adding more specific examples

Progress:

  • Extract text from PDF/document
  • Identify sections 1.1, 1.2, and 3
  • Map each section to corresponding Type and Code
  • Extract company details (Razão Social, CNPJ)
  • Clean and format values
  • Generate final table

Step 1: Section Identification

Look for these specific patterns:

  • 1.1 - Creditados e Pagos → Type: 'Rendimentos Sujeitos à Tributação Exclusiva/Definitiva', Code: '10'
  • 1.2 - Juros sobre capital próprio a receber → Type: 'Bens e Direitos', Code: '99-07'
  • 3 - Rendimentos isentos e não tributáveis → Type: 'Rendimentos Isentos e Não Tributáveis', Code: '09'

Step 2: Value Extraction

  • Section 1.1: Extract "Valor Líquido Total"
  • Section 1.2: Extract "Valor Líquido"
  • Section 3: Extract "Valor Total"

Step 3: Company Data

Extract from document header:

  • Razão Social: Usually after "Banco Itaú" or institution identifier
  • CNPJ: Format XX.XXX.XXX/XXXX-XX
Recommendation
Include edge cases

Example 1: Input: Document with section "1.1 - Creditados e Pagos" showing "Valor Líquido Total: R$ 5.234,67" Output:

| Tipo | Código | Razão Social | CNPJ | Valor |
|------|--------|--------------|------|-------|
| Rendimentos Sujeitos à Tributação Exclusiva/Definitiva | 10 | Banco Itaú Unibanco S.A. | 60.701.190/0001-04 | 5234.67 |

Example 2: Input: Section "3 - Rendimentos isentos e não tributáveis" with "Valor Total: R$ 1.500,00" Output:

| Tipo | Código | Razão Social | CNPJ | Valor |
|------|--------|--------------|------|-------|
| Rendimentos Isentos e Não Tributáveis | 09 | Banco Itaú Unibanco S.A. | 60.701.190/0001-04 | 1500.00 |
  • Always validate CNPJ format before processing
  • Convert comma decimal separators to dots for numeric operations
  • Handle missing sections gracefully (empty rows or skip)
  • Preserve original currency formatting in display
  • Use regex with DOTALL flag for multi-line section matching
  • Don't confuse "Valor Bruto" with "Valor Líquido" - always use the specified value type
  • Don't assume all three sections exist in every document
  • Don't hardcode company names - extract from document
  • Don't ignore decimal formatting differences (comma vs dot)
  • Don't process documents without proper section headers
0
Grade B-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
11/15
Workflow
11/15
Examples
15/20
Completeness
15/20
Format
11/15
Conciseness
11/15