AI Skill Report Card
Generated Skill
YAML--- name: extracting-itau-tax-data description: Extracts data from Itaú income statements to generate structured tables with Type, Code, Company Name, CNPJ and Value columns. Use when processing Brazilian tax documents or Itaú bank statements. ---
Extracting Itaú Tax Data
Quick Start
Pythonimport pandas as pd import re def extract_itau_data(text): data = [] # Section 1.1 mapping if match := re.search(r'1\.1.*?Valor Líquido Total.*?(\d+[\.,]\d+)', text, re.DOTALL): data.append({ 'Tipo': 'Rendimentos Sujeitos à Tributação Exclusiva/Definitiva', 'Código': '10', 'Razão Social': extract_company_name(text), 'CNPJ': extract_cnpj(text), 'Valor': clean_value(match.group(1)) }) return pd.DataFrame(data)
Recommendation▾
Consider adding more specific examples
Workflow
Progress:
- Extract text from PDF/document
- Identify sections 1.1, 1.2, and 3
- Map each section to corresponding Type and Code
- Extract company details (Razão Social, CNPJ)
- Clean and format values
- Generate final table
Step 1: Section Identification
Look for these specific patterns:
1.1 - Creditados e Pagos→ Type: 'Rendimentos Sujeitos à Tributação Exclusiva/Definitiva', Code: '10'1.2 - Juros sobre capital próprio a receber→ Type: 'Bens e Direitos', Code: '99-07'3 - Rendimentos isentos e não tributáveis→ Type: 'Rendimentos Isentos e Não Tributáveis', Code: '09'
Step 2: Value Extraction
- Section 1.1: Extract "Valor Líquido Total"
- Section 1.2: Extract "Valor Líquido"
- Section 3: Extract "Valor Total"
Step 3: Company Data
Extract from document header:
- Razão Social: Usually after "Banco Itaú" or institution identifier
- CNPJ: Format XX.XXX.XXX/XXXX-XX
Recommendation▾
Include edge cases
Examples
Example 1: Input: Document with section "1.1 - Creditados e Pagos" showing "Valor Líquido Total: R$ 5.234,67" Output:
| Tipo | Código | Razão Social | CNPJ | Valor |
|------|--------|--------------|------|-------|
| Rendimentos Sujeitos à Tributação Exclusiva/Definitiva | 10 | Banco Itaú Unibanco S.A. | 60.701.190/0001-04 | 5234.67 |
Example 2: Input: Section "3 - Rendimentos isentos e não tributáveis" with "Valor Total: R$ 1.500,00" Output:
| Tipo | Código | Razão Social | CNPJ | Valor |
|------|--------|--------------|------|-------|
| Rendimentos Isentos e Não Tributáveis | 09 | Banco Itaú Unibanco S.A. | 60.701.190/0001-04 | 1500.00 |
Best Practices
- Always validate CNPJ format before processing
- Convert comma decimal separators to dots for numeric operations
- Handle missing sections gracefully (empty rows or skip)
- Preserve original currency formatting in display
- Use regex with DOTALL flag for multi-line section matching
Common Pitfalls
- Don't confuse "Valor Bruto" with "Valor Líquido" - always use the specified value type
- Don't assume all three sections exist in every document
- Don't hardcode company names - extract from document
- Don't ignore decimal formatting differences (comma vs dot)
- Don't process documents without proper section headers