AI Skill Report Card

Converting T SQL to Spark SQL

A-82·May 7, 2026·Source: Web

Converting T-SQL to Spark SQL

15 / 15
SQL
-- T-SQL Input SELECT TOP 100 c.CustomerID, c.CompanyName, ISNULL(o.OrderCount, 0) as OrderCount FROM Customers c WITH (NOLOCK) LEFT JOIN ( SELECT CustomerID, COUNT(*) as OrderCount FROM Orders WHERE OrderDate >= DATEADD(month, -6, GETDATE()) GROUP BY CustomerID ) o ON c.CustomerID = o.CustomerID ORDER BY o.OrderCount DESC -- Spark SQL Output SELECT c.CustomerID, c.CompanyName, COALESCE(o.OrderCount, 0) as OrderCount FROM customers c LEFT JOIN ( SELECT CustomerID, COUNT(*) as OrderCount FROM orders WHERE OrderDate >= ADD_MONTHS(CURRENT_DATE(), -6) GROUP BY CustomerID ) o ON c.CustomerID = o.CustomerID ORDER BY o.OrderCount DESC LIMIT 100
Recommendation
Add handling for complex scenarios like recursive CTEs, MERGE statements, or stored procedure conversions
14 / 15

Progress:

  • Replace T-SQL specific syntax (TOP, NOLOCK, etc.)
  • Convert date functions and data types
  • Handle window functions and CTEs
  • Replace proprietary functions with Spark equivalents
  • Optimize for Spark execution patterns
  • Test and validate results

Step-by-step Process:

  1. Remove SQL Server hints: Strip WITH (NOLOCK), WITH (INDEX=...)
  2. Replace LIMIT syntax: TOP nLIMIT n (move to end)
  3. Convert date functions: GETDATE()CURRENT_TIMESTAMP(), DATEADDDATE_ADD
  4. Update NULL handling: ISNULL()COALESCE() or IFNULL()
  5. Convert string functions: LEN()LENGTH(), CHARINDEX()LOCATE()
  6. Handle data types: VARCHAR(MAX)STRING, DATETIMETIMESTAMP
Recommendation
Include performance optimization patterns specific to Spark (broadcast joins, partitioning considerations)
15 / 20

Example 1: Window Functions Input:

SQL
SELECT CustomerID, OrderAmount, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) as rn FROM Orders

Output:

SQL
SELECT CustomerID, OrderAmount, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) as rn FROM orders

Example 2: Date Operations Input:

SQL
WHERE OrderDate >= DATEADD(day, -30, GETDATE()) AND YEAR(OrderDate) = 2024

Output:

SQL
WHERE OrderDate >= DATE_SUB(CURRENT_DATE(), 30) AND YEAR(OrderDate) = 2024

Example 3: String Operations Input:

SQL
SELECT SUBSTRING(ProductName, 1, 10) as ShortName, LEN(RTRIM(LTRIM(Description))) as CleanLength

Output:

SQL
SELECT SUBSTRING(ProductName, 1, 10) as ShortName, LENGTH(TRIM(Description)) as CleanLength
Recommendation
Add a troubleshooting section for common conversion errors and validation techniques
  • Use lowercase table names (Spark is case-sensitive by default)
  • Prefer COALESCE() over IFNULL() for multiple null checks
  • Replace CASE WHEN x IS NULL patterns with COALESCE()
  • Use DATE_ADD()/DATE_SUB() instead of DATEADD()
  • Convert STUFF() to CONCAT() + SUBSTRING() combinations
  • Replace CROSS APPLY with LATERAL VIEW when working with arrays
  • Don't use TOP - use LIMIT at query end
  • Don't use table hints like WITH (NOLOCK) - they're ignored
  • Don't assume case insensitivity - wrap identifiers in backticks if needed
  • Don't use ISNULL() with more than 2 arguments - use COALESCE()
  • Don't use GETDATE() - use CURRENT_TIMESTAMP() or CURRENT_DATE()
  • Don't use square brackets [table] - use backticks or remove
  • Don't use VARCHAR(MAX) - use STRING type
  • Don't use @@ROWCOUNT - use ROW_NUMBER() window function instead
0
Grade A-AI Skill Framework
Scorecard
Criteria Breakdown
Quick Start
15/15
Workflow
14/15
Examples
15/20
Completeness
10/20
Format
15/15
Conciseness
13/15