AI Skill Report Card
Converting T SQL to Spark SQL
Converting T-SQL to Spark SQL
Quick Start15 / 15
SQL-- T-SQL Input SELECT TOP 100 c.CustomerID, c.CompanyName, ISNULL(o.OrderCount, 0) as OrderCount FROM Customers c WITH (NOLOCK) LEFT JOIN ( SELECT CustomerID, COUNT(*) as OrderCount FROM Orders WHERE OrderDate >= DATEADD(month, -6, GETDATE()) GROUP BY CustomerID ) o ON c.CustomerID = o.CustomerID ORDER BY o.OrderCount DESC -- Spark SQL Output SELECT c.CustomerID, c.CompanyName, COALESCE(o.OrderCount, 0) as OrderCount FROM customers c LEFT JOIN ( SELECT CustomerID, COUNT(*) as OrderCount FROM orders WHERE OrderDate >= ADD_MONTHS(CURRENT_DATE(), -6) GROUP BY CustomerID ) o ON c.CustomerID = o.CustomerID ORDER BY o.OrderCount DESC LIMIT 100
Recommendation▾
Add handling for complex scenarios like recursive CTEs, MERGE statements, or stored procedure conversions
Workflow14 / 15
Progress:
- Replace T-SQL specific syntax (TOP, NOLOCK, etc.)
- Convert date functions and data types
- Handle window functions and CTEs
- Replace proprietary functions with Spark equivalents
- Optimize for Spark execution patterns
- Test and validate results
Step-by-step Process:
- Remove SQL Server hints: Strip
WITH (NOLOCK),WITH (INDEX=...) - Replace LIMIT syntax:
TOP n→LIMIT n(move to end) - Convert date functions:
GETDATE()→CURRENT_TIMESTAMP(),DATEADD→DATE_ADD - Update NULL handling:
ISNULL()→COALESCE()orIFNULL() - Convert string functions:
LEN()→LENGTH(),CHARINDEX()→LOCATE() - Handle data types:
VARCHAR(MAX)→STRING,DATETIME→TIMESTAMP
Recommendation▾
Include performance optimization patterns specific to Spark (broadcast joins, partitioning considerations)
Examples15 / 20
Example 1: Window Functions Input:
SQLSELECT CustomerID, OrderAmount, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) as rn FROM Orders
Output:
SQLSELECT CustomerID, OrderAmount, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) as rn FROM orders
Example 2: Date Operations Input:
SQLWHERE OrderDate >= DATEADD(day, -30, GETDATE()) AND YEAR(OrderDate) = 2024
Output:
SQLWHERE OrderDate >= DATE_SUB(CURRENT_DATE(), 30) AND YEAR(OrderDate) = 2024
Example 3: String Operations Input:
SQLSELECT SUBSTRING(ProductName, 1, 10) as ShortName, LEN(RTRIM(LTRIM(Description))) as CleanLength
Output:
SQLSELECT SUBSTRING(ProductName, 1, 10) as ShortName, LENGTH(TRIM(Description)) as CleanLength
Recommendation▾
Add a troubleshooting section for common conversion errors and validation techniques
Best Practices
- Use lowercase table names (Spark is case-sensitive by default)
- Prefer
COALESCE()overIFNULL()for multiple null checks - Replace
CASE WHEN x IS NULLpatterns withCOALESCE() - Use
DATE_ADD()/DATE_SUB()instead ofDATEADD() - Convert
STUFF()toCONCAT()+SUBSTRING()combinations - Replace
CROSS APPLYwithLATERAL VIEWwhen working with arrays
Common Pitfalls
- Don't use
TOP- useLIMITat query end - Don't use table hints like
WITH (NOLOCK)- they're ignored - Don't assume case insensitivity - wrap identifiers in backticks if needed
- Don't use
ISNULL()with more than 2 arguments - useCOALESCE() - Don't use
GETDATE()- useCURRENT_TIMESTAMP()orCURRENT_DATE() - Don't use square brackets
[table]- use backticks or remove - Don't use
VARCHAR(MAX)- useSTRINGtype - Don't use
@@ROWCOUNT- useROW_NUMBER()window function instead