Common-table expressions are a very useful new feature in SQL Server 2005. You can use them for recursive queries,
removing duplicates, and even simple looping procedures.
With some crafty TSQL, this is a relatively easy task to do when a primary key defined on the table. Luckily, the
new CTE feature in SQL Server 2005 makes it very easy to remove these duplicates, with or without a primary key.
The script below defines my CTE. I am using a windowing function named DENSE_RANK to group the records together
based on the Product, SaleDate, and SalePrice fields, and assign them a sequential value randomly. This means that
if I have two records with the exact same Product, SaleDate, and SalePrice values, the first record will be ranked
as 1, the second as 2, and so on.
WITH SalesCTE(Product, SaleDate, SalePrice, Ranking)
AS
(
SELECT Product, SaleDate, SalePrice,Ranking = DENSE_RANK() OVER(PARTITION BY Product, SaleDate, SalePrice ORDER BY
NEWID() ASC)
FROM SalesHistory
)
DELETE FROM SalesCTE WHERE Ranking > 1
Because a CTE acts as a virtual table, I am able to process data modification statements against it, and the
underlying table will be affected. In this case, I am removing any record from the SalesCTE that is ranked higher
than 1. This will remove all of my duplicate records.
To verify my duplicates have been removed, I can review the data in the table, which should now contain 8 records,
rather than the previous 10.
SELECT *FROM SalesHistory
No comments:
Post a Comment