Given a single table:
tblAddresses,
with records Counted at 800,000 and very
clean,
and with these column definitions,
inAddress_PSK INTEGER
IDENTITY PK (1,1) (No Nulls, No Doops, PK)
vchAddressLineA VARCHAR(30)
(Not Null Doops OK)
vchAddressLineB VARCHAR(20)
(Nulls OK, Doops OK)
vchCityName
VARCHAR(25) (No Nulls, Doops OK)
vchStateAbbr
VARCHAR(2) (No Nulls, Doops
OK)
vchZipCode VARCHAR(10)
(Nulls OK, Doops OK)
I wrote the following T-SQL on SQL 2000
SP4
SET ANSI_NULLS
OFF
GO
SELECT DISTINCT
vchAddressLineA vchTAddressLineA,
vchAddressLineB vchTAddressLineB,
vchCityName vchTCityName,
vchStateAbbr vchTStateAbbr
INTO #tmpNoDoops
FROM tblAddresses
GO
I get 480,000 distinct addresses, all well and
good. Upon inspection, they look really good.
BUT, why does
the following query only return 16,520 records???
SET ANSI_NULLS OFF
GO
SELECT * FROM tblAddresses JOIN #tmpNoDoops
ON
vchAddressLineA = vchTAddressLineA
AND vchAddressLineB =
vchAddressLineB
AND vchCityName = vchTCityName
AND vchStateAbbr =
vchTStateAbbr
IT SHOULD GIVE ME EXACTLY 480,000
Records.
I think it has to do with the NULLS in the
second address Line, but doesn't ANSI Nulls OFF give me effective INNER
JOINS?
If I put a WHERE vchAddressLineB IS NOT NULL, I get an exact match, but then I
have to do two queries to tie back the records in Temp to the records in
Source. NOT DESIRABLE.
Anyone?
John
Smith