I needed to get my head around dynamically searching all fields in a given set of database tables for PCI Cardholder Data. I had previously come up with a rudimentary query to search for credit card numbers:
http://learningpcs.blogspot.com/2009/03/credit-cards.htmlThis time, however, I wanted to try and handle it all via SQL. For starters, I needed to determine which fields I wanted to work with. My initial query (from the link above) was:
SELECT SO.name, SC.name
FROM sysobjects AS SO
JOIN syscolumns AS SC
ON SO.ID = SC.ID
WHERE (SO.xtype = 'U') AND (SC.name LIKE '%_SUFFIX')
I needed something more powerful. I started with this link:http://codesnippets.joyent.com/posts/show/337The basic query provided here was:
SELECT table_name=sysobjects.name,
column_name=syscolumns.name,
datatype=systypes.name,
length=syscolumns.length
FROM sysobjects
JOIN syscolumns ON sysobjects.id = syscolumns.id
JOIN systypes ON syscolumns.xtype=systypes.xtype
WHERE sysobjects.xtype='U'
ORDER BY sysobjects.name,syscolumns.colid
Taking this to the next level on my system I looked as some specific fields and focused uniquely on field lengths at least 12 characters long since the shortest possible credit card number is 12 characters:USE MyDatabase
GO
SELECT [Table Name] = sysobjects.name
,[Column Name] = syscolumns.name
FROM sysobjects
JOIN syscolumns ON sysobjects.id = syscolumns.id
JOIN systypes ON syscolumns.xtype = systypes.xtype
WHERE sysobjects.xtype='U' -- retrieve tables only
AND syscolumns.name like 'Prefix_[a-z]%' -- retrieve only data columns
AND sysobjects.name like 'NRS[0-9]%' -- retrieve only NRS tables
AND substring(sysobjects.name,2,4) IN (SELECT FolderName FROM MyDatabase.dbo.Folders WHERE FolderName <> 'Secondary')
AND syscolumns.length > 11 -- exclude anything possibly shorter than shortest CC Number
ORDER BY sysobjects.name, syscolumns.colid
I am working some more to try and use this as a the basis of a query that fills a temp table so I can then run a set-based query with a while loop to actually search each of the fields I cull with this query and store hits in a second temp table for review later. The query I originally used in my old post works fine but it has a lot of assumptions. I wanted to refine it some more with tweaks on:- using a function to test each field that is a potential hit
- examine the hits with Luhn validity (assuming users entered the value correctly); see this SQL Team post
- use the Issuer Identification Number (IIN) with a case
- assume the use of multiple potential delimiters
Combining all of these to formulate a secondary level of testing, above and beyond the simple formula outlined in the old link, adds several levels of complication. For instance, we still use a lot of SQL Server 2000 engines, so, CLR integration is not much of an option. Hence, regular expression will be more difficult. Second, I don't want to totally tank the server's memory running an obnoxiously large query. I will probably hit the TSQL forums over at MSDN/Technet to get some assistance as much of the advanced functionality here (Luhn and regular expressions) will require folks with some more insight into the real guts of SQL Server than I have at this point. I'll put more posts down the road as I get a little more work done on this hypothetical function. A well-developed function to identify credit card numbers is something I suspect lots of folk would benefit from having worked out.
0 komentar:
Posting Komentar