Yes, another string splitting UDF from a guy who’s obvioiusly become obsessed with TSQL string splitting. This time we delve into a mysterious world that I call, “Tokenization.”
So what is Tokenization? It’s a word I made up for this problem.
But what is it, really? It’s splitting up a string based on a delimiter — in this case, a comma — but being wary of substring delimiters. In this case, that’s a pair of apostrophes, because that’s what TSQL uses for strings.
I think this is best illustrated with an example string:
DECLARE @Tokens VARCHAR(50) SET @Tokens = 'a, ''b'', ''''c'', ''d'', ''e'''', f, ''1,2,3,4'''
The basic split string function that you can find will produce the following output:
SELECT * FROM dbo.SplitString(@Tokens, ',') OutParam ------------- a 'b' ''c' 'd' 'e'' f '1 2 3 4'
Well, that’s wrong. Because what I want to do is maintain the substrings (or, “tokens,” as I like to call them — thus, Tokenization!)
The output I desire is:
Token -------- a 'b' ''c', 'd', 'e'' f '1,2,3,4'
Notice that substrings — delimited with apostrophes — should be maintained.
And here’s how I’ve solved this problem…
CREATE FUNCTION dbo.Tokenize ( @Input NVARCHAR(2000) ) RETURNS @Tokens TABLE ( TokenNum INT IDENTITY(1,1), Token NVARCHAR(2000) ) AS BEGIN DECLARE @i INT SET @i = 0 DECLARE @StartChar INT SET @StartChar = 1 DECLARE @Quote INT SET @Quote = 0 DECLARE @Chars TABLE ( CharNum INT IDENTITY(1,1) NOT NULL PRIMARY KEY, TheChar CHAR(1), TheCount INT, StartChar INT ) SET @Input = ' , ' + @Input + ' , ' INSERT @Chars (TheChar) SELECT SUBSTRING(@Input, n.Number, 1) FROM Numbers n WHERE n.Number > 0 AND n.Number <= LEN(@Input) ORDER BY n.Number UPDATE Chars SET @i = Chars.TheCount = CASE WHEN Chars1.TheChar = ',' AND @Quote % 2 = 0 THEN 0 ELSE @i + 1 END, @Quote = CASE WHEN Chars1.TheChar = '''' THEN @Quote + 1 WHEN @i = 0 THEN 0 ELSE @Quote END, @StartChar = Chars.StartChar = CASE WHEN @i = 1 THEN Chars1.CharNum - 1 WHEN @i = 0 THEN @StartChar + 1 ELSE @StartChar END FROM @Chars Chars JOIN @Chars Chars1 ON Chars1.CharNum = Chars.CharNum + 1 INSERT @Tokens(Token) SELECT RTRIM(LTRIM(SUBSTRING(@Input, StartChar, CharNum - StartChar + 1))) FROM ( SELECT StartChar, CharNum FROM @Chars WHERE TheCount = 0 UNION ALL SELECT MAX ( CASE TheCount WHEN 0 THEN CharNum ELSE 0 END ) + 1, MAX(CharNum) FROM @Chars ) x WHERE RTRIM(LTRIM(SUBSTRING(@Input, StartChar, CharNum - StartChar + 1))) NOT IN ('', ',') ORDER BY x.StartChar RETURN END
A word of warning: This UDF uses the undocumented — and unsupported — “aggregate update” functionality. I’ve tested thoroughly in this case and believe it works perfectly (and it sure is handy!), but I would advise you to not use it in your own projects without extensive testing! MS doesn’t support this one, so handle with care.
And by the way, you need a numbers table to use this thing. Of course.
As for using this thing, it’s pretty easy:
DECLARE @Tokens VARCHAR(50) SET @Tokens = 'a, ''b'', ''''c'', ''d'', ''e'''', f, ''1,2,3,4''' SELECT Token FROM dbo.Tokenize(@Tokens) Token -------- a 'b' ''c', 'd', 'e'' f '1,2,3,4'
… and it even appears to work properly!
Enjoy… and application for this and other strange things I’ve been posting recently coming very, very soon.