Home Uncategorized Splitting a string of unlimited length

Splitting a string of unlimited length

12

There are many techniques for splitting a string in T-SQL (in other words, taking a character-delimited string and producing a table of the values), the best of which are encapsulated in Erland Sommarskog’s famous article. My favorite of his string splitting techniques is adapted from a previous example that was created by Anith Sen.

Both of these resources are excellent, but sometimes you just need a little bit more. None of their solutions will split a string larger than 8000 characters in length. So I’ve produced my own modified version. If you’re splitting smaller strings then by all means, use theirs! This is certainly slower, but I do believe it’s the fastest way in SQL Server 2000 to split a very long string. You will require a numbers table, so make sure you have it on hand…

Anyway, the code:

CREATE FUNCTION dbo.SplitString
(
	@List TEXT,
	@Delimiter CHAR(1)
)
RETURNS @ReturnTbl TABLE 
(
	OutParam VARCHAR(20)
)
WITH SCHEMABINDING
AS
BEGIN
	DECLARE @LeftSplit VARCHAR(7998)
	DECLARE @SplitStart INT SET @SplitStart = 0
	DECLARE @SplitEnd INT
	SET @SplitEnd = 7997

	SELECT 
		@SplitEnd = MAX(Number)
	FROM dbo.Numbers
	WHERE 
		(
			REPLACE(SUBSTRING(@List, Number, 1), ' ', CHAR(255)) = 
				REPLACE(@Delimiter, ' ', CHAR(255))
			OR Number = DATALENGTH(@List) + 1
		)
		AND Number BETWEEN @SplitStart AND @SplitEnd

	WHILE @SplitStart < @SplitEnd
	BEGIN
		SET @LeftSplit = 
			@Delimiter + 
			SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) + 
			@Delimiter

		INSERT @ReturnTbl 
		(
			OutParam
		)
		SELECT 
			LTRIM
			(
				RTRIM
				(
					SUBSTRING
					(
						@LeftSplit, 
						Number + 1,
	                    CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
					)
				)
			) AS Value
		FROM dbo.Numbers
		WHERE  
			Number <= LEN(@LeftSplit) - 1
			AND REPLACE(SUBSTRING(@LeftSplit, Number, 1), ' ', CHAR(255)) = 
				REPLACE(@Delimiter, ' ', CHAR(255))
			AND '' <>
				SUBSTRING
				(
					@LeftSplit, 
					Number + 1, 
					CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
				)

		SET @SplitStart = @SplitEnd + 1
		SET @SplitEnd = @SplitEnd + 7997

		SELECT 
			@SplitEnd = MAX(Number) + @SplitStart
		FROM dbo.Numbers
		WHERE 
			(
				REPLACE(SUBSTRING(@List, Number + @SplitStart, 1), ' ', CHAR(255)) = 
					REPLACE(@Delimiter, ' ', CHAR(255))
				OR Number + @SplitStart = DATALENGTH(@List) + 1
			)
			AND Number BETWEEN 1 AND @SplitEnd - @SplitStart
	END

	RETURN
END
GO

This code does nothing more than chunk up the string, using the delimiter as boundries, into small enough pieces (< 8000 characters) that CHARINDEX can be used on them. Then the same algorithm from Erland’s split string is applied.

Note that this function does not gracefully handle situations such as passing in the wrong delimiter. So please carefully test your code before deploying this!

Using the function is quite simple:

SELECT * 
FROM dbo.SplitString('something, something else, etc, etc, etc...', ',')


Update, February 15, 2005: Fixed so that only 8000 numbers are needed in the Numbers table for this to work. Previously required as many numbers as were present in the string to be split.

Update, February 23, 2009: Four years since the last update and a new bug is discovered by the insightful Craig Hathaway! Fixed the function so that it now handles space as a delimiter. Thanks, Craig!

SHARE
Previous articleControlling Stored Procedure Caching with … Dyanmic SQL?!?
Next articleCounting occurrences of a substring within a string
Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

12 COMMENTS

  1. Well it says "Using the function is quite simple"
    But im having problems (errors) even while tring to create the function…

  2. Hi Adam this is a great function but I have got string with such as ("hello","sdfsfs",….)
    My problem is how can I pass two parremeters such as , and "".
    and how can I insert the values into a table
    Thanks

  3. Had to change the last section of the code to get it to work beyond 8,000 characters.  Great approach though.
    _________________________________
    –Start next chunk
    SET @SplitStart = @SplitEnd + 1
    SET @SplitEnd = @SplitEnd + 7998
    IF ( @SplitEnd >= DATALENGTH(@List)   )
      SET @SplitEnd = DATALENGTH(@List) +1
    ELSE
    BEGIN
     SELECT @SplitEnd = @SplitStart + MAX(Number)
     FROM dbo.Numbers
     WHERE (SUBSTRING(@List, Number, 1) = @Delimiter
     OR Number = DATALENGTH(@List) + 1
      )
    AND Number BETWEEN @SplitStart AND @SplitEnd
    END

  4. Please ignore the previous post – that didn’t fix it.  The issue seemed to be with parsing the last number in the last chunk.  I added
        AND CHARINDEX(‘,’, @LeftSplit, Number + 1) > 0
    to the key SELECT statement.

  5. Hi, this function times out when the delimiter is the <space> character – for splitting sentences into words (etc). I suspect thsi has to do with trimming…
    E.g.
    SELECT *
    FROM dbo.[SplitString]( ‘test one, two, three’,’ ‘)

  6. Hi Craig,
    Busy day at the office?  Nothing better to do than rip apart my old blog posts? <g>
    The function has been fixed… Thanks for reporting this!

  7. I altered Erland’s function to include wordposition and separator position (maybe in a highly inefficient way, but it seems to work correctly).  
    To test it on large strings, you can run something like:
    SELECT * FROM dbo.tfnSplitString(REPLICATE(CAST(‘,AB,CD,EFG,HI,MNZ,QR,UVW,YZ1′ AS nvarchar(max)), 6000), DEFAULT) ORDER BY WordPos;
    (Note that my numbers table is called "Number" and the number column in it is "Num.")
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE FUNCTION [dbo].[tfnSplitString] (
    @StringToSplit NVARCHAR(MAX),
    @Separator NCHAR(1) = N’,’
    )
    RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS
    BEGIN
      DECLARE @slices TABLE (
    slice nvarchar(4000) NOT NULL,
    slicenum int NOT NULL)
      DECLARE @slice nvarchar(4000),
              @textpos int = 1,
              @maxlen  int = 4000-2,
              @stoppos int,
              @slicenum int = 0
      WHILE datalength(@StringToSplit) / 2 – (@textpos – 1) >= @maxlen
      BEGIN;
     SET @slice = substring(@StringToSplit, @textpos, @maxlen);
     SET @stoppos = @maxlen –
    charindex(@Separator COLLATE Latin1_General_100_BIN2,
    reverse(@slice));
     INSERT @slices (slice, slicenum)
    VALUES (@Separator + left(@slice, @stoppos) + @Separator, @slicenum);
     SET @textpos = @textpos – 1 + @stoppos + 2;
     SET @slicenum +=1;
     — On the other side of the comma.
      END;
      INSERT @slices (slice, slicenum)
          VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @slicenum);
      WITH wordget (Word, SepPos) AS (
         SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,
                   charindex(@Separator COLLATE Latin1_General_100_BIN2,
                             s.slice, N.Num + 1) –
                   N.Num – 1))),                
                   charindex(@Separator COLLATE Latin1_General_100_BIN2,
                             s.slice, N.Num + 1)+(slicenum*@maxlen)-1
          FROM  Number N
          JOIN  @slices s
            ON  N.Num <= len(s.slice) – 1
           AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2
      )
      INSERT @t (Word, WordPos, SepPos)
         SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;
      RETURN;
    END

  8. I made a one-off error (–not the first time):
    replace "(slicenum*@maxlen)-1"
    with
    "(slicenum*(@maxlen-1))-1"
    (the mistake only occurs once, or only the ‘zeroth’ time, depending on how you count.)

  9. Sorry, recorrected (one hopes for the last time)–it wasn’t a one-off error, though it looked like it.  (–also, there may be a more efficient approach to adding separator position.)
    NOTE: assumes the existence of a numbers table called ‘Number’ with an integer primary key column ‘Num’ that has consecutive integers (requires 4,000).
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE FUNCTION [dbo].[tfnSplitString] (
    @StringToSplit NVARCHAR(MAX),
    @Separator NCHAR(1) = N’,’
    )
    RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS
    BEGIN
    /*
    assumes the existence of a numbers table called ‘Number’ with an integer primary key column ‘Num’.
    */
      DECLARE @slices TABLE (
    slice nvarchar(4000) NOT NULL,
    RunningSliceLen int NOT NULL)
      DECLARE @slice nvarchar(4000),
              @textpos int = 1,
              @maxlen  int = 4000-2,
              @stoppos int,
              @RunningSliceLen int = -1
      WHILE datalength(@StringToSplit) / 2 – (@textpos – 1) >= @maxlen
      BEGIN;
     SET @slice = substring(@StringToSplit, @textpos, @maxlen);
     SET @stoppos = @maxlen –
    charindex(@Separator COLLATE Latin1_General_100_BIN2,
    reverse(@slice));
     INSERT @slices (slice, RunningSliceLen)
    VALUES (@Separator + left(@slice, @stoppos) + @Separator, @RunningSliceLen);
     SET @textpos = @textpos – 1 + @stoppos + 2;
     SET @RunningSliceLen = @RunningSliceLen + DATALENGTH(@Separator + left(@slice, @stoppos))/2;
     — On the other side of the comma.
      END;
      INSERT @slices (slice, RunningSliceLen)
          VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @RunningSliceLen);
      WITH wordget (Word, SepPos) AS (
         SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,
                   charindex(@Separator COLLATE Latin1_General_100_BIN2,
                             s.slice, N.Num + 1) –
                   N.Num – 1))),                
                   charindex(@Separator COLLATE Latin1_General_100_BIN2,
                             s.slice, N.Num+1)+(s.RunningSliceLen)
          FROM  Number N
          JOIN  @slices s
            ON  N.Num <= len(s.slice) – 1
           AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2
      )
      INSERT @t (Word, WordPos, SepPos)
         SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;
      RETURN;
    END

LEAVE A REPLY

Please enter your comment!
Please enter your name here