Home Uncategorized Splitting a string of unlimited length

    Splitting a string of unlimited length

    867
    12

    There are many techniques for splitting a string in T-SQL (in other words, taking a character-delimited string and producing a table of the values), the best of which are encapsulated in Erland Sommarskog’s famous article. My favorite of his string splitting techniques is adapted from a previous example that was created by Anith Sen.

    Both of these resources are excellent, but sometimes you just need a little bit more. None of their solutions will split a string larger than 8000 characters in length. So I’ve produced my own modified version. If you’re splitting smaller strings then by all means, use theirs! This is certainly slower, but I do believe it’s the fastest way in SQL Server 2000 to split a very long string. You will require a numbers table, so make sure you have it on hand…

    Anyway, the code:

    CREATE FUNCTION dbo.SplitString
    (
    	@List TEXT,
    	@Delimiter CHAR(1)
    )
    RETURNS @ReturnTbl TABLE 
    (
    	OutParam VARCHAR(20)
    )
    WITH SCHEMABINDING
    AS
    BEGIN
    	DECLARE @LeftSplit VARCHAR(7998)
    	DECLARE @SplitStart INT SET @SplitStart = 0
    	DECLARE @SplitEnd INT
    	SET @SplitEnd = 7997
    
    	SELECT 
    		@SplitEnd = MAX(Number)
    	FROM dbo.Numbers
    	WHERE 
    		(
    			REPLACE(SUBSTRING(@List, Number, 1), ' ', CHAR(255)) = 
    				REPLACE(@Delimiter, ' ', CHAR(255))
    			OR Number = DATALENGTH(@List) + 1
    		)
    		AND Number BETWEEN @SplitStart AND @SplitEnd
    
    	WHILE @SplitStart < @SplitEnd
    	BEGIN
    		SET @LeftSplit = 
    			@Delimiter + 
    			SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) + 
    			@Delimiter
    
    		INSERT @ReturnTbl 
    		(
    			OutParam
    		)
    		SELECT 
    			LTRIM
    			(
    				RTRIM
    				(
    					SUBSTRING
    					(
    						@LeftSplit, 
    						Number + 1,
    	                    CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
    					)
    				)
    			) AS Value
    		FROM dbo.Numbers
    		WHERE  
    			Number <= LEN(@LeftSplit) - 1
    			AND REPLACE(SUBSTRING(@LeftSplit, Number, 1), ' ', CHAR(255)) = 
    				REPLACE(@Delimiter, ' ', CHAR(255))
    			AND '' <>
    				SUBSTRING
    				(
    					@LeftSplit, 
    					Number + 1, 
    					CHARINDEX(@Delimiter, @LeftSplit, Number + 1) - Number - 1
    				)
    
    		SET @SplitStart = @SplitEnd + 1
    		SET @SplitEnd = @SplitEnd + 7997
    
    		SELECT 
    			@SplitEnd = MAX(Number) + @SplitStart
    		FROM dbo.Numbers
    		WHERE 
    			(
    				REPLACE(SUBSTRING(@List, Number + @SplitStart, 1), ' ', CHAR(255)) = 
    					REPLACE(@Delimiter, ' ', CHAR(255))
    				OR Number + @SplitStart = DATALENGTH(@List) + 1
    			)
    			AND Number BETWEEN 1 AND @SplitEnd - @SplitStart
    	END
    
    	RETURN
    END
    GO
    

    This code does nothing more than chunk up the string, using the delimiter as boundries, into small enough pieces (< 8000 characters) that CHARINDEX can be used on them. Then the same algorithm from Erland’s split string is applied.

    Note that this function does not gracefully handle situations such as passing in the wrong delimiter. So please carefully test your code before deploying this!

    Using the function is quite simple:

    SELECT * 
    FROM dbo.SplitString('something, something else, etc, etc, etc...', ',')
    

    Update, February 15, 2005: Fixed so that only 8000 numbers are needed in the Numbers table for this to work. Previously required as many numbers as were present in the string to be split.

    Update, February 23, 2009: Four years since the last update and a new bug is discovered by the insightful Craig Hathaway! Fixed the function so that it now handles space as a delimiter. Thanks, Craig!

    Previous articleControlling Stored Procedure Caching with … Dyanmic SQL?!?
    Next articleCounting occurrences of a substring within a string
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    12 COMMENTS

    1. Well it says "Using the function is quite simple"
      But im having problems (errors) even while tring to create the function…

    2. It works fine. but when I use it in a Stored Procedure it gives error:
      Incorrect syntax near ‘SplitString’.

    3. Hi Adam this is a great function but I have got string with such as ("hello","sdfsfs",….)
      My problem is how can I pass two parremeters such as , and "".
      and how can I insert the values into a table
      Thanks

    4. Had to change the last section of the code to get it to work beyond 8,000 characters.  Great approach though.
      _________________________________
      –Start next chunk
      SET @SplitStart = @SplitEnd + 1
      SET @SplitEnd = @SplitEnd + 7998
      IF ( @SplitEnd >= DATALENGTH(@List)   )
        SET @SplitEnd = DATALENGTH(@List) +1
      ELSE
      BEGIN
       SELECT @SplitEnd = @SplitStart + MAX(Number)
       FROM dbo.Numbers
       WHERE (SUBSTRING(@List, Number, 1) = @Delimiter
       OR Number = DATALENGTH(@List) + 1
        )
      AND Number BETWEEN @SplitStart AND @SplitEnd
      END

    5. Please ignore the previous post – that didn’t fix it.  The issue seemed to be with parsing the last number in the last chunk.  I added
          AND CHARINDEX(‘,’, @LeftSplit, Number + 1) > 0
      to the key SELECT statement.

    6. Hi, this function times out when the delimiter is the <space> character – for splitting sentences into words (etc). I suspect thsi has to do with trimming…
      E.g.
      SELECT *
      FROM dbo.[SplitString]( ‘test one, two, three’,’ ‘)

    7. Hi Craig,
      Busy day at the office?  Nothing better to do than rip apart my old blog posts? <g>
      The function has been fixed… Thanks for reporting this!

    8. I altered Erland’s function to include wordposition and separator position (maybe in a highly inefficient way, but it seems to work correctly).  
      To test it on large strings, you can run something like:
      SELECT * FROM dbo.tfnSplitString(REPLICATE(CAST(‘,AB,CD,EFG,HI,MNZ,QR,UVW,YZ1′ AS nvarchar(max)), 6000), DEFAULT) ORDER BY WordPos;
      (Note that my numbers table is called "Number" and the number column in it is "Num.")
      SET ANSI_NULLS ON
      GO
      SET QUOTED_IDENTIFIER ON
      GO
      CREATE FUNCTION [dbo].[tfnSplitString] (
      @StringToSplit NVARCHAR(MAX),
      @Separator NCHAR(1) = N’,’
      )
      RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS
      BEGIN
        DECLARE @slices TABLE (
      slice nvarchar(4000) NOT NULL,
      slicenum int NOT NULL)
        DECLARE @slice nvarchar(4000),
                @textpos int = 1,
                @maxlen  int = 4000-2,
                @stoppos int,
                @slicenum int = 0
        WHILE datalength(@StringToSplit) / 2 – (@textpos – 1) >= @maxlen
        BEGIN;
       SET @slice = substring(@StringToSplit, @textpos, @maxlen);
       SET @stoppos = @maxlen –
      charindex(@Separator COLLATE Latin1_General_100_BIN2,
      reverse(@slice));
       INSERT @slices (slice, slicenum)
      VALUES (@Separator + left(@slice, @stoppos) + @Separator, @slicenum);
       SET @textpos = @textpos – 1 + @stoppos + 2;
       SET @slicenum +=1;
       — On the other side of the comma.
        END;
        INSERT @slices (slice, slicenum)
            VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @slicenum);
        WITH wordget (Word, SepPos) AS (
           SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,
                     charindex(@Separator COLLATE Latin1_General_100_BIN2,
                               s.slice, N.Num + 1) –
                     N.Num – 1))),                
                     charindex(@Separator COLLATE Latin1_General_100_BIN2,
                               s.slice, N.Num + 1)+(slicenum*@maxlen)-1
            FROM  Number N
            JOIN  @slices s
              ON  N.Num <= len(s.slice) – 1
             AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2
        )
        INSERT @t (Word, WordPos, SepPos)
           SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;
        RETURN;
      END

    9. I made a one-off error (–not the first time):
      replace "(slicenum*@maxlen)-1"
      with
      "(slicenum*(@maxlen-1))-1"
      (the mistake only occurs once, or only the ‘zeroth’ time, depending on how you count.)

    10. Sorry, recorrected (one hopes for the last time)–it wasn’t a one-off error, though it looked like it.  (–also, there may be a more efficient approach to adding separator position.)
      NOTE: assumes the existence of a numbers table called ‘Number’ with an integer primary key column ‘Num’ that has consecutive integers (requires 4,000).
      SET ANSI_NULLS ON
      GO
      SET QUOTED_IDENTIFIER ON
      GO
      CREATE FUNCTION [dbo].[tfnSplitString] (
      @StringToSplit NVARCHAR(MAX),
      @Separator NCHAR(1) = N’,’
      )
      RETURNS @t TABLE (Word nvarchar(4000) NOT NULL, WordPos int not null, SepPos int not null) AS
      BEGIN
      /*
      assumes the existence of a numbers table called ‘Number’ with an integer primary key column ‘Num’.
      */
        DECLARE @slices TABLE (
      slice nvarchar(4000) NOT NULL,
      RunningSliceLen int NOT NULL)
        DECLARE @slice nvarchar(4000),
                @textpos int = 1,
                @maxlen  int = 4000-2,
                @stoppos int,
                @RunningSliceLen int = -1
        WHILE datalength(@StringToSplit) / 2 – (@textpos – 1) >= @maxlen
        BEGIN;
       SET @slice = substring(@StringToSplit, @textpos, @maxlen);
       SET @stoppos = @maxlen –
      charindex(@Separator COLLATE Latin1_General_100_BIN2,
      reverse(@slice));
       INSERT @slices (slice, RunningSliceLen)
      VALUES (@Separator + left(@slice, @stoppos) + @Separator, @RunningSliceLen);
       SET @textpos = @textpos – 1 + @stoppos + 2;
       SET @RunningSliceLen = @RunningSliceLen + DATALENGTH(@Separator + left(@slice, @stoppos))/2;
       — On the other side of the comma.
        END;
        INSERT @slices (slice, RunningSliceLen)
            VALUES (@Separator + substring(@StringToSplit, @textpos, @maxlen) + @Separator, @RunningSliceLen);
        WITH wordget (Word, SepPos) AS (
           SELECT ltrim(rtrim(substring(s.slice, N.Num + 1,
                     charindex(@Separator COLLATE Latin1_General_100_BIN2,
                               s.slice, N.Num + 1) –
                     N.Num – 1))),                
                     charindex(@Separator COLLATE Latin1_General_100_BIN2,
                               s.slice, N.Num+1)+(s.RunningSliceLen)
            FROM  Number N
            JOIN  @slices s
              ON  N.Num <= len(s.slice) – 1
             AND  substring(s.slice, N.Num, 1) = @Separator COLLATE Latin1_General_100_BIN2
        )
        INSERT @t (Word, WordPos, SepPos)
           SELECT Word, ROW_NUMBER() OVER (ORDER BY SepPos) AS WordPos, SepPos FROM wordget;
        RETURN;
      END

    Comments are closed.