Home Uncategorized SQLCLR String Splitting Part 2: Even Faster, Even More Scalable

SQLCLR String Splitting Part 2: Even Faster, Even More Scalable

43

Two days ago, after posting what I thought was a pretty solid SQLCLR string splitting method, I received a comment telling me about a big thread on SQLServerCentral dedicated to the question of how best to split strings. So I jumped in, and went back and forth, and back and forth, and back… and forth…

Many, many messages and several revisions and re-revisions later, I am happy to present the newer, better, more bug-free SQLCLR string splitting function. Not only is it faster than the previous one I posted, but it also handles multi-character delimiters, ignores consecutive delimiters as well as pre and post delimiters, and best of all, it won’t get itself stuck in an infinite loop in certain cases (always a really great stability feature).

This version walks the SqlChars character array rather than using the IndexOf method on the string, which we found to be a somewhat faster technique–and it’s certainly the most scalable and memory efficient method I can imagine.

Special thanks to SQLServerCentral member Florian Reischl, who was the main person keeping the thread going with me during the last couple of days, and who managed to re-write my versions and eke out even better performance by modifying the algorithms.  Great stuff–it was definitely the most fun I’ve had on a technical forum in quite a long time.

But without further ado, the code:

using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(
       FillRowMethodName = "FillRow_Multi",
       TableDefinition = "item nvarchar(4000)"
       )
    ]
    public static IEnumerator SplitString_Multi(
      [SqlFacet(MaxSize = -1)]
      SqlChars Input,
      [SqlFacet(MaxSize = 255)]
      SqlChars Delimiter
       )
    {
        return (
            (Input.IsNull || Delimiter.IsNull) ?
            new SplitStringMulti(new char[0], new char[0]) :
            new SplitStringMulti(Input.Value, Delimiter.Value));
    }

    public static void FillRow_Multi(object obj, out SqlString item)
    {
        item = new SqlString((string)obj);
    }

    public class SplitStringMulti : IEnumerator
    {
        public SplitStringMulti(char[] TheString, char[] Delimiter)
        {
            theString = TheString;
            stringLen = TheString.Length;
            delimiter = Delimiter;
            delimiterLen = (byte)(Delimiter.Length);
            isSingleCharDelim = (delimiterLen == 1);

            lastPos = 0;
            nextPos = delimiterLen * -1;
        }

        #region IEnumerator Members

        public object Current
        {
            get
            {
                return new string(theString, lastPos, nextPos - lastPos);
            }
        }

        public bool MoveNext()
        {
            if (nextPos >= stringLen)
                return false;
            else
            {
                lastPos = nextPos + delimiterLen;

                for (int i = lastPos; i < stringLen; i++)
                {
                    bool matches = true;

                    //Optimize for single-character delimiters
                    if (isSingleCharDelim)
                    {
                        if (theString[i] != delimiter[0])
                            matches = false;
                    }
                    else
                    {
                        for (byte j = 0; j < delimiterLen; j++)
                        {
                            if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
                            {
                                matches = false;
                                break;
                            }
                        }
                    }

                    if (matches)
                    {
                        nextPos = i;

                        //Deal with consecutive delimiters
                        if ((nextPos - lastPos) > 0)
                            return true;
                        else
                        {
                            i += (delimiterLen-1);
                            lastPos += delimiterLen;
                        }
                    }
                }

                lastPos = nextPos + delimiterLen;
                nextPos = stringLen;

                if ((nextPos - lastPos) > 0)
                    return true;
                else
                    return false;
            }
        }

        public void Reset()
        {
            lastPos = 0;
            nextPos = delimiterLen * -1;
        }

        #endregion

        private int lastPos;
        private int nextPos;

        private readonly char[] theString;
        private readonly char[] delimiter;
        private readonly int stringLen;
        private readonly byte delimiterLen;
        private readonly bool isSingleCharDelim;
    }
};

Enjoy!

SHARE
Previous articleFaster, More Scalable SQLCLR String Splitting
Next articleThe Hidden Costs of INSERT EXEC
Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

43 COMMENTS

  1. I think you should make skipping consecutive delimiters an option as there are plenty of cases where you are looking for the x value, and you have to include blank ones in your count.

  2. Was wondering if you considered including a "position" element in the output table along with the item element.  Also, what about making the item element nvarchar(max) rather than 4000?  And did you compare the performance difference between 4000 and max parameters?  In other senarios, I’ve created a few SQL functions to call the same C# code, so you can call a faster SQL function when you know your inputs/outputs will be limited to 4000 characters, or call the slower SQL function when you need unlimited length inputs/outputs.

  3. Hi there. Maybe I’m not entirely in the know here, so please pardon me if my remark is stupid and disregard it altogether in that case… Would it not be easier and better (faster???) to use regexes for splitting strings? I remember that there are built-in options in regex’s methods that will allow one to split strings based not only on concrete chars but also on patterns and to automatically get rid of empty entries as well; if we compile the regex object, it could be even faster, even though it’ll incur some start-up penalty (this shouldn’t matter if there are millions of rows to process, though). And, of course, the IEnumerable is then for free! I have not checked the difference, though, between the code presented here and the regex-based solution that I propose performance-wise. But I think the regexes are a viable solution that gives even more flexibility than the (beautiful) code above.

  4. Hi Darek,
    You are correct; you can use RegEx. It’s not faster, but it certainly is much easier to implement. Unfortunately it has a major downside: the entire collection of substrings will be allocated upfront. It will be sitting in memory the entire time you’re processing, using up precious server resources. And if you have a lot of elements in that collection, it will go on the LOB heap and performance across the entire AppDomain could be severely impacted. All in all, not the kind of solution I want inside my very expensive SQL Server instance.
    –Adam

  5. Hi Adam.
    Very right you are in what you say. My first comment was misfired. After I have read the whole conversation and other bits and pieces here and there, I get the point and agree that the above solution would be the ultimate. It’s a pity that one cannot in this case split on regexes… but I guess one cannot have a cake and eat it 🙂
    Best,
    Darek

  6. I know this post is really old, but if you are calling Input.Value, are you then not doing the same pre-allocation you are trying to avoid?
    Should we not find some way to stream the chars in using SqlChars.Read and stream it back out using SqlChars.Write?
    Something like (pseudo, single-char delim):
       SqlChars currentResult = new SqlChars();
       long inputOffset = 0, outputOffset = 0;
       while (true) {
           int read = Input.Read(inputOffset, Input.Buffer, 0, Input.Buffer.Length);
           if (read > 0) {
               for (int i = 0; i < read; i++) {
                   if (Input.Buffer[i] == delimChar) {
                       // iterate the current string and create the next one
                       yield return currentResult;
                       currentResult = new SqlChars();
                       outputOffset = 0;
                   }
                   else {
                       // append the char to the current string
                       currentResult.Write(outputOffset++, Input.Buffer, i, 1);
                   }
               }
               offset += read;
           }
           else break;    // no more characters in input
       }
    I don’t know for certain whether the calls to Read and Write are correct, as I’m still learning SQLCLR (in particular how to handle LOBs — I came here looking for a possible means to split more than 4000 chars and return more than 4000 chars)…

  7. (There are also clearly some bugs in my above pseudo-code, too: `offset += read;` should be `inputOffset += read;`, for one; and another `yield return currentResult;` is required to spool the last result — but you should get the gist)

  8. jimbobmcgee: YES, we should find a way to stream. Unfortunately, Microsoft broke the streaming interface for LOBs in SQL Server 2008. (Here is a Connect item describing the issue: https://connect.microsoft.com/SQLServer/feedback/details/382235/clr-error-system-invalidoperationexception-accessing-members-of-an-object-from-a-wrong-thread) … Apparently it is now fixed, but I’ve not tried it out yet.
    That said, bringing in the string is very different from holding a reference to all of the sub-strings. The latter case is much more expensive due to pointer overhead.

  9. Hi Adam
    Fantastic article.  I’ve used a slightly modified version of your code in my application and seen tremendous performance gains over native SQL attempts at the same thing.
    I hope you don’t mind, but because I struggled to create the actual DLL and integrate it (first time I’ve written a CLR UDF), I’ve written a how-to on my blog using your code above as an example.
    Due credit is given in your direction obviously. Hope that’s okay?
    Keep up the good work.

  10. Hi,
    I was very impressed by the performance of this function.
    I would need this Split function to return empty strings if two delimiters are consecutive. Which changes would be necessary ?
    Thanks

  11. Hi there Adam,
    Thanks for posting this, I’ve created my first CLR from this code and it works like a charm.  I read the original SQL Server Central thread and a number of solutions also returned a counter as well in the table definition for each row returned.
    So table definition would be the following:
    TableDefinition = "counter int, item nvarchar(4000)"
    I’m no C# guru but I want to add this counter to the code but I’m not sure how to do this.  I’ve tried a few things but I can’t get my new output parameter from the SplitStringMulti class into the FillRow_Multi object, i.e:
    public static void FillRow_Multi(object obj, out SqlInt32 counter, out SqlString item)
    {
       counter = 1; // I don’t know???
       item = new SqlString((string)obj);
    }
    If you have a moment are you able to advise please?
    Kind regards,
    Chris

  12. Uh, never mind, I just read page 50 of that thread on SQL Server Central and I see you’ve added the row number there =)
    Thanks again!

  13. Oh, actually your code in this blog post and the code on page 50 (which includes the row numbers) is quite different.
    I thought I’d try to modify your code on this blog with the row number code but I’m not having much luck.  So my original request still stands, are you able to assist please?
    Kind regards,
    Chris

  14. Hi there,
    Sorry for all of the comments.  I think I’ve managed to do this.  I’ve also modified the code as I’m only ever interested in a one character delimiter.  I’m still a C# newb so may have missed things.
    Essentially the change returns an additional column which shows the place of the delimited field within the original record.  So it’s not meant to be a unique integer value.
    Thank you so much again for posting this on your blog.
    using System;
    using System.Collections;
    using System.Data;
    using System.Data.SqlClient;
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    public partial class UserDefinedFunctions
    {
       [Microsoft.SqlServer.Server.SqlFunction(
           FillRowMethodName = "FillRow_Multi",
           TableDefinition = "row int, item nvarchar(4000)"
           )
       ]
       public static IEnumerator SplitString_Multi(
           [SqlFacet(MaxSize = -1)] SqlChars Input,
           [SqlFacet(MaxSize = 1)] SqlChars Delimiter
           )
       {
           return ((Input.IsNull || Delimiter.IsNull) ? new SplitStringMulti(new char[0], new char[0]) : new SplitStringMulti(Input.Value, Delimiter.Value));
       }
       public static void FillRow_Multi(object obj, out SqlInt32 row, out SqlString item)
       {
           ResultSet rs = (ResultSet)obj;
           row = new SqlInt32((int)rs.get_row);
           item = new SqlString((string)rs.get_value);
       }
       public class ResultSet
       {
           private int row;
           private string value;
           private int start;
           private int end;
           private char[] theString;
           public ResultSet(int input_row, char[] input_value, int input_start, int input_end)
           {
               row = input_row;
               value = new string(input_value, input_start, input_end);
           }
           public object get_row
           {
               get
               {
                   return row;
               }
           }
           public object get_value
           {
               get
               {
                   return value;
               }
           }
       }
       public class SplitStringMulti : IEnumerator
       {
           public SplitStringMulti(char[] TheString, char[] Delimiter)
           {
               theString = TheString;
               stringLen = TheString.Length;
               delimiter = Delimiter;
               lastPos = 0;
               nextPos = 1 * -1;
           }
           public object Current
           {
               get
               {
                   ResultSet results = new ResultSet(row, theString, lastPos, nextPos – lastPos);
                   return results;
               }
           }
           public bool MoveNext()
           {
               if (nextPos >= stringLen)
               {
                   return false;
               }
               else
               {
                   lastPos = nextPos + 1;
                   for (int i = lastPos; i < stringLen; i++)
                   {
                       bool matches = true;
                       if (theString[i] != delimiter[0])
                       {
                           matches = false;
                       }
                       else
                       {
                           nextPos = i;
                           row++;
                           return true;
                       }
                   }
                   lastPos = nextPos + 1;
                   nextPos = stringLen;
                   if ((nextPos – lastPos) > 0)
                   {
                       return true;
                   }
                   else
                   {
                       return false;
                   }
               }
           }
           public void Reset()
           {
               lastPos = 0;
               nextPos = 1 * -1;
               row = 0;
           }
           private int lastPos;
           private int nextPos;
           private int row = 0;
           private readonly char[] theString;
           private readonly char[] delimiter;
           private readonly int stringLen;
       }
    };

  15. Hi Chris,
    Glad you got it working, but the code in the SSC thread would have been totally fine to use — I’ve had it in a production environment since a couple of days after I posted there, with no problems at all.
    That said, probably good that you got a chance to play with some C#. Always nice to learn new things.
    –Adam

  16. Hi Chris,
    I am a newbie myself but IMHO considering that the Enumeration will run only once for a given string, I suppose you can define your row number with just
    public class ResultSet
    {
    private readonly int row;
    private readonly string value;
    public ResultSet(int row, string value)
    {
    this.row = row;
    this.value = value;
    }
    }
    in the SplitStringMulti class keep a field
    private int row;
    initalise row in "Constructor" and during "Reset" set row to 0 by adding
    row = 0;
    then in the iterator you can use
    public object Current
    {
    get
    {
    return = new ResultSet(++row, theString( lastPos, nextPos – lastPos));
    }
    }
    and use ResultSet in the  FillRow_Multi like
    public static void FillRow_Multi(object obj, out SqlInt32 row, out SqlString item)
    {
    ResultSet rs = (ResultSet)obj;
    row = new SqlInt32(rs.row);
    item = new SqlString(rs.value);
    }

  17. Hi,
    I am new to SQL CLR.
    Is there a usage example or even how to use this class in VS2012 / SQL2008R2?
    The usage scenario I have is that I have a pipe (|) and comma delimited string.  The comma separates the "records/rows" and the pipe separates the "columns" (all string values).
    So would like to pass in my string and have it (in SQL2008R2) spit out a table with the columns and rows.  
    Is that possible?

  18. Good day,
    If I may offer a small streamlining.
    Since you are changing the code from time to time and you add improvements. It is worth using the AssemblyVersion attribute.
    This will allow the user to update the assembly directly by using "ALTER ASSEMBLY…"
    all that need to be done is add this to the code:
    using System.Reflection;
    [assembly: AssemblyVersion("1.0.0.0")]
    and you can add attributes like those if you want:
    [assembly: AssemblyFileVersion("1.0.0.0")]
    [assembly: AssemblyDescription("Adam Machanic Split CLR")]
    [assembly: AssemblyCompany("Free Open Source Code")]

  19. It would be interesting to have an example of how to run the function in SSMS. If I just call it there is an error that says it fails because is a table valued object function object.

  20. It will be nice if the post is updated (in order to be sure it is working fine with SQL Server 2012/2014). Also, is it working with nvarchar(max) input?

  21. It will be nice if the post is updated (in order to be sure it is working fine with SQL Server 2012/2014). Also, is it working with nvarchar(max) input?

  22. Adam, I found a way to make you function completely hang. To cause it to hang, all I did was pass in a empty string for my delimiter like so:
    SELECT xs.* FROM dbo.xfnSplit(N’This is a test of the emergency broadcast system’, ”) xs
    Simple fix of changing the return statement worked it out however:
           return (
               Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
               new SplitStringMulti(new char[0], new char[0]) :
               new SplitStringMulti(Input.Value, Delimiter.Value));

  23. @Tim
    Interesting! I must have found this at some point myself, as I just checked my local (somewhat modified) version, and it doesn’t hang. Although I have no recollection of when or why I did so, I seem to have fixed it by changing the following line in the SplitStringMulti constructor:
    delimiter = Delimiter;
    … to …
    delimiter = (Delimiter.Length == 0 ? ‘ ‘ : Delimiter[0]);
    … I think I prefer your solution, because what does an empty delimiter actually mean..?
    –Adam

  24. Adam, made a small change to your original code that should not make any impact. I added a rowid to the output table. VERY NICE FUNCTION BTW! Would love to take a peek at the rest of your clr libs.
       [SqlFunction(Name = "xfnSplit", FillRowMethodName = "FillRow_Multi", TableDefinition = "[RowID] int, [Item] nvarchar(4000)")]
       public static IEnumerator SplitString_Multi(
         [SqlFacet(MaxSize = -1)]
         SqlChars Input,
         [SqlFacet(MaxSize = 255)]
         SqlChars Delimiter)
       {
           return (
               Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
               new SplitStringMulti(new char[0], new char[0]) :
               new SplitStringMulti(Input.Value, Delimiter.Value));
       }
       public static void FillRow_Multi(object obj, out SqlInt32 rowid, out SqlString item)
       {
           SplitStringRow row = (SplitStringRow)obj;
           rowid = row.RowId;
           item = row.Value;
       }
       public class SplitStringMulti : IEnumerator
       {
           public SplitStringMulti(char[] TheString, char[] Delimiter)
           {
               theString = TheString;
               stringLen = TheString.Length;
               delimiter = Delimiter;
               delimiterLen = (byte)(Delimiter.Length);
               isSingleCharDelim = (delimiterLen == 1);
               lastPos = 0;
               nextPos = delimiterLen * -1;
           }
           #region IEnumerator Members
           public object Current
           {
               get
               {
                   return new SplitStringRow(rowid++, new string(theString, lastPos, nextPos – lastPos));
               }
           }
           public bool MoveNext()
           {
               if (nextPos >= stringLen)
                   return false;
               else
               {
                   lastPos = nextPos + delimiterLen;
                   for (int i = lastPos; i < stringLen; i++)
                   {
                       bool matches = true;
                       //Optimize for single-character delimiters
                       if (isSingleCharDelim)
                       {
                           if (theString[i] != delimiter[0])
                               matches = false;
                       }
                       else
                       {
                           for (byte j = 0; j < delimiterLen; j++)
                           {
                               if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
                               {
                                   matches = false;
                                   break;
                               }
                           }
                       }
                       if (matches)
                       {
                           nextPos = i;
                           //Deal with consecutive delimiters
                           if ((nextPos – lastPos) > 0)
                               return true;
                           else
                           {
                               i += (delimiterLen – 1);
                               lastPos += delimiterLen;
                           }
                       }
                   }
                   lastPos = nextPos + delimiterLen;
                   nextPos = stringLen;
                   if ((nextPos – lastPos) > 0)
                       return true;
                   else
                       return false;
               }
           }
           public void Reset()
           {
               rowid = 1;
               lastPos = 0;
               nextPos = delimiterLen * -1;
           }
           #endregion
           private int rowid = 1;
           private int lastPos;
           private int nextPos;
           private readonly char[] theString;
           private readonly char[] delimiter;
           private readonly int stringLen;
           private readonly byte delimiterLen;
           private readonly bool isSingleCharDelim;
       }
       internal class SplitStringRow
       {
           private int _rowid;
           public int RowId { get { return _rowid; } }
           private string _value;
           public string Value { get { return _value; } }
           public SplitStringRow(int rowid, string value)
           {
               _rowid = rowid;
               _value = value;
           }
       }

  25. @Tim
    I’m not a fan of that particular method, as it will create a new object for every substring. That’s going to have a measurable (and considerable) impact. Try to come up with another way to do it. (Hint: Create a reusable, pass-through object.)
    –Adam

  26. Ok, made the changes you suggested Adam, as well as added a new split for ints. This way a delimited string of ints can be passed in and used in a join or a where clause without type conversion occurring. The only way I could think of to accomplish passing an object across was to use a delimeted string. Hopefully I did not fail your test at guessing your hint. 🙂
       #region http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/  Adam Machanic: SQLCLR String Splitting Part 2: Even Faster, Even More Scalable
       /// <summary>
       /// Splits the string into a table of integers.
       /// </summary>
       /// <param name="Input">The input.</param>
       /// <param name="Delimiter">The delimiter.</param>
       /// <returns></returns>
       /// <remarks>Author: Adam Machanic http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/</remarks>
       [SqlFunction(Name = "xfnSplitInts", FillRowMethodName = "FillRow_MultiInt32", TableDefinition = "[Item] int")]
       public static IEnumerator SplitInt(
           [SqlFacet(MaxSize = -1)]
           SqlChars Input,
           [SqlFacet(MaxSize = 255)]
           SqlChars Delimiter)
       {
           return (
               Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
               new SplitStringMulti(new char[0], new char[0]) :
               new SplitStringMulti(Input.Value, Delimiter.Value, SplitType.Int32));
       }
       /// <summary>
       /// Splits the string into a table of nvarchars.
       /// </summary>
       /// <param name="Input">The input.</param>
       /// <param name="Delimiter">The delimiter.</param>
       /// <returns></returns>
       /// <remarks>Author: Adam Machanic http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/</remarks>
       [SqlFunction(Name = "xfnSplit", FillRowMethodName = "FillRow_MultiString", TableDefinition = "[RowId] int, [Item] nvarchar(4000)")]
       public static IEnumerator SplitString(
           [SqlFacet(MaxSize = -1)]
           SqlChars Input,
           [SqlFacet(MaxSize = 255)]
           SqlChars Delimiter)
       {
           return (
               Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
               new SplitStringMulti(new char[0], new char[0]) :
               new SplitStringMulti(Input.Value, Delimiter.Value, SplitType.String));
       }
       public static void FillRow_MultiInt32(object obj, out SqlInt32 item)
       {
           int tmp = 0;
           if (int.TryParse(Convert.ToString(obj), out tmp))
           {
               item = tmp;
           }
           else
           {
               item = SqlInt32.Null;
           }
       }
       public static void FillRow_MultiString(object obj, out SqlInt32 rowid, out SqlString item)
       {
           string[] row = Convert.ToString(obj).Split(new string[] { "<#>" }, StringSplitOptions.RemoveEmptyEntries);
           if (row.Length == 2)
           {
               rowid = Convert.ToInt32(row[0]);
               item = row[1];
           }
           else
           {
               throw new FormatException("Not able to property split row.");
           }
       }
       public class SplitStringMulti : IEnumerator
       {
           public SplitStringMulti(char[] TheString, char[] Delimiter, SplitType type = SplitType.String)
           {
               this.type = type;
               theString = TheString;
               stringLen = TheString.Length;
               delimiter = Delimiter;
               delimiterLen = (byte)(Delimiter.Length);
               isSingleCharDelim = (delimiterLen == 1);
               lastPos = 0;
               nextPos = delimiterLen * -1;
           }
           #region IEnumerator Members
           public object Current
           {
               get
               {
                   string tmp = new string(theString, lastPos, nextPos – lastPos);
                   if (type == SplitType.String)
                   {
                       return String.Format("{0}<#>{1}", rowid++, tmp);
                   }
                   else if (type == SplitType.Int32)
                   {
                       return tmp;
                   }
                   else
                   {
                       throw new NotSupportedException("This type of split is not implemented yet.");
                   }
               }
           }
           public bool MoveNext()
           {
               if (nextPos >= stringLen)
               {
                   return false;
               }
               else
               {
                   lastPos = nextPos + delimiterLen;
                   for (int i = lastPos; i < stringLen; i++)
                   {
                       bool matches = true;
                       //Optimize for single-character delimiters
                       if (isSingleCharDelim)
                       {
                           if (theString[i] != delimiter[0])
                           {
                               matches = false;
                           }
                       }
                       else
                       {
                           for (byte j = 0; j < delimiterLen; j++)
                           {
                               if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
                               {
                                   matches = false;
                                   break;
                               }
                           }
                       }
                       if (matches)
                       {
                           nextPos = i;
                           //Deal with consecutive delimiters
                           if ((nextPos – lastPos) > 0)
                           {
                               return true;
                           }
                           else
                           {
                               i += (delimiterLen – 1);
                               lastPos += delimiterLen;
                           }
                       }
                   }
                   lastPos = nextPos + delimiterLen;
                   nextPos = stringLen;
                   if ((nextPos – lastPos) > 0)
                   {
                       return true;
                   }
                   else
                   {
                       return false;
                   }
               }
           }
           public void Reset()
           {
               rowid = 1;
               lastPos = 0;
               nextPos = delimiterLen * -1;
           }
           #endregion
           private SplitType type = SplitType.String;
           private int rowid = 1;
           private int lastPos;
           private int nextPos;
           private readonly char[] theString;
           private readonly char[] delimiter;
           private readonly int stringLen;
           private readonly byte delimiterLen;
           private readonly bool isSingleCharDelim;
       }
       public enum SplitType
       {
           String,
           Int32
       }
       //internal class SplitStringRow
       //{
       //    private int _rowid;
       //    public int RowId { get { return _rowid; } }
       //    private string _value;
       //    public string Value { get { return _value; } }
       //    public SplitStringRow(int rowid, string value)
       //    {
       //        _rowid = rowid;
       //        _value = value;
       //    }
       //}
       #endregion

  27. @Tim:
    In my version the stringSplit class internally creates an instance of another class, results, which looks like this:
           public class results
           {
               public results(stringSplit ss)
               {
                   this.ss = ss;
               }
               private readonly stringSplit ss;
               public int RowNumber
               {
                   get
                   {
                       return (ss.rowNumber);
                   }
               }
               public string OutputString
               {
                   get
                   {
                       return (ss.result);
                   }
               }
           }
    Results is effectively a singleton (though not enforced). When it’s time to return a row, stringSplit increments the rowNumber, populates the result string, and then returns the same exact instance of results, every single time.
    The FillRowMethod then simply casts the input object as results and uses the properties to output the requisite data.
    –Adam

  28. All right Adam, I am not going to spam your comments section with code again, but I made the changes you suggested. I also removed the enum for my split type. I just handle the results class differently in the fillrow methods. This way the same enumerator can be reused for both quite seamlessly.
    TY for the help!

  29. How could a person change the code here to include NULL or empty values for cases when there are two delimiters in a row? I’m thinking of a scenario where I’ve been handed 10 column values all delimited….if column 3 is NULL, I need to make sure there are still 10 rows in my output, not 9 but this CLR splitter code above will just ignore that nullness.
    Has someone handled this case and I’ve just missed the update in the many ridiculously long and varied blog posts and articles about this? 🙂

  30. @John #2
    I specifically wrote this code to ignore that case. But I think it’s easy to change.
    Look for the comment marked "//Deal with consecutive delimiters" — and simply remove the "if" condition there. (i.e. always return true)
    I would be very interested in understanding more about your scenario, as I’ve never seen one where consecutive delimiters were anything other than a data error.
    –Adam

  31. @db Yup, it’s C#; it’s a SQLCLR user-defined function. You need to get Visual Studio (i.e. SQL Server Data Tools) and compile it. Then you can deploy the DLL to your server.
    I wrote a very in-depth chapter on how it’s done in the book "T-SQL Querying" that was released earlier this year. Check it out!
    –Adam

  32. First of all, I have learned QUITE a lot from your blogs as well as your PASS Summit Pre-con & con sessions, so THANK YOU!
    This post is over 6 years old now and throughout the many comments you have mentioned several updates that either you have made or have suggested being made.
    What do you say about a new post… "SQLCLR String Splitting Part 3: All the Bells and Whistles"?!?!?!  
    Again, I thank you for sharing your information and normally I would look forward to attending your PASS Summit 2015 sessions, but I don’t see your name in the list of speakers.  🙁
    http://www.sqlpass.org/summit/2015/Sessions/Speakers.aspx

  33. @J. Morgan:
    Thanks for the suggestion – I’ll certainly consider it!
    I’m skipping PASS this year for personal reasons, but it’s extremely likely I’ll return in 2016.
    –Adam

  34. Hello all,
    Please guide me, i m new in CLR funtion
    I have blow tow question
    –1) I need to return only int in lieu of String  how can I ?
    EXTERNAL NAME CLRUtilities.UserDefinedFunctions.SplitString_Multi;
    GO
    CREATE ASSEMBLY CLRUtilities FROM ‘c:DLLsCLRUtilities.dll’
     WITH PERMISSION_SET = SAFE;
    GO
    CREATE FUNCTION dbo.SplitStrings_CLR
    (
      @List      NVARCHAR(MAX),
      @Delimiter NVARCHAR(255)
    )
    RETURNS TABLE (
    Item Int   —  Question (1)?  
    —Item NVARCHAR(4000)
    )
    EXTERNAL NAME CLRUtilities.UserDefinedFunctions.SplitString_Multi;
    GO
    —2) I need to return identity column along with string column in return.
    CREATE ASSEMBLY CLRUtilities FROM ‘c:DLLsCLRUtilities.dll’
     WITH PERMISSION_SET = SAFE;
    GO
    CREATE FUNCTION dbo.SplitStrings_CLR
    (
      @List      NVARCHAR(MAX),
      @Delimiter NVARCHAR(255)
    )
    RETURNS TABLE ( Item NVARCHAR(4000)
    ——  Question (2) ?     need to genrate id identity column
    )

  35. Hey Adam,
    Just a quick note to say thanks and to ask a quick question…
    What’s the best way to make this function return a final null row if the last char is a delimiter?
    My use case: I am breaking up some condensed audit data from a third party DBMS and the order and volume of the columns in each audit record will be different from record to record.  I need to treat every value before and after a delimiter as a ‘column’ of the audit record so I can’t afford to lose columns even if they contain null values.
    And thanks for the post!  The function scales brilliantly!

LEAVE A REPLY

Please enter your comment!
Please enter your name here