Home Uncategorized SQLCLR String Splitting Part 2: Even Faster, Even More Scalable

SQLCLR String Splitting Part 2: Even Faster, Even More Scalable

April 28, 2009

5850

Two days ago, after posting what I thought was a pretty solid SQLCLR string splitting method, I received a comment telling me about a big thread on SQLServerCentral dedicated to the question of how best to split strings. So I jumped in, and went back and forth, and back and forth, and back… and forth…

Many, many messages and several revisions and re-revisions later, I am happy to present the newer, better, more bug-free SQLCLR string splitting function. Not only is it faster than the previous one I posted, but it also handles multi-character delimiters, ignores consecutive delimiters as well as pre and post delimiters, and best of all, it won’t get itself stuck in an infinite loop in certain cases (always a really great stability feature).

This version walks the SqlChars character array rather than using the IndexOf method on the string, which we found to be a somewhat faster technique–and it’s certainly the most scalable and memory efficient method I can imagine.

Special thanks to SQLServerCentral member Florian Reischl, who was the main person keeping the thread going with me during the last couple of days, and who managed to re-write my versions and eke out even better performance by modifying the algorithms. Great stuff–it was definitely the most fun I’ve had on a technical forum in quite a long time.

But without further ado, the code:

using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(
       FillRowMethodName = "FillRow_Multi",
       TableDefinition = "item nvarchar(4000)"
       )
    ]
    public static IEnumerator SplitString_Multi(
      [SqlFacet(MaxSize = -1)]
      SqlChars Input,
      [SqlFacet(MaxSize = 255)]
      SqlChars Delimiter
       )
    {
        return (
            (Input.IsNull || Delimiter.IsNull) ?
            new SplitStringMulti(new char[0], new char[0]) :
            new SplitStringMulti(Input.Value, Delimiter.Value));
    }

    public static void FillRow_Multi(object obj, out SqlString item)
    {
        item = new SqlString((string)obj);
    }

    public class SplitStringMulti : IEnumerator
    {
        public SplitStringMulti(char[] TheString, char[] Delimiter)
        {
            theString = TheString;
            stringLen = TheString.Length;
            delimiter = Delimiter;
            delimiterLen = (byte)(Delimiter.Length);
            isSingleCharDelim = (delimiterLen == 1);

            lastPos = 0;
            nextPos = delimiterLen * -1;
        }

        #region IEnumerator Members

        public object Current
        {
            get
            {
                return new string(theString, lastPos, nextPos - lastPos);
            }
        }

        public bool MoveNext()
        {
            if (nextPos >= stringLen)
                return false;
            else
            {
                lastPos = nextPos + delimiterLen;

                for (int i = lastPos; i < stringLen; i++)
                {
                    bool matches = true;

                    //Optimize for single-character delimiters
                    if (isSingleCharDelim)
                    {
                        if (theString[i] != delimiter[0])
                            matches = false;
                    }
                    else
                    {
                        for (byte j = 0; j < delimiterLen; j++)
                        {
                            if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
                            {
                                matches = false;
                                break;
                            }
                        }
                    }

                    if (matches)
                    {
                        nextPos = i;

                        //Deal with consecutive delimiters
                        if ((nextPos - lastPos) > 0)
                            return true;
                        else
                        {
                            i += (delimiterLen-1);
                            lastPos += delimiterLen;
                        }
                    }
                }

                lastPos = nextPos + delimiterLen;
                nextPos = stringLen;

                if ((nextPos - lastPos) > 0)
                    return true;
                else
                    return false;
            }
        }

        public void Reset()
        {
            lastPos = 0;
            nextPos = delimiterLen * -1;
        }

        #endregion

        private int lastPos;
        private int nextPos;

        private readonly char[] theString;
        private readonly char[] delimiter;
        private readonly int stringLen;
        private readonly byte delimiterLen;
        private readonly bool isSingleCharDelim;
    }
};

Enjoy!

49 COMMENTS

Paul White April 29, 2009 At 2:20 pm

I hope people read the background to appreciate what a work of art that code now is. Awesome work.
- Daniel Adeniji September 12, 2022 At 11:24 am
  
  Paul White:-
  
  Nothing like seeing Paul White encouraging somebody.
  
  Top Shelf.
  
  So ahead of everybody. Yet so encouraging.
  
  God couldn’t have found someone else better to entrust all he gloried you with.
  
  Daniel
Michael May 10, 2009 At 12:43 pm

I think you should make skipping consecutive delimiters an option as there are plenty of cases where you are looking for the x value, and you have to include blank ones in your count.
Matt November 26, 2009 At 6:03 pm

Was wondering if you considered including a "position" element in the output table along with the item element. Also, what about making the item element nvarchar(max) rather than 4000? And did you compare the performance difference between 4000 and max parameters? In other senarios, I’ve created a few SQL functions to call the same C# code, so you can call a faster SQL function when you know your inputs/outputs will be limited to 4000 characters, or call the slower SQL function when you need unlimited length inputs/outputs.
Darek Spiewak February 27, 2012 At 6:43 am

Hi there. Maybe I’m not entirely in the know here, so please pardon me if my remark is stupid and disregard it altogether in that case… Would it not be easier and better (faster???) to use regexes for splitting strings? I remember that there are built-in options in regex’s methods that will allow one to split strings based not only on concrete chars but also on patterns and to automatically get rid of empty entries as well; if we compile the regex object, it could be even faster, even though it’ll incur some start-up penalty (this shouldn’t matter if there are millions of rows to process, though). And, of course, the IEnumerable is then for free! I have not checked the difference, though, between the code presented here and the regex-based solution that I propose performance-wise. But I think the regexes are a viable solution that gives even more flexibility than the (beautiful) code above.
Adam Machanic February 27, 2012 At 6:11 pm

Hi Darek,
You are correct; you can use RegEx. It’s not faster, but it certainly is much easier to implement. Unfortunately it has a major downside: the entire collection of substrings will be allocated upfront. It will be sitting in memory the entire time you’re processing, using up precious server resources. And if you have a lot of elements in that collection, it will go on the LOB heap and performance across the entire AppDomain could be severely impacted. All in all, not the kind of solution I want inside my very expensive SQL Server instance.
–Adam
Darek Spiewak March 1, 2012 At 10:12 pm

Hi Adam.
Very right you are in what you say. My first comment was misfired. After I have read the whole conversation and other bits and pieces here and there, I get the point and agree that the above solution would be the ultimate. It’s a pity that one cannot in this case split on regexes… but I guess one cannot have a cake and eat it 🙂
Best,
Darek
jimbobmcgee May 9, 2012 At 11:29 pm

I know this post is really old, but if you are calling Input.Value, are you then not doing the same pre-allocation you are trying to avoid?
Should we not find some way to stream the chars in using SqlChars.Read and stream it back out using SqlChars.Write?
Something like (pseudo, single-char delim):
SqlChars currentResult = new SqlChars();
long inputOffset = 0, outputOffset = 0;
while (true) {
int read = Input.Read(inputOffset, Input.Buffer, 0, Input.Buffer.Length);
if (read > 0) {
for (int i = 0; i < read; i++) {
if (Input.Buffer[i] == delimChar) {
// iterate the current string and create the next one
yield return currentResult;
currentResult = new SqlChars();
outputOffset = 0;
}
else {
// append the char to the current string
currentResult.Write(outputOffset++, Input.Buffer, i, 1);
}
}
offset += read;
}
else break; // no more characters in input
}
I don’t know for certain whether the calls to Read and Write are correct, as I’m still learning SQLCLR (in particular how to handle LOBs — I came here looking for a possible means to split more than 4000 chars and return more than 4000 chars)…
jimbobmcgee May 9, 2012 At 11:38 pm

(There are also clearly some bugs in my above pseudo-code, too: `offset += read;` should be `inputOffset += read;`, for one; and another `yield return currentResult;` is required to spool the last result — but you should get the gist)
Adam Machanic May 11, 2012 At 7:05 am

jimbobmcgee: YES, we should find a way to stream. Unfortunately, Microsoft broke the streaming interface for LOBs in SQL Server 2008. (Here is a Connect item describing the issue: https://connect.microsoft.com/SQLServer/feedback/details/382235/clr-error-system-invalidoperationexception-accessing-members-of-an-object-from-a-wrong-thread) … Apparently it is now fixed, but I’ve not tried it out yet.
That said, bringing in the string is very different from holding a reference to all of the sub-strings. The latter case is much more expensive due to pointer overhead.
Dave July 20, 2012 At 4:37 pm

Hi Adam
Fantastic article. I’ve used a slightly modified version of your code in my application and seen tremendous performance gains over native SQL attempts at the same thing.
I hope you don’t mind, but because I struggled to create the actual DLL and integrate it (first time I’ve written a CLR UDF), I’ve written a how-to on my blog using your code above as an example.
Due credit is given in your direction obviously. Hope that’s okay?
Keep up the good work.
Dave July 20, 2012 At 4:38 pm

Sorry should have added… http://www.codepal.co.uk/show/How_to_integrate_a_NET_CLR_UDF_into_SQL_Server
Duh…
Patrick August 29, 2012 At 11:57 am

Hi,
I was very impressed by the performance of this function.
I would need this Split function to return empty strings if two delimiters are consecutive. Which changes would be necessary ?
Thanks
Adam Machanic August 29, 2012 At 7:25 pm

Patrick,
See the part of the code with the following comment:
//Deal with consecutive delimiters
🙂
Chris Rutherford December 5, 2012 At 12:12 am

Hi there Adam,
Thanks for posting this, I’ve created my first CLR from this code and it works like a charm. I read the original SQL Server Central thread and a number of solutions also returned a counter as well in the table definition for each row returned.
So table definition would be the following:
TableDefinition = "counter int, item nvarchar(4000)"
I’m no C# guru but I want to add this counter to the code but I’m not sure how to do this. I’ve tried a few things but I can’t get my new output parameter from the SplitStringMulti class into the FillRow_Multi object, i.e:
public static void FillRow_Multi(object obj, out SqlInt32 counter, out SqlString item)
{
counter = 1; // I don’t know???
item = new SqlString((string)obj);
}
If you have a moment are you able to advise please?
Kind regards,
Chris
Chris Rutherford December 5, 2012 At 12:27 am

Uh, never mind, I just read page 50 of that thread on SQL Server Central and I see you’ve added the row number there =)
Thanks again!
Chris Rutherford December 5, 2012 At 12:45 am

Oh, actually your code in this blog post and the code on page 50 (which includes the row numbers) is quite different.
I thought I’d try to modify your code on this blog with the row number code but I’m not having much luck. So my original request still stands, are you able to assist please?
Kind regards,
Chris
Chris Rutherford December 5, 2012 At 3:43 am

Hi there,
Sorry for all of the comments. I think I’ve managed to do this. I’ve also modified the code as I’m only ever interested in a one character delimiter. I’m still a C# newb so may have missed things.
Essentially the change returns an additional column which shows the place of the delimited field within the original record. So it’s not meant to be a unique integer value.
Thank you so much again for posting this on your blog.
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(
FillRowMethodName = "FillRow_Multi",
TableDefinition = "row int, item nvarchar(4000)"
)
]
public static IEnumerator SplitString_Multi(
[SqlFacet(MaxSize = -1)] SqlChars Input,
[SqlFacet(MaxSize = 1)] SqlChars Delimiter
)
{
return ((Input.IsNull || Delimiter.IsNull) ? new SplitStringMulti(new char[0], new char[0]) : new SplitStringMulti(Input.Value, Delimiter.Value));
}
public static void FillRow_Multi(object obj, out SqlInt32 row, out SqlString item)
{
ResultSet rs = (ResultSet)obj;
row = new SqlInt32((int)rs.get_row);
item = new SqlString((string)rs.get_value);
}
public class ResultSet
{
private int row;
private string value;
private int start;
private int end;
private char[] theString;
public ResultSet(int input_row, char[] input_value, int input_start, int input_end)
{
row = input_row;
value = new string(input_value, input_start, input_end);
}
public object get_row
{
get
{
return row;
}
}
public object get_value
{
get
{
return value;
}
}
}
public class SplitStringMulti : IEnumerator
{
public SplitStringMulti(char[] TheString, char[] Delimiter)
{
theString = TheString;
stringLen = TheString.Length;
delimiter = Delimiter;
lastPos = 0;
nextPos = 1 * -1;
}
public object Current
{
get
{
ResultSet results = new ResultSet(row, theString, lastPos, nextPos – lastPos);
return results;
}
}
public bool MoveNext()
{
if (nextPos >= stringLen)
{
return false;
}
else
{
lastPos = nextPos + 1;
for (int i = lastPos; i < stringLen; i++)
{
bool matches = true;
if (theString[i] != delimiter[0])
{
matches = false;
}
else
{
nextPos = i;
row++;
return true;
}
}
lastPos = nextPos + 1;
nextPos = stringLen;
if ((nextPos – lastPos) > 0)
{
return true;
}
else
{
return false;
}
}
}
public void Reset()
{
lastPos = 0;
nextPos = 1 * -1;
row = 0;
}
private int lastPos;
private int nextPos;
private int row = 0;
private readonly char[] theString;
private readonly char[] delimiter;
private readonly int stringLen;
}
};
- Mike C August 24, 2020 At 7:38 pm
  
  Chris,
  I too was looking for a solution that included an index or row number and was happy to see your code. But, if I put in a string like 12345-123-456-0-0, the last two items have the same row number. Any idea why that is happening?
  Thanks!
  Mike
Adam Machanic December 5, 2012 At 5:50 pm

Hi Chris,
Glad you got it working, but the code in the SSC thread would have been totally fine to use — I’ve had it in a production environment since a couple of days after I posted there, with no problems at all.
That said, probably good that you got a chance to play with some C#. Always nice to learn new things.
–Adam
Maurice December 21, 2012 At 4:19 pm

Great job, got it running just fine. Thanks !!
Surya Pratap December 26, 2012 At 4:56 pm

Hi Chris,
I am a newbie myself but IMHO considering that the Enumeration will run only once for a given string, I suppose you can define your row number with just
public class ResultSet
{
private readonly int row;
private readonly string value;
public ResultSet(int row, string value)
{
this.row = row;
this.value = value;
}
}
in the SplitStringMulti class keep a field
private int row;
initalise row in "Constructor" and during "Reset" set row to 0 by adding
row = 0;
then in the iterator you can use
public object Current
{
get
{
return = new ResultSet(++row, theString( lastPos, nextPos – lastPos));
}
}
and use ResultSet in the FillRow_Multi like
public static void FillRow_Multi(object obj, out SqlInt32 row, out SqlString item)
{
ResultSet rs = (ResultSet)obj;
row = new SqlInt32(rs.row);
item = new SqlString(rs.value);
}
sheir September 13, 2013 At 10:16 pm

Hi,
I am new to SQL CLR.
Is there a usage example or even how to use this class in VS2012 / SQL2008R2?
The usage scenario I have is that I have a pipe (|) and comma delimited string. The comma separates the "records/rows" and the pipe separates the "columns" (all string values).
So would like to pass in my string and have it (in SQL2008R2) spit out a table with the columns and rows.
Is that possible?
Ronen Ariely - pituach November 17, 2013 At 9:42 pm

Good day,
If I may offer a small streamlining.
Since you are changing the code from time to time and you add improvements. It is worth using the AssemblyVersion attribute.
This will allow the user to update the assembly directly by using "ALTER ASSEMBLY…"
all that need to be done is add this to the code:
using System.Reflection;
[assembly: AssemblyVersion("1.0.0.0")]
and you can add attributes like those if you want:
[assembly: AssemblyFileVersion("1.0.0.0")]
[assembly: AssemblyDescription("Adam Machanic Split CLR")]
[assembly: AssemblyCompany("Free Open Source Code")]
FY April 26, 2014 At 4:27 am

It would be interesting to have an example of how to run the function in SSMS. If I just call it there is an error that says it fails because is a table valued object function object.
gotqn October 10, 2014 At 4:11 pm

It will be nice if the post is updated (in order to be sure it is working fine with SQL Server 2012/2014). Also, is it working with nvarchar(max) input?
gotqn October 10, 2014 At 4:11 pm

It will be nice if the post is updated (in order to be sure it is working fine with SQL Server 2012/2014). Also, is it working with nvarchar(max) input?
Adam Machanic October 10, 2014 At 6:48 pm

@gotqn:
It works fine in 2012/2014; no need for an update. And yes, it is designed to handle large documents.
–Adam
Tim Cartwright April 15, 2015 At 8:28 pm

Adam, I found a way to make you function completely hang. To cause it to hang, all I did was pass in a empty string for my delimiter like so:
SELECT xs.* FROM dbo.xfnSplit(N’This is a test of the emergency broadcast system’, ”) xs
Simple fix of changing the return statement worked it out however:
return (
Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
new SplitStringMulti(new char[0], new char[0]) :
new SplitStringMulti(Input.Value, Delimiter.Value));
Adam Machanic April 15, 2015 At 9:46 pm

@Tim
Interesting! I must have found this at some point myself, as I just checked my local (somewhat modified) version, and it doesn’t hang. Although I have no recollection of when or why I did so, I seem to have fixed it by changing the following line in the SplitStringMulti constructor:
delimiter = Delimiter;
… to …
delimiter = (Delimiter.Length == 0 ? ‘ ‘ : Delimiter[0]);
… I think I prefer your solution, because what does an empty delimiter actually mean..?
–Adam
Tim Cartwright April 16, 2015 At 6:30 pm

Adam, made a small change to your original code that should not make any impact. I added a rowid to the output table. VERY NICE FUNCTION BTW! Would love to take a peek at the rest of your clr libs.
[SqlFunction(Name = "xfnSplit", FillRowMethodName = "FillRow_Multi", TableDefinition = "[RowID] int, [Item] nvarchar(4000)")]
public static IEnumerator SplitString_Multi(
[SqlFacet(MaxSize = -1)]
SqlChars Input,
[SqlFacet(MaxSize = 255)]
SqlChars Delimiter)
{
return (
Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
new SplitStringMulti(new char[0], new char[0]) :
new SplitStringMulti(Input.Value, Delimiter.Value));
}
public static void FillRow_Multi(object obj, out SqlInt32 rowid, out SqlString item)
{
SplitStringRow row = (SplitStringRow)obj;
rowid = row.RowId;
item = row.Value;
}
public class SplitStringMulti : IEnumerator
{
public SplitStringMulti(char[] TheString, char[] Delimiter)
{
theString = TheString;
stringLen = TheString.Length;
delimiter = Delimiter;
delimiterLen = (byte)(Delimiter.Length);
isSingleCharDelim = (delimiterLen == 1);
lastPos = 0;
nextPos = delimiterLen * -1;
}
#region IEnumerator Members
public object Current
{
get
{
return new SplitStringRow(rowid++, new string(theString, lastPos, nextPos – lastPos));
}
}
public bool MoveNext()
{
if (nextPos >= stringLen)
return false;
else
{
lastPos = nextPos + delimiterLen;
for (int i = lastPos; i < stringLen; i++)
{
bool matches = true;
//Optimize for single-character delimiters
if (isSingleCharDelim)
{
if (theString[i] != delimiter[0])
matches = false;
}
else
{
for (byte j = 0; j < delimiterLen; j++)
{
if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
{
matches = false;
break;
}
}
}
if (matches)
{
nextPos = i;
//Deal with consecutive delimiters
if ((nextPos – lastPos) > 0)
return true;
else
{
i += (delimiterLen – 1);
lastPos += delimiterLen;
}
}
}
lastPos = nextPos + delimiterLen;
nextPos = stringLen;
if ((nextPos – lastPos) > 0)
return true;
else
return false;
}
}
public void Reset()
{
rowid = 1;
lastPos = 0;
nextPos = delimiterLen * -1;
}
#endregion
private int rowid = 1;
private int lastPos;
private int nextPos;
private readonly char[] theString;
private readonly char[] delimiter;
private readonly int stringLen;
private readonly byte delimiterLen;
private readonly bool isSingleCharDelim;
}
internal class SplitStringRow
{
private int _rowid;
public int RowId { get { return _rowid; } }
private string _value;
public string Value { get { return _value; } }
public SplitStringRow(int rowid, string value)
{
_rowid = rowid;
_value = value;
}
}
Adam Machanic April 17, 2015 At 2:24 am

@Tim
I’m not a fan of that particular method, as it will create a new object for every substring. That’s going to have a measurable (and considerable) impact. Try to come up with another way to do it. (Hint: Create a reusable, pass-through object.)
–Adam
Tim Cartwright April 17, 2015 At 6:25 pm

Ok, made the changes you suggested Adam, as well as added a new split for ints. This way a delimited string of ints can be passed in and used in a join or a where clause without type conversion occurring. The only way I could think of to accomplish passing an object across was to use a delimeted string. Hopefully I did not fail your test at guessing your hint. 🙂
#region http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/ Adam Machanic: SQLCLR String Splitting Part 2: Even Faster, Even More Scalable
/// <summary>
/// Splits the string into a table of integers.
/// </summary>
/// <param name="Input">The input.</param>
/// <param name="Delimiter">The delimiter.</param>
/// <returns></returns>
/// <remarks>Author: Adam Machanic http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/</remarks>
[SqlFunction(Name = "xfnSplitInts", FillRowMethodName = "FillRow_MultiInt32", TableDefinition = "[Item] int")]
public static IEnumerator SplitInt(
[SqlFacet(MaxSize = -1)]
SqlChars Input,
[SqlFacet(MaxSize = 255)]
SqlChars Delimiter)
{
return (
Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
new SplitStringMulti(new char[0], new char[0]) :
new SplitStringMulti(Input.Value, Delimiter.Value, SplitType.Int32));
}
/// <summary>
/// Splits the string into a table of nvarchars.
/// </summary>
/// <param name="Input">The input.</param>
/// <param name="Delimiter">The delimiter.</param>
/// <returns></returns>
/// <remarks>Author: Adam Machanic http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/</remarks>
[SqlFunction(Name = "xfnSplit", FillRowMethodName = "FillRow_MultiString", TableDefinition = "[RowId] int, [Item] nvarchar(4000)")]
public static IEnumerator SplitString(
[SqlFacet(MaxSize = -1)]
SqlChars Input,
[SqlFacet(MaxSize = 255)]
SqlChars Delimiter)
{
return (
Input.IsNull || Delimiter.IsNull || Input.Value.Length == 0 || Delimiter.Value.Length == 0 ?
new SplitStringMulti(new char[0], new char[0]) :
new SplitStringMulti(Input.Value, Delimiter.Value, SplitType.String));
}
public static void FillRow_MultiInt32(object obj, out SqlInt32 item)
{
int tmp = 0;
if (int.TryParse(Convert.ToString(obj), out tmp))
{
item = tmp;
}
else
{
item = SqlInt32.Null;
}
}
public static void FillRow_MultiString(object obj, out SqlInt32 rowid, out SqlString item)
{
string[] row = Convert.ToString(obj).Split(new string[] { "<#>" }, StringSplitOptions.RemoveEmptyEntries);
if (row.Length == 2)
{
rowid = Convert.ToInt32(row[0]);
item = row[1];
}
else
{
throw new FormatException("Not able to property split row.");
}
}
public class SplitStringMulti : IEnumerator
{
public SplitStringMulti(char[] TheString, char[] Delimiter, SplitType type = SplitType.String)
{
this.type = type;
theString = TheString;
stringLen = TheString.Length;
delimiter = Delimiter;
delimiterLen = (byte)(Delimiter.Length);
isSingleCharDelim = (delimiterLen == 1);
lastPos = 0;
nextPos = delimiterLen * -1;
}
#region IEnumerator Members
public object Current
{
get
{
string tmp = new string(theString, lastPos, nextPos – lastPos);
if (type == SplitType.String)
{
return String.Format("{0}<#>{1}", rowid++, tmp);
}
else if (type == SplitType.Int32)
{
return tmp;
}
else
{
throw new NotSupportedException("This type of split is not implemented yet.");
}
}
}
public bool MoveNext()
{
if (nextPos >= stringLen)
{
return false;
}
else
{
lastPos = nextPos + delimiterLen;
for (int i = lastPos; i < stringLen; i++)
{
bool matches = true;
//Optimize for single-character delimiters
if (isSingleCharDelim)
{
if (theString[i] != delimiter[0])
{
matches = false;
}
}
else
{
for (byte j = 0; j < delimiterLen; j++)
{
if (((i + j) >= stringLen) || (theString[i + j] != delimiter[j]))
{
matches = false;
break;
}
}
}
if (matches)
{
nextPos = i;
//Deal with consecutive delimiters
if ((nextPos – lastPos) > 0)
{
return true;
}
else
{
i += (delimiterLen – 1);
lastPos += delimiterLen;
}
}
}
lastPos = nextPos + delimiterLen;
nextPos = stringLen;
if ((nextPos – lastPos) > 0)
{
return true;
}
else
{
return false;
}
}
}
public void Reset()
{
rowid = 1;
lastPos = 0;
nextPos = delimiterLen * -1;
}
#endregion
private SplitType type = SplitType.String;
private int rowid = 1;
private int lastPos;
private int nextPos;
private readonly char[] theString;
private readonly char[] delimiter;
private readonly int stringLen;
private readonly byte delimiterLen;
private readonly bool isSingleCharDelim;
}
public enum SplitType
{
String,
Int32
}
//internal class SplitStringRow
//{
// private int _rowid;
// public int RowId { get { return _rowid; } }
// private string _value;
// public string Value { get { return _value; } }
// public SplitStringRow(int rowid, string value)
// {
// _rowid = rowid;
// _value = value;
// }
//}
#endregion
Adam Machanic April 17, 2015 At 6:42 pm

@Tim:
In my version the stringSplit class internally creates an instance of another class, results, which looks like this:
public class results
{
public results(stringSplit ss)
{
this.ss = ss;
}
private readonly stringSplit ss;
public int RowNumber
{
get
{
return (ss.rowNumber);
}
}
public string OutputString
{
get
{
return (ss.result);
}
}
}
Results is effectively a singleton (though not enforced). When it’s time to return a row, stringSplit increments the rowNumber, populates the result string, and then returns the same exact instance of results, every single time.
The FillRowMethod then simply casts the input object as results and uses the properties to output the requisite data.
–Adam
Tim Cartwright April 17, 2015 At 6:55 pm

I see said the blind man! 😉 TYVM.
Tim Cartwright April 17, 2015 At 9:50 pm

All right Adam, I am not going to spam your comments section with code again, but I made the changes you suggested. I also removed the enum for my split type. I just handle the results class differently in the fillrow methods. This way the same enumerator can be reused for both quite seamlessly.
TY for the help!
- John Hardman December 1, 2017 At 11:21 am
  
  Hi Tim, I am a complete newb to C# and it appears you were trying to create exactly what i am now trying to create. Did you ever post your final code anywhere?
- Eddie Horbino October 1, 2022 At 2:43 am
  
  Could you please post your final code. Thanks.
John Nelson #2 May 19, 2015 At 9:58 am

How could a person change the code here to include NULL or empty values for cases when there are two delimiters in a row? I’m thinking of a scenario where I’ve been handed 10 column values all delimited….if column 3 is NULL, I need to make sure there are still 10 rows in my output, not 9 but this CLR splitter code above will just ignore that nullness.
Has someone handled this case and I’ve just missed the update in the many ridiculously long and varied blog posts and articles about this? 🙂
Adam Machanic May 19, 2015 At 4:16 pm

@John #2
I specifically wrote this code to ignore that case. But I think it’s easy to change.
Look for the comment marked "//Deal with consecutive delimiters" — and simply remove the "if" condition there. (i.e. always return true)
I would be very interested in understanding more about your scenario, as I’ve never seen one where consecutive delimiters were anything other than a data error.
–Adam
db August 5, 2015 At 3:48 pm

How does a newb use this? It looks like C# code not t-sql.
Adam Machanic August 5, 2015 At 4:22 pm

@db Yup, it’s C#; it’s a SQLCLR user-defined function. You need to get Visual Studio (i.e. SQL Server Data Tools) and compile it. Then you can deploy the DLL to your server.
I wrote a very in-depth chapter on how it’s done in the book "T-SQL Querying" that was released earlier this year. Check it out!
–Adam
J. Morgan Smith September 10, 2015 At 7:11 pm

First of all, I have learned QUITE a lot from your blogs as well as your PASS Summit Pre-con & con sessions, so THANK YOU!
This post is over 6 years old now and throughout the many comments you have mentioned several updates that either you have made or have suggested being made.
What do you say about a new post… "SQLCLR String Splitting Part 3: All the Bells and Whistles"?!?!?!
Again, I thank you for sharing your information and normally I would look forward to attending your PASS Summit 2015 sessions, but I don’t see your name in the list of speakers. 🙁
http://www.sqlpass.org/summit/2015/Sessions/Speakers.aspx
Adam Machanic September 11, 2015 At 6:53 pm

@J. Morgan:
Thanks for the suggestion – I’ll certainly consider it!
I’m skipping PASS this year for personal reasons, but it’s extremely likely I’ll return in 2016.
–Adam
Tuomo Kämäräinen December 30, 2015 At 2:02 pm

Hey. Thank you for the source code. Could you please add instructions on how to create the actual CLR in SQL Server?
Nikhilesh August 4, 2016 At 11:24 am

Hello all,
Please guide me, i m new in CLR funtion
I have blow tow question
–1) I need to return only int in lieu of String how can I ?
EXTERNAL NAME CLRUtilities.UserDefinedFunctions.SplitString_Multi;
GO
CREATE ASSEMBLY CLRUtilities FROM ‘c:DLLsCLRUtilities.dll’
WITH PERMISSION_SET = SAFE;
GO
CREATE FUNCTION dbo.SplitStrings_CLR
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE (
Item Int — Question (1)?
—Item NVARCHAR(4000)
)
EXTERNAL NAME CLRUtilities.UserDefinedFunctions.SplitString_Multi;
GO
—2) I need to return identity column along with string column in return.
CREATE ASSEMBLY CLRUtilities FROM ‘c:DLLsCLRUtilities.dll’
WITH PERMISSION_SET = SAFE;
GO
CREATE FUNCTION dbo.SplitStrings_CLR
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE ( Item NVARCHAR(4000)
—— Question (2) ? need to genrate id identity column
)
Martyn March 8, 2017 At 5:22 pm

Hey Adam,
Just a quick note to say thanks and to ask a quick question…
What’s the best way to make this function return a final null row if the last char is a delimiter?
My use case: I am breaking up some condensed audit data from a third party DBMS and the order and volume of the columns in each audit record will be different from record to record. I need to treat every value before and after a delimiter as a ‘column’ of the audit record so I can’t afford to lose columns even if they contain null values.
And thanks for the post! The function scales brilliantly!
John Hardman December 1, 2017 At 11:22 am

Hi Tim, I am a complete newb to C# and it appears you were trying to create exactly what i am now trying to create. Did you ever post your final code anywhere?
Mattias Woldu March 4, 2020 At 10:10 am

Thank you for this. Have saved me in a project when performance of STRING_SPLIT went of a cliff for varchar(max)/nvarchar(max).