Home Uncategorized Pattern-based split string

    Pattern-based split string

    706
    9

    “hickymanz” asked in the SQL Server Central forums for a method of counting unique words in a text column. Wayne Lawton recommended using a string split function, which was a good idea, but not quite adequate for the job in my opinion.

    Typical string split functions, like this one that I wrote can handle only a single delimiter, e.g. a comma. But in the case of splitting for unique words you want all sorts of possible delimiters — punctuation marks, white space including spaces, possibly numerics, etc.

    I believe I’ve seen other, similar requests for getting unique words before, so I decided to solve the problem. I modified that split string function to accept a pattern of valid characters for the second argument. Anything NOT in the pattern will be treated as a delimiter. I figured that would be a bit more flexible than the other way around, so that people don’t have to type in (or figure out) every single possible white space and/or punctuation character.

    Anyway, I think the modification is pretty straightforward if you understand the previous function, so check it out and if you have any questions feel free to ask. Here is the function:

    CREATE FUNCTION dbo.SplitStringPattern
    (
    	@List TEXT,
    	@Pattern VARCHAR(50)
    )
    RETURNS @ReturnTbl TABLE (OutParam VARCHAR(20))
    WITH SCHEMABINDING
    AS
    BEGIN
    	DECLARE @LeftSplit VARCHAR(7998)
    	DECLARE @SplitStart INT SET @SplitStart = 0
    	DECLARE @SplitEnd INT
    	SET @SplitEnd = 7998
    
    	SET @Pattern = '%' + '[^' + RIGHT(@Pattern, LEN(@Pattern) - 1) + '%'
    
    	DECLARE @Delimiter CHAR(1)
    	SELECT @Delimiter = CHAR(MAX(Number))
    	FROM dbo.Numbers
    	WHERE CHAR(Number) LIKE @Pattern
    
    	SELECT @SplitEnd = MAX(Number)
    	FROM dbo.Numbers
    	WHERE (SUBSTRING(@List, Number, 1) LIKE @Pattern
    			OR Number = DATALENGTH(@List) + 1)
    		AND Number BETWEEN @SplitStart AND @SplitEnd
    
    	WHILE @SplitStart < DATALENGTH(@List) - 1
    	BEGIN
    		SET @LeftSplit = @Delimiter + SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) + @Delimiter
    
    		INSERT @ReturnTbl (OutParam)
    		SELECT LTRIM(RTRIM(SUBSTRING(@LeftSplit, Number + 1,
    	                    PATINDEX(@Pattern, SUBSTRING(@LeftSplit, Number + 1, LEN(@LeftSplit))) - 1)))
    	                AS Value
    		FROM   dbo.Numbers
    		WHERE  Number <= LEN(@LeftSplit) - 1
    			AND SUBSTRING(@LeftSplit, Number, 1) LIKE @Pattern
    			AND SUBSTRING(@LeftSplit, Number + 1,
    	                    	PATINDEX(@Pattern, SUBSTRING(@LeftSplit, Number + 1, LEN(@LeftSplit))) - 1) <> ''
    
    		SET @SplitStart = @SplitEnd + 1
    		SET @SplitEnd = @SplitEnd + 7998
    
    		SELECT @SplitEnd = MAX(Number) + @SplitStart
    		FROM dbo.Numbers
    		WHERE (SUBSTRING(@List, Number + @SplitStart, 1) LIKE @Pattern
    				OR Number+@SplitStart = DATALENGTH(@List) + 1)
    			AND Number BETWEEN 1 AND @SplitEnd - @SplitStart
    	END
    
    	RETURN
    END
    

    A note on the pattern for input: The pattern should be single-character based, and must be delimited with [ ]. Also, you must type in the actual characters in the pattern. I do not recommend using ranges; ranges cause strange side-effects because they tend to include (depending on collation) characters that you wouldn’t expect to show up (e.g. characters with umlauts over them). So unless that’s what you really want, don’t use ranges.

    Here’s a usage example:

    SELECT OutParam
    FROM dbo.SplitStringPattern('This is a test.', '[abcdefghijklmnopqrstuvwxyz]')
    
    
    OutParam             
    -------------
    This
    is
    a
    test
    

    Enjoy!

    Previous articleBitmask Handling, part 4: Left-shift and right-shift
    Next articleScript out PKs/UNIQUE constraints and referencing FKs
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    9 COMMENTS

    1. Hi Adam,
      I have two tables within a SQL database. The 1st table has an identified column and column which lists one of more email identifers for a second table,
      e.g.
      ID     Email
      —     ———-
      1      AS1 AS11
      2      AS2 AS3 AS4 AS5
      3      AS6 AS7
      The second table has a column which has an email identifier and another column which lists one email address for that particular identifier, e.g.
      ID      EmailAddress
      —     ——————
      AS1      abcstu@emc.com
      AS2      abcstu2@emc.com
      AS3      abcstu3@emc.com
      AS4      abcstu4@em.com
      AS5      abcstu5@emc.com
      AS6      abcstu6@emc.com
      AS7      abcstu7@emc.com
      AS11     abcstu8@emc.com
      I need to create a stored procedure or function that:
      1. Selects an Email from the first table, based on a valid ID,
      2. Splits the Email field of the first table (using the space separator) so that there is an array of Emails and then,
      3. Selects the relevant EmailAddress value from the second table, based on a valid Email stored in the array
      Is there any way that this can be done directly within SQL Server using a stored procedure/function without having to use cursors?
      Many Thanks,
      Chris Law
      probetatester@yahoo.com

    2. Hi Adam this is a great function but I have got string with such as ("hello","sdfsfs",….)
      My problem is how can I pass two parremeters such as , and "".
      and how can I insert the values into a table
      Thanks

    3. Hi adam i am doing a search engine for my job portal… if user wnters a search string i have to give him related records using rank technology.. can you suggest me .how to pass a full string to a querry it should be matched by all columns and give me results..
      ex: search string is .. i want jobs on c in alabama
      i have to sarch this string in posting table
      can you give me example please

    4. Hi Chris Law,
      I had a similar requirement,
      The query I used was
      select
      t2.userprofile_empcompid,
      t2.userprofile_empid,
      t1.email,
      case when CHARINDEX ( ‘@’ ,t1.email ) >0 then
      left(t1.email,(CHARINDEX ( ‘@’ ,t1.email  )-1)) else t1.email end a  ,
      t2.userprofile_emailadd
      from EPHR_Email t1
      INNER JOIN userprofile t2
      ON t1. CompId = t2.userprofile_empcompid
      AND t1.EmpId = t2.userprofile_empid
      hope it helps.

    5. Please help me, how to insert the result of split function?
      Eg.
      The result of split function is :
      Item
      =====
      001
      Items2
      55
      I wanna insert the result above to table tbl_data(code,name,qty).
      How can I do that?
      Thanx

    6. Hi Ngl,
      You’ll have to pivot the results–using either the PIVOT function or CASE expressions. Google "sql crosstab" for a number of solutions.

    7. Hi NGL,
      My requirement is how to split the strings in sqlserver 2005.
      for example:
      Vtable[Table Name]
      Fname[column Name varchar(60)]
      Aamina Yuusuf Dudub Bashiir—-> Split this word as follows
                                      Aamina
                                      Yuusuf
                                      Dudub
                                      Bashiir

    Comments are closed.