Home Uncategorized Pattern-based replacement UDF

    Pattern-based replacement UDF

    892
    21

    As a personal challenge, I decided to write a UDF that will work just like T-SQL’s REPLACE() function, but using patterns as input.

    The first question: How does REPLACE() handle overlapping patterns?

    SELECT REPLACE('babab', 'bab', 'c')
    
    --------------------------------------------------
    cab
    
    (1 row(s) affected)
    
    SELECT REPLACE('bababab', 'bab', 'c')
    
    --------------------------------------------------
    cac
    
    (1 row(s) affected)
    

    It appears that SQL Server parses the input string from left to right, replacing the first instance of the replacement string, and then continues parsing to the right.

    Next question: How to do the replacement on a pattern? As it turns out, this is somewhat trickier than I initially thought. A replacement requires a starting point — easy to find using PATINDEX — and an end point. But there is no function for finding the last character of a pattern. So you’ll see that the UDF loops character-by-character, testing PATINDEX, in order to find the end of the match. This is useful for situations like:

    SELECT dbo.PatternReplace('baaa', 'ba%', 'c')
    
    -- We know that the match starts at character 1... but where does it end?
    

    Anyway, enough background, here’s the code:

    CREATE FUNCTION dbo.PatternReplace
    (
       @InputString VARCHAR(4000),
       @Pattern VARCHAR(100),
       @ReplaceText VARCHAR(4000)
    )
    RETURNS VARCHAR(4000)
    AS
    BEGIN
       DECLARE @Result VARCHAR(4000) SET @Result = ''
       -- First character in a match
       DECLARE @First INT
       -- Next character to start search on
       DECLARE @Next INT SET @Next = 1
       -- Length of the total string -- 8001 if @InputString is NULL
       DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
       -- End of a pattern
       DECLARE @EndPattern INT
     
       WHILE (@Next <= @Len) 
       BEGIN
          SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
          IF COALESCE(@First, 0) = 0 --no match - return
          BEGIN
             SET @Result = @Result + 
                CASE --return NULL, just like REPLACE, if inputs are NULL
                   WHEN  @InputString IS NULL
                         OR @Pattern IS NULL
                         OR @ReplaceText IS NULL THEN NULL
                   ELSE SUBSTRING(@InputString, @Next, @Len)
                END
             BREAK
          END
          ELSE
          BEGIN
             -- Concatenate characters before the match to the result
             SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
             SET @Next = @Next + @First - 1
     
             SET @EndPattern = 1
             -- Find start of end pattern range
             WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
                SET @EndPattern = @EndPattern + 1
             -- Find end of pattern range
             WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
                   AND @Len >= (@Next + @EndPattern - 1)
                SET @EndPattern = @EndPattern + 1
    
             --Either at the end of the pattern or @Next + @EndPattern = @Len
             SET @Result = @Result + @ReplaceText
             SET @Next = @Next + @EndPattern - 1
          END
       END
       RETURN(@Result)
    END
    

    … And here’s how you run it, with some sample outputs showing that it does, indeed, appear to work:

    SELECT dbo.PatternReplace('babab', 'bab', 'c')
    
    --------------------------------------------------
    cab
    
    (1 row(s) affected)
    
    SELECT dbo.PatternReplace('babab', 'b_b', 'c')
    
    --------------------------------------------------
    cab
    
    (1 row(s) affected)
    
    SELECT dbo.PatternReplace('bababe', 'b%b', 'c')
    
    --------------------------------------------------
    cabe
    
    (1 row(s) affected)
    

    Hopefully this will help someone, somewhere. I haven’t found any use for it yet 🙂

    Thanks to Steve Kass for posting some single-character replacement code which I based this UDF on.


    Update, January 10, 2005: Thanks to Frank Kalis, I’ve tracked down some problems with the original UDF. The version posted here has been fixed and now should respond identically to the T-SQL REPLACE function when NULLs or non-pattern-based arguments are passed in. The following example pairs should return the same values (and do, at this point!)

    SELECT dbo.PatternReplace(NULL, '', 'abc')
    SELECT REPLACE(NULL, '', 'abc')
    
    SELECT dbo.PatternReplace('abc', '', NULL)
    SELECT REPLACE('abc', '', NULL)
    
    SELECT dbo.PatternReplace('abc', NULL, '')
    SELECT REPLACE('abc', NULL, '')
    
    SELECT dbo.PatternReplace('abc', 'b', '')
    SELECT REPLACE('abc', 'b', '')
    
    SELECT dbo.PatternReplace('adc', 'b', '')
    SELECT REPLACE('adc', 'b', '')
    
    Previous articleCaveats of the TEXT datatype
    Next articleNo, stored procedures are NOT bad
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    21 COMMENTS

    1. Thanks a lot for this function!
      Would you mind if I use it in a function to clean SQL Injected databases? It’s very useful to search for script tags and remove them.

    2. You can do whatever you want with it, short of selling it without giving me a chunk of the profit 🙂

    3. Hi Adam,
      I have a similar kind of requirement for my Quiz project but a different way. In my DB, I have texts which are similar as below:
      "You selected the option as [QuizResponse#1001]. Do you need to go to [QuizResponse#1002]?"
      After the mark ups "[QuizResponse#" the following 4 numbers is the Quiz ID. Then there is a character "]". I need to get those Quiz ids. There can be ‘n’ number of such ids. Once I have the ids, I need to replace the response entered for those with the mark ups.
      Any help would be appreciated.
      Thanks,
      Sharmin

    4. Nice code, thank you!
      Here’s a twist, the pattern I’m looking to replace is "anything in brackets, including the brackets". An example would be:
      Chris[BLAH] Columbus
      Is changed to:
      Chris Columbus
      I tried
      select dbo.PatternReplace(‘Chris[Blah] Columbus’, ‘[%]’,”)
      but it doesnt seem to work.
      Is there a way to get around this?
      thanks!

    5. Hi Chris,
      The problem is that square brackets are used to control the regex used by LIKE, for single characters. So your pattern actually says: Find all matches for the character "%".
      To fix this, we have to quote the opening bracket–which can be done using the QUOTENAME function:
      SELECT QUOTENAME(‘[‘)
      –[[]
      … and then you can pass it to the function using the following pattern:
      select dbo.PatternReplace(‘Chris[Blah] Columbus’, ‘[[]%]’,”)
      … there does seem to be a bug here; the space is getting matched and thrown out. I’ll take a look and see if I can fix it… I actually thought I already had looked at this previously, but maybe I’m thinking of something else.

    6. Great function with one observation:
      — executing this…
      SELECT dbo.PatternReplace(UPPER(‘6/24/1976 5320305 C101/262′),'[0-9A-Z]’,’x’)
      — produces this (replacing embedded spaces with ‘x’)…
      x/xx/xxxxxxxxxxxxxxx/xxx
      — rather than this…
      x/xx/xxxx xxxxxxx xxxx/xxx
      — unless I change all your VARCHARs to NVARCHAR
      –FYI

    7. Hi Rob,
      Seems to be the same issue that Chris had. Glad to hear that you’ve found a workaround. I don’t recall if I did any research on it back in April of 2010 but given that I didn’t follow up here I’m thinking that I didn’t. So now we have the answer. Thanks for sharing!

    8. What if I want to replace all characters up to one special character, in my case underscore "_". So something like "87hdisuh_989", and I want only the remainder after "_"?

    9. Joe,
      I think you’d want to use the built-in RIGHT function for this, in conjunction with LEN and CHARINDEX:
      SELECT RIGHT(’87hdisuh_989′, LEN(’87hdisuh_989′) – (CHARINDEX(‘_’, ’87hdisuh_989′)))

    10. Another option is to use SUBSTRING, which doesn’t require finding the length of the string. It would be interesting to test and see which is faster:
      SELECT SUBSTRING(’87hdisuh_989′, CHARINDEX(‘_’, ’87hdisuh_989′) + 1, 2147483647)

    11. Table name: Answers
      Column name: Answer
      144-TRIAL-Telnet 216
      281-TRIAL-ROM 198
      122-TRIAL-SRAM 51
      121-TRIAL-DRAM 299
      257-TRIAL-PRAM 276
      ——————
      —————–
      —————–
      till 1722 records
      I am suppose to remove [1/2/3digits]-TRIAL-“string”[space][2/3 digits]  using sql statement
      “String should be retained as it is”
      For example in the row data: 144-TRIAL-Telnet 216,  
      I should retain the text Telnet.
      Please any help in writing sql statement for retaining only string in the above column

    12. Thanks for the help.
      Im a beginner learner so please check whether the below statement is correct or not.If not please help me:
      select left(SUBSTRING(substring(‘144-TRIAL-Telnet
      216’,CHARINDEX(‘-‘,’144-TRIAL-Telnet 216’)+1,100),
      CHARINDEX(‘-‘,substring(‘144-TRIAL-Telnet 216’,CHARINDEX(‘-‘,’144-TRIAL-Telnet
      216’)+1,100))+1,100),CHARINDEX(‘ ‘,SUBSTRING(substring(‘144-TRIAL-Telnet
      216’,CHARINDEX(‘-‘,’144-TRIAL-Telnet 216’)+1,100),
      CHARINDEX(‘-‘,substring(‘144-TRIAL-Telnet 216’,CHARINDEX(‘-‘,’144-TRIAL-Telnet
      216’)+1,100))+1,100)))

    13. @Joshna:
      Yours returns an empty string. Not sure what’s up with that. Here’s what I had in mind:

      DECLARE @t varchar(50) = ‘144-TRIAL-Telnet 216’
      SELECT SUBSTRING(@t, 11, CHARINDEX(‘ ‘, @t) – 11)

    14. If the character following the replacement pattern is a space, it replaces that as well as the pattern.  Was this intentional?

    15. @Ken:
      Nope, not intentional – but if you read the prior comments you’ll see the issue:
      These two things produce the same result:
      select patindex (‘cd’, ‘cd’)
      select patindex (‘cd’, ‘cd ‘)
      Changing the function to use NVARCHAR fixes things:
      select patindex (N’cd’, N’cd ‘)
      –Adam

    Comments are closed.