Home Uncategorized Counting occurrences of a substring within a string

Counting occurrences of a substring within a string

January 11, 2005

5466

I have absolutely no idea why anyone wants to do this, but I keep answering the same question in forums: “How do I count the occurrences of a substring [note: usually comma] within a string?”

In an effort to thwart carpal tunnel syndrome, I have created the Ultimate Substring Occurrence Counting UDF.

… And here it is:

CREATE FUNCTION dbo.GetSubstringCount
(
	@InputString TEXT, 
	@SubString VARCHAR(200),
	@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
	RETURN 
	(
		SELECT COUNT(*)
		FROM dbo.Numbers N
		WHERE
			SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
			AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
			AND 0 = 
				CASE 
					WHEN @NoisePattern = '' THEN 0
					ELSE PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1))
				END
	)
END

First note: You need (regular readers, you guessed it) a numbers table.

Okay, so what’s it do? Simply put, it returns the number of times @SubString appears within @InputString. But wait! — Act now and you will receive an additional bonus feature at no extra cost! Can you feel the love?

The @NoisePattern parameter allows the user to put the UDF into “exact match” mode.

For instance, let’s say you have a big string containing some text about automobile manufacturers, and for some reason (again, I have no clue why people need this functionality — fill me in if you do!) you want to count the number of occurrences of the word “auto”, but not the number of occurrences of other forms of the word, e.g. “automobile” or …. some word that ends in “auto” (if such a word exists).

By specifying a pattern for @NoisePattern of characters that shouldn’t be adjacent to your word, you’re telling the UDF that any other characters are safe. Leaving the parameter empty means that all occurrences of the substring will be counted. Examples:

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')
-- Returns 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '')
-- Also returns 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '%[a-z]%')
-- Only returns 1 -- The exact match must not have adjacent alphabetic characters

23 COMMENTS

beckham May 7, 2007 At 7:59 am

Good one..
Jereme Guenther June 17, 2008 At 6:12 pm

I am looking for this functionality, however I would prefer to do it without a Numbers table.
My need is that of a search engine sproc. I need to be able to weight the returned records and one of the items in the weighting system is to check how many times the search pattern occurs in the record.
Pickles July 17, 2008 At 1:28 am

I need this too. I have a column that basically contains sql and I need to pull out field_ids from it, but each row has a different number of field_ids in the sql. Is there a different(better) solution?
Adam Machanic July 23, 2008 At 9:30 pm

I don’t know why you would want to do it without the numbers table. It makes the solution an order of magnitude faster.
Craig Hathaway February 23, 2009 At 5:06 am

Hi Adam, I have tried this code but even the included examples do not work!
SELECT dbo.GetSubstringCount(‘We like the autos, the autos that go boom.’, ‘auto’, ”)
— returns 0 <– should be 2
SELECT dbo.GetSubstringCount(‘Autos are fun. I like to drive my auto.’, ‘auto’, ”)
— returns 0 <– should be 2
SELECT dbo.GetSubstringCount(‘Autos are fun. I like to drive my auto.’, ‘auto’, ‘%[a-z]%’)
— returns 1 <– should be 1
Is there a SQL setting / version dependency that O could be missing?
Craig
Adam Machanic February 23, 2009 At 7:18 pm

Hi Craig,
Thanks for pointing this out. It is indeed a version issue. I wrote this back in the bad old days of SQL Server 2000, and apparently there was a change to the way PATINDEX works between 2000 and 2005. I just tested:
SELECT PATINDEX(”, ‘ ‘)
SQL Server 2000 returns 0, whereas SQL Server 2005 returns 1. This is breaking the third predicate in the WHERE clause within the function, which checks to see if the target string is prepended by anything that matches the input pattern.
I’ll have to think about how to fix this, but as a temporary workaround if you don’t want to use exact-match mode, you could pass in some character that you know can’t possibly exist in the target string:
SELECT dbo.GetSubstringCount(‘We like the autos, the autos that go boom.’, ‘auto’, CHAR(255))
–Returns 2, even in SQL Server 2005 or 2008
In the meantime, I’m wondering if this "new" behavior makes sense? I’m not sure, but I’m leaning towards SQL Server 2000’s answer. An empty pattern shouldn’t, in my opinion, match on anything at all…
Funmarkaz October 6, 2009 At 8:25 am

Great work dude!
Can i use it on MyISAM?
Adam Machanic October 6, 2009 At 9:27 pm

Funmarkaz, yes, it should work with a bit of modification; MySQL’s CREATE FUNCTION syntax isn’t quite the same as SQL Server’s.
Mike Schafer December 29, 2009 At 8:51 pm

I found this post because I was searching for a function for a project where I need to find out how many delimiters exist in a string. This implementation does not require a numbers table or anything other than the function itself. There is no "additional bonus" feature in this version but it will count occurences without any additional db objects. Happy Querying!
CREATE FUNCTION dbo.GetSubStringCount (
@InputString NVARCHAR(4000),
@SearchString VARCHAR(255)
)RETURNS INT WITH SCHEMABINDING AS
BEGIN
DECLARE @occurences AS BIGINT
,@position AS BIGINT
SET @occurences = 0
SET @position = 0
WHILE @position < LEN(@InputString)
BEGIN
IF CHARINDEX(@SearchString, @InputString, @position) > 0
BEGIN
SET @occurences = @occurences + 1
SET @position = CHARINDEX(@SearchString, @InputString, @position)
END
SET @position = @position+1
END
RETURN @occurences
END
Adam Machanic December 30, 2009 At 7:47 pm

Mike,
Thanks for sharing. A simpler and more efficient way to solve the problem (if you don’t want the “bonus” feature) is to do:
SELECT LEN(@InputString) – REPLACE(@InputString, @SearchString, ”)
I suspect that the numbers table will provide better performance than a WHILE loop, and both will be less efficient than the above solution, but I’ll leave that testing as an exercise for anyone interested in taking this a bit further. Even better would be to inline the Numbers table version (search my blog for my post on that topic), and a SQLCLR solution would probably be fastest of all. I would personally definitely keep the “bonus” around as it’s been quite useful in a few projects I’ve worked on.
Brian Lewis April 28, 2010 At 11:57 pm

My reason for wanting this functionality: to count line breaks in sys.syscomments in order to measure how many lines of T-SQL there are in the project. The bonus feature will allow blank lines to be excluded from the count.
Adam Machanic May 1, 2010 At 6:49 pm

Okay, the 2005/2008 bug is fixed.
Jason June 10, 2011 At 11:47 pm

Adam, thanks for posting this. Very handy use of the numbers table.
I’m troubleshooting a system that is experiencing tempdb meta-data contention and trying to identify queries that are creating temp objects within ad-hoc sql statements. I have traces of this activity and am using this to identify the ‘worst offenders’. Initially I was just counting each query where TextData was like ‘%table%’ until I found that some batches created 10-15 table variables, so I needed a way to count them.
Works great. thx.
Ron February 19, 2013 At 4:35 pm

Adam,
Is there a modification I can make to get:
SELECT dbo.GetSubstringCount(‘ROUTINE MEDICAL EXAMINATION’, ‘NORMAL ROUTINE HISTORY AND PHYSICAL’, ‘%[a-z]%’)
to return 1 (ROUTINE)
and
SELECT dbo.GetSubstringCount(‘REFLUX, ESOPHAGEAL’, ‘ESOPHAGEAL REFLUX’, ‘%[a-z]%’)
to return 2 (ESOPHAGEAL & REFLUX)
Adam Machanic February 19, 2013 At 5:50 pm

Ron: Sure, it’s doable, but it would be a completely different function. What you want to do is split both strings on any non-alpha character, then intersect the results. You can search my blog for a string splitter (the CLR version would probably be best), and then just use the INTERSECT operator to get your final answer.
–Adam
Rayliner October 29, 2014 At 2:20 am

I’m not sure if it is faster or slower, but how about this: a single line, no external tables needed.
String: ‘The dog and the cat broke the plate before the farmer got home’
find the length of the string (63)
upshift the entire string (and the search string), and replace the search string with nothing (”). Find the length of that new string (51)
Subtract the new length from the old length and divide by the length of the search string. The result will be the number of occurrences of the search string in the original line.
select
length(‘The cat and the dog broke the plate before the farmer got home.’) as orig_length,
length(replace(upper(‘The cat and the dog broke the plate before the farmer got home.’),upper(‘the’),”)) as newlength
,length(‘the’) as substringlength
,(length(‘The cat and the dog broke the plate before the farmer got home.’) – length(replace(upper(‘The cat and the dog broke the plate before the farmer got home.’),upper(‘the’),”)))/length(‘the’) as NumberOfOccurrences
from dual
or with a table that has a line of chars in colC, and you need to search for ‘x’:
select (length(colC) – length(replace(upper(colC),upper(‘x’)))/length(‘x’) as countByLine
from myTable;
Rayliner October 29, 2014 At 2:46 am

oops, looks like Adam mentioned this in one of the posts already. sorry to duplicate your solution. I was interested in having the count so when we were lookign for a variable name throughout our code, and we were looking to see how it was used, we wouldn’t need to walk through the entire line if it only existed once, or we would know to look for the extra occurrences. (We also bolded each occurrence so they would stand out, but the occurrence counter in addition to the bold helped out).
SAM January 21, 2015 At 7:56 pm

SELECT dbo.GetSubstringCount(‘We like the autos, the autos that go boom.’, ‘auto’, ”)
Does not work. Try searching "the", it returns 0 occurrences
Adam Machanic January 22, 2015 At 5:14 pm

@SAM
See my response to Craig from February 23, 2009.
–Adam
haxer February 10, 2015 At 2:21 pm

best ive ever seen.
ALTER FUNCTION [CountSubStrings]
( @String VARCHAR(8000), @SubString VARCHAR(100) )
RETURNS INT
BEGIN
RETURN (LEN(@String) –
LEN(REPLACE(@String, @SubString, ”))) /
LEN(@SubString)
END
Felix Pamittan February 17, 2015 At 11:25 am

Awesome!
Mycroft April 21, 2015 At 7:38 am

Well done all… @Haxer the most compact counter I have see. Great stuff..
Robert Bondy April 3, 2018 At 2:06 pm

Occcurence_Count = LENGTH(REPLACE(string_to_search,string_to_find,’~’))- LENGTH(REPLACE(string_to_search,string_to_find,”))

This solution is a bit cleaner than many that I have seen, especially with no divisor.
You can turn this into a function or use within a Select.
No variables required.
I use tilde as a replacement character, but any character that is not in the dataset will work.