Home Uncategorized Bitmask Handling, part 3: Logical operators

    Bitmask Handling, part 3: Logical operators

    483
    4

    It’s been longer than I hoped since my last installment on bitmask / big number handling. Life caught up with me and I’ve had many thankless tasks to catch up on. But that’s over now and I’m back to the general slacking that typifies my days, so welcome to Part 3, handling logical operators.

    I’ll be discussing four operators in this post. AND, OR, XOR, and NOT. The first three are extremely easy given the framework already built out in the previous two posts. The last one has some problems — so I’ll discuss those first.

    I haven’t been able to find any mathematical or computer science texts that discuss how to deal with variable-length bitmasks, which is what I’m attempting to define here. Most texts keep the discussion to, at most, two or four bytes, and even then those two or four bytes are stable. But the problem with using a variable number is that it creates some inconsistencies with regards to signing. Take this example:

    SELECT 1 AS Oops
    WHERE 
    	-1 <> 
    	CONVERT(INT, CONVERT(VARBINARY, CONVERT(SMALLINT, -1)))
    
    
    Oops
    ----
    1
    

    So what’s happening here? The -1 on the right side of the comparison is being cast into a 2-byte SMALLINT, then converted to VARBINARY (which produces 0xFFFF). Then it’s being converted back to INT. But guess what..?

    SELECT CONVERT(INT, 0xFFFF) AS Arrgh
    
    
    Arrgh
    -----
    65535
    

    So what is SQL Server telling us? That a small -1 is not the same as a bigger -1. -1 <> -1!!!

    You’re probably asking yourself, “what is this guy talking about, and what does any of this have to do with the topic at hand, implementing a binary NOT operation?” And if that’s what you were asking yourself, then good job, because you’ve asked the correct question. So what do they have to do with each other?

    The truth-table for NOT is quite simple:

    Input Result
    0 1
    1 0

    But let’s delve a little deeper. The SMALLINT representation of the decimal number 1 in hex is 0x0001. In binary, that’s 0000000000000001. If you run that through a logical NOT, the result is 1111111111111110, or 0xFFFE. That’s -2. But remember, that’s only -2 if you’re a SMALLINT. If you’re either an INT or a BIGINT, it’s 65534. And that’s just not consistent. I want to know that any equivalent number into my function yeilds an equivalent number on the way out. So NOT(0x000001) should yeild the same result as NOT(0x000000000001).

    … and that result, in the function I provide, will be: 0xFE. One byte in, one byte out. Similar to some prison gang mottos, but that’s a topic for a later post.

    You’ll notice that this is similar to the way I handled the bitmask re-constitution in the last post. So I feel pretty good about this. Sparsity is a good thing.

    But this has a very important side-effect. These numbers are now officially and permanently un-signed. We can’t deal with sign because we can’t know which byte corresponds to the highest byte — that’s variable. And since we can’t determine the highest byte, we also can’t determine the highest bit in that byte, and so can’t know whether or not our number is negative.

    But I can live with that. And I hope you can, too. And if you can’t, write a solution and send it to me and I’ll post it.

    So on that note, and without further ado, here’s how you figure out which bit positions should be output by a NOT operation:

    DECLARE @Bitmask VARBINARY(4096)
    SET @Bitmask = 0x01
    
    SELECT x.Number 
    FROM BitmaskNumbers x
    LEFT JOIN dbo.SplitBitmask(@Bitmask) y ON y.Number = x.Number
    WHERE y.Number IS NULL
    	AND x.Byte <= 
    		(SELECT MAX(Byte)
    		FROM BitmaskNumbers z
    		WHERE z.Number =
    			(SELECT MAX(Number)
    			FROM dbo.SplitBitmask(@Bitmask)))
    
    
    Number
    -------
    2
    3
    4
    5
    6
    7
    8
    

    Pretty simple, really: Split the bitmask and take any numbers within the same byte range that aren’t in the bitmask. To reconstitute it, simply modify the reconstitution pattern a bit, stuff it all into a function, and you get:

    CREATE FUNCTION bitwiseNot
    (
    	@Bitmask VARBINARY(4096)
    )
    RETURNS VARBINARY(4096)
    AS
    BEGIN
    	DECLARE @BitsInBitmask TABLE(Number SMALLINT)
    	INSERT @BitsInBitmask
    	SELECT Number
    	FROM dbo.splitBitmask(@Bitmask)
    	
    	SET @Bitmask = 0x
    	
    	SELECT @Bitmask = @Bitmask +
    		CONVERT(VARBINARY(1), 
    			SUM(CASE
    				WHEN x.Number IS NOT NULL THEN 0
    				ELSE BitmaskNumbers.BitValue
    				END)
    			)
    	FROM @BitsInBitmask x
    	RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
    	WHERE BitmaskNumbers.Byte <=
    		(SELECT
    			CASE MAX(Number) % 8
    				WHEN 0 THEN (MAX(Number) - 1) / 8
    				ELSE  MAX(Number) / 8
    			END + 1
    		FROM @BitsInBitmask)
    	GROUP BY BitmaskNumbers.Byte
    	ORDER BY BitmaskNumbers.Byte DESC
    
    	RETURN(@Bitmask)
    END
    GO
    
    
    SELECT dbo.BitwiseNot(0x01) AS Not01
    
    
    Not01
    -----
    0xFE
    

    And of course, given the properties I described above:

    SELECT dbo.BitwiseNot(0x0000000001) AS Not0000000001
    
    
    Not0000000001
    -------------
    0xFE
    

    And now on to the other three logical operations, which are much simpler…

    The easiest is OR, which has the following truth table:

    + 0 1
    0 0 1
    1 1 1

    And what is that similar to, in relational parlance..? A UNION, perhaps?

    SELECT Number
    FROM
    (
    	SELECT Number
    	FROM dbo.splitBitmask(0x01)
    
    	UNION
    
    	SELECT Number
    	FROM dbo.splitBitmask(0x03)
    ) x
    

    … and how about exclusive OR (XOR)?

    + 0 1
    0 0 1
    1 1 0

    Similar to the UNION, but we only want intersections with exactly one bitmask position… Luckily, SQL is equipped for that:

    SELECT Number
    FROM
    (
    	SELECT Number FROM dbo.SplitBitmask(0x01)
    
    	UNION ALL
    
    	SELECT Number FROM dbo.SplitBitmask(0x02)
    ) x
    GROUP BY Number
    HAVING COUNT(*) = 1
    

    Finally, the AND operation:

    + 0 1
    0 0 0
    1 0 1

    Just like XOR, but you need exactly two bit positions in each intersection:

    SELECT Number
    FROM
    (
    	SELECT Number FROM dbo.SplitBitmask(0x01)
    
    	UNION ALL
    
    	SELECT Number FROM dbo.SplitBitmask(0x02)
    ) x
    GROUP BY Number
    HAVING COUNT(*) = 2
    

    Putting it all together, I present the following OR, XOR, and AND UDFs:

    OR

    CREATE FUNCTION bitwiseOr
    (
    	@Bitmask1 VARBINARY(4096),
    	@Bitmask2 VARBINARY(4096)
    )
    RETURNS VARBINARY(4096)
    AS
    BEGIN
    	DECLARE @BitsInBitmask TABLE(Number SMALLINT)
    	INSERT @BitsInBitmask
    	SELECT Number
    	FROM
    	(
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask1)
    
    		UNION
    
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask2)
    	) x
    	
    	SET @Bitmask1 = 0x
    	
    	SELECT @Bitmask1 = @Bitmask1 +
    		CONVERT(VARBINARY(1), 
    			SUM(CASE
    				WHEN x.Number IS NULL THEN 0
    				ELSE BitmaskNumbers.BitValue
    				END)
    			)
    	FROM @BitsInBitmask x
    	RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
    	WHERE BitmaskNumbers.Byte <=
    		(SELECT
    			CASE MAX(Number) % 8
    				WHEN 0 THEN (MAX(Number) - 1) / 8
    				ELSE  MAX(Number) / 8
    			END + 1
    		FROM @BitsInBitmask)
    	GROUP BY BitmaskNumbers.Byte
    	ORDER BY BitmaskNumbers.Byte DESC
    
    	RETURN(@Bitmask1)
    END
    
    
    SELECT dbo.bitwiseOr(0x01, 0x03) AS Or_01_03
    
    
    Or_01_03
    --------
    0x03
    

    XOR

    CREATE FUNCTION bitwiseXOr
    (
    	@Bitmask1 VARBINARY(4096),
    	@Bitmask2 VARBINARY(4096)
    )
    RETURNS VARBINARY(4096)
    AS
    BEGIN
    	DECLARE @BitsInBitmask TABLE(Number SMALLINT)
    	INSERT @BitsInBitmask
    	SELECT Number
    	FROM
    	(
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask1)
    
    		UNION ALL
    
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask2)
    	) x
    	GROUP BY Number
    	HAVING COUNT(*) = 1
    	
    	SET @Bitmask1 = 0x
    	
    	SELECT @Bitmask1 = @Bitmask1 +
    		CONVERT(VARBINARY(1), 
    			SUM(CASE
    				WHEN x.Number IS NULL THEN 0
    				ELSE BitmaskNumbers.BitValue
    				END)
    			)
    	FROM @BitsInBitmask x
    	RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
    	WHERE BitmaskNumbers.Byte <=
    		(SELECT
    			CASE MAX(Number) % 8
    				WHEN 0 THEN (MAX(Number) - 1) / 8
    				ELSE  MAX(Number) / 8
    			END + 1
    		FROM @BitsInBitmask)
    	GROUP BY BitmaskNumbers.Byte
    	ORDER BY BitmaskNumbers.Byte DESC
    
    	RETURN(@Bitmask1)
    END
    
    
    SELECT dbo.bitwiseXOr(0x01, 0x03) AS XOr_01_03
    
    
    XOr_01_03
    --------
    0x02
    

    … And finally, AND

    CREATE FUNCTION bitwiseAnd
    (
    	@Bitmask1 VARBINARY(4096),
    	@Bitmask2 VARBINARY(4096)
    )
    RETURNS VARBINARY(4096)
    AS
    BEGIN
    	DECLARE @BitsInBitmask TABLE(Number SMALLINT)
    	INSERT @BitsInBitmask
    	SELECT Number
    	FROM
    	(
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask1)
    
    		UNION ALL
    
    		SELECT Number
    		FROM dbo.splitBitmask(@Bitmask2)
    	) x
    	GROUP BY Number
    	HAVING COUNT(*) = 2
    	
    	SET @Bitmask1 = 0x
    	
    	SELECT @Bitmask1 = @Bitmask1 +
    		CONVERT(VARBINARY(1), 
    			SUM(CASE
    				WHEN x.Number IS NULL THEN 0
    				ELSE BitmaskNumbers.BitValue
    				END)
    			)
    	FROM @BitsInBitmask x
    	RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
    	WHERE BitmaskNumbers.Byte <=
    		(SELECT
    			CASE MAX(Number) % 8
    				WHEN 0 THEN (MAX(Number) - 1) / 8
    				ELSE  MAX(Number) / 8
    			END + 1
    		FROM @BitsInBitmask)
    	GROUP BY BitmaskNumbers.Byte
    	ORDER BY BitmaskNumbers.Byte DESC
    
    	RETURN(@Bitmask1)
    END
    
    
    SELECT dbo.bitwiseAnd(0x01, 0x03) AS And_01_03
    
    
    And_01_03
    --------
    0x01
    

    … And that’s enough for today’s installment. Enjoy..!

    Previous articleBitmask Handling, part 2: Bitmask reconstitution
    Next articleBitmask Handling, part 4: Left-shift and right-shift
    Adam Machanic helps companies get the most out of their SQL Server databases. He creates solid architectural foundations for high performance databases and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has contributed to numerous books on SQL Server development. A long-time Microsoft MVP for SQL Server, he speaks and trains at IT conferences across North America and Europe.

    4 COMMENTS

    1. Hello,it appears to be a small bug in your logical or funtion, bitwiseor.
      The last order by "ORDER BY BitmaskNumbers.Byte DESC"
      cannot be order by desc, because if you use the function in a nestled manner it will reverse the order of the bytes causing problematic behaviour.
      Otherwise the parts i’ve tried works fine.
      Cheers!

    2. Great article, making code for bitwise operation with large binary data easy to read !
      But when i tested it, i noticed performances issues.
      It is lot more efficient to work on slices of 2bytes and iterate.
      The algo is like that :
      – calculate the max length of data and number of iteration
      – loop the number of slices (number of slice = CEILING(maxLength/CAST(2 AS FLOAT)))
      – Work on slice of data : SET @bin = CAST(SUBSTRING(@Source, DATALENGTH(@Source) + 1 – (@sliceIndex + 1) * @sliceBytes, 2) AS SMALLINT)
      I can post more if needed

    3. Etienne:
      If you feel like it, you’re more than welcome to share your more optimized code. Perhaps someone will be able to benefit. Personally I’ve never had a use case for any of this stuff. I just did it for fun 🙂

    4. Thank you for the interesting articles.
      Could I ask a question.
      create table t(id int, b varbinary(128))
      Every bit in binary means something and I need to have possibility to check if one of bits from list (i1, i2, .., iN) checked.
      I use some of your functions, all is ok. But naturally when table is large then checking subset is slow.
      For example simply saying:
      select * from t inner join idlist on t.id=idlist.id and substring(b, n, 1)&mask<>0
      Thank you
      How I can optimize this kind of tables and queries?

    Comments are closed.