Maximum number of bits

I have a requirement to find the maximum number of consecutive zeros in a 
row in a 32 bit binary number. For example
1111 1111 1111 1000 1111 1111 0000 1111    =   4
1111 1110 1111 1110 1111 1111 1110 1111    =   1

I am actually storing the number in a BIGINT for simplicity. I can do this 
in a udf with a look fairly easily but was wondering if there is something 
more sql that will do the job.

Cheers,
Michael 


0
Michael
3/18/2010 11:28:35 PM
sqlserver.programming 1873 articles. 0 followers. Follow

28 Replies
1067 Views

Similar Articles

[PageSpeed] 42

That's an interesting question.  Does it include leading zeroes?

0000 0000 0111 1000 1111 1111 0000 1111 = 9?

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
----------------

"Michael C" <mike@nospam.com> wrote in message 
news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
>I have a requirement to find the maximum number of consecutive zeros in a 
>row in a 32 bit binary number. For example
> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>
> I am actually storing the number in a BIGINT for simplicity. I can do this 
> in a udf with a look fairly easily but was wondering if there is something 
> more sql that will do the job.
>
> Cheers,
> Michael
> 

0
Michael
3/19/2010 3:37:39 AM
One set based solution (not necessarily faster than a loop)

Declare @TestTable Table (TestValue bigint);
Insert @TestTable (TestValue)
Select 0x07060504
Union All Select 0xFFF8FF0F
Union All Select 0xFEFEFFEF;

With Nbrs_3( n ) As ( Select 1 Union All Select 0 ),
Nbrs_2( n ) As ( Select 1 From Nbrs_3 n1 Cross Join Nbrs_3 n2 ),
Nbrs_1( n ) As ( Select 1 From Nbrs_2 n1 Cross Join Nbrs_2 n2 ),
Nbrs_0  ( n ) As ( Select 1 From Nbrs_1 n1 Cross Join Nbrs_1 n2 ),
Nbrs As (Select n-1 As n
  From ( Select Row_Number() Over (Order By n)
  From Nbrs_0 ) D ( n )
  Where n <= 32),
BitValues As (Select t.TestValue, n,
  Case When t.TestValue&Power(Cast (2 As bigint), n) > 0 Then 1 Else 0 End 
As BitValue
  From Nbrs n
  Cross Join @TestTable t
  Union All Select -1, -1, 1),
BitSequences As (Select b1.TestValue, b1.n,
  b1.BitValue,
  b1.n - (Select Max(b2.n) From BitValues b2 Where b1.n >= b2.n And 
b2.BitValue = 1 And b1.TestValue = b2.TestValue) As Sequence
  From BitValues b1
  Where b1.BitValue = 0)
Select TestValue,
 Cast (TestValue As varbinary(4)) As TestValueBinary,
 Max(Sequence) As MaxNbrOfSeqZeroes
From BitSequences
Group By TestValue;

Tom

"Michael C" <mike@nospam.com> wrote in message 
news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
>I have a requirement to find the maximum number of consecutive zeros in a 
>row in a 32 bit binary number. For example
> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>
> I am actually storing the number in a BIGINT for simplicity. I can do this 
> in a udf with a look fairly easily but was wondering if there is something 
> more sql that will do the job.
>
> Cheers,
> Michael
> 

0
Tom
3/19/2010 4:44:37 AM
On 2010-03-19 0:28, Michael C wrote:
> I have a requirement to find the maximum number of consecutive zeros in a
> row in a 32 bit binary number. For example
> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>
> I am actually storing the number in a BIGINT for simplicity. I can do this
> in a udf with a look fairly easily but was wondering if there is something
> more sql that will do the job.
>
If you're going to do this a lot, writing a CLR function for this is 
probably a good idea, because no SQL-based solution is going to have very 
good performance or maintainability. There is no simple arithmetic 
manipulation for this operation. Set-based solutions are certainly possible, 
as Tom Cooper has demonstrated, but they're pretty strained.

-- 
J.
0
Jeroen
3/19/2010 5:54:20 AM
Here is another set based solution:

CREATE TABLE Foo (
  bin BIGINT NOT NULL PRIMARY KEY);

INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111

WITH N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
Nums (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY n) - 1 FROM N4),
Bin AS (
SELECT bin, CAST('1' + bin_string + '1' AS CHAR(34)) AS bin_string,
        CAST(bin_string AS CHAR(32)) AS bin_string_org
FROM Foo AS A
CROSS APPLY (SELECT CASE WHEN B.bin & POWER(CAST(2 AS BIGINT), n) > 0
                          THEN '1'
                          ELSE '0'
                     END
              FROM Foo AS B
              CROSS JOIN Nums
              WHERE B.bin = A.bin
                AND n < 32
              ORDER BY n DESC
              FOR XML PATH('')) AS C(bin_string)),
Parsed AS (
SELECT B.bin, bin_string_org,
        SUBSTRING(bin_string, CHARINDEX('10', bin_string, n) + 1,
                              CHARINDEX('01', bin_string, n) -
                              CHARINDEX('10', bin_string, n)) AS bin_section, n
FROM Bin AS B
JOIN Nums
   ON n BETWEEN 1 AND LEN(B.bin_string)
  AND CHARINDEX('10', bin_string, n) <
      CHARINDEX('01', bin_string, n)
  AND CHARINDEX('10', bin_string, n) > 0)
SELECT bin, bin_string_org, MAX(LEN(bin_section)) AS max_len_zero
FROM Parsed
GROUP BY bin, bin_string_org;

/*

bin                  bin_string_org                   max_len_zero
-------------------- -------------------------------- ------------
4278124527           11111110111111101111111111101111 1
4294508303           11111111111110001111111100001111 4

*/

-- 
Plamen Ratchev
http://www.SQLStudio.com
0
Plamen
3/19/2010 4:56:10 PM
Interesting question.

Here is another solution.

  -- auxiliary table
  CREATE TABLE Numbers(n int)
  INSERT INTO Numbers VALUES (1)
  INSERT INTO Numbers SELECT n+1 FROM Numbers
  INSERT INTO Numbers SELECT n+2 FROM Numbers
  INSERT INTO Numbers SELECT n+4 FROM Numbers
  INSERT INTO Numbers SELECT n+8 FROM Numbers
  INSERT INTO Numbers SELECT n+16 FROM Numbers
  INSERT INTO Numbers VALUES (0)
  
  -- data table
  CREATE TABLE Foo (
    bin BIGINT NOT NULL PRIMARY KEY);
  
  INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
  INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111
  insert into foo values(4294967295); -- all 1s
  insert into foo values(0);          -- all 0s
  
  ;WITH zeros AS (
    SELECT bin,n,CASE WHEN bin & POWER(CAST(2 as bigint),n) = 0 THEN 1
ELSE -32 END AS zero
    FROM Foo
    CROSS JOIN Numbers
    WHERE n BETWEEN 0 AND 31
  ),   zeros2 AS (
    SELECT Z3.bin, Z1.n AS nmin, Z2.n AS nmax, SUM(Z3.zero) AS score
    FROM zeros Z1
    JOIN zeros Z2
      ON  Z1.bin =  Z2.bin
      AND Z1.n   <= Z2.n
    JOIN zeros Z3
      ON  Z3.bin =  Z1.bin
      AND Z3.n BETWEEN Z1.n AND Z2.n
    GROUP BY Z3.bin, Z1.n, Z2.n
  )
  SELECT bin, COALESCE(NULLIF(MAX(score),-32),0)
  FROM zeros2
  GROUP BY bin
  
  -- cleanup
  DROP TABLE Foo
  DROP TABLE Numbers

-- 
Gert-Jan


Michael C wrote:
> 
> I have a requirement to find the maximum number of consecutive zeros in a
> row in a 32 bit binary number. For example
> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
> 
> I am actually storing the number in a BIGINT for simplicity. I can do this
> in a udf with a look fairly easily but was wondering if there is something
> more sql that will do the job.
> 
> Cheers,
> Michael
0
Gert
3/19/2010 7:05:00 PM
Gert-Jan,

Very clever!

-- 
Plamen Ratchev
http://www.SQLStudio.com
0
Plamen
3/19/2010 8:26:04 PM
Thanks. I'm wondering who I borrowed this from, but I can't remember.

The original problem for which this was a solution was the problem of
people getting on and off the bus, one row for each transition. The
query can be used to determine how many people are on the bus at any
moment. Each "on" would have the value 1, each "off" the value -1. The
sum will give the current number of people.

Similarly this type of query allows you to get the maximum number; the
maximum number of people on the bus, the maximum number of concurrent
users, etc. In this case, the maximum number of consecutive zeros.

-- 
Gert-Jan


Plamen Ratchev wrote:
> 
> Gert-Jan,
> 
> Very clever!
> 
> --
> Plamen Ratchev
> http://www.SQLStudio.com
0
Gert
3/19/2010 9:03:57 PM
Here's another possibility.  For this one I treated it as a gaps/islands 
problem.  Each consecutive string of 1 bits is treated as an island, then 
the distances between them (the zero bits) are treated as gaps.  I had to 
add some logic to shift the number left one bit (*2) and set the 1st and 
33rd bits to 1 ("| 8589934593") to force leading and trailing zeroes to be 
treated as gaps.

DECLARE @num BIGINT = 4294508303;

WITH Powers
AS
(
 SELECT 0 AS id,
  CAST(1 AS BIGINT) AS pwr

 UNION ALL

 SELECT p.id + 1,
  POWER(CAST(2 AS BIGINT), p.id + 1)
 FROM Powers p
 WHERE id < 33
),
Islands
AS
(
 SELECT ROW_NUMBER() OVER (ORDER BY id) AS num,
  p.id,
  p.pwr
 FROM Powers p
 WHERE p.pwr & ((@num * 2 ) | CAST(8589934593 AS BIGINT)) <> 0
)
SELECT MAX(n.id - c.id - 1) AS zerobitstringlength
FROM Islands AS c
INNER JOIN Islands AS n
  ON n.num = c.num + 1
    WHERE n.id - c.id > 1;

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
----------------

"Michael Coles" <admin@geocodenet.com> wrote in message 
news:%23aBLqWxxKHA.5364@TK2MSFTNGP05.phx.gbl...
> That's an interesting question.  Does it include leading zeroes?
>
> 0000 0000 0111 1000 1111 1111 0000 1111 = 9?
>
> -- 
> Thanks
>
> Michael Coles
> SQL Server MVP
> Author, "Expert SQL Server 2008 Encryption" 
> (http://www.apress.com/book/view/1430224649)
> ----------------
>
> "Michael C" <mike@nospam.com> wrote in message 
> news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
>>I have a requirement to find the maximum number of consecutive zeros in a 
>>row in a 32 bit binary number. For example
>> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
>> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>>
>> I am actually storing the number in a BIGINT for simplicity. I can do 
>> this in a udf with a look fairly easily but was wondering if there is 
>> something more sql that will do the job.
>>
>> Cheers,
>> Michael
>>
> 

0
Michael
3/21/2010 2:33:54 AM
I think this puzzle might have been in Bentley's PROGRAMMING PEARLS
books and/or columns, in the form of an array in a procedural
language. I did it as blocks of sequential theater seats with a status
of "available" rather than "sold" years ago.

CREATE TABLE Theater
(seat_seq INTEGER NOT NULL PRIMARY KEY,
 seat_status CHAR(1) DEFAULT 'A' NOT NULL
    CHECK (seat_status IN ('A', 'S'));

SQL is not really a language for low-level bit fiddling.
0
CELKO
3/21/2010 11:41:28 PM
"Michael C" <mike@nospam.com> wrote in message 
news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
>I have a requirement to find the maximum number of consecutive zeros in a 
>row in a 32 bit binary number. For example
> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>
> I am actually storing the number in a BIGINT for simplicity. I can do this 
> in a udf with a look fairly easily but was wondering if there is something 
> more sql that will do the job.

Hi Guys,

Thanks for all the replies and sorry for the delays, it took me a while to 
get around to looking at each solution. Mike Coles, I have to say I like 
your solution the best as you had the simplest method of generating powers 
of 2 and everything was nicely encapsulated. Although other solutions did 
probably have a speed advantage working on multiple values in the one 
statement. The only thing I would say is that you can avoid using the POWER 
function by just substituting pwr * 2 into the second part of your CTE.

Gert, what the heck was yours doing? I couldn't work it out. :-)

It looks like any sort of set based solution isn't going to be quick (yep, I 
know I asked for it). The looping code executed a loop of 10,000 numbers 
instantly where the fastest set based solution was 16 seconds. The award 
goes to Mike Coles for the fastest code which is suprising (I wrote the 
above before doing the speed testing).

Michael 


0
Michael
3/24/2010 3:20:50 AM
Yet another solution, based on bit shifting and LOG2.

  Declare @num bigint
  Set @num = 4294508303
  
  ;WITH permutations
  AS (
    SELECT 0 AS id, @num & 0xFFFFFFFF AS test
    
    UNION ALL
    
    SELECT p.id + 1, (p.test * 2 + 1) & 0xFFFFFFFF
    FROM   permutations p
    WHERE  p.id + 1 < 32
  )
  SELECT 32 - COALESCE(CEILING(LOG(NULLIF(MIN(test),0))/LOG(2)),0)
  FROM permutations


I would expect this to run very efficiently!

If you need more bits, then alter "32" and "0xFFFFFFFF". I needed to add
the whole COALESCE(... NULLIF() ) stuff because it is illegal to execute
LOG(0).

I am sure that this time, you will be able to figure out what it does
:-)

-- 
Gert-Jan



Michael C wrote:
> 
> "Michael C" <mike@nospam.com> wrote in message
> news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
> >I have a requirement to find the maximum number of consecutive zeros in a
> >row in a 32 bit binary number. For example
> > 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> > 1111 1110 1111 1110 1111 1111 1110 1111    =   1
> >
> > I am actually storing the number in a BIGINT for simplicity. I can do this
> > in a udf with a look fairly easily but was wondering if there is something
> > more sql that will do the job.
> 
> Hi Guys,
> 
> Thanks for all the replies and sorry for the delays, it took me a while to
> get around to looking at each solution. Mike Coles, I have to say I like
> your solution the best as you had the simplest method of generating powers
> of 2 and everything was nicely encapsulated. Although other solutions did
> probably have a speed advantage working on multiple values in the one
> statement. The only thing I would say is that you can avoid using the POWER
> function by just substituting pwr * 2 into the second part of your CTE.
> 
> Gert, what the heck was yours doing? I couldn't work it out. :-)
> 
> It looks like any sort of set based solution isn't going to be quick (yep, I
> know I asked for it). The looping code executed a loop of 10,000 numbers
> instantly where the fastest set based solution was 16 seconds. The award
> goes to Mike Coles for the fastest code which is suprising (I wrote the
> above before doing the speed testing).
> 
> Michael
0
Gert
3/24/2010 7:16:42 PM
Oh you could always replace the CTE with a lookup table and avoid the 
calculation altogether :)

I was originally playing around with pre-calculating the longest string of 
bits for each number and shoving it in a table, but 4 billion rows is a 
little ugggh.  I was considering splitting it up into 8 bit bytes and 
treating each one separately, but that's overly complex.  Just play around 
with it, and have fun :)

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
----------------

"Michael C" <mike@nospam.com> wrote in message 
news:OwQbCEwyKHA.1236@TK2MSFTNGP06.phx.gbl...
> "Michael C" <mike@nospam.com> wrote in message 
> news:%23jNl8KvxKHA.5132@TK2MSFTNGP05.phx.gbl...
>>I have a requirement to find the maximum number of consecutive zeros in a 
>>row in a 32 bit binary number. For example
>> 1111 1111 1111 1000 1111 1111 0000 1111    =   4
>> 1111 1110 1111 1110 1111 1111 1110 1111    =   1
>>
>> I am actually storing the number in a BIGINT for simplicity. I can do 
>> this in a udf with a look fairly easily but was wondering if there is 
>> something more sql that will do the job.
>
> Hi Guys,
>
> Thanks for all the replies and sorry for the delays, it took me a while to 
> get around to looking at each solution. Mike Coles, I have to say I like 
> your solution the best as you had the simplest method of generating powers 
> of 2 and everything was nicely encapsulated. Although other solutions did 
> probably have a speed advantage working on multiple values in the one 
> statement. The only thing I would say is that you can avoid using the 
> POWER function by just substituting pwr * 2 into the second part of your 
> CTE.
>
> Gert, what the heck was yours doing? I couldn't work it out. :-)
>
> It looks like any sort of set based solution isn't going to be quick (yep, 
> I know I asked for it). The looping code executed a loop of 10,000 numbers 
> instantly where the fastest set based solution was 16 seconds. The award 
> goes to Mike Coles for the fastest code which is suprising (I wrote the 
> above before doing the speed testing).
>
> Michael
> 

0
Michael
3/25/2010 12:24:48 AM
"Michael Coles" <admin@geocodenet.com> wrote in message 
news:F933C5EA-C148-4006-A810-9F1E794F65AC@microsoft.com...
> Oh you could always replace the CTE with a lookup table and avoid the 
> calculation altogether :)

Yeah but that would likely be slower. When I replaced the power function 
with pwr * 2 it didn't appear to make much difference but somehow I think 
such functions should be avoided when doing bit fiddling. It just doesn't 
seem right to me to use a complex function when trying to do something so 
simple. Everything should be possible with & | + - and bit shifts.

> I was originally playing around with pre-calculating the longest string of 
> bits for each number and shoving it in a table, but 4 billion rows is a 
> little ugggh.  I was considering splitting it up into 8 bit bytes and 
> treating each one separately, but that's overly complex.  Just play around 
> with it, and have fun :)

I've had my fun and the procedural solution has been implemented. Thanks for 
your replies.

Michael 


0
Michael
3/25/2010 3:44:31 AM
"Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message 
news:4BAA651A.20DDD1C1@xs4all.nl...
> Yet another solution, based on bit shifting and LOG2.
>
>  Declare @num bigint
>  Set @num = 4294508303
>
>  ;WITH permutations
>  AS (
>    SELECT 0 AS id, @num & 0xFFFFFFFF AS test
>
>    UNION ALL
>
>    SELECT p.id + 1, (p.test * 2 + 1) & 0xFFFFFFFF
>    FROM   permutations p
>    WHERE  p.id + 1 < 32
>  )
>  SELECT 32 - COALESCE(CEILING(LOG(NULLIF(MIN(test),0))/LOG(2)),0)
>  FROM permutations
>
>
> I would expect this to run very efficiently!
>
> If you need more bits, then alter "32" and "0xFFFFFFFF". I needed to add
> the whole COALESCE(... NULLIF() ) stuff because it is illegal to execute
> LOG(0).
>
> I am sure that this time, you will be able to figure out what it does
> :-)

Thanks Gert but you didn't explain your first solution. I like your little 
dig at me asking but I'm confident enough in my abilities to ask how 
something works.

Michael 


0
Michael
3/25/2010 3:51:18 AM
> Yeah but that would likely be slower. When I replaced the power function 
> with pwr * 2 it didn't appear to make much difference but somehow I think 
> such functions should be avoided when doing bit fiddling. It just doesn't 
> seem right to me to use a complex function when trying to do something so 
> simple.

SQL is optimized for querying, not bit fiddling (hence no << and >> 
operators).  C is optimized for bit fiddling and loops.  When you try to do 
procedural bit fiddling in a declarative language with minimal purpose-built 
operators the most efficient solution in that language is not necessarily 
going to be a loop.

> Everything should be possible with & | + - and bit shifts.

....and variables ...and assignment statements ...and a loop ...and etc....

Is this a computation you have to perform for large sets of numbers?  Or is 
this a one-off deal?

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
---------------- 

0
Michael
3/25/2010 4:15:16 AM
"Michael Coles" <admin@geocodenet.com> wrote in message 
news:uKsPbH9yKHA.5936@TK2MSFTNGP04.phx.gbl...
> SQL is optimized for querying, not bit fiddling (hence no << and >> 
> operators).  C is optimized for bit fiddling and loops.  When you try to 
> do procedural bit fiddling in a declarative language with minimal 
> purpose-built operators the most efficient solution in that language is 
> not necessarily going to be a loop.

The procedural method was 1000s of times faster than any of the set based 
solutions.

> ...and variables ...and assignment statements ...and a loop ...and etc....

Yep, but they are all things that translate directly to cpu instructions. 
Power is possibly hundreds of instructions assuming it's not optimised for 
powers of 2. I know it didn't make any speed difference but it just feels 
wrong to me. It just doesn't seem right to base a simpler function on a more 
complex instruction. Power isn't that big a deal, using floating point stuff 
like log would be much worse.

> Is this a computation you have to perform for large sets of numbers?  Or 
> is this a one-off deal?

This is for a large frame relay network where customers can purchase 
timeslots from 1 to 32 on a line. If we are moving a customer from one 
location to another then we need to know that they will fit. As an example, 
a customer with 5 timeslots will need 5 *consecutive* timeslots in the new 
location. We need to know for any of the moves coming up if we've got a 
problem. This could be a few thousand rows where I am currently using it, 
but there is potential for 100,000+ rows if my function was reused in other 
reports that worked across the network. As an example, someone might ask 
which ports have less than 4 consecutive timeslots.

Michael 


0
Michael
3/25/2010 5:10:19 AM
"Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message 
news:4BAA651A.20DDD1C1@xs4all.nl...
> I would expect this to run very efficiently!

But not accurately. :-) It fails on all values with a single bit in them up 
to 32768 and then at 2147483648 


0
Michael
3/25/2010 5:21:27 AM
Michael C wrote:
> 
> "Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message
> news:4BAA651A.20DDD1C1@xs4all.nl...
> > I would expect this to run very efficiently!
> 
> But not accurately. :-) It fails on all values with a single bit in them up
> to 32768 and then at 2147483648

I see. Yes, there was a logical error / bug. Sorry about that.

Here is a correct version. Also, it is set based.

  CREATE TABLE Foo (bin BIGINT NOT NULL PRIMARY KEY);
  
  INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
  INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111
  INSERT INTO Foo VALUES(4294967295); -- all 1s
  INSERT INTO Foo VALUES(0);          -- all 0s
  INSERT INTO Foo VALUES(32768)
  INSERT INTO Foo VALUES(2147483648)
  INSERT INTO Foo VALUES(0x7FFFFFFF)
  
  ;WITH permutations
  AS (
    SELECT bin, 0 AS id, bin AS test
    FROM Foo
    
    UNION ALL
    
    SELECT bin, p.id + 1, (p.test * 2 + 1)
    FROM   permutations p
    WHERE  p.id + 1 < 32
  )
  SELECT bin
  ,      CAST(bin AS binary(4)) AS HexRepresentation
  ,      32 - COALESCE(FLOOR(LOG(NULLIF(MIN(test &
0xFFFFFFFF),0))/LOG(2)) + 1,0) AS MaxSequenceOf0Bits
  FROM permutations
  GROUP BY bin

  DROP TABLE Foo


It works as follows:
1. The CTE will take a sample for the original value, and each bit shift
to the left. After each bit shift to the left (*2) the least significant
bit is set to 1 (+1).

2. Bits 32 and higher are filtered out (& 0xFFFFFFFF). After that, the
minimum (MIN(test)) indicates the sample with the most leading zeros,
which is the one we want.

3. The Log2 value (LOG(...)/LOG(2)) returns an indication of the bit
number of the most significant 1-bit. The integer part indicates the bit
number. The remainder is irrelevant, and therefore removed (FLOOR). Then
the bit number (zero-based) is converted to a bit count (+1). The rest
of the 32 bits must be the maximum sequence of 0 bits (32 - ...)

4. Handling the value 0 (sequence of 32 0-bits) requires special
attention. NULLIF will "escape" the value to NULL. The COALESCE will
specify the correct result for that situation, which is no 1-bits (0).

-- 
Gert-Jan
0
Gert
3/25/2010 7:11:56 PM
Michael,

Here a short explanation of this solution. This is just FYI, because I
think the other solution that I posted is much better.

1. The CTE "zeros" determines a very specific value for each individual
bit in the "bin". A 0-bit gets the number 1, a 1-bit gets the number
-32. This number -32 is the number of (relevant) bits in "bin".

2. In CTE "zeros2", the cross join between Z1 and Z2 will generate all
combinations of "from" (nmin) and "to" (nmax) bits in a "bin".

3. In CTE "zeros3", the Z3 cross join will "traverse" between the each
"from" and "to" and SUM the values of step 1. The SUM indicates the
number of 0-bits. If there is a 1-bit in the range, its value of -32
will immediately result in a negative value.

4. The MAX(score) selects the range with the most 0-bits and no 1-bits.

5. If there is no such value in step 4, so if all bits are 1, then the
"MAX" score will be -32.

-- 
Gert-Jan



Gert-Jan Strik wrote:
> 
> Interesting question.
> 
> Here is another solution.
> 
>   -- auxiliary table
>   CREATE TABLE Numbers(n int)
>   INSERT INTO Numbers VALUES (1)
>   INSERT INTO Numbers SELECT n+1 FROM Numbers
>   INSERT INTO Numbers SELECT n+2 FROM Numbers
>   INSERT INTO Numbers SELECT n+4 FROM Numbers
>   INSERT INTO Numbers SELECT n+8 FROM Numbers
>   INSERT INTO Numbers SELECT n+16 FROM Numbers
>   INSERT INTO Numbers VALUES (0)
> 
>   -- data table
>   CREATE TABLE Foo (
>     bin BIGINT NOT NULL PRIMARY KEY);
> 
>   INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
>   INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111
>   insert into foo values(4294967295); -- all 1s
>   insert into foo values(0);          -- all 0s
> 
>   ;WITH zeros AS (
>     SELECT bin,n,CASE WHEN bin & POWER(CAST(2 as bigint),n) = 0 THEN 1
> ELSE -32 END AS zero
>     FROM Foo
>     CROSS JOIN Numbers
>     WHERE n BETWEEN 0 AND 31
>   ),   zeros2 AS (
>     SELECT Z3.bin, Z1.n AS nmin, Z2.n AS nmax, SUM(Z3.zero) AS score
>     FROM zeros Z1
>     JOIN zeros Z2
>       ON  Z1.bin =  Z2.bin
>       AND Z1.n   <= Z2.n
>     JOIN zeros Z3
>       ON  Z3.bin =  Z1.bin
>       AND Z3.n BETWEEN Z1.n AND Z2.n
>     GROUP BY Z3.bin, Z1.n, Z2.n
>   )
>   SELECT bin, COALESCE(NULLIF(MAX(score),-32),0)
>   FROM zeros2
>   GROUP BY bin
> 
>   -- cleanup
>   DROP TABLE Foo
>   DROP TABLE Numbers
> 
> --
> Gert-Jan
> 
> Michael C wrote:
> >
> > I have a requirement to find the maximum number of consecutive zeros in a
> > row in a 32 bit binary number. For example
> > 1111 1111 1111 1000 1111 1111 0000 1111    =   4
> > 1111 1110 1111 1110 1111 1111 1110 1111    =   1
> >
> > I am actually storing the number in a BIGINT for simplicity. I can do this
> > in a udf with a look fairly easily but was wondering if there is something
> > more sql that will do the job.
> >
> > Cheers,
> > Michael
0
Gert
3/25/2010 7:21:13 PM
"Michael C" <mike@nospam.com> wrote in message 
news:%23STq4l9yKHA.928@TK2MSFTNGP05.phx.gbl...
> "Michael Coles" <admin@geocodenet.com> wrote in message 
> news:uKsPbH9yKHA.5936@TK2MSFTNGP04.phx.gbl...
>> SQL is optimized for querying, not bit fiddling (hence no << and >> 
>> operators).  C is optimized for bit fiddling and loops.  When you try to 
>> do procedural bit fiddling in a declarative language with minimal 
>> purpose-built operators the most efficient solution in that language is 
>> not necessarily going to be a loop.
>
> The procedural method was 1000s of times faster than any of the set based 
> solutions.

Personally I was just getting started experimenting with set-based methods 
when you called game over.

>> ...and variables ...and assignment statements ...and a loop ...and 
>> etc....
>
> Yep, but they are all things that translate directly to cpu instructions. 
> Power is possibly hundreds of instructions assuming it's not optimised for 
> powers of 2. I know it didn't make any speed difference but it just feels 
> wrong to me. It just doesn't seem right to base a simpler function on a 
> more complex instruction. Power isn't that big a deal, using floating 
> point stuff like log would be much worse.
>
>> Is this a computation you have to perform for large sets of numbers?  Or 
>> is this a one-off deal?
>
> This is for a large frame relay network where customers can purchase 
> timeslots from 1 to 32 on a line. If we are moving a customer from one 
> location to another then we need to know that they will fit. As an 
> example, a customer with 5 timeslots will need 5 *consecutive* timeslots 
> in the new location. We need to know for any of the moves coming up if 
> we've got a problem. This could be a few thousand rows where I am 
> currently using it, but there is potential for 100,000+ rows if my 
> function was reused in other reports that worked across the network. As an 
> example, someone might ask which ports have less than 4 consecutive 
> timeslots.

That being the case, if performance is critical, I'll throw my vote in with 
the folks who think you should do it in a procedural language that's 
optimized for bit fiddling.

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
---------------- 

0
Michael
3/26/2010 1:50:51 AM
"Michael Coles" <admin@geocodenet.com> wrote in message 
news:B5EB3741-4323-4F4D-89E7-A15C11A06BF8@microsoft.com...
> Personally I was just getting started experimenting with set-based methods 
> when you called game over.

Sorry, I certainly wasn't calling game over, just stating the facts. It was 
me who asked for a set based solution.

> That being the case, if performance is critical, I'll throw my vote in 
> with the folks who think you should do it in a procedural language that's 
> optimized for bit fiddling.

It didn't seem like it was necessary to me. Procedural sql was fast enough.

Michael 


0
Michael
3/26/2010 5:39:19 AM
"Michael C" <mike@nospam.com> wrote in message 
news:%23JirxaKzKHA.4156@TK2MSFTNGP06.phx.gbl...
> "Michael Coles" <admin@geocodenet.com> wrote in message 
> news:B5EB3741-4323-4F4D-89E7-A15C11A06BF8@microsoft.com...
>> Personally I was just getting started experimenting with set-based 
>> methods when you called game over.
>
> Sorry, I certainly wasn't calling game over, just stating the facts. It 
> was me who asked for a set based solution.
>
>> That being the case, if performance is critical, I'll throw my vote in 
>> with the folks who think you should do it in a procedural language that's 
>> optimized for bit fiddling.
>
> It didn't seem like it was necessary to me. Procedural sql was fast 
> enough.

At any rate my initial thought was to create a lookup table, which would 
probably be considerably faster than the procedural alternative.  Of course 
a lookup table that covers 4 billion entries (32 bits) will take a long time 
to create, eat up a whole bunch of storage space, and probably just won't be 
the most efficient method in general.  My idea to counter this was to look 
for simple patterns that would make calculation of the number of contiguous 
zero bits simple.  Consider the following for 32 bit numbers from 0 to 
65535:

First Number - Last Number, Number of Consecutive Zero Bits, Number of 
Entries in Range

0  -  0 , 32 , 1
1  -  1 , 31 , 1
2  -  3 , 30 , 2
4  -  7 , 29 , 4
8  -  15 , 28 , 8
16  -  31 , 27 , 16
32  -  63 , 26 , 32
64  -  127 , 25 , 64
128  -  255 , 24 , 128
256  -  511 , 23 , 256
512  -  1023 , 22 , 512
1024  -  2047 , 21 , 1024
2048  -  4095 , 20 , 2048
4096  -  8191 , 19 , 4096
8192  -  16383 , 18 , 8192
16384  -  32767 , 17 , 16384
32768  -  65535 , 16 , 32768

Of course these rely on the fact that all the bits to the left are leading 
zero bits, which makes for a very simple pattern.  The complexity comes in 
when you have numbers with larger runs of zero bit that lie between 1 bits. 
My thinking was that there might be a useful pattern that could be derived 
for all 32 bit numbers to create a small and efficient reference table or a 
simple formula that would not require looping and bit manipulations.  Is it 
possible?  Who knows.  But I'd bet you dollars to doughnuts that a lookup 
against a properly indexed reference table would outperform procedural code 
in T-SQL.

--
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption"
(http://www.apress.com/book/view/1430224649)
---------------- 

0
Michael
3/27/2010 7:30:37 PM
Michael Coles wrote: [snipped]
> But I'd bet you dollars to doughnuts that a lookup
> against a properly indexed reference table would outperform procedural code
> in T-SQL.

If I were a betting man, I would take that bet. IMO, you greatly
underestimate the cost of lookups. They incur locking, lots of memory
management and physical I/O (unless you have 40 GB of RAM dedicated to
this lookup table). The alternative is just a few mathematical
operations. From a performance perspective, the lookup table will not
even come close.

-- 
Gert-Jan
0
Gert
3/27/2010 10:01:44 PM
"Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message 
news:4BAE8048.897B7EC8@xs4all.nl...
> Michael Coles wrote: [snipped]
>> But I'd bet you dollars to doughnuts that a lookup
>> against a properly indexed reference table would outperform procedural 
>> code
>> in T-SQL.
>
> If I were a betting man, I would take that bet. IMO, you greatly
> underestimate the cost of lookups. They incur locking, lots of memory
> management and physical I/O (unless you have 40 GB of RAM dedicated to
> this lookup table). The alternative is just a few mathematical
> operations. From a performance perspective, the lookup table will not
> even come close.

You're on.  40 GB of RAM?  Maybe... if you declare all your columns 
varbinary(max) and pad them with excessive zeroes.  Just for reference how 
many GB of RAM do you think need to be dedicated to the 17 row lookup table 
in my previous post?

This particular lookup table would not ever need to be updated ever after it 
is initially built, so you can avoid data locks with the NOLOCK query hint 
against this table.

The "few mathematical operations" you mention are going to be performed 
against large numbers of rows, loops nested within loops (O(n^2)).

Of course all of this is just conjecture until you do an actual performance 
comparison.

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
---------------- 

0
Michael
3/28/2010 4:05:24 PM
Oops, that's O(m*n) not O(n^2)

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
----------------

"Michael Coles" <admin@geocodenet.com> wrote in message 
news:7D733D71-6A8F-497C-9BBD-A7A2E8908F67@microsoft.com...
> "Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message 
> news:4BAE8048.897B7EC8@xs4all.nl...
>> Michael Coles wrote: [snipped]
>>> But I'd bet you dollars to doughnuts that a lookup
>>> against a properly indexed reference table would outperform procedural 
>>> code
>>> in T-SQL.
>>
>> If I were a betting man, I would take that bet. IMO, you greatly
>> underestimate the cost of lookups. They incur locking, lots of memory
>> management and physical I/O (unless you have 40 GB of RAM dedicated to
>> this lookup table). The alternative is just a few mathematical
>> operations. From a performance perspective, the lookup table will not
>> even come close.
>
> You're on.  40 GB of RAM?  Maybe... if you declare all your columns 
> varbinary(max) and pad them with excessive zeroes.  Just for reference how 
> many GB of RAM do you think need to be dedicated to the 17 row lookup 
> table in my previous post?
>
> This particular lookup table would not ever need to be updated ever after 
> it is initially built, so you can avoid data locks with the NOLOCK query 
> hint against this table.
>
> The "few mathematical operations" you mention are going to be performed 
> against large numbers of rows, loops nested within loops (O(n^2)).
>
> Of course all of this is just conjecture until you do an actual 
> performance comparison.
>
> -- 
> Thanks
>
> Michael Coles
> SQL Server MVP
> Author, "Expert SQL Server 2008 Encryption" 
> (http://www.apress.com/book/view/1430224649)
> ---------------- 
> 

0
Michael
3/28/2010 4:57:28 PM
Michael,

You can't have it both ways. Either you have 4 billion row lookup table
(which is what I assumed with my 40 GB RAM remark), or you have some
kind of solution (that either does or does not require an auxiliary
table). As long as that solution is not there, we can't discuss it, can
we?

As for your remark about "nested loops within loops": if that is a
problem (in perception or performance), then it is easy to rewrite my
solution to a UDF, or to write it as a scalar subquery as below.

  CREATE TABLE Foo (bin BIGINT NOT NULL PRIMARY KEY);
  
  INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
  INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111
  INSERT INTO Foo VALUES(4294967295); -- all 1s
  INSERT INTO Foo VALUES(0);          -- all 0s
  INSERT INTO Foo VALUES(32768)
  INSERT INTO Foo VALUES(2147483648)
  INSERT INTO Foo VALUES(0x7FFFFFFF)
  INSERT INTO Foo VALUES(0xFFFFFFFE)
  
  SELECT bin
  ,      CAST(bin AS binary(4)) AS HexRepresentation
  ,( SELECT 32 - COALESCE(FLOOR(LOG(NULLIF(MIN(test),0))/LOG(2)) + 1,0)
     FROM (      SELECT bin AS test
       UNION ALL SELECT bin*0x00000002 & 0xFFFFFFFF | 0x00000001
       UNION ALL SELECT bin*0x00000004 & 0xFFFFFFFF | 0x00000003
       UNION ALL SELECT bin*0x00000008 & 0xFFFFFFFF | 0x00000007
       UNION ALL SELECT bin*0x00000010 & 0xFFFFFFFF | 0x0000000F
       UNION ALL SELECT bin*0x00000020 & 0xFFFFFFFF | 0x0000001F
       UNION ALL SELECT bin*0x00000040 & 0xFFFFFFFF | 0x0000003F
       UNION ALL SELECT bin*0x00000080 & 0xFFFFFFFF | 0x0000007F
       UNION ALL SELECT bin*0x00000100 & 0xFFFFFFFF | 0x000000FF
       UNION ALL SELECT bin*0x00000200 & 0xFFFFFFFF | 0x000001FF
       UNION ALL SELECT bin*0x00000400 & 0xFFFFFFFF | 0x000003FF
       UNION ALL SELECT bin*0x00000800 & 0xFFFFFFFF | 0x000007FF
       UNION ALL SELECT bin*0x00001000 & 0xFFFFFFFF | 0x00000FFF
       UNION ALL SELECT bin*0x00002000 & 0xFFFFFFFF | 0x00001FFF
       UNION ALL SELECT bin*0x00004000 & 0xFFFFFFFF | 0x00003FFF
       UNION ALL SELECT bin*0x00008000 & 0xFFFFFFFF | 0x00007FFF
       UNION ALL SELECT bin*0x00010000 & 0xFFFFFFFF | 0x0000FFFF
       UNION ALL SELECT bin*0x00020000 & 0xFFFFFFFF | 0x0001FFFF
       UNION ALL SELECT bin*0x00040000 & 0xFFFFFFFF | 0x0003FFFF
       UNION ALL SELECT bin*0x00080000 & 0xFFFFFFFF | 0x0007FFFF
       UNION ALL SELECT bin*0x00100000 & 0xFFFFFFFF | 0x000FFFFF
       UNION ALL SELECT bin*0x00200000 & 0xFFFFFFFF | 0x001FFFFF
       UNION ALL SELECT bin*0x00400000 & 0xFFFFFFFF | 0x003FFFFF
       UNION ALL SELECT bin*0x00800000 & 0xFFFFFFFF | 0x007FFFFF
       UNION ALL SELECT bin*0x01000000 & 0xFFFFFFFF | 0x00FFFFFF
       UNION ALL SELECT bin*0x02000000 & 0xFFFFFFFF | 0x01FFFFFF
       UNION ALL SELECT bin*0x04000000 & 0xFFFFFFFF | 0x03FFFFFF
       UNION ALL SELECT bin*0x08000000 & 0xFFFFFFFF | 0x07FFFFFF
       UNION ALL SELECT bin*0x10000000 & 0xFFFFFFFF | 0x0FFFFFFF
       UNION ALL SELECT bin*0x20000000 & 0xFFFFFFFF | 0x1FFFFFFF
       UNION ALL SELECT bin*0x40000000 & 0xFFFFFFFF | 0x3FFFFFFF
       UNION ALL SELECT bin*0x80000000 & 0xFFFFFFFF | 0x7FFFFFFF
   ) T ) AS MaxSequenceOf0Bits
  FROM Foo
  
  DROP TABLE Foo

-- 
Gert-Jan


Michael Coles wrote:
> 
> "Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message
> news:4BAE8048.897B7EC8@xs4all.nl...
> > Michael Coles wrote: [snipped]
> >> But I'd bet you dollars to doughnuts that a lookup
> >> against a properly indexed reference table would outperform procedural
> >> code
> >> in T-SQL.
> >
> > If I were a betting man, I would take that bet. IMO, you greatly
> > underestimate the cost of lookups. They incur locking, lots of memory
> > management and physical I/O (unless you have 40 GB of RAM dedicated to
> > this lookup table). The alternative is just a few mathematical
> > operations. From a performance perspective, the lookup table will not
> > even come close.
> 
> You're on.  40 GB of RAM?  Maybe... if you declare all your columns
> varbinary(max) and pad them with excessive zeroes.  Just for reference how
> many GB of RAM do you think need to be dedicated to the 17 row lookup table
> in my previous post?
> 
> This particular lookup table would not ever need to be updated ever after it
> is initially built, so you can avoid data locks with the NOLOCK query hint
> against this table.
> 
> The "few mathematical operations" you mention are going to be performed
> against large numbers of rows, loops nested within loops (O(n^2)).
> 
> Of course all of this is just conjecture until you do an actual performance
> comparison.
> 
> --
> Thanks
> 
> Michael Coles
> SQL Server MVP
> Author, "Expert SQL Server 2008 Encryption"
> (http://www.apress.com/book/view/1430224649)
> ----------------
0
Gert
3/28/2010 8:17:30 PM
Gert,

If you read my previous message then you understand where I am going.  I 
posted contents of a lookup table with 17 rows to cover the first set of 
combinations.  As I also stated, the key is to determine if there is some 
pattern here.  Don't worry, we will discuss it soon enough.

I'm not overly concerned with the performance of your solution.  The "nested 
loops within loops" refers to a procedural looping solution.  For instance, 
a solution might use a de Bruijn sequence to count the number of right-most 
zero bits, which can be done in a loop combined with bit shifts to answer 
the OP's question.  Shove in the next number. Wash. Rinse. Repeat.  That's a 
loop within a loop.

-- 
Thanks

Michael Coles
SQL Server MVP
Author, "Expert SQL Server 2008 Encryption" 
(http://www.apress.com/book/view/1430224649)
----------------

"Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message 
news:4BAFB95A.536A5639@xs4all.nl...
> Michael,
>
> You can't have it both ways. Either you have 4 billion row lookup table
> (which is what I assumed with my 40 GB RAM remark), or you have some
> kind of solution (that either does or does not require an auxiliary
> table). As long as that solution is not there, we can't discuss it, can
> we?
>
> As for your remark about "nested loops within loops": if that is a
> problem (in perception or performance), then it is easy to rewrite my
> solution to a UDF, or to write it as a scalar subquery as below.
>
>  CREATE TABLE Foo (bin BIGINT NOT NULL PRIMARY KEY);
>
>  INSERT INTO Foo VALUES(4294508303); --11111111111110001111111100001111
>  INSERT INTO Foo VALUES(4278124527); --11111110111111101111111111101111
>  INSERT INTO Foo VALUES(4294967295); -- all 1s
>  INSERT INTO Foo VALUES(0);          -- all 0s
>  INSERT INTO Foo VALUES(32768)
>  INSERT INTO Foo VALUES(2147483648)
>  INSERT INTO Foo VALUES(0x7FFFFFFF)
>  INSERT INTO Foo VALUES(0xFFFFFFFE)
>
>  SELECT bin
>  ,      CAST(bin AS binary(4)) AS HexRepresentation
>  ,( SELECT 32 - COALESCE(FLOOR(LOG(NULLIF(MIN(test),0))/LOG(2)) + 1,0)
>     FROM (      SELECT bin AS test
>       UNION ALL SELECT bin*0x00000002 & 0xFFFFFFFF | 0x00000001
>       UNION ALL SELECT bin*0x00000004 & 0xFFFFFFFF | 0x00000003
>       UNION ALL SELECT bin*0x00000008 & 0xFFFFFFFF | 0x00000007
>       UNION ALL SELECT bin*0x00000010 & 0xFFFFFFFF | 0x0000000F
>       UNION ALL SELECT bin*0x00000020 & 0xFFFFFFFF | 0x0000001F
>       UNION ALL SELECT bin*0x00000040 & 0xFFFFFFFF | 0x0000003F
>       UNION ALL SELECT bin*0x00000080 & 0xFFFFFFFF | 0x0000007F
>       UNION ALL SELECT bin*0x00000100 & 0xFFFFFFFF | 0x000000FF
>       UNION ALL SELECT bin*0x00000200 & 0xFFFFFFFF | 0x000001FF
>       UNION ALL SELECT bin*0x00000400 & 0xFFFFFFFF | 0x000003FF
>       UNION ALL SELECT bin*0x00000800 & 0xFFFFFFFF | 0x000007FF
>       UNION ALL SELECT bin*0x00001000 & 0xFFFFFFFF | 0x00000FFF
>       UNION ALL SELECT bin*0x00002000 & 0xFFFFFFFF | 0x00001FFF
>       UNION ALL SELECT bin*0x00004000 & 0xFFFFFFFF | 0x00003FFF
>       UNION ALL SELECT bin*0x00008000 & 0xFFFFFFFF | 0x00007FFF
>       UNION ALL SELECT bin*0x00010000 & 0xFFFFFFFF | 0x0000FFFF
>       UNION ALL SELECT bin*0x00020000 & 0xFFFFFFFF | 0x0001FFFF
>       UNION ALL SELECT bin*0x00040000 & 0xFFFFFFFF | 0x0003FFFF
>       UNION ALL SELECT bin*0x00080000 & 0xFFFFFFFF | 0x0007FFFF
>       UNION ALL SELECT bin*0x00100000 & 0xFFFFFFFF | 0x000FFFFF
>       UNION ALL SELECT bin*0x00200000 & 0xFFFFFFFF | 0x001FFFFF
>       UNION ALL SELECT bin*0x00400000 & 0xFFFFFFFF | 0x003FFFFF
>       UNION ALL SELECT bin*0x00800000 & 0xFFFFFFFF | 0x007FFFFF
>       UNION ALL SELECT bin*0x01000000 & 0xFFFFFFFF | 0x00FFFFFF
>       UNION ALL SELECT bin*0x02000000 & 0xFFFFFFFF | 0x01FFFFFF
>       UNION ALL SELECT bin*0x04000000 & 0xFFFFFFFF | 0x03FFFFFF
>       UNION ALL SELECT bin*0x08000000 & 0xFFFFFFFF | 0x07FFFFFF
>       UNION ALL SELECT bin*0x10000000 & 0xFFFFFFFF | 0x0FFFFFFF
>       UNION ALL SELECT bin*0x20000000 & 0xFFFFFFFF | 0x1FFFFFFF
>       UNION ALL SELECT bin*0x40000000 & 0xFFFFFFFF | 0x3FFFFFFF
>       UNION ALL SELECT bin*0x80000000 & 0xFFFFFFFF | 0x7FFFFFFF
>   ) T ) AS MaxSequenceOf0Bits
>  FROM Foo
>
>  DROP TABLE Foo
>
> -- 
> Gert-Jan
>
>
> Michael Coles wrote:
>>
>> "Gert-Jan Strik" <sorrytoomuchspamalready@xs4all.nl> wrote in message
>> news:4BAE8048.897B7EC8@xs4all.nl...
>> > Michael Coles wrote: [snipped]
>> >> But I'd bet you dollars to doughnuts that a lookup
>> >> against a properly indexed reference table would outperform procedural
>> >> code
>> >> in T-SQL.
>> >
>> > If I were a betting man, I would take that bet. IMO, you greatly
>> > underestimate the cost of lookups. They incur locking, lots of memory
>> > management and physical I/O (unless you have 40 GB of RAM dedicated to
>> > this lookup table). The alternative is just a few mathematical
>> > operations. From a performance perspective, the lookup table will not
>> > even come close.
>>
>> You're on.  40 GB of RAM?  Maybe... if you declare all your columns
>> varbinary(max) and pad them with excessive zeroes.  Just for reference 
>> how
>> many GB of RAM do you think need to be dedicated to the 17 row lookup 
>> table
>> in my previous post?
>>
>> This particular lookup table would not ever need to be updated ever after 
>> it
>> is initially built, so you can avoid data locks with the NOLOCK query 
>> hint
>> against this table.
>>
>> The "few mathematical operations" you mention are going to be performed
>> against large numbers of rows, loops nested within loops (O(n^2)).
>>
>> Of course all of this is just conjecture until you do an actual 
>> performance
>> comparison.
>>
>> --
>> Thanks
>>
>> Michael Coles
>> SQL Server MVP
>> Author, "Expert SQL Server 2008 Encryption"
>> (http://www.apress.com/book/view/1430224649)
>> ---------------- 

0
Michael
3/28/2010 10:05:07 PM
"Michael Coles" <admin@geocodenet.com> wrote in message 
news:77267149-94D0-4BD0-9DD8-0A36C484EE1A@microsoft.com...
> Gert,
>
> If you read my previous message then you understand where I am going.  I 
> posted contents of a lookup table with 17 rows to cover the first set of 
> combinations.  As I also stated, the key is to determine if there is some 
> pattern here.  Don't worry, we will discuss it soon enough.
>
> I'm not overly concerned with the performance of your solution.  The 
> "nested loops within loops" refers to a procedural looping solution.  For 
> instance, a solution might use a de Bruijn sequence to count the number of 
> right-most zero bits, which can be done in a loop combined with bit shifts 
> to answer the OP's question.  Shove in the next number. Wash. Rinse. 
> Repeat.  That's a loop within a loop.

Hi Michael and Gert,

I think we can take another piece of code I wrote as an example here. As 
part of this problem I also needed something to find the number of bits 
turned on in a 32 bit number. I tested both a set based and procedural based 
method.

Set based method used a lookup table with 32 rows with the values of 
1,2,4,8,16,32 etc. I wrote a query to simply select count(*) from this 
lookup table where the bitmask and my number was not zero.

The procedural method just incremented a value if my number & 1 was 1 and 
then divided the number by 2 until it was zero. This was placed in a udf. 
This wasn't particularly smart in any way and could have been optimised 
further.

The result was that the unoptimised procedural method was 6 times faster 
than the set based method. Also, I timed every solution provided in this 
thread and compared it to my procedural method with similar results. The 
procedural method run 100,000 iterations in 9 seconds where the quickest set 
based method took 58 seconds. This isn't as big a difference as I expected 
but still quite significant.

The thing was when I optimised the procedural method to remove the loop then 
it executed 10 times faster again making it 60 times faster than the set 
based method. This was the function I used to count the number of bits:

ALTER FUNCTION [dbo].[udfBitCount32](@Value BIGINT)
RETURNS INT
AS
BEGIN
        SET @Value = @Value - ((@Value / 2) & 1431655765)
        SET @Value = (@Value & 858993459) + ((@Value / 4) & 858993459)
        RETURN (((@Value + (@Value / 16) & 252645135) * 16843009) & 
CAST(4294967295 AS BIGINT)) / 16777216
END

Interesting, C# did around 150,000,000 bitcounts per second so was around 
1350 times faster. 


0
Michael
3/29/2010 1:00:48 AM
Reply:

Similar Artilces: