I am working on a program to calculate primes. It's just for fun, not for
work or school, if that matters here. I'm trying to optimize it and I'll
have some more questions about that later.
Today I changed the CArray used to hold the primes to a CList based on a
compile time selection, figuring I would save the time of reallocating and
copying the CArray every time I went over the maximum allocated size. I'm
aware of the pros and cons of CArray and CList in general. A disadvantage is
that the CList of primes takes (about) four time as much memory as the
CArray holding the same number of primes. The primes are 32 bit ints so I
don't know why it's four times instead of three times (previous pointer,
prime and next pointer). What else is in there?
I tried something that I wasn't sure would work but it did. I serialized a
CArray to disk and then later I serialized it from disk to a CList. It
worked. It's not a total surprise, because it wouldn't make sense to store
the next and previous CList pointers into a serialize archive. My question
here is, does anyone know if these were made intentionally compatible and if
I can count on that working in all cases?
Thanks...
|
|
0
|
|
|
|
Reply
|
Bill
|
3/15/2010 3:06:24 PM |
|
See below...
On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:
>I am working on a program to calculate primes. It's just for fun, not for
>work or school, if that matters here. I'm trying to optimize it and I'll
>have some more questions about that later.
>
>Today I changed the CArray used to hold the primes to a CList based on a
>compile time selection, figuring I would save the time of reallocating and
>copying the CArray every time I went over the maximum allocated size. I'm
>aware of the pros and cons of CArray and CList in general.
****
Your choice of CArray is probably a Bad Idea. First, you should SetSize and set an
explicit reallocation quantum (the second parameter of SetSize) to minimize the amount of
reallocation that is done. Or use std::vector which does this implicitly, and had far,
far better performance than CArray.
****
>A disadvantage is
>that the CList of primes takes (about) four time as much memory as the
>CArray holding the same number of primes. The primes are 32 bit ints so I
>don't know why it's four times instead of three times (previous pointer,
>prime and next pointer). What else is in there?
****
you have made a fundaemental failure in thinking here. You think, for example, that the
size of any elmeent matters, which is the wrong way to think about the problem. What
matter is the memory footprint, and the induced paging, which is potentially far worse in
a CLIst. Note that most prime algorithms are variants of the Seive of Eastosthenes, which
means they don't scal to large integer (such as are using for encryption) and you can find
enumerations of all the primes in the range of 0..2**32-1 on the Web and in reference
books. LIss are nearly always less efficient than arrays due to locality-of-reference
issues that affect cache behavior and paging.
*****
>
>I tried something that I wasn't sure would work but it did. I serialized a
>CArray to disk and then later I serialized it from disk to a CList. It
>worked. It's not a total surprise, because it wouldn't make sense to store
>the next and previous CList pointers into a serialize archive. My question
>here is, does anyone know if these were made intentionally compatible and if
>I can count on that working in all cases?
****
I wouldn't touch MFC serialization as far as I could throw a mainframe. I would not use
it for anything I cared about.
joe
****
>
>Thanks...
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/15/2010 11:41:42 PM
|
|
On Mar 15, 4:06=A0pm, "Bill Brehm" <don't want spam> wrote:
> I tried something that I wasn't sure would work but it did. I serialized =
a
> CArray to disk and then later I serialized it from disk to a CList. It
> worked. It's not a total surprise, because it wouldn't make sense to stor=
e
> the next and previous CList pointers into a serialize archive. My questio=
n
> here is, does anyone know if these were made intentionally compatible and=
if
> I can count on that working in all cases?
Yes, but I doubt that was intentional. The reason is really this: both
CList<> and CArray<> simply serialize the number of elements, then
each element, from first to last. If you think about it, what else
should they do? So this will work for lists and arrays, and I would
guess that you can also altercate between e.g. CArray<BYTE> and
CByteArray and such, __EXCEPT__ if you use "shift" operators and
serialize pointers to concrete class (that is, if you use ar <<
pToCByteArray, you will have trouble using ar >> pToTemplateByteArray
later on).
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
3/16/2010 7:56:04 AM
|
|
"Your choice of CArray is probably a Bad Idea."
It IS turning out to be a bad idea. This was one of the things I said I
would get around to asking. If I give a growby of 10000, the program runs at
a reasonable speed. When I set it much higher, like 100,000,000, thinking
that i'll have fewer reallocations, it runs really slowly. I would have
expected the opposite until 1e8 primes were calculated, then a large delay
during reallocation. But it's slow from the start. I did some
PerformanceQueries and found that adding something to the CArray (when it
grew by 10000) was ten times slow than doing a division as part of the prime
check. I don't know if it's a paging issue or some inefficiency in CArray
itself. I guess it's much worse with a large growby but i don't know why and
i didn't measure to check.
Would the release version be much faster than the debug version (less
checking for bounds and other stuff)? Is there a way (with task manager
maybe) to tell if the CArray is staying in memory or being swapped out to
disk? Is there a way to force the CArray to stay in memory and not swap out?
Any guess why it might be so slow?
"you have made a fundaemental failure in thinking here"
Maybe so, but it turns out that the CList is much faster than the CArray, in
my program at least. Too bad it takes four times the memory and I run out
before I can reach 2^32. Also, it takes much longer to clear out the CList
than the CArray, which is logical since it's made up of many tiny parts
instead of a few large parts.
I found another issue with CList which I think I would consider a bug. When
I Serialize back from disk into a CArray, anything in the array is wiped out
and overwritten. When I do that into the CList, the disk data is appended to
the end of what is in the list. I didn't see that described in the help but
I also didn't find any mention of this "bug" in a quick Google search.
"I wouldn't touch MFC serialization as far as I could throw a mainframe."
Ha ha, how far can you throw a mainframe? I am learning to agree with you on
this. I don't think I ever used it before. I started to use it this time
because it is much faster and uses less space than saving and loading in
text format. But if it's not trustworthy then it's not worth it.
"Note that most prime algorithms are variants of the Seive of Eastosthenes"
Yes, I'm using that. My interest in playing with primes was triggered by
trying to learn about encryption and modular arithmetic used in RSA. But my
first program ever was a prime number generator running on some HP Basic
computer back in the early 70's, so it's interesting to me to bring that up
to date. My program also can run the old way by dividing by primes until the
sqrt of the number under test. I think my original Basic program divided by
all odd numbers because there wasn't enough memory to store the array of
primes on that old computer.
Actually the sieve technique should scale to large integers (if you mean the
64 bit type and not arbitrarily large) once one has an array of primes to
2^32 and if the sieving is done in segments.
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
> See below...
> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>I am working on a program to calculate primes. It's just for fun, not for
>>work or school, if that matters here. I'm trying to optimize it and I'll
>>have some more questions about that later.
>>
>>Today I changed the CArray used to hold the primes to a CList based on a
>>compile time selection, figuring I would save the time of reallocating and
>>copying the CArray every time I went over the maximum allocated size. I'm
>>aware of the pros and cons of CArray and CList in general.
> ****
> Your choice of CArray is probably a Bad Idea. First, you should SetSize
> and set an
> explicit reallocation quantum (the second parameter of SetSize) to
> minimize the amount of
> reallocation that is done. Or use std::vector which does this implicitly,
> and had far,
> far better performance than CArray.
> ****
>>A disadvantage is
>>that the CList of primes takes (about) four time as much memory as the
>>CArray holding the same number of primes. The primes are 32 bit ints so I
>>don't know why it's four times instead of three times (previous pointer,
>>prime and next pointer). What else is in there?
> ****
> you have made a fundaemental failure in thinking here. You think, for
> example, that the
> size of any elmeent matters, which is the wrong way to think about the
> problem. What
> matter is the memory footprint, and the induced paging, which is
> potentially far worse in
> a CLIst. Note that most prime algorithms are variants of the Seive of
> Eastosthenes, which
> means they don't scal to large integer (such as are using for encryption)
> and you can find
> enumerations of all the primes in the range of 0..2**32-1 on the Web and
> in reference
> books. LIss are nearly always less efficient than arrays due to
> locality-of-reference
> issues that affect cache behavior and paging.
> *****
>>
>>I tried something that I wasn't sure would work but it did. I serialized a
>>CArray to disk and then later I serialized it from disk to a CList. It
>>worked. It's not a total surprise, because it wouldn't make sense to store
>>the next and previous CList pointers into a serialize archive. My question
>>here is, does anyone know if these were made intentionally compatible and
>>if
>>I can count on that working in all cases?
> ****
> I wouldn't touch MFC serialization as far as I could throw a mainframe. I
> would not use
> it for anything I cared about.
> joe
> ****
>>
>>Thanks...
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/16/2010 1:38:46 PM
|
|
"Bill Brehm" <don't want spam> ha scritto nel messaggio
news:e9or$3QxKHA.1796@TK2MSFTNGP02.phx.gbl...
> If I give a growby of 10000, the program runs at a reasonable speed. When
> I set it much higher, like 100,000,000, thinking that i'll have fewer
> reallocations, it runs really slowly. I would have expected the opposite
> until 1e8 primes were calculated, then a large delay during reallocation.
One of the problems of CArray is its reallocation policy. IIRC, CArray uses
an arithmetic growth policy, which is not smart.
Instead, std::vector uses a geometric growth policy (using a 1.5x factor for
increasing capacity, IIRC).
In any case, I think that if you set CArray initial size big enough, you
won't have speed problems.
Did you try something like SetSize(100000000) ?
Then you can use a separated integer index to identify the first free slot
in the array, to add new found primes.
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/16/2010 4:26:22 PM
|
|
See below...
On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:
>"Your choice of CArray is probably a Bad Idea."
>
>It IS turning out to be a bad idea. This was one of the things I said I
>would get around to asking. If I give a growby of 10000, the program runs at
>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>that i'll have fewer reallocations, it runs really slowly. I would have
>expected the opposite until 1e8 primes were calculated, then a large delay
>during reallocation. But it's slow from the start.
***
Someone one complained that they could only add about 5000 elements/second to CArray. I
tried the same experiment, and found that indeed that was the number I was getting, in
Debug mode. They were adding 1.5M elements to the CArray. I set the expansion quantum to
500,000 and measured in Release mode (which is much faster) and got 15M elements/second.
CArray without using SetSize to modify the quantum is horrendous performance,
approximately exponential in time with the number of elements. std::vector is very close
to linear (I did the measurements, but never published them).
*****
>I did some
>PerformanceQueries and found that adding something to the CArray (when it
>grew by 10000) was ten times slow than doing a division as part of the prime
>check. I don't know if it's a paging issue or some inefficiency in CArray
>itself. I guess it's much worse with a large growby but i don't know why and
>i didn't measure to check.
****
Add 1 element: allocate a new array of size+1; copy size elements from the old array, add
the new element to the end. Expoential peformance; the time goes up as approximately the
square of the number of elements (do the arithmetic for the copy operation). Never
believe ANYTHING you measure in Debug mode; it is irrelevant and will always be horrible.
This is because there are bounds checks and new-storage-iniitailization that take time.
Most paging performance is irrelevant unless your vector gets into the hundreds of
megabytes in size, at which point you definitely have the wrong representation.
****
>
>Would the release version be much faster than the debug version (less
>checking for bounds and other stuff)?
****
Absolutely! See previous comment; you CANNOT rely on "Debug" measurements for true
performance guarantees.
****
>Is there a way (with task manager
>maybe) to tell if the CArray is staying in memory or being swapped out to
>disk? Is there a way to force the CArray to stay in memory and not swap out?
>Any guess why it might be so slow?
****
Watch for increases in paging faults using the perfmon tool. A substantial increase in
paging frequency is a Bad Sign. Note that mking your algorithm cache-aware can generally
improve performance by 10X, but you need to know the cache parameters for each computer
(which you can discover) to write code like this, and it doesn't always work due to the
nature of the particular algorithm involved.
****
>
>"you have made a fundaemental failure in thinking here"
>
>Maybe so, but it turns out that the CList is much faster than the CArray, in
>my program at least. Too bad it takes four times the memory and I run out
>before I can reach 2^32. Also, it takes much longer to clear out the CList
>than the CArray, which is logical since it's made up of many tiny parts
>instead of a few large parts.
****
But tha fundamental failure in thinking is thinking that the extra space actually matters.
It may not. Note that while adding to an array with SetSize(..., 0) (the default), is
exponential in time, adding to a list is constant time, so burning the extra space would
not matter. Unless you exceed the working set size, at which point, it becomes a paging
problem, or if the elements are widely scattered in memory, or the storage allocator
becomes the performance bottleneck (which it can). If you anticipate 1,000,000 elements,
do a SetSize(0, 1000000); when the array is empty; then the array append is constant time
up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the performance as the
array is reallocated to size 2,000,000 and then time is constant again to 2,000,000. IF
you are not doing the SetSize, then your assumption are all broken and you will get
exponential cost. So you cannot
(a) predict performance if you are in Debug mode
(b) predict performance if you have not forced preallocation (SetSize)
which is handled "by magic" in std::vector
You have apparently committed both sins here, so your choices don't have a sound basis.
joe
****
>
>I found another issue with CList which I think I would consider a bug. When
>I Serialize back from disk into a CArray, anything in the array is wiped out
>and overwritten. When I do that into the CList, the disk data is appended to
>the end of what is in the list. I didn't see that described in the help but
>I also didn't find any mention of this "bug" in a quick Google search.
****
No, serialization is intended to replace the contents, so you are seeing the specified
behavior. Appending was never considered as a part of the design. In fact, it would have
been considered a bug by the designers if serialization appended.
****
>
>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>
>Ha ha, how far can you throw a mainframe? I am learning to agree with you on
>this. I don't think I ever used it before. I started to use it this time
>because it is much faster and uses less space than saving and loading in
>text format. But if it's not trustworthy then it's not worth it.
****
Given that I'm actually physically handicapped, and probably couldn't throw a mere laptop
very far, throwing a 1-ton mainframe isn't likely. Probably measured in nanometers. The
real problem is schema evolution, a problem I had to address when I designed what is a lot
like XML back in 1977 (except our design was far better). Today, I use XML for my data
representation.
*****
>
>"Note that most prime algorithms are variants of the Seive of Eastosthenes"
>
>Yes, I'm using that. My interest in playing with primes was triggered by
>trying to learn about encryption and modular arithmetic used in RSA. But my
>first program ever was a prime number generator running on some HP Basic
>computer back in the early 70's, so it's interesting to me to bring that up
>to date. My program also can run the old way by dividing by primes until the
>sqrt of the number under test. I think my original Basic program divided by
>all odd numbers because there wasn't enough memory to store the array of
>primes on that old computer.
****
Note that RSA wants to use massive prime numbers (which are almost impossible to factor).
I have been talking with a colleague who is an encryption guy and we are talking about
developing a course in encryption when we would use RSA-8 (8-bit primes) so students could
work out the encryption with just a calculator or even pencil and paper and then show why
short key are vulnerable by asking them to work out encrypted data (since 8-bit primes
would be easy to crack), to give some real feel for the algorithms. He has taken a course
from Ron Rivest (the R of RSA) and is really into this stuff.
****
>
>Actually the sieve technique should scale to large integers (if you mean the
>64 bit type and not arbitrarily large) once one has an array of primes to
>2^32 and if the sieving is done in segments.
****
It doesn't scale; note that modern encryption might use 1024, 2048 or 4096-bit primes, and
facroting these with moden computers and moden algorithms takes intervals such that the
Sun is likely to expand into a Red Giant and engulf the Earth before the algorithm
terminates.
joe
****
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>> See below...
>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:
>>
>>>I am working on a program to calculate primes. It's just for fun, not for
>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>have some more questions about that later.
>>>
>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>compile time selection, figuring I would save the time of reallocating and
>>>copying the CArray every time I went over the maximum allocated size. I'm
>>>aware of the pros and cons of CArray and CList in general.
>> ****
>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>> and set an
>> explicit reallocation quantum (the second parameter of SetSize) to
>> minimize the amount of
>> reallocation that is done. Or use std::vector which does this implicitly,
>> and had far,
>> far better performance than CArray.
>> ****
>>>A disadvantage is
>>>that the CList of primes takes (about) four time as much memory as the
>>>CArray holding the same number of primes. The primes are 32 bit ints so I
>>>don't know why it's four times instead of three times (previous pointer,
>>>prime and next pointer). What else is in there?
>> ****
>> you have made a fundaemental failure in thinking here. You think, for
>> example, that the
>> size of any elmeent matters, which is the wrong way to think about the
>> problem. What
>> matter is the memory footprint, and the induced paging, which is
>> potentially far worse in
>> a CLIst. Note that most prime algorithms are variants of the Seive of
>> Eastosthenes, which
>> means they don't scal to large integer (such as are using for encryption)
>> and you can find
>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>> in reference
>> books. LIss are nearly always less efficient than arrays due to
>> locality-of-reference
>> issues that affect cache behavior and paging.
>> *****
>>>
>>>I tried something that I wasn't sure would work but it did. I serialized a
>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>worked. It's not a total surprise, because it wouldn't make sense to store
>>>the next and previous CList pointers into a serialize archive. My question
>>>here is, does anyone know if these were made intentionally compatible and
>>>if
>>>I can count on that working in all cases?
>> ****
>> I wouldn't touch MFC serialization as far as I could throw a mainframe. I
>> would not use
>> it for anything I cared about.
>> joe
>> ****
>>>
>>>Thanks...
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/16/2010 5:50:47 PM
|
|
Bill Brehm wrote:
> "Your choice of CArray is probably a Bad Idea."
>
> It IS turning out to be a bad idea. This was one of the things I said I
> would get around to asking. If I give a growby of 10000, the program runs at
> a reasonable speed. When I set it much higher, like 100,000,000, thinking
> that i'll have fewer reallocations, it runs really slowly. I would have
> expected the opposite until 1e8 primes were calculated, then a large delay
> during reallocation. But it's slow from the start. I did some
> PerformanceQueries and found that adding something to the CArray (when it
> grew by 10000) was ten times slow than doing a division as part of the prime
> check. I don't know if it's a paging issue or some inefficiency in CArray
> itself. I guess it's much worse with a large growby but i don't know why and
> i didn't measure to check.
Bill, your solution is a memory map. Find yourself a good "CFileMap"
class that allows you to an element type to it and handles it as a
array. We have our own that allows us to do this:
struct TArrayItems {
int x;
int y;
};
CFileMap<TArrayItems> myArray;
myArray.Open("some file name even temp");
myArray[6].x = 1;
myArray[6].y = 2;
....
myArray.Delete(10);
myArray.Close();
etc.
Such a good class will use the CArray or any other collection class as
a base class for file mapping.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
3/16/2010 8:41:24 PM
|
|
"Someone one complained that they could only add about 5000 elements/second
to CArray"
I am well aware of the growby parameter of SetSize() and have had it in my
program from the start. Yet CArray has an odd behavior in my program that
doesn't follow what one would expect knowing how the growby parameter is
supposed to make CArray work.
I did some tests with my program using both debug mode and release mode.
Also I varied the growby parameter. Lastly, I can store the primes in a
CArray, a CList or a preallocated array of unsigned ints (needs just over
200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for all
timing results.
Results for debug mode:
int[] - around 43000 primes per second.
CList - around 39000 primes per second. The slightly lower speed is probably
due to run time allocation of the memory for new elements.
CArray, growby 100,000 - start at around 35000 pps then tapers down to 16000
pps after 400,000 primes have been calculated. suddenly drops to 8000 pps at
around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
calculated. one might expect it to slow down as it must reallocate and copy
an ever growing array ten times during this test. I don't understand the
sudden drop at 500,000. I know that displaying the results on the GUI takes
a lot of time, so I only display every 1000th prime. The timing will include
the reallocation time but only once in every 100,000 primes generated and
therefore only once in ever 100 display updates.
CArray, growby 100 - start at around 40000 pps then tapers down to 8000 pps
after 500,000 primes have been calculated then tapers down to 2700 pps
around 1,000,000 primes calculated. Not surprising that it's slower with
more reallocations.
Here where it gets weird.
CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow to
wait for 1,000,000 primes to be generated to see the speed there. At 245,000
primes, the speed is the same. There is a longer delay at the start to
allocate the memory.
The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
primes in the first few passes. The sieve runs very quickly. The time being
measured at such low speeds is basically the time to copy data into the
CArray.
Results for release mode:
int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
start and still at 1,000,000 primes and even at 3,000,000 primes. CList
takes a long time to give back the memory when I remove all the elements
from it.
CArray with growby 100 - around 57,000 pps to start and tapers down to
13,000 at 1,000,000 primes. This is not surprising as there will be 10
reallocations for every time the timing update is displayed and so the
reallocation time is included in the timing calculation.
So I would conclude that in release mode, everything works logically. In
debug mode, I might have blamed page faults as a reason to slow down with a
large CArray growby, but then why doesn't it affect the int[] which is the
same size and allocated in one big chunk like the CArray is. I suspect there
is something funny (too much overhead) in CArray in debug mode for large
growby sizes.
"But tha fundamental failure in thinking is thinking that the extra space
actually matters. It may not."
It does in my case because I reach the 2GB limit before I reach the end of
the primes that I want to calculate up to 2^32. There are 200,000,000+
primes in that set and at 4 bytes per prime for CArray or int[], I can fit
in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16 bytes
per prime.
"It doesn't scale; note that modern encryption might use 1024, 2048 or
4096-bit primes"
So you were referring to arbitrarily large numbers. when you said 'large
integer', I thought you might be referring to LARGE_INTEGER which is only 64
bits. Using the primes up to 2^32, I could use the sieve to calculate all
the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
space (on my PC) to hold them because there are somewhere between 10^15 and
10^16 of them.
"No, serialization is intended to replace the contents, so you are seeing
the specified
behavior. Appending was never considered as a part of the design. In fact,
it would have
been considered a bug by the designers if serialization appended."
In my case it IS appending. If I start out with one, two or three elements
in the CList, after I serialize from disk, those original elements are still
in there. I don't know why but I think and agree with you that they should
not be.
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
> See below...
> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>"Your choice of CArray is probably a Bad Idea."
>>
>>It IS turning out to be a bad idea. This was one of the things I said I
>>would get around to asking. If I give a growby of 10000, the program runs
>>at
>>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>>that i'll have fewer reallocations, it runs really slowly. I would have
>>expected the opposite until 1e8 primes were calculated, then a large delay
>>during reallocation. But it's slow from the start.
> ***
> Someone one complained that they could only add about 5000 elements/second
> to CArray. I
> tried the same experiment, and found that indeed that was the number I was
> getting, in
> Debug mode. They were adding 1.5M elements to the CArray. I set the
> expansion quantum to
> 500,000 and measured in Release mode (which is much faster) and got 15M
> elements/second.
> CArray without using SetSize to modify the quantum is horrendous
> performance,
> approximately exponential in time with the number of elements.
> std::vector is very close
> to linear (I did the measurements, but never published them).
> *****
>>I did some
>>PerformanceQueries and found that adding something to the CArray (when it
>>grew by 10000) was ten times slow than doing a division as part of the
>>prime
>>check. I don't know if it's a paging issue or some inefficiency in CArray
>>itself. I guess it's much worse with a large growby but i don't know why
>>and
>>i didn't measure to check.
> ****
> Add 1 element: allocate a new array of size+1; copy size elements from the
> old array, add
> the new element to the end. Expoential peformance; the time goes up as
> approximately the
> square of the number of elements (do the arithmetic for the copy
> operation). Never
> believe ANYTHING you measure in Debug mode; it is irrelevant and will
> always be horrible.
> This is because there are bounds checks and new-storage-iniitailization
> that take time.
> Most paging performance is irrelevant unless your vector gets into the
> hundreds of
> megabytes in size, at which point you definitely have the wrong
> representation.
> ****
>>
>>Would the release version be much faster than the debug version (less
>>checking for bounds and other stuff)?
> ****
> Absolutely! See previous comment; you CANNOT rely on "Debug" measurements
> for true
> performance guarantees.
> ****
>>Is there a way (with task manager
>>maybe) to tell if the CArray is staying in memory or being swapped out to
>>disk? Is there a way to force the CArray to stay in memory and not swap
>>out?
>>Any guess why it might be so slow?
> ****
> Watch for increases in paging faults using the perfmon tool. A
> substantial increase in
> paging frequency is a Bad Sign. Note that mking your algorithm
> cache-aware can generally
> improve performance by 10X, but you need to know the cache parameters for
> each computer
> (which you can discover) to write code like this, and it doesn't always
> work due to the
> nature of the particular algorithm involved.
> ****
>>
>>"you have made a fundaemental failure in thinking here"
>>
>>Maybe so, but it turns out that the CList is much faster than the CArray,
>>in
>>my program at least. Too bad it takes four times the memory and I run out
>>before I can reach 2^32. Also, it takes much longer to clear out the CList
>>than the CArray, which is logical since it's made up of many tiny parts
>>instead of a few large parts.
> ****
> But tha fundamental failure in thinking is thinking that the extra space
> actually matters.
> It may not. Note that while adding to an array with SetSize(..., 0) (the
> default), is
> exponential in time, adding to a list is constant time, so burning the
> extra space would
> not matter. Unless you exceed the working set size, at which point, it
> becomes a paging
> problem, or if the elements are widely scattered in memory, or the storage
> allocator
> becomes the performance bottleneck (which it can). If you anticipate
> 1,000,000 elements,
> do a SetSize(0, 1000000); when the array is empty; then the array append
> is constant time
> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
> performance as the
> array is reallocated to size 2,000,000 and then time is constant again to
> 2,000,000. IF
> you are not doing the SetSize, then your assumption are all broken and you
> will get
> exponential cost. So you cannot
> (a) predict performance if you are in Debug mode
> (b) predict performance if you have not forced preallocation (SetSize)
> which is handled "by magic" in std::vector
> You have apparently committed both sins here, so your choices don't have a
> sound basis.
> joe
> ****
>>
>>I found another issue with CList which I think I would consider a bug.
>>When
>>I Serialize back from disk into a CArray, anything in the array is wiped
>>out
>>and overwritten. When I do that into the CList, the disk data is appended
>>to
>>the end of what is in the list. I didn't see that described in the help
>>but
>>I also didn't find any mention of this "bug" in a quick Google search.
> ****
> No, serialization is intended to replace the contents, so you are seeing
> the specified
> behavior. Appending was never considered as a part of the design. In
> fact, it would have
> been considered a bug by the designers if serialization appended.
> ****
>>
>>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>>
>>Ha ha, how far can you throw a mainframe? I am learning to agree with you
>>on
>>this. I don't think I ever used it before. I started to use it this time
>>because it is much faster and uses less space than saving and loading in
>>text format. But if it's not trustworthy then it's not worth it.
> ****
> Given that I'm actually physically handicapped, and probably couldn't
> throw a mere laptop
> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
> nanometers. The
> real problem is schema evolution, a problem I had to address when I
> designed what is a lot
> like XML back in 1977 (except our design was far better). Today, I use
> XML for my data
> representation.
> *****
>>
>>"Note that most prime algorithms are variants of the Seive of
>>Eastosthenes"
>>
>>Yes, I'm using that. My interest in playing with primes was triggered by
>>trying to learn about encryption and modular arithmetic used in RSA. But
>>my
>>first program ever was a prime number generator running on some HP Basic
>>computer back in the early 70's, so it's interesting to me to bring that
>>up
>>to date. My program also can run the old way by dividing by primes until
>>the
>>sqrt of the number under test. I think my original Basic program divided
>>by
>>all odd numbers because there wasn't enough memory to store the array of
>>primes on that old computer.
> ****
> Note that RSA wants to use massive prime numbers (which are almost
> impossible to factor).
> I have been talking with a colleague who is an encryption guy and we are
> talking about
> developing a course in encryption when we would use RSA-8 (8-bit primes)
> so students could
> work out the encryption with just a calculator or even pencil and paper
> and then show why
> short key are vulnerable by asking them to work out encrypted data (since
> 8-bit primes
> would be easy to crack), to give some real feel for the algorithms. He
> has taken a course
> from Ron Rivest (the R of RSA) and is really into this stuff.
> ****
>>
>>Actually the sieve technique should scale to large integers (if you mean
>>the
>>64 bit type and not arbitrarily large) once one has an array of primes to
>>2^32 and if the sieving is done in segments.
> ****
> It doesn't scale; note that modern encryption might use 1024, 2048 or
> 4096-bit primes, and
> facroting these with moden computers and moden algorithms takes intervals
> such that the
> Sun is likely to expand into a Red Giant and engulf the Earth before the
> algorithm
> terminates.
> joe
> ****
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>> See below...
>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>for
>>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>>have some more questions about that later.
>>>>
>>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>>compile time selection, figuring I would save the time of reallocating
>>>>and
>>>>copying the CArray every time I went over the maximum allocated size.
>>>>I'm
>>>>aware of the pros and cons of CArray and CList in general.
>>> ****
>>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>>> and set an
>>> explicit reallocation quantum (the second parameter of SetSize) to
>>> minimize the amount of
>>> reallocation that is done. Or use std::vector which does this
>>> implicitly,
>>> and had far,
>>> far better performance than CArray.
>>> ****
>>>>A disadvantage is
>>>>that the CList of primes takes (about) four time as much memory as the
>>>>CArray holding the same number of primes. The primes are 32 bit ints so
>>>>I
>>>>don't know why it's four times instead of three times (previous pointer,
>>>>prime and next pointer). What else is in there?
>>> ****
>>> you have made a fundaemental failure in thinking here. You think, for
>>> example, that the
>>> size of any elmeent matters, which is the wrong way to think about the
>>> problem. What
>>> matter is the memory footprint, and the induced paging, which is
>>> potentially far worse in
>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>> Eastosthenes, which
>>> means they don't scal to large integer (such as are using for
>>> encryption)
>>> and you can find
>>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>>> in reference
>>> books. LIss are nearly always less efficient than arrays due to
>>> locality-of-reference
>>> issues that affect cache behavior and paging.
>>> *****
>>>>
>>>>I tried something that I wasn't sure would work but it did. I serialized
>>>>a
>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>store
>>>>the next and previous CList pointers into a serialize archive. My
>>>>question
>>>>here is, does anyone know if these were made intentionally compatible
>>>>and
>>>>if
>>>>I can count on that working in all cases?
>>> ****
>>> I wouldn't touch MFC serialization as far as I could throw a mainframe.
>>> I
>>> would not use
>>> it for anything I cared about.
>>> joe
>>> ****
>>>>
>>>>Thanks...
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer@flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/18/2010 1:21:28 AM
|
|
Sorry, because I can't see clearly, I think I sent a reply by hitting the wrong keystroke
combinattion....
What I had said was that allocation requires extra overhead. What I was about to say was
that the most efficient representation would be a bit vector of bits that were 1 for a
prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes, If you further
apply RLE (run length encoding) to this bit vector, you get an even more compact
representation which can be optimized by treating the special case of a 1-bit surrounded
by a lot of 0 bits with a separate encoding. I would consider examining this over using a
vector of primes.
joe
More below...
On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:
>"Someone one complained that they could only add about 5000 elements/second
>to CArray"
>
>I am well aware of the growby parameter of SetSize() and have had it in my
>program from the start. Yet CArray has an odd behavior in my program that
>doesn't follow what one would expect knowing how the growby parameter is
>supposed to make CArray work.
>
>I did some tests with my program using both debug mode and release mode.
>Also I varied the growby parameter. Lastly, I can store the primes in a
>CArray, a CList or a preallocated array of unsigned ints (needs just over
>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for all
>timing results.
>
>Results for debug mode:
>
>int[] - around 43000 primes per second.
>CList - around 39000 primes per second. The slightly lower speed is probably
>due to run time allocation of the memory for new elements.
>CArray, growby 100,000 - start at around 35000 pps then tapers down to 16000
>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps at
>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>calculated. one might expect it to slow down as it must reallocate and copy
>an ever growing array ten times during this test. I don't understand the
>sudden drop at 500,000. I know that displaying the results on the GUI takes
>a lot of time, so I only display every 1000th prime. The timing will include
>the reallocation time but only once in every 100,000 primes generated and
>therefore only once in ever 100 display updates.
>CArray, growby 100 - start at around 40000 pps then tapers down to 8000 pps
>after 500,000 primes have been calculated then tapers down to 2700 pps
>around 1,000,000 primes calculated. Not surprising that it's slower with
>more reallocations.
>Here where it gets weird.
>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow to
>wait for 1,000,000 primes to be generated to see the speed there. At 245,000
>primes, the speed is the same. There is a longer delay at the start to
>allocate the memory.
***
In case the last message didn't make it, what I indicated here was the first element
involves adding 10,000,000 elements, which is going to badly skew your first average. If
you add "1" or "3" outside the measurement interval to factor that startup transient out
of your measurements you might get a more interesting (and less biased) result. It all
depends on what you are trying to measure.
****
>
>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>primes in the first few passes. The sieve runs very quickly. The time being
>measured at such low speeds is basically the time to copy data into the
>CArray.
>
>Results for release mode:
>
>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>takes a long time to give back the memory when I remove all the elements
>from it.
>CArray with growby 100 - around 57,000 pps to start and tapers down to
>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>reallocations for every time the timing update is displayed and so the
>reallocation time is included in the timing calculation.
>
>So I would conclude that in release mode, everything works logically. In
>debug mode, I might have blamed page faults as a reason to slow down with a
>large CArray growby, but then why doesn't it affect the int[] which is the
>same size and allocated in one big chunk like the CArray is. I suspect there
>is something funny (too much overhead) in CArray in debug mode for large
>growby sizes.
****
There is a lot of initialization that goes on in debug mode that you don't get in release
mode. This can result in a lot of paging and in a lot of data copies of the
initialization values. A big blip in the radar. Measure the time it takes to add the
first element to that huge growby array and you might want to factor that out.
****
>
>"But tha fundamental failure in thinking is thinking that the extra space
>actually matters. It may not."
>
>It does in my case because I reach the 2GB limit before I reach the end of
>the primes that I want to calculate up to 2^32. There are 200,000,000+
>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16 bytes
>per prime.
****
Plus storage allocator overhead plus allocation quantum roundup, you may be allocating
between 32 and 72 bytes per storage element. ALlocation in debug mode is much larger.
****
>
>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>4096-bit primes"
>
>So you were referring to arbitrarily large numbers. when you said 'large
>integer', I thought you might be referring to LARGE_INTEGER which is only 64
>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>space (on my PC) to hold them because there are somewhere between 10^15 and
>10^16 of them.
****
LARGE_INTEGER, which is only 64 bits, is too small for effective encryption. It is
possible to crack 64-bit encryption in hours to days, making it ineffective. 256-bit to
4096-bit encryption is considered state-of-the-art for PKI today. Really secure
encryption uses one-time pads generated by truly random processes (such as radioactive
decay)
****
>
>"No, serialization is intended to replace the contents, so you are seeing
>the specified
>behavior. Appending was never considered as a part of the design. In fact,
>it would have
>been considered a bug by the designers if serialization appended."
>
>In my case it IS appending. If I start out with one, two or three elements
>in the CList, after I serialize from disk, those original elements are still
>in there. I don't know why but I think and agree with you that they should
>not be.
*****
one of the reasons I don't trust serialization is that it has too many "unspecified"
features. I can write serialization code in seconds that does what I want, and I know the
specs.
joe
*****
>
>
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
>> See below...
>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:
>>
>>>"Your choice of CArray is probably a Bad Idea."
>>>
>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>would get around to asking. If I give a growby of 10000, the program runs
>>>at
>>>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>expected the opposite until 1e8 primes were calculated, then a large delay
>>>during reallocation. But it's slow from the start.
>> ***
>> Someone one complained that they could only add about 5000 elements/second
>> to CArray. I
>> tried the same experiment, and found that indeed that was the number I was
>> getting, in
>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>> expansion quantum to
>> 500,000 and measured in Release mode (which is much faster) and got 15M
>> elements/second.
>> CArray without using SetSize to modify the quantum is horrendous
>> performance,
>> approximately exponential in time with the number of elements.
>> std::vector is very close
>> to linear (I did the measurements, but never published them).
>> *****
>>>I did some
>>>PerformanceQueries and found that adding something to the CArray (when it
>>>grew by 10000) was ten times slow than doing a division as part of the
>>>prime
>>>check. I don't know if it's a paging issue or some inefficiency in CArray
>>>itself. I guess it's much worse with a large growby but i don't know why
>>>and
>>>i didn't measure to check.
>> ****
>> Add 1 element: allocate a new array of size+1; copy size elements from the
>> old array, add
>> the new element to the end. Expoential peformance; the time goes up as
>> approximately the
>> square of the number of elements (do the arithmetic for the copy
>> operation). Never
>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>> always be horrible.
>> This is because there are bounds checks and new-storage-iniitailization
>> that take time.
>> Most paging performance is irrelevant unless your vector gets into the
>> hundreds of
>> megabytes in size, at which point you definitely have the wrong
>> representation.
>> ****
>>>
>>>Would the release version be much faster than the debug version (less
>>>checking for bounds and other stuff)?
>> ****
>> Absolutely! See previous comment; you CANNOT rely on "Debug" measurements
>> for true
>> performance guarantees.
>> ****
>>>Is there a way (with task manager
>>>maybe) to tell if the CArray is staying in memory or being swapped out to
>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>out?
>>>Any guess why it might be so slow?
>> ****
>> Watch for increases in paging faults using the perfmon tool. A
>> substantial increase in
>> paging frequency is a Bad Sign. Note that mking your algorithm
>> cache-aware can generally
>> improve performance by 10X, but you need to know the cache parameters for
>> each computer
>> (which you can discover) to write code like this, and it doesn't always
>> work due to the
>> nature of the particular algorithm involved.
>> ****
>>>
>>>"you have made a fundaemental failure in thinking here"
>>>
>>>Maybe so, but it turns out that the CList is much faster than the CArray,
>>>in
>>>my program at least. Too bad it takes four times the memory and I run out
>>>before I can reach 2^32. Also, it takes much longer to clear out the CList
>>>than the CArray, which is logical since it's made up of many tiny parts
>>>instead of a few large parts.
>> ****
>> But tha fundamental failure in thinking is thinking that the extra space
>> actually matters.
>> It may not. Note that while adding to an array with SetSize(..., 0) (the
>> default), is
>> exponential in time, adding to a list is constant time, so burning the
>> extra space would
>> not matter. Unless you exceed the working set size, at which point, it
>> becomes a paging
>> problem, or if the elements are widely scattered in memory, or the storage
>> allocator
>> becomes the performance bottleneck (which it can). If you anticipate
>> 1,000,000 elements,
>> do a SetSize(0, 1000000); when the array is empty; then the array append
>> is constant time
>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>> performance as the
>> array is reallocated to size 2,000,000 and then time is constant again to
>> 2,000,000. IF
>> you are not doing the SetSize, then your assumption are all broken and you
>> will get
>> exponential cost. So you cannot
>> (a) predict performance if you are in Debug mode
>> (b) predict performance if you have not forced preallocation (SetSize)
>> which is handled "by magic" in std::vector
>> You have apparently committed both sins here, so your choices don't have a
>> sound basis.
>> joe
>> ****
>>>
>>>I found another issue with CList which I think I would consider a bug.
>>>When
>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>out
>>>and overwritten. When I do that into the CList, the disk data is appended
>>>to
>>>the end of what is in the list. I didn't see that described in the help
>>>but
>>>I also didn't find any mention of this "bug" in a quick Google search.
>> ****
>> No, serialization is intended to replace the contents, so you are seeing
>> the specified
>> behavior. Appending was never considered as a part of the design. In
>> fact, it would have
>> been considered a bug by the designers if serialization appended.
>> ****
>>>
>>>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>>>
>>>Ha ha, how far can you throw a mainframe? I am learning to agree with you
>>>on
>>>this. I don't think I ever used it before. I started to use it this time
>>>because it is much faster and uses less space than saving and loading in
>>>text format. But if it's not trustworthy then it's not worth it.
>> ****
>> Given that I'm actually physically handicapped, and probably couldn't
>> throw a mere laptop
>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>> nanometers. The
>> real problem is schema evolution, a problem I had to address when I
>> designed what is a lot
>> like XML back in 1977 (except our design was far better). Today, I use
>> XML for my data
>> representation.
>> *****
>>>
>>>"Note that most prime algorithms are variants of the Seive of
>>>Eastosthenes"
>>>
>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>my
>>>first program ever was a prime number generator running on some HP Basic
>>>computer back in the early 70's, so it's interesting to me to bring that
>>>up
>>>to date. My program also can run the old way by dividing by primes until
>>>the
>>>sqrt of the number under test. I think my original Basic program divided
>>>by
>>>all odd numbers because there wasn't enough memory to store the array of
>>>primes on that old computer.
>> ****
>> Note that RSA wants to use massive prime numbers (which are almost
>> impossible to factor).
>> I have been talking with a colleague who is an encryption guy and we are
>> talking about
>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>> so students could
>> work out the encryption with just a calculator or even pencil and paper
>> and then show why
>> short key are vulnerable by asking them to work out encrypted data (since
>> 8-bit primes
>> would be easy to crack), to give some real feel for the algorithms. He
>> has taken a course
>> from Ron Rivest (the R of RSA) and is really into this stuff.
>> ****
>>>
>>>Actually the sieve technique should scale to large integers (if you mean
>>>the
>>>64 bit type and not arbitrarily large) once one has an array of primes to
>>>2^32 and if the sieving is done in segments.
>> ****
>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>> 4096-bit primes, and
>> facroting these with moden computers and moden algorithms takes intervals
>> such that the
>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>> algorithm
>> terminates.
>> joe
>> ****
>>>
>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>>> See below...
>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>> wrote:
>>>>
>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>for
>>>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>>>have some more questions about that later.
>>>>>
>>>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>and
>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>I'm
>>>>>aware of the pros and cons of CArray and CList in general.
>>>> ****
>>>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>>>> and set an
>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>> minimize the amount of
>>>> reallocation that is done. Or use std::vector which does this
>>>> implicitly,
>>>> and had far,
>>>> far better performance than CArray.
>>>> ****
>>>>>A disadvantage is
>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>CArray holding the same number of primes. The primes are 32 bit ints so
>>>>>I
>>>>>don't know why it's four times instead of three times (previous pointer,
>>>>>prime and next pointer). What else is in there?
>>>> ****
>>>> you have made a fundaemental failure in thinking here. You think, for
>>>> example, that the
>>>> size of any elmeent matters, which is the wrong way to think about the
>>>> problem. What
>>>> matter is the memory footprint, and the induced paging, which is
>>>> potentially far worse in
>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>> Eastosthenes, which
>>>> means they don't scal to large integer (such as are using for
>>>> encryption)
>>>> and you can find
>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>>>> in reference
>>>> books. LIss are nearly always less efficient than arrays due to
>>>> locality-of-reference
>>>> issues that affect cache behavior and paging.
>>>> *****
>>>>>
>>>>>I tried something that I wasn't sure would work but it did. I serialized
>>>>>a
>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>store
>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>question
>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>and
>>>>>if
>>>>>I can count on that working in all cases?
>>>> ****
>>>> I wouldn't touch MFC serialization as far as I could throw a mainframe.
>>>> I
>>>> would not use
>>>> it for anything I cared about.
>>>> joe
>>>> ****
>>>>>
>>>>>Thanks...
>>>>>
>>>> Joseph M. Newcomer [MVP]
>>>> email: newcomer@flounder.com
>>>> Web: http://www.flounder.com
>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/18/2010 5:43:23 PM
|
|
Hi Joe,
I had considered using the bit vector idea idea, but didn't proceed with it
because I was concerned about the amount of time it would take to find for
the next prime compared to just indexing into an array. But the bit vector
needs only 500MB for primes up to 2^32 while the array of unsigned ints
needs ~800MB. Furthermore, the bit vector of boolean bits can serve as the
sieve itself too, meaning I don't need two separate memory structures and
don't need to copy the found primes from the sieve over to the store.What I
hadn't thought of is compressing with RLE. That's a good idea and I might
just play around with both of these later.
I've acheived the goal of calculating the primes up to 2^32. What I started
out with, using division, using CArray and using debug mode, would have
taken months to complete the calculations. What I have now, using the sieve,
using unsigned int[] and, well, still using debug mode because it's fast
enough after the other two bottlenecks were found, can do the whole job in
an hour or so. Thanks for the tips, ideas, discussions and help in finding
those bottlenecks.
I think I'll move on to writing my own class to handle arbitrarily large
integers and see where I want to go from there. I won't be trying to
calculate all the primes up to any number anymore, but I can use it for
other prime, modulo abd encryption experiments.
There was one more question related to speed that I haven't asked yet. When
the program is running full speed, the task manager shows it's only using
49 - 51 percent of the CPU. The idle process is using most of the rest. I
did have a YieldTime() call in the loop to peek and process Windows messages
to keep the GUI active. But when I remove that it went to to 39%
utilization. Setting the priority to real time to high or realtime didn't
help. Do you know why it's not making full use of the CPU and how to force
that? My processor does have two CPUs, but the affinity is set to both and
all the CPU usage numbers add up to 100%, not 200%, so I believe the 50% is
not due to only running on one of the two processors.
Thanks,
Bill
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:vro4q5t923qa256ej597fbkl88fg0dot6t@4ax.com...
> Sorry, because I can't see clearly, I think I sent a reply by hitting the
> wrong keystroke
> combinattion....
>
> What I had said was that allocation requires extra overhead. What I was
> about to say was
> that the most efficient representation would be a bit vector of bits that
> were 1 for a
> prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes,
> If you further
> apply RLE (run length encoding) to this bit vector, you get an even more
> compact
> representation which can be optimized by treating the special case of a
> 1-bit surrounded
> by a lot of 0 bits with a separate encoding. I would consider examining
> this over using a
> vector of primes.
> joe
>
> More below...
>
> On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>"Someone one complained that they could only add about 5000
>>elements/second
>>to CArray"
>>
>>I am well aware of the growby parameter of SetSize() and have had it in my
>>program from the start. Yet CArray has an odd behavior in my program that
>>doesn't follow what one would expect knowing how the growby parameter is
>>supposed to make CArray work.
>>
>>I did some tests with my program using both debug mode and release mode.
>>Also I varied the growby parameter. Lastly, I can store the primes in a
>>CArray, a CList or a preallocated array of unsigned ints (needs just over
>>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for
>>all
>>timing results.
>>
>>Results for debug mode:
>>
>>int[] - around 43000 primes per second.
>>CList - around 39000 primes per second. The slightly lower speed is
>>probably
>>due to run time allocation of the memory for new elements.
>>CArray, growby 100,000 - start at around 35000 pps then tapers down to
>>16000
>>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps
>>at
>>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>>calculated. one might expect it to slow down as it must reallocate and
>>copy
>>an ever growing array ten times during this test. I don't understand the
>>sudden drop at 500,000. I know that displaying the results on the GUI
>>takes
>>a lot of time, so I only display every 1000th prime. The timing will
>>include
>>the reallocation time but only once in every 100,000 primes generated and
>>therefore only once in ever 100 display updates.
>>CArray, growby 100 - start at around 40000 pps then tapers down to 8000
>>pps
>>after 500,000 primes have been calculated then tapers down to 2700 pps
>>around 1,000,000 primes calculated. Not surprising that it's slower with
>>more reallocations.
>>Here where it gets weird.
>>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow
>>to
>>wait for 1,000,000 primes to be generated to see the speed there. At
>>245,000
>>primes, the speed is the same. There is a longer delay at the start to
>>allocate the memory.
> ***
> In case the last message didn't make it, what I indicated here was the
> first element
> involves adding 10,000,000 elements, which is going to badly skew your
> first average. If
> you add "1" or "3" outside the measurement interval to factor that startup
> transient out
> of your measurements you might get a more interesting (and less biased)
> result. It all
> depends on what you are trying to measure.
> ****
>>
>>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>>primes in the first few passes. The sieve runs very quickly. The time
>>being
>>measured at such low speeds is basically the time to copy data into the
>>CArray.
>>
>>Results for release mode:
>>
>>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>>takes a long time to give back the memory when I remove all the elements
>>from it.
>>CArray with growby 100 - around 57,000 pps to start and tapers down to
>>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>>reallocations for every time the timing update is displayed and so the
>>reallocation time is included in the timing calculation.
>>
>>So I would conclude that in release mode, everything works logically. In
>>debug mode, I might have blamed page faults as a reason to slow down with
>>a
>>large CArray growby, but then why doesn't it affect the int[] which is the
>>same size and allocated in one big chunk like the CArray is. I suspect
>>there
>>is something funny (too much overhead) in CArray in debug mode for large
>>growby sizes.
> ****
> There is a lot of initialization that goes on in debug mode that you don't
> get in release
> mode. This can result in a lot of paging and in a lot of data copies of
> the
> initialization values. A big blip in the radar. Measure the time it
> takes to add the
> first element to that huge growby array and you might want to factor that
> out.
> ****
>>
>>"But tha fundamental failure in thinking is thinking that the extra space
>>actually matters. It may not."
>>
>>It does in my case because I reach the 2GB limit before I reach the end of
>>the primes that I want to calculate up to 2^32. There are 200,000,000+
>>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16
>>bytes
>>per prime.
> ****
> Plus storage allocator overhead plus allocation quantum roundup, you may
> be allocating
> between 32 and 72 bytes per storage element. ALlocation in debug mode is
> much larger.
> ****
>>
>>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>>4096-bit primes"
>>
>>So you were referring to arbitrarily large numbers. when you said 'large
>>integer', I thought you might be referring to LARGE_INTEGER which is only
>>64
>>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>>space (on my PC) to hold them because there are somewhere between 10^15
>>and
>>10^16 of them.
> ****
> LARGE_INTEGER, which is only 64 bits, is too small for effective
> encryption. It is
> possible to crack 64-bit encryption in hours to days, making it
> ineffective. 256-bit to
> 4096-bit encryption is considered state-of-the-art for PKI today. Really
> secure
> encryption uses one-time pads generated by truly random processes (such as
> radioactive
> decay)
> ****
>>
>>"No, serialization is intended to replace the contents, so you are seeing
>>the specified
>>behavior. Appending was never considered as a part of the design. In
>>fact,
>>it would have
>>been considered a bug by the designers if serialization appended."
>>
>>In my case it IS appending. If I start out with one, two or three elements
>>in the CList, after I serialize from disk, those original elements are
>>still
>>in there. I don't know why but I think and agree with you that they should
>>not be.
> *****
> one of the reasons I don't trust serialization is that it has too many
> "unspecified"
> features. I can write serialization code in seconds that does what I
> want, and I know the
> specs.
> joe
> *****
>>
>>
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
>>> See below...
>>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>"Your choice of CArray is probably a Bad Idea."
>>>>
>>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>>would get around to asking. If I give a growby of 10000, the program
>>>>runs
>>>>at
>>>>a reasonable speed. When I set it much higher, like 100,000,000,
>>>>thinking
>>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>>expected the opposite until 1e8 primes were calculated, then a large
>>>>delay
>>>>during reallocation. But it's slow from the start.
>>> ***
>>> Someone one complained that they could only add about 5000
>>> elements/second
>>> to CArray. I
>>> tried the same experiment, and found that indeed that was the number I
>>> was
>>> getting, in
>>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>>> expansion quantum to
>>> 500,000 and measured in Release mode (which is much faster) and got 15M
>>> elements/second.
>>> CArray without using SetSize to modify the quantum is horrendous
>>> performance,
>>> approximately exponential in time with the number of elements.
>>> std::vector is very close
>>> to linear (I did the measurements, but never published them).
>>> *****
>>>>I did some
>>>>PerformanceQueries and found that adding something to the CArray (when
>>>>it
>>>>grew by 10000) was ten times slow than doing a division as part of the
>>>>prime
>>>>check. I don't know if it's a paging issue or some inefficiency in
>>>>CArray
>>>>itself. I guess it's much worse with a large growby but i don't know why
>>>>and
>>>>i didn't measure to check.
>>> ****
>>> Add 1 element: allocate a new array of size+1; copy size elements from
>>> the
>>> old array, add
>>> the new element to the end. Expoential peformance; the time goes up as
>>> approximately the
>>> square of the number of elements (do the arithmetic for the copy
>>> operation). Never
>>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>>> always be horrible.
>>> This is because there are bounds checks and new-storage-iniitailization
>>> that take time.
>>> Most paging performance is irrelevant unless your vector gets into the
>>> hundreds of
>>> megabytes in size, at which point you definitely have the wrong
>>> representation.
>>> ****
>>>>
>>>>Would the release version be much faster than the debug version (less
>>>>checking for bounds and other stuff)?
>>> ****
>>> Absolutely! See previous comment; you CANNOT rely on "Debug"
>>> measurements
>>> for true
>>> performance guarantees.
>>> ****
>>>>Is there a way (with task manager
>>>>maybe) to tell if the CArray is staying in memory or being swapped out
>>>>to
>>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>>out?
>>>>Any guess why it might be so slow?
>>> ****
>>> Watch for increases in paging faults using the perfmon tool. A
>>> substantial increase in
>>> paging frequency is a Bad Sign. Note that mking your algorithm
>>> cache-aware can generally
>>> improve performance by 10X, but you need to know the cache parameters
>>> for
>>> each computer
>>> (which you can discover) to write code like this, and it doesn't always
>>> work due to the
>>> nature of the particular algorithm involved.
>>> ****
>>>>
>>>>"you have made a fundaemental failure in thinking here"
>>>>
>>>>Maybe so, but it turns out that the CList is much faster than the
>>>>CArray,
>>>>in
>>>>my program at least. Too bad it takes four times the memory and I run
>>>>out
>>>>before I can reach 2^32. Also, it takes much longer to clear out the
>>>>CList
>>>>than the CArray, which is logical since it's made up of many tiny parts
>>>>instead of a few large parts.
>>> ****
>>> But tha fundamental failure in thinking is thinking that the extra space
>>> actually matters.
>>> It may not. Note that while adding to an array with SetSize(..., 0)
>>> (the
>>> default), is
>>> exponential in time, adding to a list is constant time, so burning the
>>> extra space would
>>> not matter. Unless you exceed the working set size, at which point, it
>>> becomes a paging
>>> problem, or if the elements are widely scattered in memory, or the
>>> storage
>>> allocator
>>> becomes the performance bottleneck (which it can). If you anticipate
>>> 1,000,000 elements,
>>> do a SetSize(0, 1000000); when the array is empty; then the array append
>>> is constant time
>>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>>> performance as the
>>> array is reallocated to size 2,000,000 and then time is constant again
>>> to
>>> 2,000,000. IF
>>> you are not doing the SetSize, then your assumption are all broken and
>>> you
>>> will get
>>> exponential cost. So you cannot
>>> (a) predict performance if you are in Debug mode
>>> (b) predict performance if you have not forced preallocation (SetSize)
>>> which is handled "by magic" in std::vector
>>> You have apparently committed both sins here, so your choices don't have
>>> a
>>> sound basis.
>>> joe
>>> ****
>>>>
>>>>I found another issue with CList which I think I would consider a bug.
>>>>When
>>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>>out
>>>>and overwritten. When I do that into the CList, the disk data is
>>>>appended
>>>>to
>>>>the end of what is in the list. I didn't see that described in the help
>>>>but
>>>>I also didn't find any mention of this "bug" in a quick Google search.
>>> ****
>>> No, serialization is intended to replace the contents, so you are seeing
>>> the specified
>>> behavior. Appending was never considered as a part of the design. In
>>> fact, it would have
>>> been considered a bug by the designers if serialization appended.
>>> ****
>>>>
>>>>"I wouldn't touch MFC serialization as far as I could throw a
>>>>mainframe."
>>>>
>>>>Ha ha, how far can you throw a mainframe? I am learning to agree with
>>>>you
>>>>on
>>>>this. I don't think I ever used it before. I started to use it this time
>>>>because it is much faster and uses less space than saving and loading in
>>>>text format. But if it's not trustworthy then it's not worth it.
>>> ****
>>> Given that I'm actually physically handicapped, and probably couldn't
>>> throw a mere laptop
>>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>>> nanometers. The
>>> real problem is schema evolution, a problem I had to address when I
>>> designed what is a lot
>>> like XML back in 1977 (except our design was far better). Today, I use
>>> XML for my data
>>> representation.
>>> *****
>>>>
>>>>"Note that most prime algorithms are variants of the Seive of
>>>>Eastosthenes"
>>>>
>>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>>my
>>>>first program ever was a prime number generator running on some HP Basic
>>>>computer back in the early 70's, so it's interesting to me to bring that
>>>>up
>>>>to date. My program also can run the old way by dividing by primes until
>>>>the
>>>>sqrt of the number under test. I think my original Basic program divided
>>>>by
>>>>all odd numbers because there wasn't enough memory to store the array of
>>>>primes on that old computer.
>>> ****
>>> Note that RSA wants to use massive prime numbers (which are almost
>>> impossible to factor).
>>> I have been talking with a colleague who is an encryption guy and we are
>>> talking about
>>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>>> so students could
>>> work out the encryption with just a calculator or even pencil and paper
>>> and then show why
>>> short key are vulnerable by asking them to work out encrypted data
>>> (since
>>> 8-bit primes
>>> would be easy to crack), to give some real feel for the algorithms. He
>>> has taken a course
>>> from Ron Rivest (the R of RSA) and is really into this stuff.
>>> ****
>>>>
>>>>Actually the sieve technique should scale to large integers (if you mean
>>>>the
>>>>64 bit type and not arbitrarily large) once one has an array of primes
>>>>to
>>>>2^32 and if the sieving is done in segments.
>>> ****
>>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>>> 4096-bit primes, and
>>> facroting these with moden computers and moden algorithms takes
>>> intervals
>>> such that the
>>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>>> algorithm
>>> terminates.
>>> joe
>>> ****
>>>>
>>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>>>> See below...
>>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>>> wrote:
>>>>>
>>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>>for
>>>>>>work or school, if that matters here. I'm trying to optimize it and
>>>>>>I'll
>>>>>>have some more questions about that later.
>>>>>>
>>>>>>Today I changed the CArray used to hold the primes to a CList based on
>>>>>>a
>>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>>and
>>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>>I'm
>>>>>>aware of the pros and cons of CArray and CList in general.
>>>>> ****
>>>>> Your choice of CArray is probably a Bad Idea. First, you should
>>>>> SetSize
>>>>> and set an
>>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>>> minimize the amount of
>>>>> reallocation that is done. Or use std::vector which does this
>>>>> implicitly,
>>>>> and had far,
>>>>> far better performance than CArray.
>>>>> ****
>>>>>>A disadvantage is
>>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>>CArray holding the same number of primes. The primes are 32 bit ints
>>>>>>so
>>>>>>I
>>>>>>don't know why it's four times instead of three times (previous
>>>>>>pointer,
>>>>>>prime and next pointer). What else is in there?
>>>>> ****
>>>>> you have made a fundaemental failure in thinking here. You think, for
>>>>> example, that the
>>>>> size of any elmeent matters, which is the wrong way to think about the
>>>>> problem. What
>>>>> matter is the memory footprint, and the induced paging, which is
>>>>> potentially far worse in
>>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>>> Eastosthenes, which
>>>>> means they don't scal to large integer (such as are using for
>>>>> encryption)
>>>>> and you can find
>>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web
>>>>> and
>>>>> in reference
>>>>> books. LIss are nearly always less efficient than arrays due to
>>>>> locality-of-reference
>>>>> issues that affect cache behavior and paging.
>>>>> *****
>>>>>>
>>>>>>I tried something that I wasn't sure would work but it did. I
>>>>>>serialized
>>>>>>a
>>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>>store
>>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>>question
>>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>>and
>>>>>>if
>>>>>>I can count on that working in all cases?
>>>>> ****
>>>>> I wouldn't touch MFC serialization as far as I could throw a
>>>>> mainframe.
>>>>> I
>>>>> would not use
>>>>> it for anything I cared about.
>>>>> joe
>>>>> ****
>>>>>>
>>>>>>Thanks...
>>>>>>
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer@flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer@flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/19/2010 5:48:25 AM
|
|
Another feature I had on my primes GUI was a CListCtrl with two columns
showing all the primes and their index or sequence number. I had diabled it
though because I found it taking a lot of time to add items into. I am using
the LVS_OWNERDATA style and handling the LVN_GETDISPINFO notification to
provide the display data. So scrolling up and down the array is no problem.
Is there any way that I haven't been able to find that will allow me to just
tell it how many items it has, rather than adding them one at a time? Adding
them one at a time seems to be time consuming like adding one element at a
time to a CArray. I really don't need the control to do anything for me
except know the number of items and handle the scrolling and ask me for the
data it needs to display. (I'm testing the timing now in debug mode by
inserting the 200million plus items into the CListCtrl anfter loading them
from disk and I think it's going to take longer than calculating the primes.
I'm also trying in release mode but after several minutes it's still adding
and I can't break the program to see the progress made so far.)
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:vro4q5t923qa256ej597fbkl88fg0dot6t@4ax.com...
> Sorry, because I can't see clearly, I think I sent a reply by hitting the
> wrong keystroke
> combinattion....
>
> What I had said was that allocation requires extra overhead. What I was
> about to say was
> that the most efficient representation would be a bit vector of bits that
> were 1 for a
> prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes,
> If you further
> apply RLE (run length encoding) to this bit vector, you get an even more
> compact
> representation which can be optimized by treating the special case of a
> 1-bit surrounded
> by a lot of 0 bits with a separate encoding. I would consider examining
> this over using a
> vector of primes.
> joe
>
> More below...
>
> On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>"Someone one complained that they could only add about 5000
>>elements/second
>>to CArray"
>>
>>I am well aware of the growby parameter of SetSize() and have had it in my
>>program from the start. Yet CArray has an odd behavior in my program that
>>doesn't follow what one would expect knowing how the growby parameter is
>>supposed to make CArray work.
>>
>>I did some tests with my program using both debug mode and release mode.
>>Also I varied the growby parameter. Lastly, I can store the primes in a
>>CArray, a CList or a preallocated array of unsigned ints (needs just over
>>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for
>>all
>>timing results.
>>
>>Results for debug mode:
>>
>>int[] - around 43000 primes per second.
>>CList - around 39000 primes per second. The slightly lower speed is
>>probably
>>due to run time allocation of the memory for new elements.
>>CArray, growby 100,000 - start at around 35000 pps then tapers down to
>>16000
>>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps
>>at
>>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>>calculated. one might expect it to slow down as it must reallocate and
>>copy
>>an ever growing array ten times during this test. I don't understand the
>>sudden drop at 500,000. I know that displaying the results on the GUI
>>takes
>>a lot of time, so I only display every 1000th prime. The timing will
>>include
>>the reallocation time but only once in every 100,000 primes generated and
>>therefore only once in ever 100 display updates.
>>CArray, growby 100 - start at around 40000 pps then tapers down to 8000
>>pps
>>after 500,000 primes have been calculated then tapers down to 2700 pps
>>around 1,000,000 primes calculated. Not surprising that it's slower with
>>more reallocations.
>>Here where it gets weird.
>>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow
>>to
>>wait for 1,000,000 primes to be generated to see the speed there. At
>>245,000
>>primes, the speed is the same. There is a longer delay at the start to
>>allocate the memory.
> ***
> In case the last message didn't make it, what I indicated here was the
> first element
> involves adding 10,000,000 elements, which is going to badly skew your
> first average. If
> you add "1" or "3" outside the measurement interval to factor that startup
> transient out
> of your measurements you might get a more interesting (and less biased)
> result. It all
> depends on what you are trying to measure.
> ****
>>
>>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>>primes in the first few passes. The sieve runs very quickly. The time
>>being
>>measured at such low speeds is basically the time to copy data into the
>>CArray.
>>
>>Results for release mode:
>>
>>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>>takes a long time to give back the memory when I remove all the elements
>>from it.
>>CArray with growby 100 - around 57,000 pps to start and tapers down to
>>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>>reallocations for every time the timing update is displayed and so the
>>reallocation time is included in the timing calculation.
>>
>>So I would conclude that in release mode, everything works logically. In
>>debug mode, I might have blamed page faults as a reason to slow down with
>>a
>>large CArray growby, but then why doesn't it affect the int[] which is the
>>same size and allocated in one big chunk like the CArray is. I suspect
>>there
>>is something funny (too much overhead) in CArray in debug mode for large
>>growby sizes.
> ****
> There is a lot of initialization that goes on in debug mode that you don't
> get in release
> mode. This can result in a lot of paging and in a lot of data copies of
> the
> initialization values. A big blip in the radar. Measure the time it
> takes to add the
> first element to that huge growby array and you might want to factor that
> out.
> ****
>>
>>"But tha fundamental failure in thinking is thinking that the extra space
>>actually matters. It may not."
>>
>>It does in my case because I reach the 2GB limit before I reach the end of
>>the primes that I want to calculate up to 2^32. There are 200,000,000+
>>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16
>>bytes
>>per prime.
> ****
> Plus storage allocator overhead plus allocation quantum roundup, you may
> be allocating
> between 32 and 72 bytes per storage element. ALlocation in debug mode is
> much larger.
> ****
>>
>>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>>4096-bit primes"
>>
>>So you were referring to arbitrarily large numbers. when you said 'large
>>integer', I thought you might be referring to LARGE_INTEGER which is only
>>64
>>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>>space (on my PC) to hold them because there are somewhere between 10^15
>>and
>>10^16 of them.
> ****
> LARGE_INTEGER, which is only 64 bits, is too small for effective
> encryption. It is
> possible to crack 64-bit encryption in hours to days, making it
> ineffective. 256-bit to
> 4096-bit encryption is considered state-of-the-art for PKI today. Really
> secure
> encryption uses one-time pads generated by truly random processes (such as
> radioactive
> decay)
> ****
>>
>>"No, serialization is intended to replace the contents, so you are seeing
>>the specified
>>behavior. Appending was never considered as a part of the design. In
>>fact,
>>it would have
>>been considered a bug by the designers if serialization appended."
>>
>>In my case it IS appending. If I start out with one, two or three elements
>>in the CList, after I serialize from disk, those original elements are
>>still
>>in there. I don't know why but I think and agree with you that they should
>>not be.
> *****
> one of the reasons I don't trust serialization is that it has too many
> "unspecified"
> features. I can write serialization code in seconds that does what I
> want, and I know the
> specs.
> joe
> *****
>>
>>
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
>>> See below...
>>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>"Your choice of CArray is probably a Bad Idea."
>>>>
>>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>>would get around to asking. If I give a growby of 10000, the program
>>>>runs
>>>>at
>>>>a reasonable speed. When I set it much higher, like 100,000,000,
>>>>thinking
>>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>>expected the opposite until 1e8 primes were calculated, then a large
>>>>delay
>>>>during reallocation. But it's slow from the start.
>>> ***
>>> Someone one complained that they could only add about 5000
>>> elements/second
>>> to CArray. I
>>> tried the same experiment, and found that indeed that was the number I
>>> was
>>> getting, in
>>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>>> expansion quantum to
>>> 500,000 and measured in Release mode (which is much faster) and got 15M
>>> elements/second.
>>> CArray without using SetSize to modify the quantum is horrendous
>>> performance,
>>> approximately exponential in time with the number of elements.
>>> std::vector is very close
>>> to linear (I did the measurements, but never published them).
>>> *****
>>>>I did some
>>>>PerformanceQueries and found that adding something to the CArray (when
>>>>it
>>>>grew by 10000) was ten times slow than doing a division as part of the
>>>>prime
>>>>check. I don't know if it's a paging issue or some inefficiency in
>>>>CArray
>>>>itself. I guess it's much worse with a large growby but i don't know why
>>>>and
>>>>i didn't measure to check.
>>> ****
>>> Add 1 element: allocate a new array of size+1; copy size elements from
>>> the
>>> old array, add
>>> the new element to the end. Expoential peformance; the time goes up as
>>> approximately the
>>> square of the number of elements (do the arithmetic for the copy
>>> operation). Never
>>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>>> always be horrible.
>>> This is because there are bounds checks and new-storage-iniitailization
>>> that take time.
>>> Most paging performance is irrelevant unless your vector gets into the
>>> hundreds of
>>> megabytes in size, at which point you definitely have the wrong
>>> representation.
>>> ****
>>>>
>>>>Would the release version be much faster than the debug version (less
>>>>checking for bounds and other stuff)?
>>> ****
>>> Absolutely! See previous comment; you CANNOT rely on "Debug"
>>> measurements
>>> for true
>>> performance guarantees.
>>> ****
>>>>Is there a way (with task manager
>>>>maybe) to tell if the CArray is staying in memory or being swapped out
>>>>to
>>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>>out?
>>>>Any guess why it might be so slow?
>>> ****
>>> Watch for increases in paging faults using the perfmon tool. A
>>> substantial increase in
>>> paging frequency is a Bad Sign. Note that mking your algorithm
>>> cache-aware can generally
>>> improve performance by 10X, but you need to know the cache parameters
>>> for
>>> each computer
>>> (which you can discover) to write code like this, and it doesn't always
>>> work due to the
>>> nature of the particular algorithm involved.
>>> ****
>>>>
>>>>"you have made a fundaemental failure in thinking here"
>>>>
>>>>Maybe so, but it turns out that the CList is much faster than the
>>>>CArray,
>>>>in
>>>>my program at least. Too bad it takes four times the memory and I run
>>>>out
>>>>before I can reach 2^32. Also, it takes much longer to clear out the
>>>>CList
>>>>than the CArray, which is logical since it's made up of many tiny parts
>>>>instead of a few large parts.
>>> ****
>>> But tha fundamental failure in thinking is thinking that the extra space
>>> actually matters.
>>> It may not. Note that while adding to an array with SetSize(..., 0)
>>> (the
>>> default), is
>>> exponential in time, adding to a list is constant time, so burning the
>>> extra space would
>>> not matter. Unless you exceed the working set size, at which point, it
>>> becomes a paging
>>> problem, or if the elements are widely scattered in memory, or the
>>> storage
>>> allocator
>>> becomes the performance bottleneck (which it can). If you anticipate
>>> 1,000,000 elements,
>>> do a SetSize(0, 1000000); when the array is empty; then the array append
>>> is constant time
>>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>>> performance as the
>>> array is reallocated to size 2,000,000 and then time is constant again
>>> to
>>> 2,000,000. IF
>>> you are not doing the SetSize, then your assumption are all broken and
>>> you
>>> will get
>>> exponential cost. So you cannot
>>> (a) predict performance if you are in Debug mode
>>> (b) predict performance if you have not forced preallocation (SetSize)
>>> which is handled "by magic" in std::vector
>>> You have apparently committed both sins here, so your choices don't have
>>> a
>>> sound basis.
>>> joe
>>> ****
>>>>
>>>>I found another issue with CList which I think I would consider a bug.
>>>>When
>>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>>out
>>>>and overwritten. When I do that into the CList, the disk data is
>>>>appended
>>>>to
>>>>the end of what is in the list. I didn't see that described in the help
>>>>but
>>>>I also didn't find any mention of this "bug" in a quick Google search.
>>> ****
>>> No, serialization is intended to replace the contents, so you are seeing
>>> the specified
>>> behavior. Appending was never considered as a part of the design. In
>>> fact, it would have
>>> been considered a bug by the designers if serialization appended.
>>> ****
>>>>
>>>>"I wouldn't touch MFC serialization as far as I could throw a
>>>>mainframe."
>>>>
>>>>Ha ha, how far can you throw a mainframe? I am learning to agree with
>>>>you
>>>>on
>>>>this. I don't think I ever used it before. I started to use it this time
>>>>because it is much faster and uses less space than saving and loading in
>>>>text format. But if it's not trustworthy then it's not worth it.
>>> ****
>>> Given that I'm actually physically handicapped, and probably couldn't
>>> throw a mere laptop
>>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>>> nanometers. The
>>> real problem is schema evolution, a problem I had to address when I
>>> designed what is a lot
>>> like XML back in 1977 (except our design was far better). Today, I use
>>> XML for my data
>>> representation.
>>> *****
>>>>
>>>>"Note that most prime algorithms are variants of the Seive of
>>>>Eastosthenes"
>>>>
>>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>>my
>>>>first program ever was a prime number generator running on some HP Basic
>>>>computer back in the early 70's, so it's interesting to me to bring that
>>>>up
>>>>to date. My program also can run the old way by dividing by primes until
>>>>the
>>>>sqrt of the number under test. I think my original Basic program divided
>>>>by
>>>>all odd numbers because there wasn't enough memory to store the array of
>>>>primes on that old computer.
>>> ****
>>> Note that RSA wants to use massive prime numbers (which are almost
>>> impossible to factor).
>>> I have been talking with a colleague who is an encryption guy and we are
>>> talking about
>>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>>> so students could
>>> work out the encryption with just a calculator or even pencil and paper
>>> and then show why
>>> short key are vulnerable by asking them to work out encrypted data
>>> (since
>>> 8-bit primes
>>> would be easy to crack), to give some real feel for the algorithms. He
>>> has taken a course
>>> from Ron Rivest (the R of RSA) and is really into this stuff.
>>> ****
>>>>
>>>>Actually the sieve technique should scale to large integers (if you mean
>>>>the
>>>>64 bit type and not arbitrarily large) once one has an array of primes
>>>>to
>>>>2^32 and if the sieving is done in segments.
>>> ****
>>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>>> 4096-bit primes, and
>>> facroting these with moden computers and moden algorithms takes
>>> intervals
>>> such that the
>>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>>> algorithm
>>> terminates.
>>> joe
>>> ****
>>>>
>>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>>>> See below...
>>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>>> wrote:
>>>>>
>>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>>for
>>>>>>work or school, if that matters here. I'm trying to optimize it and
>>>>>>I'll
>>>>>>have some more questions about that later.
>>>>>>
>>>>>>Today I changed the CArray used to hold the primes to a CList based on
>>>>>>a
>>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>>and
>>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>>I'm
>>>>>>aware of the pros and cons of CArray and CList in general.
>>>>> ****
>>>>> Your choice of CArray is probably a Bad Idea. First, you should
>>>>> SetSize
>>>>> and set an
>>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>>> minimize the amount of
>>>>> reallocation that is done. Or use std::vector which does this
>>>>> implicitly,
>>>>> and had far,
>>>>> far better performance than CArray.
>>>>> ****
>>>>>>A disadvantage is
>>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>>CArray holding the same number of primes. The primes are 32 bit ints
>>>>>>so
>>>>>>I
>>>>>>don't know why it's four times instead of three times (previous
>>>>>>pointer,
>>>>>>prime and next pointer). What else is in there?
>>>>> ****
>>>>> you have made a fundaemental failure in thinking here. You think, for
>>>>> example, that the
>>>>> size of any elmeent matters, which is the wrong way to think about the
>>>>> problem. What
>>>>> matter is the memory footprint, and the induced paging, which is
>>>>> potentially far worse in
>>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>>> Eastosthenes, which
>>>>> means they don't scal to large integer (such as are using for
>>>>> encryption)
>>>>> and you can find
>>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web
>>>>> and
>>>>> in reference
>>>>> books. LIss are nearly always less efficient than arrays due to
>>>>> locality-of-reference
>>>>> issues that affect cache behavior and paging.
>>>>> *****
>>>>>>
>>>>>>I tried something that I wasn't sure would work but it did. I
>>>>>>serialized
>>>>>>a
>>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>>store
>>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>>question
>>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>>and
>>>>>>if
>>>>>>I can count on that working in all cases?
>>>>> ****
>>>>> I wouldn't touch MFC serialization as far as I could throw a
>>>>> mainframe.
>>>>> I
>>>>> would not use
>>>>> it for anything I cared about.
>>>>> joe
>>>>> ****
>>>>>>
>>>>>>Thanks...
>>>>>>
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer@flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer@flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/19/2010 7:13:42 AM
|
|
Bill wrote:
> There was one more question related to speed that I haven't asked yet. When
> the program is running full speed, the task manager shows it's only using
> 49 - 51 percent of the CPU. The idle process is using most of the rest.
>... My processor does have two CPUs, but the affinity is set to both and
> all the CPU usage numbers add up to 100%, not 200%, so I believe the 50% is
> not due to only running on one of the two processors.
I think you are seeing your calculations being done single-threaded (i
e, on only one of the cores). You might want to modify your pgm to be
multi-threaded (2 worker threads, for example). It is an interesting
question how to divide the work when there is a single bit vector that
both threads need to access.
As far as the memory, using a bit vector should only require 2^32/8/2
bytes, since you know there are no even primes >2.
|
|
0
|
|
|
|
Reply
|
Woody
|
3/19/2010 7:25:09 AM
|
|
Woody,
"only require 2^32/8/2"
You are right. I thought of that after sending when I was thinking what to
do next in the project, but didn't think it worth making another post to
correct.
About the 50%. If you are right then it's already running at maximum. I
should find a quad core and see if it says 25% only. But if you are right,
then why wouldn't all the numbers add up to 200% instead of 100%?
I hust tried running two copies. At first they started out sharing the ~50%,
but eventually they settled out sharing ~100%. Four copies settled out to
sharing 100% too - about 25% each. But system idle time went down to zero,
so the total still adds up to 100%, not 100% for each core.
I'm not sure what to conclude.
Bill
"Woody" <ols6000@sbcglobal.net> wrote in message
news:5c3c5b21-0bcb-4588-8da9-459ff368e082@z35g2000yqd.googlegroups.com...
> Bill wrote:
>> There was one more question related to speed that I haven't asked yet.
>> When
>> the program is running full speed, the task manager shows it's only using
>> 49 - 51 percent of the CPU. The idle process is using most of the rest.
>>... My processor does have two CPUs, but the affinity is set to both and
>> all the CPU usage numbers add up to 100%, not 200%, so I believe the 50%
>> is
>> not due to only running on one of the two processors.
>
> I think you are seeing your calculations being done single-threaded (i
> e, on only one of the cores). You might want to modify your pgm to be
> multi-threaded (2 worker threads, for example). It is an interesting
> question how to divide the work when there is a single bit vector that
> both threads need to access.
>
> As far as the memory, using a bit vector should only require 2^32/8/2
> bytes, since you know there are no even primes >2.
|
|
0
|
|
|
|
Reply
|
Bill
|
3/19/2010 9:21:20 AM
|
|
--snip--
The total system utilization can only be up to 100% regardless of cores.
Thats the way system defines it. In quad core one core delivers 25% of
performance.
ismo
|
|
0
|
|
|
|
Reply
|
Ismo
|
3/19/2010 11:44:33 AM
|
|
Ther are two ways to handle this:
locking the entire structure (the naive approach)
Use the InterlockedOr opertion to set a bit in a single DWORD in a multiprocessor-safe
fashion.
I'd use the latter method because it doesn't require any thread blocking and is handled by
the hardware. So once I figure out the DWORD in the array that needed to be changed (and
it has to be DWORD-aligned) and come up with the bit mask for the bit within that DWORD,
the InterlockedOr will safely allow multiple threads to set bits in the same DWORD without
any problem.
joe
On Fri, 19 Mar 2010 00:25:09 -0700 (PDT), Woody <ols6000@sbcglobal.net> wrote:
>Bill wrote:
>> There was one more question related to speed that I haven't asked yet. When
>> the program is running full speed, the task manager shows it's only using
>> 49 - 51 percent of the CPU. The idle process is using most of the rest.
>>... My processor does have two CPUs, but the affinity is set to both and
>> all the CPU usage numbers add up to 100%, not 200%, so I believe the 50% is
>> not due to only running on one of the two processors.
>
>I think you are seeing your calculations being done single-threaded (i
>e, on only one of the cores). You might want to modify your pgm to be
>multi-threaded (2 worker threads, for example). It is an interesting
>question how to divide the work when there is a single bit vector that
>both threads need to access.
>
>As far as the memory, using a bit vector should only require 2^32/8/2
>bytes, since you know there are no even primes >2.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/19/2010 3:55:23 PM
|
|
The way loading is computed is to take the % of each core and divide by the number of
cores, so a task that uses 100% of a single core of a quad-core system, you will get "25%"
usage. If you see 100%, this means that each of the 4 cores is running 100%
(25%+25%+25+25%=100%). SO you will see "100%" only if you are using 100% of your total
computing capabilitiy, that is, 100% of each of 4 cores. And yes, the idle time goes to
0, no big surprise there.
joe
On Fri, 19 Mar 2010 17:21:20 +0800, "Bill" <<don't want more spam>> wrote:
>Woody,
>
>"only require 2^32/8/2"
>
>You are right. I thought of that after sending when I was thinking what to
>do next in the project, but didn't think it worth making another post to
>correct.
>
>About the 50%. If you are right then it's already running at maximum. I
>should find a quad core and see if it says 25% only. But if you are right,
>then why wouldn't all the numbers add up to 200% instead of 100%?
>
>I hust tried running two copies. At first they started out sharing the ~50%,
>but eventually they settled out sharing ~100%. Four copies settled out to
>sharing 100% too - about 25% each. But system idle time went down to zero,
>so the total still adds up to 100%, not 100% for each core.
>
>I'm not sure what to conclude.
>
>Bill
>
>"Woody" <ols6000@sbcglobal.net> wrote in message
>news:5c3c5b21-0bcb-4588-8da9-459ff368e082@z35g2000yqd.googlegroups.com...
>> Bill wrote:
>>> There was one more question related to speed that I haven't asked yet.
>>> When
>>> the program is running full speed, the task manager shows it's only using
>>> 49 - 51 percent of the CPU. The idle process is using most of the rest.
>>>... My processor does have two CPUs, but the affinity is set to both and
>>> all the CPU usage numbers add up to 100%, not 200%, so I believe the 50%
>>> is
>>> not due to only running on one of the two processors.
>>
>> I think you are seeing your calculations being done single-threaded (i
>> e, on only one of the cores). You might want to modify your pgm to be
>> multi-threaded (2 worker threads, for example). It is an interesting
>> question how to divide the work when there is a single bit vector that
>> both threads need to access.
>>
>> As far as the memory, using a bit vector should only require 2^32/8/2
>> bytes, since you know there are no even primes >2.
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/19/2010 3:58:49 PM
|
|
Bsically, you have implemented a "virtual CListCtrl". The only cost involved is adding
the value to the CArray, which we've already discussed. Any representation that lets you
derive your display information is valid, so the bitmap can work also.
joe
On Fri, 19 Mar 2010 15:13:42 +0800, "Bill" <<don't want more spam>> wrote:
>Another feature I had on my primes GUI was a CListCtrl with two columns
>showing all the primes and their index or sequence number. I had diabled it
>though because I found it taking a lot of time to add items into. I am using
>the LVS_OWNERDATA style and handling the LVN_GETDISPINFO notification to
>provide the display data. So scrolling up and down the array is no problem.
>Is there any way that I haven't been able to find that will allow me to just
>tell it how many items it has, rather than adding them one at a time? Adding
>them one at a time seems to be time consuming like adding one element at a
>time to a CArray. I really don't need the control to do anything for me
>except know the number of items and handle the scrolling and ask me for the
>data it needs to display. (I'm testing the timing now in debug mode by
>inserting the 200million plus items into the CListCtrl anfter loading them
>from disk and I think it's going to take longer than calculating the primes.
>I'm also trying in release mode but after several minutes it's still adding
>and I can't break the program to see the progress made so far.)
>
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:vro4q5t923qa256ej597fbkl88fg0dot6t@4ax.com...
>> Sorry, because I can't see clearly, I think I sent a reply by hitting the
>> wrong keystroke
>> combinattion....
>>
>> What I had said was that allocation requires extra overhead. What I was
>> about to say was
>> that the most efficient representation would be a bit vector of bits that
>> were 1 for a
>> prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes,
>> If you further
>> apply RLE (run length encoding) to this bit vector, you get an even more
>> compact
>> representation which can be optimized by treating the special case of a
>> 1-bit surrounded
>> by a lot of 0 bits with a separate encoding. I would consider examining
>> this over using a
>> vector of primes.
>> joe
>>
>> More below...
>>
>> On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:
>>
>>>"Someone one complained that they could only add about 5000
>>>elements/second
>>>to CArray"
>>>
>>>I am well aware of the growby parameter of SetSize() and have had it in my
>>>program from the start. Yet CArray has an odd behavior in my program that
>>>doesn't follow what one would expect knowing how the growby parameter is
>>>supposed to make CArray work.
>>>
>>>I did some tests with my program using both debug mode and release mode.
>>>Also I varied the growby parameter. Lastly, I can store the primes in a
>>>CArray, a CList or a preallocated array of unsigned ints (needs just over
>>>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for
>>>all
>>>timing results.
>>>
>>>Results for debug mode:
>>>
>>>int[] - around 43000 primes per second.
>>>CList - around 39000 primes per second. The slightly lower speed is
>>>probably
>>>due to run time allocation of the memory for new elements.
>>>CArray, growby 100,000 - start at around 35000 pps then tapers down to
>>>16000
>>>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps
>>>at
>>>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>>>calculated. one might expect it to slow down as it must reallocate and
>>>copy
>>>an ever growing array ten times during this test. I don't understand the
>>>sudden drop at 500,000. I know that displaying the results on the GUI
>>>takes
>>>a lot of time, so I only display every 1000th prime. The timing will
>>>include
>>>the reallocation time but only once in every 100,000 primes generated and
>>>therefore only once in ever 100 display updates.
>>>CArray, growby 100 - start at around 40000 pps then tapers down to 8000
>>>pps
>>>after 500,000 primes have been calculated then tapers down to 2700 pps
>>>around 1,000,000 primes calculated. Not surprising that it's slower with
>>>more reallocations.
>>>Here where it gets weird.
>>>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow
>>>to
>>>wait for 1,000,000 primes to be generated to see the speed there. At
>>>245,000
>>>primes, the speed is the same. There is a longer delay at the start to
>>>allocate the memory.
>> ***
>> In case the last message didn't make it, what I indicated here was the
>> first element
>> involves adding 10,000,000 elements, which is going to badly skew your
>> first average. If
>> you add "1" or "3" outside the measurement interval to factor that startup
>> transient out
>> of your measurements you might get a more interesting (and less biased)
>> result. It all
>> depends on what you are trying to measure.
>> ****
>>>
>>>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>>>primes in the first few passes. The sieve runs very quickly. The time
>>>being
>>>measured at such low speeds is basically the time to copy data into the
>>>CArray.
>>>
>>>Results for release mode:
>>>
>>>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>>>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>>>takes a long time to give back the memory when I remove all the elements
>>>from it.
>>>CArray with growby 100 - around 57,000 pps to start and tapers down to
>>>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>>>reallocations for every time the timing update is displayed and so the
>>>reallocation time is included in the timing calculation.
>>>
>>>So I would conclude that in release mode, everything works logically. In
>>>debug mode, I might have blamed page faults as a reason to slow down with
>>>a
>>>large CArray growby, but then why doesn't it affect the int[] which is the
>>>same size and allocated in one big chunk like the CArray is. I suspect
>>>there
>>>is something funny (too much overhead) in CArray in debug mode for large
>>>growby sizes.
>> ****
>> There is a lot of initialization that goes on in debug mode that you don't
>> get in release
>> mode. This can result in a lot of paging and in a lot of data copies of
>> the
>> initialization values. A big blip in the radar. Measure the time it
>> takes to add the
>> first element to that huge growby array and you might want to factor that
>> out.
>> ****
>>>
>>>"But tha fundamental failure in thinking is thinking that the extra space
>>>actually matters. It may not."
>>>
>>>It does in my case because I reach the 2GB limit before I reach the end of
>>>the primes that I want to calculate up to 2^32. There are 200,000,000+
>>>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>>>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16
>>>bytes
>>>per prime.
>> ****
>> Plus storage allocator overhead plus allocation quantum roundup, you may
>> be allocating
>> between 32 and 72 bytes per storage element. ALlocation in debug mode is
>> much larger.
>> ****
>>>
>>>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>>>4096-bit primes"
>>>
>>>So you were referring to arbitrarily large numbers. when you said 'large
>>>integer', I thought you might be referring to LARGE_INTEGER which is only
>>>64
>>>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>>>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>>>space (on my PC) to hold them because there are somewhere between 10^15
>>>and
>>>10^16 of them.
>> ****
>> LARGE_INTEGER, which is only 64 bits, is too small for effective
>> encryption. It is
>> possible to crack 64-bit encryption in hours to days, making it
>> ineffective. 256-bit to
>> 4096-bit encryption is considered state-of-the-art for PKI today. Really
>> secure
>> encryption uses one-time pads generated by truly random processes (such as
>> radioactive
>> decay)
>> ****
>>>
>>>"No, serialization is intended to replace the contents, so you are seeing
>>>the specified
>>>behavior. Appending was never considered as a part of the design. In
>>>fact,
>>>it would have
>>>been considered a bug by the designers if serialization appended."
>>>
>>>In my case it IS appending. If I start out with one, two or three elements
>>>in the CList, after I serialize from disk, those original elements are
>>>still
>>>in there. I don't know why but I think and agree with you that they should
>>>not be.
>> *****
>> one of the reasons I don't trust serialization is that it has too many
>> "unspecified"
>> features. I can write serialization code in seconds that does what I
>> want, and I know the
>> specs.
>> joe
>> *****
>>>
>>>
>>>
>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
>>>> See below...
>>>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam>
>>>> wrote:
>>>>
>>>>>"Your choice of CArray is probably a Bad Idea."
>>>>>
>>>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>>>would get around to asking. If I give a growby of 10000, the program
>>>>>runs
>>>>>at
>>>>>a reasonable speed. When I set it much higher, like 100,000,000,
>>>>>thinking
>>>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>>>expected the opposite until 1e8 primes were calculated, then a large
>>>>>delay
>>>>>during reallocation. But it's slow from the start.
>>>> ***
>>>> Someone one complained that they could only add about 5000
>>>> elements/second
>>>> to CArray. I
>>>> tried the same experiment, and found that indeed that was the number I
>>>> was
>>>> getting, in
>>>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>>>> expansion quantum to
>>>> 500,000 and measured in Release mode (which is much faster) and got 15M
>>>> elements/second.
>>>> CArray without using SetSize to modify the quantum is horrendous
>>>> performance,
>>>> approximately exponential in time with the number of elements.
>>>> std::vector is very close
>>>> to linear (I did the measurements, but never published them).
>>>> *****
>>>>>I did some
>>>>>PerformanceQueries and found that adding something to the CArray (when
>>>>>it
>>>>>grew by 10000) was ten times slow than doing a division as part of the
>>>>>prime
>>>>>check. I don't know if it's a paging issue or some inefficiency in
>>>>>CArray
>>>>>itself. I guess it's much worse with a large growby but i don't know why
>>>>>and
>>>>>i didn't measure to check.
>>>> ****
>>>> Add 1 element: allocate a new array of size+1; copy size elements from
>>>> the
>>>> old array, add
>>>> the new element to the end. Expoential peformance; the time goes up as
>>>> approximately the
>>>> square of the number of elements (do the arithmetic for the copy
>>>> operation). Never
>>>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>>>> always be horrible.
>>>> This is because there are bounds checks and new-storage-iniitailization
>>>> that take time.
>>>> Most paging performance is irrelevant unless your vector gets into the
>>>> hundreds of
>>>> megabytes in size, at which point you definitely have the wrong
>>>> representation.
>>>> ****
>>>>>
>>>>>Would the release version be much faster than the debug version (less
>>>>>checking for bounds and other stuff)?
>>>> ****
>>>> Absolutely! See previous comment; you CANNOT rely on "Debug"
>>>> measurements
>>>> for true
>>>> performance guarantees.
>>>> ****
>>>>>Is there a way (with task manager
>>>>>maybe) to tell if the CArray is staying in memory or being swapped out
>>>>>to
>>>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>>>out?
>>>>>Any guess why it might be so slow?
>>>> ****
>>>> Watch for increases in paging faults using the perfmon tool. A
>>>> substantial increase in
>>>> paging frequency is a Bad Sign. Note that mking your algorithm
>>>> cache-aware can generally
>>>> improve performance by 10X, but you need to know the cache parameters
>>>> for
>>>> each computer
>>>> (which you can discover) to write code like this, and it doesn't always
>>>> work due to the
>>>> nature of the particular algorithm involved.
>>>> ****
>>>>>
>>>>>"you have made a fundaemental failure in thinking here"
>>>>>
>>>>>Maybe so, but it turns out that the CList is much faster than the
>>>>>CArray,
>>>>>in
>>>>>my program at least. Too bad it takes four times the memory and I run
>>>>>out
>>>>>before I can reach 2^32. Also, it takes much longer to clear out the
>>>>>CList
>>>>>than the CArray, which is logical since it's made up of many tiny parts
>>>>>instead of a few large parts.
>>>> ****
>>>> But tha fundamental failure in thinking is thinking that the extra space
>>>> actually matters.
>>>> It may not. Note that while adding to an array with SetSize(..., 0)
>>>> (the
>>>> default), is
>>>> exponential in time, adding to a list is constant time, so burning the
>>>> extra space would
>>>> not matter. Unless you exceed the working set size, at which point, it
>>>> becomes a paging
>>>> problem, or if the elements are widely scattered in memory, or the
>>>> storage
>>>> allocator
>>>> becomes the performance bottleneck (which it can). If you anticipate
>>>> 1,000,000 elements,
>>>> do a SetSize(0, 1000000); when the array is empty; then the array append
>>>> is constant time
>>>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>>>> performance as the
>>>> array is reallocated to size 2,000,000 and then time is constant again
>>>> to
>>>> 2,000,000. IF
>>>> you are not doing the SetSize, then your assumption are all broken and
>>>> you
>>>> will get
>>>> exponential cost. So you cannot
>>>> (a) predict performance if you are in Debug mode
>>>> (b) predict performance if you have not forced preallocation (SetSize)
>>>> which is handled "by magic" in std::vector
>>>> You have apparently committed both sins here, so your choices don't have
>>>> a
>>>> sound basis.
>>>> joe
>>>> ****
>>>>>
>>>>>I found another issue with CList which I think I would consider a bug.
>>>>>When
>>>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>>>out
>>>>>and overwritten. When I do that into the CList, the disk data is
>>>>>appended
>>>>>to
>>>>>the end of what is in the list. I didn't see that described in the help
>>>>>but
>>>>>I also didn't find any mention of this "bug" in a quick Google search.
>>>> ****
>>>> No, serialization is intended to replace the contents, so you are seeing
>>>> the specified
>>>> behavior. Appending was never considered as a part of the design. In
>>>> fact, it would have
>>>> been considered a bug by the designers if serialization appended.
>>>> ****
>>>>>
>>>>>"I wouldn't touch MFC serialization as far as I could throw a
>>>>>mainframe."
>>>>>
>>>>>Ha ha, how far can you throw a mainframe? I am learning to agree with
>>>>>you
>>>>>on
>>>>>this. I don't think I ever used it before. I started to use it this time
>>>>>because it is much faster and uses less space than saving and loading in
>>>>>text format. But if it's not trustworthy then it's not worth it.
>>>> ****
>>>> Given that I'm actually physically handicapped, and probably couldn't
>>>> throw a mere laptop
>>>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>>>> nanometers. The
>>>> real problem is schema evolution, a problem I had to address when I
>>>> designed what is a lot
>>>> like XML back in 1977 (except our design was far better). Today, I use
>>>> XML for my data
>>>> representation.
>>>> *****
>>>>>
>>>>>"Note that most prime algorithms are variants of the Seive of
>>>>>Eastosthenes"
>>>>>
>>>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>>>my
>>>>>first program ever was a prime number generator running on some HP Basic
>>>>>computer back in the early 70's, so it's interesting to me to bring that
>>>>>up
>>>>>to date. My program also can run the old way by dividing by primes until
>>>>>the
>>>>>sqrt of the number under test. I think my original Basic program divided
>>>>>by
>>>>>all odd numbers because there wasn't enough memory to store the array of
>>>>>primes on that old computer.
>>>> ****
>>>> Note that RSA wants to use massive prime numbers (which are almost
>>>> impossible to factor).
>>>> I have been talking with a colleague who is an encryption guy and we are
>>>> talking about
>>>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>>>> so students could
>>>> work out the encryption with just a calculator or even pencil and paper
>>>> and then show why
>>>> short key are vulnerable by asking them to work out encrypted data
>>>> (since
>>>> 8-bit primes
>>>> would be easy to crack), to give some real feel for the algorithms. He
>>>> has taken a course
>>>> from Ron Rivest (the R of RSA) and is really into this stuff.
>>>> ****
>>>>>
>>>>>Actually the sieve technique should scale to large integers (if you mean
>>>>>the
>>>>>64 bit type and not arbitrarily large) once one has an array of primes
>>>>>to
>>>>>2^32 and if the sieving is done in segments.
>>>> ****
>>>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>>>> 4096-bit primes, and
>>>> facroting these with moden computers and moden algorithms takes
>>>> intervals
>>>> such that the
>>>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>>>> algorithm
>>>> terminates.
>>>> joe
>>>> ****
>>>>>
>>>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>>>>> See below...
>>>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>>>> wrote:
>>>>>>
>>>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>>>for
>>>>>>>work or school, if that matters here. I'm trying to optimize it and
>>>>>>>I'll
>>>>>>>have some more questions about that later.
>>>>>>>
>>>>>>>Today I changed the CArray used to hold the primes to a CList based on
>>>>>>>a
>>>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>>>and
>>>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>>>I'm
>>>>>>>aware of the pros and cons of CArray and CList in general.
>>>>>> ****
>>>>>> Your choice of CArray is probably a Bad Idea. First, you should
>>>>>> SetSize
>>>>>> and set an
>>>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>>>> minimize the amount of
>>>>>> reallocation that is done. Or use std::vector which does this
>>>>>> implicitly,
>>>>>> and had far,
>>>>>> far better performance than CArray.
>>>>>> ****
>>>>>>>A disadvantage is
>>>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>>>CArray holding the same number of primes. The primes are 32 bit ints
>>>>>>>so
>>>>>>>I
>>>>>>>don't know why it's four times instead of three times (previous
>>>>>>>pointer,
>>>>>>>prime and next pointer). What else is in there?
>>>>>> ****
>>>>>> you have made a fundaemental failure in thinking here. You think, for
>>>>>> example, that the
>>>>>> size of any elmeent matters, which is the wrong way to think about the
>>>>>> problem. What
>>>>>> matter is the memory footprint, and the induced paging, which is
>>>>>> potentially far worse in
>>>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>>>> Eastosthenes, which
>>>>>> means they don't scal to large integer (such as are using for
>>>>>> encryption)
>>>>>> and you can find
>>>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web
>>>>>> and
>>>>>> in reference
>>>>>> books. LIss are nearly always less efficient than arrays due to
>>>>>> locality-of-reference
>>>>>> issues that affect cache behavior and paging.
>>>>>> *****
>>>>>>>
>>>>>>>I tried something that I wasn't sure would work but it did. I
>>>>>>>serialized
>>>>>>>a
>>>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>>>store
>>>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>>>question
>>>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>>>and
>>>>>>>if
>>>>>>>I can count on that working in all cases?
>>>>>> ****
>>>>>> I wouldn't touch MFC serialization as far as I could throw a
>>>>>> mainframe.
>>>>>> I
>>>>>> would not use
>>>>>> it for anything I cared about.
>>>>>> joe
>>>>>> ****
>>>>>>>
>>>>>>>Thanks...
>>>>>>>
>>>>>> Joseph M. Newcomer [MVP]
>>>>>> email: newcomer@flounder.com
>>>>>> Web: http://www.flounder.com
>>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>>
>>>> Joseph M. Newcomer [MVP]
>>>> email: newcomer@flounder.com
>>>> Web: http://www.flounder.com
>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
3/19/2010 4:08:02 PM
|
|
I got it. One copy running goes up to 50% because it's running on one core
of two and taking up 100% of that one. When I run another copy, it might
start out in the same core but eventually Windows gets smart and switches it
to the other core and they both run at almost 100% within their respective
cores.
Thanks...
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:ul77q5l1f7kvbv6q1hlaf8hkps9pj7vlvd@4ax.com...
> The way loading is computed is to take the % of each core and divide by
> the number of
> cores, so a task that uses 100% of a single core of a quad-core system,
> you will get "25%"
> usage. If you see 100%, this means that each of the 4 cores is running
> 100%
> (25%+25%+25+25%=100%). SO you will see "100%" only if you are using 100%
> of your total
> computing capabilitiy, that is, 100% of each of 4 cores. And yes, the
> idle time goes to
> 0, no big surprise there.
> joe
>
> On Fri, 19 Mar 2010 17:21:20 +0800, "Bill" <<don't want more spam>> wrote:
>
>>Woody,
>>
>>"only require 2^32/8/2"
>>
>>You are right. I thought of that after sending when I was thinking what to
>>do next in the project, but didn't think it worth making another post to
>>correct.
>>
>>About the 50%. If you are right then it's already running at maximum. I
>>should find a quad core and see if it says 25% only. But if you are right,
>>then why wouldn't all the numbers add up to 200% instead of 100%?
>>
>>I hust tried running two copies. At first they started out sharing the
>>~50%,
>>but eventually they settled out sharing ~100%. Four copies settled out to
>>sharing 100% too - about 25% each. But system idle time went down to zero,
>>so the total still adds up to 100%, not 100% for each core.
>>
>>I'm not sure what to conclude.
>>
>>Bill
>>
>>"Woody" <ols6000@sbcglobal.net> wrote in message
>>news:5c3c5b21-0bcb-4588-8da9-459ff368e082@z35g2000yqd.googlegroups.com...
>>> Bill wrote:
>>>> There was one more question related to speed that I haven't asked yet.
>>>> When
>>>> the program is running full speed, the task manager shows it's only
>>>> using
>>>> 49 - 51 percent of the CPU. The idle process is using most of the rest.
>>>>... My processor does have two CPUs, but the affinity is set to both and
>>>> all the CPU usage numbers add up to 100%, not 200%, so I believe the
>>>> 50%
>>>> is
>>>> not due to only running on one of the two processors.
>>>
>>> I think you are seeing your calculations being done single-threaded (i
>>> e, on only one of the cores). You might want to modify your pgm to be
>>> multi-threaded (2 worker threads, for example). It is an interesting
>>> question how to divide the work when there is a single bit vector that
>>> both threads need to access.
>>>
>>> As far as the memory, using a bit vector should only require 2^32/8/2
>>> bytes, since you know there are no even primes >2.
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/19/2010 4:19:48 PM
|
|
After I made my post, I googled CListCtrl and discovered there is a
SetItemCount() function. (Unfortunately, it is not next to the
GetItemCount() function in the help so I overlooked it.) I tested and at
first it seemed to do nothing. So I thought it was just to preallocate some
memory to speed up the InsertItem() calls. But then I discovered that with
few enough items, like one million, I was actually getting double the number
of items than I wanted on the control. So I tested SetItemCount() again and
it worked. But when I tried to set to 200 million, I only got one. I
narrowed down and found the maximum number of items is (arbitrarily) set at
100 million. 100 million and one does not work - you get only one item in
the control (or maybe none, because I think I started out with one item in
there). After setting to 100,000,000 items, I can use InsertItem to add one
more, but no more than one. Subsequent inserts don't insert anything.
So to conclude, the speed problem can be solved by using SetItemCount()
instead ot InsertItem() many times. But the upper limit on the number of
items in a CListCtrl is one hundred million (plus one). Memory is not the
issue, because task manager shows the memory usage is the same before and
after adding the 100 million items with SetItemCount(). So the limit seems
to be arbitrary.
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:4587q5l3ousm9fhs9rv5soc0m85utfnsq5@4ax.com...
> Bsically, you have implemented a "virtual CListCtrl". The only cost
> involved is adding
> the value to the CArray, which we've already discussed. Any
> representation that lets you
> derive your display information is valid, so the bitmap can work also.
> joe
>
> On Fri, 19 Mar 2010 15:13:42 +0800, "Bill" <<don't want more spam>> wrote:
>
>>Another feature I had on my primes GUI was a CListCtrl with two columns
>>showing all the primes and their index or sequence number. I had diabled
>>it
>>though because I found it taking a lot of time to add items into. I am
>>using
>>the LVS_OWNERDATA style and handling the LVN_GETDISPINFO notification to
>>provide the display data. So scrolling up and down the array is no
>>problem.
>>Is there any way that I haven't been able to find that will allow me to
>>just
>>tell it how many items it has, rather than adding them one at a time?
>>Adding
>>them one at a time seems to be time consuming like adding one element at a
>>time to a CArray. I really don't need the control to do anything for me
>>except know the number of items and handle the scrolling and ask me for
>>the
>>data it needs to display. (I'm testing the timing now in debug mode by
>>inserting the 200million plus items into the CListCtrl anfter loading them
>>from disk and I think it's going to take longer than calculating the
>>primes.
>>I'm also trying in release mode but after several minutes it's still
>>adding
>>and I can't break the program to see the progress made so far.)
>>
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>news:vro4q5t923qa256ej597fbkl88fg0dot6t@4ax.com...
>>> Sorry, because I can't see clearly, I think I sent a reply by hitting
>>> the
>>> wrong keystroke
>>> combinattion....
>>>
>>> What I had said was that allocation requires extra overhead. What I was
>>> about to say was
>>> that the most efficient representation would be a bit vector of bits
>>> that
>>> were 1 for a
>>> prime and 0 for a non-prime; this would require 2^32 bits, or 2^29
>>> bytes,
>>> If you further
>>> apply RLE (run length encoding) to this bit vector, you get an even more
>>> compact
>>> representation which can be optimized by treating the special case of a
>>> 1-bit surrounded
>>> by a lot of 0 bits with a separate encoding. I would consider examining
>>> this over using a
>>> vector of primes.
>>> joe
>>>
>>> More below...
>>>
>>> On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>"Someone one complained that they could only add about 5000
>>>>elements/second
>>>>to CArray"
>>>>
>>>>I am well aware of the growby parameter of SetSize() and have had it in
>>>>my
>>>>program from the start. Yet CArray has an odd behavior in my program
>>>>that
>>>>doesn't follow what one would expect knowing how the growby parameter is
>>>>supposed to make CArray work.
>>>>
>>>>I did some tests with my program using both debug mode and release mode.
>>>>Also I varied the growby parameter. Lastly, I can store the primes in a
>>>>CArray, a CList or a preallocated array of unsigned ints (needs just
>>>>over
>>>>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for
>>>>all
>>>>timing results.
>>>>
>>>>Results for debug mode:
>>>>
>>>>int[] - around 43000 primes per second.
>>>>CList - around 39000 primes per second. The slightly lower speed is
>>>>probably
>>>>due to run time allocation of the memory for new elements.
>>>>CArray, growby 100,000 - start at around 35000 pps then tapers down to
>>>>16000
>>>>pps after 400,000 primes have been calculated. suddenly drops to 8000
>>>>pps
>>>>at
>>>>around 500,000 primes and tapers down to 4000 pps around 1,000,000
>>>>primes
>>>>calculated. one might expect it to slow down as it must reallocate and
>>>>copy
>>>>an ever growing array ten times during this test. I don't understand the
>>>>sudden drop at 500,000. I know that displaying the results on the GUI
>>>>takes
>>>>a lot of time, so I only display every 1000th prime. The timing will
>>>>include
>>>>the reallocation time but only once in every 100,000 primes generated
>>>>and
>>>>therefore only once in ever 100 display updates.
>>>>CArray, growby 100 - start at around 40000 pps then tapers down to 8000
>>>>pps
>>>>after 500,000 primes have been calculated then tapers down to 2700 pps
>>>>around 1,000,000 primes calculated. Not surprising that it's slower with
>>>>more reallocations.
>>>>Here where it gets weird.
>>>>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow
>>>>to
>>>>wait for 1,000,000 primes to be generated to see the speed there. At
>>>>245,000
>>>>primes, the speed is the same. There is a longer delay at the start to
>>>>allocate the memory.
>>> ***
>>> In case the last message didn't make it, what I indicated here was the
>>> first element
>>> involves adding 10,000,000 elements, which is going to badly skew your
>>> first average. If
>>> you add "1" or "3" outside the measurement interval to factor that
>>> startup
>>> transient out
>>> of your measurements you might get a more interesting (and less biased)
>>> result. It all
>>> depends on what you are trying to measure.
>>> ****
>>>>
>>>>The sieve size is 1,000,000 odd numbers which generates 100,000 to
>>>>150,000
>>>>primes in the first few passes. The sieve runs very quickly. The time
>>>>being
>>>>measured at such low speeds is basically the time to copy data into the
>>>>CArray.
>>>>
>>>>Results for release mode:
>>>>
>>>>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>>>>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>>>>takes a long time to give back the memory when I remove all the elements
>>>>from it.
>>>>CArray with growby 100 - around 57,000 pps to start and tapers down to
>>>>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>>>>reallocations for every time the timing update is displayed and so the
>>>>reallocation time is included in the timing calculation.
>>>>
>>>>So I would conclude that in release mode, everything works logically. In
>>>>debug mode, I might have blamed page faults as a reason to slow down
>>>>with
>>>>a
>>>>large CArray growby, but then why doesn't it affect the int[] which is
>>>>the
>>>>same size and allocated in one big chunk like the CArray is. I suspect
>>>>there
>>>>is something funny (too much overhead) in CArray in debug mode for large
>>>>growby sizes.
>>> ****
>>> There is a lot of initialization that goes on in debug mode that you
>>> don't
>>> get in release
>>> mode. This can result in a lot of paging and in a lot of data copies of
>>> the
>>> initialization values. A big blip in the radar. Measure the time it
>>> takes to add the
>>> first element to that huge growby array and you might want to factor
>>> that
>>> out.
>>> ****
>>>>
>>>>"But tha fundamental failure in thinking is thinking that the extra
>>>>space
>>>>actually matters. It may not."
>>>>
>>>>It does in my case because I reach the 2GB limit before I reach the end
>>>>of
>>>>the primes that I want to calculate up to 2^32. There are 200,000,000+
>>>>primes in that set and at 4 bytes per prime for CArray or int[], I can
>>>>fit
>>>>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16
>>>>bytes
>>>>per prime.
>>> ****
>>> Plus storage allocator overhead plus allocation quantum roundup, you may
>>> be allocating
>>> between 32 and 72 bytes per storage element. ALlocation in debug mode
>>> is
>>> much larger.
>>> ****
>>>>
>>>>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>>>>4096-bit primes"
>>>>
>>>>So you were referring to arbitrarily large numbers. when you said 'large
>>>>integer', I thought you might be referring to LARGE_INTEGER which is
>>>>only
>>>>64
>>>>bits. Using the primes up to 2^32, I could use the sieve to calculate
>>>>all
>>>>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or
>>>>disk
>>>>space (on my PC) to hold them because there are somewhere between 10^15
>>>>and
>>>>10^16 of them.
>>> ****
>>> LARGE_INTEGER, which is only 64 bits, is too small for effective
>>> encryption. It is
>>> possible to crack 64-bit encryption in hours to days, making it
>>> ineffective. 256-bit to
>>> 4096-bit encryption is considered state-of-the-art for PKI today.
>>> Really
>>> secure
>>> encryption uses one-time pads generated by truly random processes (such
>>> as
>>> radioactive
>>> decay)
>>> ****
>>>>
>>>>"No, serialization is intended to replace the contents, so you are
>>>>seeing
>>>>the specified
>>>>behavior. Appending was never considered as a part of the design. In
>>>>fact,
>>>>it would have
>>>>been considered a bug by the designers if serialization appended."
>>>>
>>>>In my case it IS appending. If I start out with one, two or three
>>>>elements
>>>>in the CList, after I serialize from disk, those original elements are
>>>>still
>>>>in there. I don't know why but I think and agree with you that they
>>>>should
>>>>not be.
>>> *****
>>> one of the reasons I don't trust serialization is that it has too many
>>> "unspecified"
>>> features. I can write serialization code in seconds that does what I
>>> want, and I know the
>>> specs.
>>> joe
>>> *****
>>>>
>>>>
>>>>
>>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>>news:fofvp5tejd5101f2pu7974u9upb2qmim9t@4ax.com...
>>>>> See below...
>>>>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam>
>>>>> wrote:
>>>>>
>>>>>>"Your choice of CArray is probably a Bad Idea."
>>>>>>
>>>>>>It IS turning out to be a bad idea. This was one of the things I said
>>>>>>I
>>>>>>would get around to asking. If I give a growby of 10000, the program
>>>>>>runs
>>>>>>at
>>>>>>a reasonable speed. When I set it much higher, like 100,000,000,
>>>>>>thinking
>>>>>>that i'll have fewer reallocations, it runs really slowly. I would
>>>>>>have
>>>>>>expected the opposite until 1e8 primes were calculated, then a large
>>>>>>delay
>>>>>>during reallocation. But it's slow from the start.
>>>>> ***
>>>>> Someone one complained that they could only add about 5000
>>>>> elements/second
>>>>> to CArray. I
>>>>> tried the same experiment, and found that indeed that was the number I
>>>>> was
>>>>> getting, in
>>>>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>>>>> expansion quantum to
>>>>> 500,000 and measured in Release mode (which is much faster) and got
>>>>> 15M
>>>>> elements/second.
>>>>> CArray without using SetSize to modify the quantum is horrendous
>>>>> performance,
>>>>> approximately exponential in time with the number of elements.
>>>>> std::vector is very close
>>>>> to linear (I did the measurements, but never published them).
>>>>> *****
>>>>>>I did some
>>>>>>PerformanceQueries and found that adding something to the CArray (when
>>>>>>it
>>>>>>grew by 10000) was ten times slow than doing a division as part of the
>>>>>>prime
>>>>>>check. I don't know if it's a paging issue or some inefficiency in
>>>>>>CArray
>>>>>>itself. I guess it's much worse with a large growby but i don't know
>>>>>>why
>>>>>>and
>>>>>>i didn't measure to check.
>>>>> ****
>>>>> Add 1 element: allocate a new array of size+1; copy size elements from
>>>>> the
>>>>> old array, add
>>>>> the new element to the end. Expoential peformance; the time goes up
>>>>> as
>>>>> approximately the
>>>>> square of the number of elements (do the arithmetic for the copy
>>>>> operation). Never
>>>>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>>>>> always be horrible.
>>>>> This is because there are bounds checks and
>>>>> new-storage-iniitailization
>>>>> that take time.
>>>>> Most paging performance is irrelevant unless your vector gets into the
>>>>> hundreds of
>>>>> megabytes in size, at which point you definitely have the wrong
>>>>> representation.
>>>>> ****
>>>>>>
>>>>>>Would the release version be much faster than the debug version (less
>>>>>>checking for bounds and other stuff)?
>>>>> ****
>>>>> Absolutely! See previous comment; you CANNOT rely on "Debug"
>>>>> measurements
>>>>> for true
>>>>> performance guarantees.
>>>>> ****
>>>>>>Is there a way (with task manager
>>>>>>maybe) to tell if the CArray is staying in memory or being swapped out
>>>>>>to
>>>>>>disk? Is there a way to force the CArray to stay in memory and not
>>>>>>swap
>>>>>>out?
>>>>>>Any guess why it might be so slow?
>>>>> ****
>>>>> Watch for increases in paging faults using the perfmon tool. A
>>>>> substantial increase in
>>>>> paging frequency is a Bad Sign. Note that mking your algorithm
>>>>> cache-aware can generally
>>>>> improve performance by 10X, but you need to know the cache parameters
>>>>> for
>>>>> each computer
>>>>> (which you can discover) to write code like this, and it doesn't
>>>>> always
>>>>> work due to the
>>>>> nature of the particular algorithm involved.
>>>>> ****
>>>>>>
>>>>>>"you have made a fundaemental failure in thinking here"
>>>>>>
>>>>>>Maybe so, but it turns out that the CList is much faster than the
>>>>>>CArray,
>>>>>>in
>>>>>>my program at least. Too bad it takes four times the memory and I run
>>>>>>out
>>>>>>before I can reach 2^32. Also, it takes much longer to clear out the
>>>>>>CList
>>>>>>than the CArray, which is logical since it's made up of many tiny
>>>>>>parts
>>>>>>instead of a few large parts.
>>>>> ****
>>>>> But tha fundamental failure in thinking is thinking that the extra
>>>>> space
>>>>> actually matters.
>>>>> It may not. Note that while adding to an array with SetSize(..., 0)
>>>>> (the
>>>>> default), is
>>>>> exponential in time, adding to a list is constant time, so burning the
>>>>> extra space would
>>>>> not matter. Unless you exceed the working set size, at which point,
>>>>> it
>>>>> becomes a paging
>>>>> problem, or if the elements are widely scattered in memory, or the
>>>>> storage
>>>>> allocator
>>>>> becomes the performance bottleneck (which it can). If you anticipate
>>>>> 1,000,000 elements,
>>>>> do a SetSize(0, 1000000); when the array is empty; then the array
>>>>> append
>>>>> is constant time
>>>>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n
>>>>> the
>>>>> performance as the
>>>>> array is reallocated to size 2,000,000 and then time is constant again
>>>>> to
>>>>> 2,000,000. IF
>>>>> you are not doing the SetSize, then your assumption are all broken and
>>>>> you
>>>>> will get
>>>>> exponential cost. So you cannot
>>>>> (a) predict performance if you are in Debug mode
>>>>> (b) predict performance if you have not forced preallocation (SetSize)
>>>>> which is handled "by magic" in std::vector
>>>>> You have apparently committed both sins here, so your choices don't
>>>>> have
>>>>> a
>>>>> sound basis.
>>>>> joe
>>>>> ****
>>>>>>
>>>>>>I found another issue with CList which I think I would consider a bug.
>>>>>>When
>>>>>>I Serialize back from disk into a CArray, anything in the array is
>>>>>>wiped
>>>>>>out
>>>>>>and overwritten. When I do that into the CList, the disk data is
>>>>>>appended
>>>>>>to
>>>>>>the end of what is in the list. I didn't see that described in the
>>>>>>help
>>>>>>but
>>>>>>I also didn't find any mention of this "bug" in a quick Google search.
>>>>> ****
>>>>> No, serialization is intended to replace the contents, so you are
>>>>> seeing
>>>>> the specified
>>>>> behavior. Appending was never considered as a part of the design. In
>>>>> fact, it would have
>>>>> been considered a bug by the designers if serialization appended.
>>>>> ****
>>>>>>
>>>>>>"I wouldn't touch MFC serialization as far as I could throw a
>>>>>>mainframe."
>>>>>>
>>>>>>Ha ha, how far can you throw a mainframe? I am learning to agree with
>>>>>>you
>>>>>>on
>>>>>>this. I don't think I ever used it before. I started to use it this
>>>>>>time
>>>>>>because it is much faster and uses less space than saving and loading
>>>>>>in
>>>>>>text format. But if it's not trustworthy then it's not worth it.
>>>>> ****
>>>>> Given that I'm actually physically handicapped, and probably couldn't
>>>>> throw a mere laptop
>>>>> very far, throwing a 1-ton mainframe isn't likely. Probably measured
>>>>> in
>>>>> nanometers. The
>>>>> real problem is schema evolution, a problem I had to address when I
>>>>> designed what is a lot
>>>>> like XML back in 1977 (except our design was far better). Today, I
>>>>> use
>>>>> XML for my data
>>>>> representation.
>>>>> *****
>>>>>>
>>>>>>"Note that most prime algorithms are variants of the Seive of
>>>>>>Eastosthenes"
>>>>>>
>>>>>>Yes, I'm using that. My interest in playing with primes was triggered
>>>>>>by
>>>>>>trying to learn about encryption and modular arithmetic used in RSA.
>>>>>>But
>>>>>>my
>>>>>>first program ever was a prime number generator running on some HP
>>>>>>Basic
>>>>>>computer back in the early 70's, so it's interesting to me to bring
>>>>>>that
>>>>>>up
>>>>>>to date. My program also can run the old way by dividing by primes
>>>>>>until
>>>>>>the
>>>>>>sqrt of the number under test. I think my original Basic program
>>>>>>divided
>>>>>>by
>>>>>>all odd numbers because there wasn't enough memory to store the array
>>>>>>of
>>>>>>primes on that old computer.
>>>>> ****
>>>>> Note that RSA wants to use massive prime numbers (which are almost
>>>>> impossible to factor).
>>>>> I have been talking with a colleague who is an encryption guy and we
>>>>> are
>>>>> talking about
>>>>> developing a course in encryption when we would use RSA-8 (8-bit
>>>>> primes)
>>>>> so students could
>>>>> work out the encryption with just a calculator or even pencil and
>>>>> paper
>>>>> and then show why
>>>>> short key are vulnerable by asking them to work out encrypted data
>>>>> (since
>>>>> 8-bit primes
>>>>> would be easy to crack), to give some real feel for the algorithms.
>>>>> He
>>>>> has taken a course
>>>>> from Ron Rivest (the R of RSA) and is really into this stuff.
>>>>> ****
>>>>>>
>>>>>>Actually the sieve technique should scale to large integers (if you
>>>>>>mean
>>>>>>the
>>>>>>64 bit type and not arbitrarily large) once one has an array of primes
>>>>>>to
>>>>>>2^32 and if the sieving is done in segments.
>>>>> ****
>>>>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>>>>> 4096-bit primes, and
>>>>> facroting these with moden computers and moden algorithms takes
>>>>> intervals
>>>>> such that the
>>>>> Sun is likely to expand into a Red Giant and engulf the Earth before
>>>>> the
>>>>> algorithm
>>>>> terminates.
>>>>> joe
>>>>> ****
>>>>>>
>>>>>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>>>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a@4ax.com...
>>>>>>> See below...
>>>>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>I am working on a program to calculate primes. It's just for fun,
>>>>>>>>not
>>>>>>>>for
>>>>>>>>work or school, if that matters here. I'm trying to optimize it and
>>>>>>>>I'll
>>>>>>>>have some more questions about that later.
>>>>>>>>
>>>>>>>>Today I changed the CArray used to hold the primes to a CList based
>>>>>>>>on
>>>>>>>>a
>>>>>>>>compile time selection, figuring I would save the time of
>>>>>>>>reallocating
>>>>>>>>and
>>>>>>>>copying the CArray every time I went over the maximum allocated
>>>>>>>>size.
>>>>>>>>I'm
>>>>>>>>aware of the pros and cons of CArray and CList in general.
>>>>>>> ****
>>>>>>> Your choice of CArray is probably a Bad Idea. First, you should
>>>>>>> SetSize
>>>>>>> and set an
>>>>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>>>>> minimize the amount of
>>>>>>> reallocation that is done. Or use std::vector which does this
>>>>>>> implicitly,
>>>>>>> and had far,
>>>>>>> far better performance than CArray.
>>>>>>> ****
>>>>>>>>A disadvantage is
>>>>>>>>that the CList of primes takes (about) four time as much memory as
>>>>>>>>the
>>>>>>>>CArray holding the same number of primes. The primes are 32 bit ints
>>>>>>>>so
>>>>>>>>I
>>>>>>>>don't know why it's four times instead of three times (previous
>>>>>>>>pointer,
>>>>>>>>prime and next pointer). What else is in there?
>>>>>>> ****
>>>>>>> you have made a fundaemental failure in thinking here. You think,
>>>>>>> for
>>>>>>> example, that the
>>>>>>> size of any elmeent matters, which is the wrong way to think about
>>>>>>> the
>>>>>>> problem. What
>>>>>>> matter is the memory footprint, and the induced paging, which is
>>>>>>> potentially far worse in
>>>>>>> a CLIst. Note that most prime algorithms are variants of the Seive
>>>>>>> of
>>>>>>> Eastosthenes, which
>>>>>>> means they don't scal to large integer (such as are using for
>>>>>>> encryption)
>>>>>>> and you can find
>>>>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web
>>>>>>> and
>>>>>>> in reference
>>>>>>> books. LIss are nearly always less efficient than arrays due to
>>>>>>> locality-of-reference
>>>>>>> issues that affect cache behavior and paging.
>>>>>>> *****
>>>>>>>>
>>>>>>>>I tried something that I wasn't sure would work but it did. I
>>>>>>>>serialized
>>>>>>>>a
>>>>>>>>CArray to disk and then later I serialized it from disk to a CList.
>>>>>>>>It
>>>>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>>>>store
>>>>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>>>>question
>>>>>>>>here is, does anyone know if these were made intentionally
>>>>>>>>compatible
>>>>>>>>and
>>>>>>>>if
>>>>>>>>I can count on that working in all cases?
>>>>>>> ****
>>>>>>> I wouldn't touch MFC serialization as far as I could throw a
>>>>>>> mainframe.
>>>>>>> I
>>>>>>> would not use
>>>>>>> it for anything I cared about.
>>>>>>> joe
>>>>>>> ****
>>>>>>>>
>>>>>>>>Thanks...
>>>>>>>>
>>>>>>> Joseph M. Newcomer [MVP]
>>>>>>> email: newcomer@flounder.com
>>>>>>> Web: http://www.flounder.com
>>>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>>>
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer@flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer@flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Bill
|
3/20/2010 2:02:21 AM
|
|
.... and sure enough, when I tried it on a single core processor, it
approaches 100%. Thanks all.
"Bill Brehm >" <<don't want spam> wrote in message
news:efmxgB4xKHA.4752@TK2MSFTNGP04.phx.gbl...
>I got it. One copy running goes up to 50% because it's running on one core
>of two and taking up 100% of that one. When I run another copy, it might
>start out in the same core but eventually Windows gets smart and switches
>it to the other core and they both run at almost 100% within their
>respective cores.
>
> Thanks...
>
>
> "Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
> news:ul77q5l1f7kvbv6q1hlaf8hkps9pj7vlvd@4ax.com...
>> The way loading is computed is to take the % of each core and divide by
>> the number of
>> cores, so a task that uses 100% of a single core of a quad-core system,
>> you will get "25%"
>> usage. If you see 100%, this means that each of the 4 cores is running
>> 100%
>> (25%+25%+25+25%=100%). SO you will see "100%" only if you are using 100%
>> of your total
>> computing capabilitiy, that is, 100% of each of 4 cores. And yes, the
>> idle time goes to
>> 0, no big surprise there.
>> joe
>>
>> On Fri, 19 Mar 2010 17:21:20 +0800, "Bill" <<don't want more spam>>
>> wrote:
>>
>>>Woody,
>>>
>>>"only require 2^32/8/2"
>>>
>>>You are right. I thought of that after sending when I was thinking what
>>>to
>>>do next in the project, but didn't think it worth making another post to
>>>correct.
>>>
>>>About the 50%. If you are right then it's already running at maximum. I
>>>should find a quad core and see if it says 25% only. But if you are
>>>right,
>>>then why wouldn't all the numbers add up to 200% instead of 100%?
>>>
>>>I hust tried running two copies. At first they started out sharing the
>>>~50%,
>>>but eventually they settled out sharing ~100%. Four copies settled out to
>>>sharing 100% too - about 25% each. But system idle time went down to
>>>zero,
>>>so the total still adds up to 100%, not 100% for each core.
>>>
>>>I'm not sure what to conclude.
>>>
>>>Bill
>>>
>>>"Woody" <ols6000@sbcglobal.net> wrote in message
>>>news:5c3c5b21-0bcb-4588-8da9-459ff368e082@z35g2000yqd.googlegroups.com...
>>>> Bill wrote:
>>>>> There was one more question related to speed that I haven't asked yet.
>>>>> When
>>>>> the program is running full speed, the task manager shows it's only
>>>>> using
>>>>> 49 - 51 percent of the CPU. The idle process is using most of the
>>>>> rest.
>>>>>... My processor does have two CPUs, but the affinity is set to both
>>>>>and
>>>>> all the CPU usage numbers add up to 100%, not 200%, so I believe the
>>>>> 50%
>>>>> is
>>>>> not due to only running on one of the two processors.
>>>>
>>>> I think you are seeing your calculations being done single-threaded (i
>>>> e, on only one of the cores). You might want to modify your pgm to be
>>>> multi-threaded (2 worker threads, for example). It is an interesting
>>>> question how to divide the work when there is a single bit vector that
>>>> both threads need to access.
>>>>
>>>> As far as the memory, using a bit vector should only require 2^32/8/2
>>>> bytes, since you know there are no even primes >2.
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
>
|
|
0
|
|
|
|
Reply
|
Bill
|
3/20/2010 2:39:22 AM
|
|
>So to conclude, the speed problem can be solved by using SetItemCount()
>instead ot InsertItem() many times. But the upper limit on the number of
>items in a CListCtrl is one hundred million (plus one). Memory is not the
>issue, because task manager shows the memory usage is the same before and
>after adding the 100 million items with SetItemCount(). So the limit seems
>to be arbitrary.
Bill,
Which OS is this on?
It might be worthwhile adding a comment to the online MSDN
documentation to record your findings.
Dave
|
|
0
|
|
|
|
Reply
|
David
|
3/20/2010 10:11:08 AM
|
|
Dave,
I'm running WinXP, but probably more importantly, MSVC++ 6.0.
I'll look for this online MSDN and see what I can find / record.
Bill
"David Lowndes" <DavidL@example.invalid> wrote in message
news:um79q51bl781o4rntmmfrt0bbfmm8fjc8r@4ax.com...
> >So to conclude, the speed problem can be solved by using SetItemCount()
>>instead ot InsertItem() many times. But the upper limit on the number of
>>items in a CListCtrl is one hundred million (plus one). Memory is not the
>>issue, because task manager shows the memory usage is the same before and
>>after adding the 100 million items with SetItemCount(). So the limit seems
>>to be arbitrary.
>
> Bill,
>
> Which OS is this on?
>
> It might be worthwhile adding a comment to the online MSDN
> documentation to record your findings.
>
> Dave
|
|
0
|
|
|
|
Reply
|
Bill
|
3/20/2010 2:30:36 PM
|
|
"Bill Brehm" <don't want spam> ha scritto nel messaggio
news:uZ1EiF9xKHA.2436@TK2MSFTNGP04.phx.gbl...
> So to conclude, the speed problem can be solved by using SetItemCount()
> instead ot InsertItem() many times. But the upper limit on the number of
> items in a CListCtrl is one hundred million (plus one).
Would it be possible to you to test using SetItemCountEx instead of
SetItemCount?
http://msdn.microsoft.com/en-us/library/xb95w29a(VS.80).aspx
http://msdn.microsoft.com/en-us/library/bb775095(VS.85).aspx
<quote>
If the list-view control was created with the LVS_OWNERDATA style, this
macro sets the virtual number of items that the control contains.
If the list-view control was created without the LVS_OWNERDATA style, the
ListView_SetItemCount macro should be used.
</quote>
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/23/2010 11:09:08 AM
|
|
I did try SetItemCountEx() as well and got the same result. I do have
LVS_OWNERDATA set in the properties of the control in the resource view. I
shuold point out that I'm using MSVC++ version 6. Maybe in newer versions
the limitation has been removed???
"Giovanni Dicanio" <giovanniDOTdicanio@REMOVEMEgmail.com> wrote in message
news:ObPlElnyKHA.2644@TK2MSFTNGP04.phx.gbl...
> "Bill Brehm" <don't want spam> ha scritto nel messaggio
> news:uZ1EiF9xKHA.2436@TK2MSFTNGP04.phx.gbl...
>
>> So to conclude, the speed problem can be solved by using SetItemCount()
>> instead ot InsertItem() many times. But the upper limit on the number of
>> items in a CListCtrl is one hundred million (plus one).
>
> Would it be possible to you to test using SetItemCountEx instead of
> SetItemCount?
>
> http://msdn.microsoft.com/en-us/library/xb95w29a(VS.80).aspx
>
> http://msdn.microsoft.com/en-us/library/bb775095(VS.85).aspx
>
> <quote>
> If the list-view control was created with the LVS_OWNERDATA style, this
> macro sets the virtual number of items that the control contains.
> If the list-view control was created without the LVS_OWNERDATA style, the
> ListView_SetItemCount macro should be used.
> </quote>
>
> Giovanni
>
>
|
|
0
|
|
|
|
Reply
|
Bill
|
3/24/2010 12:03:22 AM
|
|
"Bill Brehm" <don't want spam> ha scritto nel messaggio
news:#ScAsVuyKHA.2552@TK2MSFTNGP04.phx.gbl...
> I did try SetItemCountEx() as well and got the same result. I do have
> LVS_OWNERDATA set in the properties of the control in the resource view. I
> shuold point out that I'm using MSVC++ version 6. Maybe in newer versions
> the limitation has been removed???
I tried a quick test building a dialog-based MFC app with VS2010 RC.
Basically, the dialog hosts a virtual list control, that has two columns:
the first one shows the index value (e.g. 0,1,2,3...,<count-1>), the second
one is just left empty.
I built the app in 64-bit and run the test: my results on Windows 7 x64 are
that the virtual list control displays properly for count <= 100,000,000.
Instead, for count > 100,000,000 an empty list is displayed.
Weird...
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/24/2010 12:19:26 PM
|
|
"Giovanni Dicanio" <giovanniDOTdicanio@REMOVEMEgmail.com> ha scritto nel
messaggio news:#mTWCx0yKHA.1236@TK2MSFTNGP06.phx.gbl...
> I tried a quick test building a dialog-based MFC app with VS2010 RC.
> [...]
I can repro it also using a raw Win32 C++ app (no MFC).
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/24/2010 2:19:16 PM
|
|
There are worse things. I'm putting a scrollbar in another project and I
just discovered (maybe again, can't recall) that it has a limit of 32767
positions. I'm thinking of using two, one for course and one for fine or
even three, but it's not what people are used to seeing.
Thanks for the update that even ten years later, CListCtrl hasn't been
enhanced. No need to upgrade for that reason.
"Giovanni Dicanio" <giovanniDOTdicanio@REMOVEMEgmail.com> wrote in message
news:%23nAIA01yKHA.4492@TK2MSFTNGP05.phx.gbl...
> "Giovanni Dicanio" <giovanniDOTdicanio@REMOVEMEgmail.com> ha scritto nel
> messaggio news:#mTWCx0yKHA.1236@TK2MSFTNGP06.phx.gbl...
>
>> I tried a quick test building a dialog-based MFC app with VS2010 RC.
>> [...]
>
> I can repro it also using a raw Win32 C++ app (no MFC).
>
> Giovanni
>
>
>
|
|
0
|
|
|
|
Reply
|
Bill
|
3/24/2010 2:55:27 PM
|
|
"Bill Brehm" <don't want spam> ha scritto nel messaggio
news:u3rSJI2yKHA.1796@TK2MSFTNGP02.phx.gbl...
> Thanks for the update that even ten years later, CListCtrl hasn't been
> enhanced.
I asked about that to some Microsoft persons, and I'll let you know if I
have any news about that.
> No need to upgrade for that reason.
VS2010 has several advantages if compared to VC6, like better C++ standard
compliance, better STL implementation, parallel libraries, etc.
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/24/2010 4:11:25 PM
|
|
"Giovanni Dicanio" <giovanniDOTdicanio@REMOVEMEgmail.com> ha scritto nel
messaggio news:#mTWCx0yKHA.1236@TK2MSFTNGP06.phx.gbl...
> I built the app in 64-bit and run the test: my results on Windows 7 x64
> are that the virtual list control displays properly for count <=
> 100,000,000.
> Instead, for count > 100,000,000 an empty list is displayed.
FWIW, I tried also a simple WPF test with a listview, and the situation
seems to be far worse than native.
In fact, even with 100,000 items, the program takes longer to start and then
hangs...
Giovanni
|
|
0
|
|
|
|
Reply
|
Giovanni
|
3/24/2010 5:16:00 PM
|
|
On Mar 24, 7:55=A0am, "Bill Brehm" <don't want spam> wrote:
> There are worse things. I'm putting a scrollbar in another project and I
> just discovered (maybe again, can't recall) that it has a limit of 32767
> positions.
According to VS2005 docs, the scroll bar functions support 32-bit
position data; it is only the messages WM_xSCROLL that are limited to
16 bits (32767). GetScrollInfo has an example of how to use the 32-bit
positions.
|
|
0
|
|
|
|
Reply
|
Woody
|
3/25/2010 8:01:15 AM
|
|
Brilliant! Thanks very much Woody. It's documented in version 6 too.
"Woody" <ols6000@sbcglobal.net> wrote in message
news:561c1965-1a60-447a-a8f2-2c8b775657ac@x12g2000yqx.googlegroups.com...
On Mar 24, 7:55 am, "Bill Brehm" <don't want spam> wrote:
> There are worse things. I'm putting a scrollbar in another project and I
> just discovered (maybe again, can't recall) that it has a limit of 32767
> positions.
According to VS2005 docs, the scroll bar functions support 32-bit
position data; it is only the messages WM_xSCROLL that are limited to
16 bits (32767). GetScrollInfo has an example of how to use the 32-bit
positions.
|
|
0
|
|
|
|
Reply
|
Bill
|
3/26/2010 7:00:54 AM
|
|
|
30 Replies
257 Views
(page loaded in 1.476 seconds)
|