I am using this routine to replace characters in a large text string.
I would like to know how many replacements have been made.
This is a generalized routine for experimenting.
If I set lReplCount = 1, I can get a count but it takes over one minute to
run.
If I set lReplCount = -1, it takes about 20 mSec but I get no count (except
1).
Is there any way to get speed and the replace count?
I would be happy with a few seconds and an accurate count.
Do
If InStr(1, sText, vbCr & vbCr) > 0 Then
lChangeCount = lChangeCount + 1
sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
If lChangeCount Mod 100 = 0 Then
DoEvents
Sleep 0&
End If
Else
Exit Do
End If
Loop While True
|
|
0
|
|
|
|
Reply
|
Utf
|
12/24/2009 1:55:01 AM |
|
"Bee" <Bee@discussions.microsoft.com> skrev i meddelandet
news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
>I am using this routine to replace characters in a large text string.
> I would like to know how many replacements have been made.
> This is a generalized routine for experimenting.
> If I set lReplCount = 1, I can get a count but it takes over one minute to
> run.
> If I set lReplCount = -1, it takes about 20 mSec but I get no count
> (except
> 1).
> Is there any way to get speed and the replace count?
> I would be happy with a few seconds and an accurate count.
>
> Do
> If InStr(1, sText, vbCr & vbCr) > 0 Then
> lChangeCount = lChangeCount + 1
> sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
> If lChangeCount Mod 100 = 0 Then
> DoEvents
> Sleep 0&
> End If
> Else
> Exit Do
> End If
> Loop While True
>
From MSDN "Optional. Number of substring substitutions to perform. If
omitted, the default value is -1, which means make all possible
substitutions.".
If you set IReplaceCount to -1 the loop is executed only one time, replacing
all (vbCr & vbCr) at once.
/Henning
|
|
0
|
|
|
|
Reply
|
Henning
|
12/24/2009 3:32:43 AM
|
|
"Bee" <Bee@discussions.microsoft.com> wrote in message
news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
> I am using this routine to replace characters in a large text string.
> I would like to know how many replacements have been made.
> This is a generalized routine for experimenting.
> If I set lReplCount = 1, I can get a count but it takes over one minute to
> run.
> If I set lReplCount = -1, it takes about 20 mSec but I get no count
(except
> 1).
> Is there any way to get speed and the replace count?
> I would be happy with a few seconds and an accurate count.
>
> Do
> If InStr(1, sText, vbCr & vbCr) > 0 Then
> lChangeCount = lChangeCount + 1
> sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
> If lChangeCount Mod 100 = 0 Then
> DoEvents
> Sleep 0&
> End If
> Else
> Exit Do
> End If
> Loop While True
>
Something like this ...
Dim lCount As Long
Dim pos As Integer: pos = 1
Dim sText As String
Dim sFind As String: sFind = vbCr & vbCr
Do While pos <> 0
pos = InStr(pos + 1, sText, sFind)
lCount = lCount + 1
If lCount Mod 100 = 0 Then DoEvents()
Loop
sText = Replace(sText, sFind, vbCr)
Note: you seldom need to follow DoEvents() with Sleep() as DoEvents calls
Sleep 0. So you can skip that extra click. There are occasions when you may
need a longer sleep than 0, but not here.
You may be able to do this faster if you also do the replace at the point of
discovery. For this you could use another 'string' buffer copying the new
string as you go along. But I'm just guessing - I didn't test it.
-ralph
|
|
0
|
|
|
|
Reply
|
Ralph
|
12/24/2009 6:05:13 AM
|
|
> If you set IReplaceCount to -1 the loop is executed only one time,
> replacing all (vbCr & vbCr) at once.
Given the OP's code, he is attempting to collapse all multiple contiguous
carriage returns down to a single carriage return... your suggestion won't
do that if there are more than two contiguous carriage returns as the
Replace function is not reentrant (that is, it will **not** reduce, as a
small example, four carriage returns down to two and then go back to replace
the new double carriage return that is left... it will just leave the double
carriage return), so the loop the OP is using is required.
--
Rick (MVP - Excel)
"Henning" <computer_hero@coldmail.com> wrote in message
news:%23nDoDnEhKHA.5644@TK2MSFTNGP05.phx.gbl...
>
> "Bee" <Bee@discussions.microsoft.com> skrev i meddelandet
> news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
>>I am using this routine to replace characters in a large text string.
>> I would like to know how many replacements have been made.
>> This is a generalized routine for experimenting.
>> If I set lReplCount = 1, I can get a count but it takes over one minute
>> to
>> run.
>> If I set lReplCount = -1, it takes about 20 mSec but I get no count
>> (except
>> 1).
>> Is there any way to get speed and the replace count?
>> I would be happy with a few seconds and an accurate count.
>>
>> Do
>> If InStr(1, sText, vbCr & vbCr) > 0 Then
>> lChangeCount = lChangeCount + 1
>> sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
>> If lChangeCount Mod 100 = 0 Then
>> DoEvents
>> Sleep 0&
>> End If
>> Else
>> Exit Do
>> End If
>> Loop While True
>>
>
> From MSDN "Optional. Number of substring substitutions to perform. If
> omitted, the default value is -1, which means make all possible
> substitutions.".
>
> If you set IReplaceCount to -1 the loop is executed only one time,
> replacing all (vbCr & vbCr) at once.
>
> /Henning
>
>
|
|
0
|
|
|
|
Reply
|
Rick
|
12/24/2009 8:29:06 AM
|
|
"Bee" <Bee@discussions.microsoft.com> wrote in message
news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
> I am using this routine to replace characters in a large text
> string. I would like to know how many replacements have
> been made. This is a generalized routine for experimenting.
> If I set lReplCount = 1, I can get a count but it takes over
> one minute to run. If I set lReplCount = -1, it takes about
> 20 mSec but I get no count (except 1). Is there any way to
> get speed and the replace count? I would be happy with a
> few seconds and an accurate count.
If you want to know the number of replacements required then it would be
faster to run through the string first using InString in a loop and
modifying the Start parameter within the loop to each time point to the
first character after the previously found substring. Then use a single
Replace function to peform the actual replacements. That would be very
quick. There are other more generalised solutions though, although before I
come to that perhaps a few notes about your existing code might be in order.
Firstly, in your code you are replacing multiple vbCr characters whereas
most standard Windows .txt files (at least most that I've come across) use
the character pair vbCrLf instead of just a vbCr character, so presumably
you are working with a text file that does use just vbCr to signify a new
line, of which of course there are many, but you do need to know (or include
code to test) which kind of text file you are working with.
Secondly, your existing loop code as it stands (although it is not a good
way to do it) will replace each set of multiple vbCr with just a single
vbCr, regardless of whether there are two of them together or three or four
or five of them together or whatever, and the count you get in your code
would reflect that. In fact that is probably what you want to do anyway, but
you should bear in mind generally that a single use of the VB6 Replace
function (not inside your loop) to replace vbCr & vbCr with vbCr will
replace each pair of vbCr it comes across with a single vbCr, which is not
the same thing. For example, using a single call to the VB6 Replace function
on a string containing a consecutive group of five vbCr characters will
replace the first pair in that set with just one and the second pair in that
set with just one, leaving the fifth vbCr of the set alone because it is not
itself part of another pair. In other words, a consecutive group of five
vbCr will replaced by a consecutive group of three vbCr. A second call to
the VB6 replace function will replace just the first pair of that group of
three with a single vbCr, thereby effectively replacing the group of three
with a group of two. So if, for example, five conecutive vbCr was the
largest "group of vbCr" in the string then you would need to call the VB6
Replace function three times (you can of course write code to do that for
you so that it knows how many times it must call the Replace function). I've
used vbCr in this example simply because that is what you are using
yourself, but the same reasoning applies regardless of the string you are
replacing.
Anyway, regarding your specific request, here is some code I wrote a long
time ago when I had VB5 (because VB5 does not have a native Replace
function). It does a similar job to the VB6 Replace function, but it returns
a count of the number of replacements it has made, so it should be fine for
your own purposes. And of course it does it purely in code, because it does
not actually use the VB6 Replace function (it wasn't there in VB5). You'll
find that it is still pretty fast though, often as fast if not faster than a
call to the VB6 Replace function, especially when run as a standard native
code compiled exe, although I have not included anything to deal with
strings containing stuff other than standard English text. There are even
faster ways of doing it, especially once you start messing with array
pointers and stuff. In fact, as far as I recall, some time after that there
was a bit of a "speed test" in the VB newsgroup (clbvm I think) in which
there was a sort of competition going to write the fastest Replace function.
The Google Groups archive isn't what it used to be, but you may still find
it out there somewhere. I think you'll find my own outine easily fast enough
for your needs though. It will certainly kill the "few seconds" you said you
would be happy with, that's for sure! By the way, the small Button Click
example showing you how to call the function uses a stest string containing
vbCr (as it appears you are using yourself in your own text string)
although, as I have said, vbCrLf is more common in standard Windows .txt
files.
Mike
Private Function ReplaceAll(ByRef txtWork As String, _
ByVal txtFind As String, ByVal txtReplace As String, _
ByVal compare As Long, ByRef replacecount As Long) _
As String
Dim txtNew As String, Ltxtnew As Long
Dim Start As Long, nCounter As Long, n As Long
Dim Lfind As Long, Lreplace As Long
Dim OldPointer As Long, NewPointer As Long
Dim txtTemp As String
replacecount = 0
Lfind = Len(txtFind)
If Lfind < 1 Then
ReplaceAll = txtWork
Exit Function
End If
Lreplace = Len(txtReplace)
' the following temporary strings avoid the
' need to use vbTextCompare (which is very slow)
If compare = vbTextCompare Then
txtTemp = UCase(txtWork)
txtFind = UCase(txtFind)
Else
txtTemp = txtWork
End If
'
Start = 1 - Lfind
Do
Start = InStr(Start + Lfind, txtTemp, txtFind, 0)
If Start <> 0 Then
nCounter = nCounter + 1
Else
Exit Do
End If
Loop
If nCounter < 1 Then
ReplaceAll = txtWork
Exit Function
Else
Ltxtnew = Len(txtWork) + nCounter * (Lreplace - Lfind)
txtNew = Space$(Ltxtnew)
Start = 1 - Lfind
OldPointer = 1
NewPointer = 1
On Error Resume Next
For n = 1 To nCounter
Start = InStr(Start + Lfind, txtTemp, txtFind, 0)
Mid$(txtNew, NewPointer, Start - OldPointer) = _
Mid$(txtWork, OldPointer, Start - OldPointer)
NewPointer = NewPointer + Start - OldPointer + Lreplace
Mid$(txtNew, NewPointer - Lreplace, Lreplace) = txtReplace
OldPointer = Start + Lfind
Next n
On Error GoTo 0
If OldPointer <= Len(txtWork) Then
Mid$(txtNew, NewPointer) = Mid$(txtWork, OldPointer)
End If
End If
ReplaceAll = txtNew
replacecount = nCounter
End Function
Private Sub Command2_Click()
Dim s1 As String, nCount As Long
s1 = "A small" & vbCr & vbCr & "test string"
s1 = ReplaceAll(s1, vbCr & vbCr, vbCr, vbBinaryCompare, nCount)
Print nCount & " replacements made"
End Sub
|
|
0
|
|
|
|
Reply
|
Mike
|
12/24/2009 8:34:15 AM
|
|
I would think the number of replacements you are after would be the total
number of carriage returns at the beginning of your routine minus the number
remaining after the routine. So, something like this maybe...
StartingNumberOfCRs = Len(sText) - Len(Replace(sText, vbCr, ""))
Do While InStr(1, sText, vbCr & vbCr) > 0
sText = Replace(sText, vbCr & vbCr, vbCr)
Loop
EndingNumberOfCRs = Len(sText) - Len(Replace(sText, vbCr, ""))
NumberOfReplacements = StartingNumberOfCRs - EndingNumberOfCRs
--
Rick (MVP - Excel)
"Bee" <Bee@discussions.microsoft.com> wrote in message
news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
>I am using this routine to replace characters in a large text string.
> I would like to know how many replacements have been made.
> This is a generalized routine for experimenting.
> If I set lReplCount = 1, I can get a count but it takes over one minute to
> run.
> If I set lReplCount = -1, it takes about 20 mSec but I get no count
> (except
> 1).
> Is there any way to get speed and the replace count?
> I would be happy with a few seconds and an accurate count.
>
> Do
> If InStr(1, sText, vbCr & vbCr) > 0 Then
> lChangeCount = lChangeCount + 1
> sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
> If lChangeCount Mod 100 = 0 Then
> DoEvents
> Sleep 0&
> End If
> Else
> Exit Do
> End If
> Loop While True
>
|
|
0
|
|
|
|
Reply
|
Rick
|
12/24/2009 8:35:03 AM
|
|
It was just a try to explain why he when setting IReplaceCount to -1 just
got a count of one (1) in 20 ms. Compared to setting IReplaceCount to 1 and
letting the loop replace all occurances.
/Henning
"Rick Rothstein" <rick.newsNO.SPAM@NO.SPAMverizon.net> skrev i meddelandet
news:%23$sykMHhKHA.1536@TK2MSFTNGP06.phx.gbl...
>> If you set IReplaceCount to -1 the loop is executed only one time,
>> replacing all (vbCr & vbCr) at once.
>
> Given the OP's code, he is attempting to collapse all multiple contiguous
> carriage returns down to a single carriage return... your suggestion won't
> do that if there are more than two contiguous carriage returns as the
> Replace function is not reentrant (that is, it will **not** reduce, as a
> small example, four carriage returns down to two and then go back to
> replace the new double carriage return that is left... it will just leave
> the double carriage return), so the loop the OP is using is required.
>
> --
> Rick (MVP - Excel)
>
>
> "Henning" <computer_hero@coldmail.com> wrote in message
> news:%23nDoDnEhKHA.5644@TK2MSFTNGP05.phx.gbl...
>>
>> "Bee" <Bee@discussions.microsoft.com> skrev i meddelandet
>> news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
>>>I am using this routine to replace characters in a large text string.
>>> I would like to know how many replacements have been made.
>>> This is a generalized routine for experimenting.
>>> If I set lReplCount = 1, I can get a count but it takes over one minute
>>> to
>>> run.
>>> If I set lReplCount = -1, it takes about 20 mSec but I get no count
>>> (except
>>> 1).
>>> Is there any way to get speed and the replace count?
>>> I would be happy with a few seconds and an accurate count.
>>>
>>> Do
>>> If InStr(1, sText, vbCr & vbCr) > 0 Then
>>> lChangeCount = lChangeCount + 1
>>> sText = Replace(sText, vbCr & vbCr, vbCr, , lReplCount)
>>> If lChangeCount Mod 100 = 0 Then
>>> DoEvents
>>> Sleep 0&
>>> End If
>>> Else
>>> Exit Do
>>> End If
>>> Loop While True
>>>
>>
>> From MSDN "Optional. Number of substring substitutions to perform. If
>> omitted, the default value is -1, which means make all possible
>> substitutions.".
>>
>> If you set IReplaceCount to -1 the loop is executed only one time,
>> replacing all (vbCr & vbCr) at once.
>>
>> /Henning
>>
>>
>
|
|
0
|
|
|
|
Reply
|
Henning
|
12/24/2009 4:36:12 PM
|
|
I have found that cutting and pasting some large text files from the web
sometimes have the vbCR vbCR vbLF sequence throughout. So I am trying to
quickly clean up the text files so that they will justify correctly in a
richtextbox. This sequence, though not of itself, causes problems with the
next phase of cleanup so I need to do this first.
I ran your code and was suprised how fast it ran. Very nice.
My next is to try to adapt yopur code to the next phase which is to do
single replacements of sequencese within the text.
In my test text there are over 7000 of the vbCR vbCR vbLF groups that need
to be replace followed by cleaning up vbCR vbLF that are NOT at the end of a
sentence.
I am hoping that your RepalceAll can be modified to allow single replacements.
It seems that the most slowness is concatenating strings.
I expect to have to replace over 3000 vbCR vbLF pairs with a Trim$(Left)
space Trim$(Right) where there is no end of sentence. I do not need
supersonic speed but I think your method will significantly improve what I
now have in this area too.
This is all part of a generalized cleanup for text. Also will be removing
non-printing unnecessary characters and extended characters except for
accented characters.
Thanks again for coming to the rescue!
"Mike Williams" wrote:
> "Bee" <Bee@discussions.microsoft.com> wrote in message
> news:C10F4FCA-4D8E-428B-9332-DCE5B6766329@microsoft.com...
>
> > I am using this routine to replace characters in a large text
> > string. I would like to know how many replacements have
> > been made. This is a generalized routine for experimenting.
> > If I set lReplCount = 1, I can get a count but it takes over
> > one minute to run. If I set lReplCount = -1, it takes about
> > 20 mSec but I get no count (except 1). Is there any way to
> > get speed and the replace count? I would be happy with a
> > few seconds and an accurate count.
>
> If you want to know the number of replacements required then it would be
> faster to run through the string first using InString in a loop and
> modifying the Start parameter within the loop to each time point to the
> first character after the previously found substring. Then use a single
> Replace function to peform the actual replacements. That would be very
> quick. There are other more generalised solutions though, although before I
> come to that perhaps a few notes about your existing code might be in order.
>
> Firstly, in your code you are replacing multiple vbCr characters whereas
> most standard Windows .txt files (at least most that I've come across) use
> the character pair vbCrLf instead of just a vbCr character, so presumably
> you are working with a text file that does use just vbCr to signify a new
> line, of which of course there are many, but you do need to know (or include
> code to test) which kind of text file you are working with.
>
> Secondly, your existing loop code as it stands (although it is not a good
> way to do it) will replace each set of multiple vbCr with just a single
> vbCr, regardless of whether there are two of them together or three or four
> or five of them together or whatever, and the count you get in your code
> would reflect that. In fact that is probably what you want to do anyway, but
> you should bear in mind generally that a single use of the VB6 Replace
> function (not inside your loop) to replace vbCr & vbCr with vbCr will
> replace each pair of vbCr it comes across with a single vbCr, which is not
> the same thing. For example, using a single call to the VB6 Replace function
> on a string containing a consecutive group of five vbCr characters will
> replace the first pair in that set with just one and the second pair in that
> set with just one, leaving the fifth vbCr of the set alone because it is not
> itself part of another pair. In other words, a consecutive group of five
> vbCr will replaced by a consecutive group of three vbCr. A second call to
> the VB6 replace function will replace just the first pair of that group of
> three with a single vbCr, thereby effectively replacing the group of three
> with a group of two. So if, for example, five conecutive vbCr was the
> largest "group of vbCr" in the string then you would need to call the VB6
> Replace function three times (you can of course write code to do that for
> you so that it knows how many times it must call the Replace function). I've
> used vbCr in this example simply because that is what you are using
> yourself, but the same reasoning applies regardless of the string you are
> replacing.
>
> Anyway, regarding your specific request, here is some code I wrote a long
> time ago when I had VB5 (because VB5 does not have a native Replace
> function). It does a similar job to the VB6 Replace function, but it returns
> a count of the number of replacements it has made, so it should be fine for
> your own purposes. And of course it does it purely in code, because it does
> not actually use the VB6 Replace function (it wasn't there in VB5). You'll
> find that it is still pretty fast though, often as fast if not faster than a
> call to the VB6 Replace function, especially when run as a standard native
> code compiled exe, although I have not included anything to deal with
> strings containing stuff other than standard English text. There are even
> faster ways of doing it, especially once you start messing with array
> pointers and stuff. In fact, as far as I recall, some time after that there
> was a bit of a "speed test" in the VB newsgroup (clbvm I think) in which
> there was a sort of competition going to write the fastest Replace function.
> The Google Groups archive isn't what it used to be, but you may still find
> it out there somewhere. I think you'll find my own outine easily fast enough
> for your needs though. It will certainly kill the "few seconds" you said you
> would be happy with, that's for sure! By the way, the small Button Click
> example showing you how to call the function uses a stest string containing
> vbCr (as it appears you are using yourself in your own text string)
> although, as I have said, vbCrLf is more common in standard Windows .txt
> files.
>
> Mike
>
> Private Function ReplaceAll(ByRef txtWork As String, _
> ByVal txtFind As String, ByVal txtReplace As String, _
> ByVal compare As Long, ByRef replacecount As Long) _
> As String
> Dim txtNew As String, Ltxtnew As Long
> Dim Start As Long, nCounter As Long, n As Long
> Dim Lfind As Long, Lreplace As Long
> Dim OldPointer As Long, NewPointer As Long
> Dim txtTemp As String
> replacecount = 0
> Lfind = Len(txtFind)
> If Lfind < 1 Then
> ReplaceAll = txtWork
> Exit Function
> End If
> Lreplace = Len(txtReplace)
> ' the following temporary strings avoid the
> ' need to use vbTextCompare (which is very slow)
> If compare = vbTextCompare Then
> txtTemp = UCase(txtWork)
> txtFind = UCase(txtFind)
> Else
> txtTemp = txtWork
> End If
> '
> Start = 1 - Lfind
> Do
> Start = InStr(Start + Lfind, txtTemp, txtFind, 0)
> If Start <> 0 Then
> nCounter = nCounter + 1
> Else
> Exit Do
> End If
> Loop
> If nCounter < 1 Then
> ReplaceAll = txtWork
> Exit Function
> Else
> Ltxtnew = Len(txtWork) + nCounter * (Lreplace - Lfind)
> txtNew = Space$(Ltxtnew)
> Start = 1 - Lfind
> OldPointer = 1
> NewPointer = 1
> On Error Resume Next
> For n = 1 To nCounter
> Start = InStr(Start + Lfind, txtTemp, txtFind, 0)
> Mid$(txtNew, NewPointer, Start - OldPointer) = _
> Mid$(txtWork, OldPointer, Start - OldPointer)
> NewPointer = NewPointer + Start - OldPointer + Lreplace
> Mid$(txtNew, NewPointer - Lreplace, Lreplace) = txtReplace
> OldPointer = Start + Lfind
> Next n
> On Error GoTo 0
> If OldPointer <= Len(txtWork) Then
> Mid$(txtNew, NewPointer) = Mid$(txtWork, OldPointer)
> End If
> End If
> ReplaceAll = txtNew
> replacecount = nCounter
> End Function
>
> Private Sub Command2_Click()
> Dim s1 As String, nCount As Long
> s1 = "A small" & vbCr & vbCr & "test string"
> s1 = ReplaceAll(s1, vbCr & vbCr, vbCr, vbBinaryCompare, nCount)
> Print nCount & " replacements made"
> End Sub
>
>
>
>
>
>
> .
>
|
|
0
|
|
|
|
Reply
|
Utf
|
12/24/2009 4:43:02 PM
|
|
"Bee" <Bee@discussions.microsoft.com> wrote in message
news:CDBCC689-764E-4B77-874E-DA14C1B411F7@microsoft.com...
> I ran your code and was suprised how fast it ran.
> Very nice. My next is to try to adapt your code
> to the next phase which is to do single replacements
> of sequencese within the text.
Yes, it is easy to modify it so that it can do that, and to do other things
as well, but the VB6 Replace function already offers that specific
functionality if you should ever actually need it. Unless there is something
rather unusual you wish to do that might not be so easily achieved with the
VB6 Replace function then I would use the VB6 Replace function to do the
job, counting replacements if necessary in a fast separate Instr loop, as I
had mentioned in the very first part of my response and as was mentioned in
the code examples posted by Ralph and Rick. If you do end up wanting to do
unusual things that cannot easily be achieved with the VB6 Replace function
then you could modify my ReplaceAll function to very quickly do the job, but
I wouldn't really bother unless you have a pressing need to do so. The only
reason I offered my ReplaceAll function was because you asked for a fast way
of getting the VB6 Replace function to return a count of how many
replacements it actually made when asked to replace them all (and you also
said you wanted to do some experimenting) and so I offered my own ReplaceAll
function, which I wrote when I used VB5, partly because it is just as fast
as the VB6 Replace function and partly because it does actually return the
number of replacements it makes.
> It seems that the most slowness is concatenating strings
It certainly is, which is why my ReplaceAll function does not use
concatenation and why you would be well advised not to use concatenation in
any of your own code, at least not in cases where speed is of the essence.
> In my test text there are over 7000 of the vbCR vbCR vbLF
> groups that need to be replace
Not sure what you mean by that? Do you mean that in some of your text files
there actually are lots of the three character groupings vbCr vbCr vbLf, or
is that just a "typo" in your response? Or do you mean something else? All
things are possible of course (except the impossible!) but it isn't worth
elaborating until we know exactly what you have there. By the way, that
reminds me of a very nice advertising slogan I once heard somewhere, "The
improbable we do at once, the impossible takes a little longer!"
> . . . followed by cleaning up vbCR vbLF that are
> NOT at the end of a sentence.
I know what you mean. I've seen lots of text files in which either the
author has pressed the Enter key each time he gets close to the normal word
wrap position of the window he is typing into, or where the utility he is
using to type his stuff has annoyingly done that for him (many applications
do that, including Outlook Express news messages at typical standard
settings). However, whilst determing whether a vbCrLf does not occur
immediately after a full stop ("period" in Americanese) is in many cases a
reasonable indication that it is not at the end of a sentence, things are
not quite as simple as that and you need a bit of thought to get the code to
reliably detect whether a vbCrLf needs to be removed or not. It can be done
of course with fairly good reliability, as long as you take everything into
account.
I'm not sure what the precise contents of your text files are, nor precisely
how you wish to modify them, but personally if I were doing it I would be
inclined to leave alone all instances of where vbCrLf occurs twice in
sucession, because people often do that when writing text files when they
finish one paragraph and start another (such a pair of vbCrLf would occur at
the end of the word "account" in the above paragraph, for example, and it is
intentional and meant to be left that way. My own view would be that you
should leave such pairs alone, and replace only those where there are three
or more vbCrLf in succession (to remove possibly unwanted extra blank lines
between paragraphs).
I would then remove all single occurrences of vbCrLf where it appeared that
it was a vbCrLf which was annoyingly placed there by the application used to
originally write the text file (those that the application has used to "hard
wrap" a long sentence to the editing window). As I've said, determining
whether a vbCrLf is one of these that needs removing, or whether it is one
that the author actually intended to be there as a meaningful part of the
text (the end of a paragraph for example) needs a bit more code than a
simple "does it immediately follow a full stop" test, but in no case does it
require you to repeatedly set the number of replacements to just one in your
Replace function, and neither does it require you to repeatedly concatenate
substrings, either of which would slow your code down.
Instead, taking the overly simplified "is it immediately preceeded by a full
stop" method as a simple example, you can run through the entire text string
just once using an Instr loop looking for occurrences of vbCrLf and
examining the immediately preceeding character each time it finds one to
check whether it is a full stop or exclamation mark or whatever, and if it
is not a full stop then replace the vbCrLf with something else of the same
character length that would not normally appear in a text file, such as two
zeros, the "two character string" Chr(0) Chr(0) for example, using the Mid$
statement. The Mid$ statement (as opposed to the Mid$ function) is extremely
fast, because it alters specific characters in a string without altering the
length of the string and without moving it anywhere else. You could run
through a very large text file using an Instr loop in such a way in a small
number of milliseconds. Then all you need to do is use the standard VB6
Replace function to replace all occurrences of Chr(0) Chr(0) with a space
character, which is also extremely fast. Anyway, that's just the sort of
thing I would suggest and I offer it purely as food for thought.
Mike
|
|
0
|
|
|
|
Reply
|
Mike
|
12/24/2009 7:17:57 PM
|
|
<snip>
> character, which is also extremely fast. Anyway, that's just the sort of
> thing I would suggest and I offer it purely as food for thought.
>
I don't know about Bee, but I'm stuffed! ;-)
Good stuff Mike..
> Mike
>
|
|
0
|
|
|
|
Reply
|
Mike
|
12/24/2009 7:55:13 PM
|
|
"Mike B" <mDotByerley@VerizonDottieNettie> wrote in message
news:ueO6BMNhKHA.1648@TK2MSFTNGP05.phx.gbl...
> I don't know about Bee, but I'm stuffed! ;-) Good stuff Mike..
Thanks Mike. Nice to be appreciated. By the way, I realise that you don't
live in Australia but you sound like a well travelled person and I wonder if
you happen to know anything about Aboriginal Voodoo Rituals? The reason I
ask is that some time ago I "took the Mickey" out of Bill McCarthy by
publicly noting that he was probably almost thirty years younger than myself
and yet according to his MVP photo he was going prematurely bald, whereas
despite my age I had a reasonably complete head of hair, despite the fact
that the individual strands are quite thin. However, since that time my hair
has rapidly begun to disappear and it is now getting to the state where it
is easily possible to see right through it in a number of places, with a
suddenly fast receding hairline at the forehead and the temples. All this in
less than twelve months! This feels weird to me. I've managed to get to 66
years of age with a reasonably intact head of hair and yet suddenly, almost
immediately after "taking the Mickey" out of Bill McCarthy, it has started
to rapidly thin out and recede.
Is there a connection? Is Australian Aboriginal Voodo a real phenomenon, and
if so do you think that McCarthy knows how to do it?
Mike
p.s. As you might have gathered, it is Christmas Eve here in England at the
moment and I'm slightly inebriated, but the impending and apparently rapidly
approaching baldness after all these years is real, and I'm sure that
McCarthy is at the root of it! Incidentally, can any "baldies" here offer me
any words of encouragement, to ease my pain?
|
|
0
|
|
|
|
Reply
|
Mike
|
12/24/2009 9:03:28 PM
|
|
Not a typo. There are over 7000 of these vbCR vbCR vbLF groups. As copied
from a webpage. It seems that some webpages are loaded with this grouping.
Yes, I am filtering all non-printing characters.
And yes, appending is a drag.
So Replace is a better way.
I have also found some other API methods that might be worth a few hours of
fiddling but it looks like your ReplaceAll and VB Replace will do for now.
Just in case, when I get to a CRLF I back up until I find either a character
or a punctuation mark . ! or ? then call that an end of sentence. This after
filtering all non-printing characters (except for vbTab vbCR vbLF etc).
I may try moding the ReplaceAll and see what happens. Just so I have a
common Call.
Thanks again.
Merry and Happy, but not too much.
|
|
0
|
|
|
|
Reply
|
Utf
|
12/24/2009 10:02:01 PM
|
|
There is a Windows API call that produces output with vbCr, vbCr, vbLf
triplets like that... it is used to break the contents of a TextBox into the
individually displayed lines in it (at the word wrap positions). Perhaps
that is where your text came from. I would think just replacing these
triplets with a space might get you to a normally readable file. Give this
code a try.(it loads the entire file into a String variable and then
replaces all those triplets)...
Dim FileNum As Long, TotalFile As String, PathAndFileName As String
FileNum = FreeFile
Open PathAndFileName For Binary As #FileNum
TotalFile = Space(LOF(FileNum))
Get #FileNum, , TotalFile
Close #FileNum
TotalFile = Replace(TotalFile, vbCr & vbCrLf, " ")
FileNum = FreeFile
Open NewPathAndFileName For Output As #FileNum
Print #FileNum, TotalFile
Close #FileNum
Just assign the correct path and filenames to the PathAndFileName and
NewPathAndFileName variables and run this code... at the end, you will have
a new txt file that should read correctly. Give it a try and see.
--
Rick (MVP - Excel)
"Bee" <Bee@discussions.microsoft.com> wrote in message
news:4FBC36FA-945A-43A5-98BB-B893ADBB0985@microsoft.com...
> Not a typo. There are over 7000 of these vbCR vbCR vbLF groups. As
> copied
> from a webpage. It seems that some webpages are loaded with this
> grouping.
> Yes, I am filtering all non-printing characters.
> And yes, appending is a drag.
> So Replace is a better way.
> I have also found some other API methods that might be worth a few hours
> of
> fiddling but it looks like your ReplaceAll and VB Replace will do for now.
>
> Just in case, when I get to a CRLF I back up until I find either a
> character
> or a punctuation mark . ! or ? then call that an end of sentence. This
> after
> filtering all non-printing characters (except for vbTab vbCR vbLF etc).
>
> I may try moding the ReplaceAll and see what happens. Just so I have a
> common Call.
>
> Thanks again.
> Merry and Happy, but not too much.
>
|
|
0
|
|
|
|
Reply
|
Rick
|
12/24/2009 10:29:44 PM
|
|
|
12 Replies
285 Views
(page loaded in 0.281 seconds)
|