Replace Better

  • Follow


Vb6 Replace is fine for what it does and is fast but it has a strange 
operation.

If I specify Start then the string that is returned now begins at the start 
parameter.
Appending it way to slow.
So what fast way is there to do a replace at a specified location in a string?
The replace may be a vbNullstring, or more or fewer characters than the 
single or multiple character find string.

ReplaceStr(expression, find, replace[, start[, count[, compare]]])

Always returns the beginning and changed part and end part of the expression 
when start is supplied or not.

0
Reply Utf 12/25/2009 12:11:01 AM

"Bee" <Bee@discussions.microsoft.com> wrote in message
news:33DAF12E-CD12-4848-8198-688E59613373@microsoft.com...
> Vb6 Replace is fine for what it does and is fast but it has a strange
> operation.
>
> If I specify Start then the string that is returned now begins at the
start
> parameter.
> Appending it way to slow.
> So what fast way is there to do a replace at a specified location in a
string?
> The replace may be a vbNullstring, or more or fewer characters than the
> single or multiple character find string.
>
> ReplaceStr(expression, find, replace[, start[, count[, compare]]])
>
> Always returns the beginning and changed part and end part of the
expression
> when start is supplied or not.
>

Don't have code immediately at hand, but these hints should help.

Create a second temporary buffer - a byte array.
Make it the same size as the file (FileLen) plus a little extra.
Use Mike's suggestion to step through the original text looking for changes.
When found, and a substitute is determined, write the new string of
characters to the Temporary buffer using Memcopy.

Write out the temporary buffer as the new file.

-ralph


0
Reply Ralph 12/25/2009 1:02:09 AM

"Bee" <Bee@discussions.microsoft.com> schrieb im Newsbeitrag 
news:33DAF12E-CD12-4848-8198-688E59613373@microsoft.com...
> Vb6 Replace is fine for what it does and is fast but it has a strange
> operation.
>
> If I specify Start then the string that is returned now begins at the 
> start
> parameter.

Check at VBspeed. All functions here work the way you like it, and are 
faster than Vb6 Replace:
http://www.xbeat.net/vbspeed/c_Replace.htm

Don 

0
Reply Donald 12/25/2009 7:42:16 PM

The Mid *statement* is very fast. Tokenizing is also
surprisingly fast. Here's an "air code" sample that points
an integer array at an incoming string and runs a tokenizing
routine, building a new string with Mid as it goes. The sample
just looks for "***" and replaces it with "**". As you can see
from the code, you can run any number of replacement operations
based on character codes. The "***" can be editied to
replace CrCrLf with CrLf. At the same time you could replace
"ball" with "hat", "1" with "2", etc.

   Also note: The code as written allows for a new
string 1.5 times the size of the original. If you're
doing a lot of operations you might need to keep an eye
on the loop to make sure your new string doesn't need
to be reallocated.

'--------------------------- module code:

Private Type SAFEARRAY1
    cDims           As Integer
    fFeatures       As Integer
    cbElements      As Long
    cLocks          As Long
    pvData          As Long
    cElements       As Long
    lLbound         As Long
End Type
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr"
(ptr() As Any) As Long
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory"
(Destination As Any, Source As Any, ByVal length As Long)
Private Declare Sub ZeroMemory Lib "kernel32" Alias "RtlZeroMemory" (dst As
Any, ByVal NBytes As Long)

Public Function SReplace(StrToRep As String) As String
 Dim Lens As Long, LNew As Long, iPos As Long, iPosNew As Long, iPosStart As
Long, Len2 As Long
 Dim SA1 As SAFEARRAY1
 Dim iChar As Integer
 Dim A1() As Integer

  Lens = Len(StrToRep)
  LNew = Lens * 1.5

  SReplace = String$(LNew, 0)
     With SA1  '-- set up array for t1.text string.
       .cbElements = 2
       .cElements = Lens + 1
       .cDims = 1
       .pvData = StrPtr(StrToRep)
     End With
       CopyMemory ByVal VarPtrArray(A1), VarPtr(SA1), 4

   iPosNew = 1 '-- offset into new string
   iPos = 0

   Do While iPos < Lens
     iPosStart = iPos + 1
     iChar = A1(iPos)

     Select Case iChar
      Case 0
        Exit Do

      Case 42 'vbCR
        If A1(iPos + 1) = 42 Then
          If A1(iPos + 2) = 42 Then
             Mid(SReplace, iPosNew) = "**"
             iPosNew = iPosNew + 2
             iPos = iPos + 2
          End If
        End If

      Case Else
        Mid(SReplace, iPosNew) = Mid(StrToRep, (iPosStart), 1)
        iPosNew = iPosNew + 1
     End Select

     iPos = iPos + 1
  Loop

  ZeroMemory ByVal VarPtrArray(A1), 4
   Len2 = InStr(1, SReplace, Chr$(0))
   If Len2 > 1 Then SReplace = Left$(SReplace, (Len2 - 1))
End Function

'-------------------------- form code:
'--t1 is a multiline textbox

Private Sub Command1_Click()
 Dim s As String
Dim i2 As Long

  s = String$(10000, "A")
  For i2 = 3 To 100
    Mid(s, (i2 * 50)) = "***"
  Next

T1.Text = s
End Sub

Private Sub Command2_Click()
  Dim s As String, s2 As String
    s = T1.Text
    s2 = SReplace(s)
    T1.Text = s2
End Sub



0
Reply mayayana 12/25/2009 9:22:34 PM

Thanks all.  I now have several things to try.

"mayayana" wrote:

> The Mid *statement* is very fast. Tokenizing is also
> surprisingly fast. Here's an "air code" sample that points
> an integer array at an incoming string and runs a tokenizing
> routine, building a new string with Mid as it goes. The sample
> just looks for "***" and replaces it with "**". As you can see
> from the code, you can run any number of replacement operations
> based on character codes. The "***" can be editied to
> replace CrCrLf with CrLf. At the same time you could replace
> "ball" with "hat", "1" with "2", etc.
> 
>    Also note: The code as written allows for a new
> string 1.5 times the size of the original. If you're
> doing a lot of operations you might need to keep an eye
> on the loop to make sure your new string doesn't need
> to be reallocated.
> 
> '--------------------------- module code:
> 
> Private Type SAFEARRAY1
>     cDims           As Integer
>     fFeatures       As Integer
>     cbElements      As Long
>     cLocks          As Long
>     pvData          As Long
>     cElements       As Long
>     lLbound         As Long
> End Type
> Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr"
> (ptr() As Any) As Long
> Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory"
> (Destination As Any, Source As Any, ByVal length As Long)
> Private Declare Sub ZeroMemory Lib "kernel32" Alias "RtlZeroMemory" (dst As
> Any, ByVal NBytes As Long)
> 
> Public Function SReplace(StrToRep As String) As String
>  Dim Lens As Long, LNew As Long, iPos As Long, iPosNew As Long, iPosStart As
> Long, Len2 As Long
>  Dim SA1 As SAFEARRAY1
>  Dim iChar As Integer
>  Dim A1() As Integer
> 
>   Lens = Len(StrToRep)
>   LNew = Lens * 1.5
> 
>   SReplace = String$(LNew, 0)
>      With SA1  '-- set up array for t1.text string.
>        .cbElements = 2
>        .cElements = Lens + 1
>        .cDims = 1
>        .pvData = StrPtr(StrToRep)
>      End With
>        CopyMemory ByVal VarPtrArray(A1), VarPtr(SA1), 4
> 
>    iPosNew = 1 '-- offset into new string
>    iPos = 0
> 
>    Do While iPos < Lens
>      iPosStart = iPos + 1
>      iChar = A1(iPos)
> 
>      Select Case iChar
>       Case 0
>         Exit Do
> 
>       Case 42 'vbCR
>         If A1(iPos + 1) = 42 Then
>           If A1(iPos + 2) = 42 Then
>              Mid(SReplace, iPosNew) = "**"
>              iPosNew = iPosNew + 2
>              iPos = iPos + 2
>           End If
>         End If
> 
>       Case Else
>         Mid(SReplace, iPosNew) = Mid(StrToRep, (iPosStart), 1)
>         iPosNew = iPosNew + 1
>      End Select
> 
>      iPos = iPos + 1
>   Loop
> 
>   ZeroMemory ByVal VarPtrArray(A1), 4
>    Len2 = InStr(1, SReplace, Chr$(0))
>    If Len2 > 1 Then SReplace = Left$(SReplace, (Len2 - 1))
> End Function
> 
> '-------------------------- form code:
> '--t1 is a multiline textbox
> 
> Private Sub Command1_Click()
>  Dim s As String
> Dim i2 As Long
> 
>   s = String$(10000, "A")
>   For i2 = 3 To 100
>     Mid(s, (i2 * 50)) = "***"
>   Next
> 
> T1.Text = s
> End Sub
> 
> Private Sub Command2_Click()
>   Dim s As String, s2 As String
>     s = T1.Text
>     s2 = SReplace(s)
>     T1.Text = s2
> End Sub
> 
> 
> 
> .
> 
0
Reply Utf 12/25/2009 10:43:01 PM

A bit of an edit to make my air code slightly
less airy:

Public Function SReplace(StrToRep As String) As String
 Dim Lens As Long, LNew As Long, iPos As Long, iPosNew As Long, iPosStart As
Long, Len2 As Long
 Dim SA1 As SAFEARRAY1
 Dim iChar As Integer
 Dim A1() As Integer
 Dim Boo1 As Boolean

  Lens = Len(StrToRep)
  LNew = Lens * 1.5

  SReplace = String$(LNew, 0)
     With SA1  '-- set up array for t1.text string.
       .cbElements = 2
       .cElements = Lens + 1
       .cDims = 1
       .pvData = StrPtr(StrToRep)
     End With
       CopyMemory ByVal VarPtrArray(A1), VarPtr(SA1), 4

   iPosNew = 1 '-- offset into new string
   iPos = 0

   Do While iPos < Lens
     iPosStart = iPos + 1
     iChar = A1(iPos)

     Select Case iChar
      Case 0
        Exit Do

      Case 42
        If A1(iPos + 1) = 42 Then
          If A1(iPos + 2) = 42 Then
             Mid$(SReplace, iPosNew) = "**"
             iPosNew = iPosNew + 2
             iPos = iPos + 2
             Boo1 = True
          End If
        End If
          If Boo1 = False Then
             Mid$(SReplace, iPosNew) = Mid$(StrToRep, (iPosStart), 1)
             iPosNew = iPosNew + 1
          End If
        Boo1 = False
      Case Else
        Mid$(SReplace, iPosNew) = Mid$(StrToRep, (iPosStart), 1)
        iPosNew = iPosNew + 1
     End Select

     iPos = iPos + 1
  Loop

  ZeroMemory ByVal VarPtrArray(A1), 4
   Len2 = InStr(1, SReplace, Chr$(0))
   If Len2 > 1 Then SReplace = Left$(SReplace, (Len2 - 1))
End Function

'-------------------------------------

1) Added $ to all Mid calls. I've never been sure
about the Mid statement. It works as Mid$ but
is only documented as Mid. On the other hand,
the VB object browser doesn't show the Mid statement
at all. And MSDN, while listing the Mid and Mid$ function
but only the Mid statement, actually only has one page
pointed to by both Mid Function and Mid$ Function.
It's strange that the VB docs are so cavalier about
distinguishing between data types. I haven't tested
whether Mid$ statement might be slower than Mid
staement for some reason. It works, so I figure it's
the safer option for optimization.

2) At "Case 42" I realized that if a match was not
found then the character still needed to be dealt with.
My first version would have dropped out any * not
followed by two asterisks.

 (It can get tricky as the function gets more complex
and customized, but a simple Replace function would
avoid that problem. Also, with this method the incoming
string can be LCase-d to do a text-compare operation.
And of course it would be simple to add a counter.)

  I don't know if the method I'm using is the fastest,
but it was the best option I came up with when I
needed to write a complex string-rebuilding function.
By pointing an array at the incoming string one avoids
Asc/Chr conversions. Of course the new string could also
be built as an integer array, but that quickly becomes
unwieldy if there are long strings being replaced. In
other words, if you want to replace "a" with "b" it would
be easy, but if you want to replace "here" with "anywhere
in the world" it gets awkward.

  The VBSpeed samples are interesting, and Olaf's
code seems to be the clear winner there, but they
seem to be only testing short strings. And the results
vary so much for different calls that I wonder what
value the tests have. In real-world usage a few
microseconds difference is an absurd measure, especially
given that the function is seldom called more than a few
times. I don't see how those samples could be useful
unless they were repeated on a typical string of perhaps
100 KB.



0
Reply mayayana 12/26/2009 3:05:09 AM

"mayayana" <mayaXXyana@rcXXn.com> schrieb im Newsbeitrag
news:OHhzDgdhKHA.1236@TK2MSFTNGP04.phx.gbl...

>   The VBSpeed samples are interesting, and Olaf's
> code seems to be the clear winner there, but they
> seem to be only testing short strings. And the results
> vary so much for different calls that I wonder what
> value the tests have. In real-world usage a few
> microseconds difference is an absurd measure, especially
> given that the function is seldom called more than a few
> times. I don't see how those samples could be useful
> unless they were repeated on a typical string of perhaps
> 100 KB.

The practical reason I wrote this highly optimized
Replace-routine was serverside code for dynamically
"glued together" WebPage-snippets (templates), some
of them below 1 or 2kByte, but some of them larger
(10-30kByte).
Building up such an "ready to post back" Webpage involved
quite an amount of such replacements - and this was a few
years back in time, where a "fast CPU" was one of the
Pentium III-class (with only about 500MHz-1GHz clock-freq).

And the performance gain, compared with the builtin
VB6-Replace-function was really worth it.

Nowadays the faster CPUs with their larger caches
seem to make its usage obsolete, and you're right,
the Mid-statement is not all that bad in comparison,
just look at Jost Schwiders entries into the list - though
there's a factor 2-4 anyways (on average), compared with the
higher optimized (Integer-Array-mapped) string routines
which follow his contributions.

From a short test, I can measure about the same difference
of factor 3-4, if I time the performance of your posted
routine with my approach (corrected and tuned somewhat
by Guido Beckmann).

Factor 4, when tested on my older Pentium III 500MHz,
and reduced to about factor 3, when compared on a
modern CPU.

And that is comparing a generically working Replace-
routine with a currently "somewhat specialized"
code, which acts on a "limited input-range" currently.

Since you already work with the Integer-array-mapping,
it would be consequent, to work "entirely within array-space",
also for the "copy-over" of the appropriate parts.

You could speedup your routine by a reasonable amount,
doing so.

Here comes a Class (based on Guidos latest corrections),
which I've just adapted a bit, to work without the Typelib
he was using - so that the code is ready for copy and paste:

'***Into a Class
Option Explicit

Private Type SafeArray1D
  cDims       As Integer
  fFeatures   As Integer
  cbElements  As Long
  cLocks      As Long
  pvData      As Long
  cElements1d   As Long
  lLBound     As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, PSrc&, Optional ByVal cb& = 4)
Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4)
Private Declare Sub RtlMoveMemory Lib "kernel32" _
  (dst As Any, src As Any, ByVal nBytes&)

Private Declare Function CharLowerBuffA& Lib "user32" _
  (lpsz As Any, ByVal cchLength&)
Private Declare Function CharLowerBuffW& Lib "user32" _
  (lpsz As Any, ByVal cchLength&)

Private aSrc%(), saSrc As SafeArray1D
Private aNew%(), saNew As SafeArray1D
Private aOld%(), saOld As SafeArray1D
Private aDst%(), saDst As SafeArray1D
Private aPosFnd&(), ubPosFnd&
Private aLowChars%(&H8000 To &H7FFF)

Friend Function Replace(Text As String, sOld As String, sNew As String, _
    Optional ByVal Start As Long = 1, _
    Optional ByVal Count As Long = 2147483647, _
    Optional ByVal Compare As VbCompareMethod = vbBinaryCompare _
  ) As String

Dim c&, i&, j&, cntCpy&
Dim cntFnd&, ptrSrc&, ptrDst&
Dim lenFnd&, lenSrc&, lenNew&, lenNewB&
Dim posFnd&, posOut&, posIn&
Dim ubFnd&, Fnd0%, fSameLen As Boolean

    lenSrc = Len(Text)
    lenNew = Len(sNew)
    lenFnd = Len(sOld)
    ubFnd = lenFnd - 1
    ptrSrc = StrPtr(Text)

    If lenSrc = 0 Then Exit Function
    If lenFnd = 0 Then Replace = Text: Exit Function
    If Start > 0 Then i = Start - 1

    saSrc.pvData = ptrSrc
    saOld.pvData = StrPtr(sOld)
    saNew.pvData = StrPtr(sNew)

    If lenFnd = lenNew Then
        fSameLen = True
        Replace = Text

        saDst.pvData = StrPtr(Replace)
'        ptrDst = StrPtr(Replace11)
'        saDst.pvData = ptrDst
    End If

    c = lenSrc - lenFnd
    If Compare = vbBinaryCompare Then

      Fnd0 = aOld(0)

      For i = i To c
          'Inline-Cascading for first Char
          If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then i = i + 1: _
              If aSrc(i) <> Fnd0 Then GoTo loopNxt

          If i > c Then Exit For

          'Search all others
          j = ubFnd
          Do While j
              If aSrc(i + j) <> aOld(j) Then GoTo loopNxt
              j = j - 1
          Loop

          cntFnd = cntFnd + 1
          'Found at Position i (0 based)
          If fSameLen Then
            j = lenNew
            Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j
          Else
            If cntFnd > ubPosFnd Then
              ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd)
            End If
            aPosFnd(cntFnd) = i * 2
          End If

          If cntFnd = Count Then Exit For
          i = i + ubFnd
loopNxt: Next i


    Else 'vbStringCompare

      Fnd0 = aLowChars(aOld(0))

      For i = i To c
          If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
              If aLowChars(aSrc(i)) <> Fnd0 Then GoTo loopNxt2

          If i > c Then Exit For

          'Search all others
          j = ubFnd
          Do While j
              If aLowChars(aSrc(i + j)) <> aLowChars(aOld(j)) Then
                GoTo loopNxt2
              End If
              j = j - 1
          Loop
          'Found at Position i (0 based)
          cntFnd = cntFnd + 1
          If fSameLen Then
            j = lenNew
            Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j
          Else
            If cntFnd > ubPosFnd Then
              ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd)
            End If
            aPosFnd(cntFnd) = i * 2
          End If
          If cntFnd = Count Then Exit For
          i = i + ubFnd
loopNxt2: Next i
    End If

    'Generate Output
    If fSameLen Then Exit Function

    If cntFnd = 0 Then
      Replace = Text
    Else
      c = lenSrc + (lenNew - lenFnd) * cntFnd
      Replace = Space(c)
      ptrDst = StrPtr(Replace)
      saDst.pvData = ptrDst

      lenFnd = lenFnd * 2
      If lenNew Then
        lenNewB = lenNew * 2
        For i = 1 To cntFnd
          posFnd = aPosFnd(i)
          cntCpy = posFnd - posIn

          If cntCpy > 50 Then
            RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy
            saDst.pvData = saDst.pvData + cntCpy
          ElseIf cntCpy > 0 Then
            j = cntCpy \ 2
            Do: j = j - 1: aDst(j) = aSrc(j): Loop While j
            saDst.pvData = saDst.pvData + cntCpy
          End If

          posIn = posFnd + lenFnd
          saSrc.pvData = ptrSrc + posIn

          If lenNew > 50 Then
            RtlMoveMemory ByVal saDst.pvData, ByVal saNew.pvData, lenNewB
          Else
            j = lenNew
            Do: j = j - 1: aDst(j) = aNew(j): Loop While j
          End If
          saDst.pvData = saDst.pvData + lenNewB
        Next i
      Else
        For i = 1 To cntFnd
          posFnd = aPosFnd(i)
          cntCpy = posFnd - posIn

          If cntCpy > 50 Then
            RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy
            saDst.pvData = saDst.pvData + cntCpy
          ElseIf cntCpy > 0 Then
            j = cntCpy \ 2
            Do: j = j - 1: aDst(j) = aSrc(j): Loop While j
            saDst.pvData = saDst.pvData + cntCpy
          End If

          posIn = posFnd + lenFnd
          saSrc.pvData = ptrSrc + posIn
        Next i
      End If

      c = lenSrc * 2 - posIn
      If c > 50 Then
        RtlMoveMemory aDst(0), aSrc(0), c
      ElseIf c > 0 Then
        c = c \ 2
        Do: c = c - 1: aDst(c) = aSrc(c): Loop While c
      End If
    End If
End Function


Private Sub Class_Initialize()
    Dim c&

    ubPosFnd = 512: ReDim aPosFnd(ubPosFnd)

    saSrc.cDims = 1
    saSrc.cbElements = 2
    saSrc.cElements1d = &H7FFFFFFF

    saNew = saSrc
    saOld = saSrc
    saDst = saSrc

    BindArray aSrc, VarPtr(saSrc)
    BindArray aNew, VarPtr(saNew)
    BindArray aOld, VarPtr(saOld)
    BindArray aDst, VarPtr(saDst)

    For c = -32768 To 32767: aLowChars(c) = c: Next c
    If CharLowerBuffW(aLowChars(-32768), &H10000) = 0 Then
      CharLowerBuffA aLowChars(65), (223 - 65) * 2
    End If

    ' added by donald, 20011210
    ' patch the stooges
    ' S 138/352   s 154/353
    ' O 140/338   o 156/339
    ' Z 142/381   z 158/382
    ' Y 159/376   � 255/255
    aLowChars(352) = 353
    aLowChars(338) = 339
    aLowChars(381) = 382
    aLowChars(376) = 255
End Sub

Private Sub Class_Terminate()
    ReleaseArray aSrc
    ReleaseArray aNew
    ReleaseArray aOld
    ReleaseArray aDst
End Sub




0
Reply Schmidt 12/26/2009 6:53:59 AM

  Thank you. That's very informative and good to know
for future reference.




0
Reply mayayana 12/26/2009 2:43:34 PM

"Donald Lessau" <don@oflex.com> wrote:

>Check at VBspeed. All functions here work the way you like it, and are 
>faster than Vb6 Replace:

It's funny.  Last night I tried several methods of replacing unwanted 
characters in a text file.  I used Replace and InStr/Mid$, both as text and 
using the Asc function.  I also tried doing a Select Case routine.

What I found was that a For Next loop using Replace was the fastest for 
eliminating characters in a text file.

What I did was write a bunch of routines all with the same basic things such 
as starting a timer and adding to a list box.  Except for the string 
replacement part all were identical.  I called them from a Command button and 
the results showed on my screen as they ran.  The extra Dims were in there 
because they were used in other routines and I wanted to be sure I was 
doing the exact same thing each time in each routine except for the actual 
replacement.

NOTE:  I replaced characters < "A" only because I wanted to replace a bunch of 
stuff in a typical text file, not because I plan to use this exact replacement 
scenario.  The Text1.Text box held 28k of text I copied from a text file. 

I think the most important parts are to put the entire text into a variable 
and to use Replace to replace the contents of that variable, that is b$ = the 
new b$.  I'm assuming that VB is not creating a new memory location for b$, 
but is reusing the space.  This probably accounts for the speed.  

This was the winning routine:

Sub MyReplace()
    Dim i%, j%, a$, b$, timex
    timex = Timer
    b$ = Text1.Text
    For j% = 1 To 64
        b$ = Replace(b$, Chr$(j%), "")
    Next
    For j% = 128 To 255
        b$ = Replace(b$, Chr$(j%), "")
    Next
    List1.AddItem CDec(Timer - timex)
End Sub

There was one thing faster and that was a line by line Replace function for 
each and every character.  That was so fast that the timer couldn't even 
register a change.

0
Reply sfdavidkaye2 12/27/2009 12:28:46 AM

"David Kaye" <sfdavidkaye2@yahoo.com> wrote

> I think the most important parts are to put the entire text into a variable
> and to use Replace to replace the contents of that variable, that is b$ = the
> new b$.  I'm assuming that VB is not creating a new memory location for b$,
> but is reusing the space.  This probably accounts for the speed.

That couldn't happen if the replacement was larger than what was replaced.

You have to remember that Replace is to be used in a wide variety of situations.
If the scope of your problem is very limited, you should be able to beat
VB's Replace statement hands down using knowlege of the particular situation.

See what you get for this method which does the same job as your post:

Function Scrub(Text As String) As String
Dim inc() As Byte
Dim txt() As Byte
Dim src As Long, dst As Long

  txt = Text
  inc = StrConv(String$(32, 0) & String$(96, 2) & Chr$(0), vbFromUnicode)
  Do While src < UBound(txt)
    txt(dst) = txt(src)
    src = src + 2
    dst = dst + inc(txt(dst))
  Loop
  Do While dst < UBound(txt)
    txt(dst) = 32
    dst = dst + 2
  Loop
  Scrub = Trim$(txt)
End Function



0
Reply Larry 12/27/2009 7:30:23 AM

"Larry Serflaten" <serflaten@usinternet.com> wrote:

>That couldn't happen if the replacement was larger than what was replaced.
>

Okay, I changed things a bit.  Instead of copying from Text1.Text, I opened a 
file for read access and input a blob of 80k from a text file.  So, now b$ is 
80k instead of 28k.  Using the routine I showed originally, it takes 0.091+ 
seconds (that is 9/100ths of a second) to replace unwanted characters with 
empty strings ("").

Interesting.  Well, I tried replacing chr$(j%) with chr$(j%) & chr$(j%), 
thereby adding one byte to each replacement, and one would think forcing VB 
to make another copy of b$ somewhere.  Oddly enough, the speed does not go up 
significantly. It changes to 0.093+.  Amazing.  

In my situation, Replace is not slow at all.

Let me try your example and see how it works.

0
Reply sfdavidkaye2 12/27/2009 8:14:34 AM

"Larry Serflaten" <serflaten@usinternet.com> wrote:

>See what you get for this method which does the same job as your post:
>
>Function Scrub(Text As String) As String
>Dim inc() As Byte
>Dim txt() As Byte
>Dim src As Long, dst As Long
etc


The function takes about half the time of my Replace function, but it's not 
doing the exact same thing.  I'll try it doing exactly the same and see if 
it's faster.  It does show promise, though.

0
Reply sfdavidkaye2 12/27/2009 8:26:36 AM

"Larry Serflaten" <serflaten@usinternet.com> wrote:

>See what you get for this method which does the same job as your post:
[....]
>  inc = StrConv(String$(32, 0) & String$(96, 2) & Chr$(0), vbFromUnicode)
[....]

I'm a bit confused about what the above string conversion does.  It appears to 
define a variable called "inc" as a Unicode string containing 32 bytes of a 
null string, 96 bytes of ASCII 2, and a single byte of the nullstring and 
convert the whole mess into plain ASCII.  

But then "inc" appears to be used later as a function.  

Can you explain this to me?

0
Reply sfdavidkaye2 12/27/2009 9:31:21 AM

"David Kaye" <sfdavidkaye2@yahoo.com> wrote in message 
news:hh79h9$rn9$3@news.eternal-september.org...
> "Larry Serflaten" <serflaten@usinternet.com> wrote:
>
>>See what you get for this method which does the same job as your post:
> [....]
>>  inc = StrConv(String$(32, 0) & String$(96, 2) & Chr$(0), vbFromUnicode)
> [....]
>
> I'm a bit confused about what the above string conversion does.  It 
> appears to
> define a variable called "inc" as a Unicode string containing 32 bytes of 
> a
> null string, 96 bytes of ASCII 2, and a single byte of the nullstring and
> convert the whole mess into plain ASCII.
>
> But then "inc" appears to be used later as a function.
>
> Can you explain this to me?

"inc" is a byte array. See the declaration. It's short for "increment". 
Larry used it to specify increment values, so for byte value 0-31, nothing 
is incremented so characters in that range are ignored, and for byte value 
32 to 127(96 characters), the count is incremented by 2 bytes, which is the 
size of one Unicode character. Essentially, it's another way of writing this 
code:

If x >= 32 And x <=128 Then
    dst = dst + 2
End If

The code doesn't accept byte values 128 to 255, and a runtime error would be 
generated in this case. You can avoid that by changing 96 to 224.


0
Reply Nobody 12/27/2009 10:39:21 AM

"David Kaye" <sfdavidkaye2@yahoo.com> schrieb im Newsbeitrag
news:hh69nu$ugl$1@news.eternal-september.org...

[String-Cleanup (advanced Trim-functionality)]
> NOTE:  I replaced characters < "A" only because
> I wanted to replace a bunch of stuff in a typical text file, ...

Careful with the range below Asc("A") - since you would also
cleanup all "Number-Digits" and many "wanted punctuations"
.... not sure, if that was your intent.

> This was the winning routine:
>
> Sub MyReplace()
>     Dim i%, j%, a$, b$, timex
>     timex = Timer
>     b$ = Text1.Text
>     For j% = 1 To 64
>         b$ = Replace(b$, Chr$(j%), "")
>     Next
>     For j% = 128 To 255
>         b$ = Replace(b$, Chr$(j%), "")
>     Next
>     List1.AddItem CDec(Timer - timex)
> End Sub

Repeated calls to Replace (scanning the string
over and over again, to replace different single-chars)
is "horribly inefficient" ... ;-)
Larrys approach with the lookup-array for the increments
is a nice (and much better) idea - but as Nobody also pointed
out - a direct checking per If ... or defining the replace-char-ranges
in a "Select Case Line" works also very well and fast - and can be
adapted easily as well to other ranges.

Here comes a Sub, which does some cleanup (all chars
between Asc 0-32 are removed) - directly on (within) the
given Input-String - thereby avoiding *any* additional
memory-allocations in the routine, which is, what Larrys
routine slows down somewhat (about factor 3 in comparison).

You will need to compile it to native-code (all advanced options
set), to get "full-speed" - best to place such stuff in a Dll-Class -
in that case (having the Routine in a Class) - you could move
the aSrc/saSrc array-mapping-pair out of the routine - and
declare it at the class-level (Binding and releasing could then
be done in Class-Initialize/Terminate).


'***Into a Form, then click the Form
Option Explicit

Private Type SafeArray1D
  cDims       As Integer
  fFeatures   As Integer
  cbElements  As Long
  cLocks      As Long
  pvData      As Long
  cElements1d As Long
  lLBound1d   As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, PSrc&, Optional ByVal cb& = 4)
Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4)
Private Declare Sub RtlMoveMemory Lib "kernel32" _
  (dst As Any, src As Any, ByVal nBytes&)


Private Declare Function QueryPerformanceFrequency& Lib "kernel32" (x@)
Private Declare Function QueryPerformanceCounter& Lib "kernel32" (x@)


Private Sub Form_Click()
Dim i&, T#, S$
  S = "   abc" & vbTab & "123" & vbCrLf & "ABC" & Chr(0) & "123  "
  For i = 1 To 16
    S = S & S 'results in an about 1.2MB test-string
  Next i

  T = HPTimer
     Cleanup S
  Caption = CLng((HPTimer - T) * 1000)
End Sub

Sub Cleanup(Text As String)
Dim i&, j&, aSrc%(), saSrc As SafeArray1D
  saSrc.cDims = 1
  saSrc.cbElements = 2 'the width of an 16Bit-Integer
  saSrc.cElements1d = Len(Text) + 2 'two more, to reflect the LBound
  saSrc.lLBound1d = -2 'include the 4 Len-Info-Bytes of the BSTR
  saSrc.pvData = StrPtr(Text) - 4 'adapt to the real start of the BSTR

  If saSrc.cElements1d = 2 Then Exit Sub 'nothing to replace

  BindArray aSrc, VarPtr(saSrc)

    For i = 0 To UBound(aSrc)
      Select Case aSrc(i)
        Case 0 To 32 '<-- define the ignored Char-ranges here...
        Case Else: aSrc(j) = aSrc(i): j = j + 1
      End Select
    Next i
    RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust the new Len-Info

  ReleaseArray aSrc
End Sub

Private Function HPTimer#()
Dim x@: Static Frq@
  If Frq = 0 Then QueryPerformanceFrequency Frq
  If QueryPerformanceCounter(x) Then HPTimer = x / Frq
End Function

Olaf


0
Reply Schmidt 12/27/2009 12:42:29 PM

"David Kaye" <sfdavidkaye2@yahoo.com> wrote in message 
news:hh751a$rn9$1@news.eternal-september.org...

> [1] I think the most important parts are to put the entire text
> into a variable and to use Replace to replace the contents of
> that variable, that is b$ = the new b$.  I'm assuming that VB
> is not creating a new memory location for b$, but is reusing
> the space.
> [2] Well, I tried replacing chr$(j%) with chr$(j%) & chr$(j%),
> thereby adding one byte to each replacement, and one would
> think forcing VB to make another copy of b$ somewhere.
> Oddly enough, the speed does not go up significantly.
> It changes to 0.093+.  Amazing.

Actually I think VB itself almost always creates a new string in a new area 
of memory, even for simple things. For example, if you have:

    Dim s1 As String, s2 As String
    s1 = Space$(10000000)

.. . . and if you then time a simple assignment you will probably find that 
s1 = s1 takes the same time as s2 = s1.

Mike



0
Reply Mike 12/27/2009 1:02:27 PM

"Schmidt" <sss@online.de> wrote in message 
news:uisoAOvhKHA.5608@TK2MSFTNGP05.phx.gbl...

> RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust the new Len-Info

Hi Olaf. I'd often thought of doing that myself (reducing the recorded 
String length behind VB's back) for various jobs I've tackled but I've 
always worried about any detrimental effects it might have when VB performs 
its garbage collection and I wondered whether there is a risk of leaving 
chunks of memory unused but unavailable. Do you know if there are actually 
any problems in that area, or if it is actually okay to do it?

Mike



0
Reply Mike 12/27/2009 1:27:33 PM

"Mike Williams" <Mike@WhiskyAndCoke.com> schrieb im Newsbeitrag
news:%23lNvThvhKHA.1824@TK2MSFTNGP04.phx.gbl...
>
> "Schmidt" <sss@online.de> wrote in message
> news:uisoAOvhKHA.5608@TK2MSFTNGP05.phx.gbl...
>
> > RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust the new Len-Info
>
> I'd often thought of doing that myself (reducing the recorded
> String length behind VB's back) for various jobs I've tackled
> but I've always worried about any detrimental effects
> it might have when VB performs its garbage collection and
> I wondered whether there is a risk of leaving chunks of memory
> unused but unavailable. Do you know if there are actually
> any problems in that area, or if it is actually okay to do it?

Good you asked that - since the above line really looks
"suspicious" and needs explanation - but after all, all the
VB-Strings are allocated on the heap - and freed after going
"out of scope" with the normal heap-pointer-based freeing-
methods of the system (StrPtr(S) - 4 is the heap-pointer).
These freeing-methods have not the slightest interests,
regarding what is placed as the LenB-info in the first 4Bytes
of a (formerly) allocated "heap-area" (the BString).

The only mistake one can make is, to put a LenB-info
into the first 4 bytes, which exceeds the allocated range
(whilst working with such a manipulated string).

Aside from that, one can use that "trick" for example,
to write a very fast word-parser, passing "normal looking"
String-Parameters into SubRoutines as e.g.:
ProcessFoundWord(NewWord As String)

....where NewWord is such a "large enough defined"
String (defined as e.g. 16kByte String-buffer at class-level,
having an integer-array-mapping on it),
which is then "shortened" dynamically, by setting the
aMapIntegerArr(-2) member, to the LenB of the currently
found word (after copying the few char-integers which
make up this word, over from the input-source-array).
Meaning, one can "translate" from "array-space" into
"string-space" very fast, using this technique on a
predefined, large enough StringBuffer (declared in a *.bas
or at Class-Level) - thereby avoiding all these many
small String-allocations, to feed eventual "SubParsing-routines"
with "VB-String-parts".

I'm using that for a long time now, and never found any
"weirdness".

Oh, and BTW - VB does not have a garbage-collector
(in the "decoupled  working" sense) - VB-"Heap-Variables"
(be it Objects or Strings or Arrays) are released (heap-freed)
as soon as they go out of their defined scope ...
e.g. on procedure-exit (if declared at routine-level).

And another note with regards to the SafeArray-mapping-
technique, which is useful not only on BSTR-content...

Most of these "optimized routines", be it on Pixel-Content
or on String-content, are compiled with "all advanced"
compiler options, to fulfill the "speed-promise" which one
expects by using this stuff.

Checking *all* advanced compiler-options works well with
that in nearly all cases (don't panic <g>), the critical point
comes in, if one is using a "dynamic switching" of the
safeArray.pvData member *within* a routine (to span the
array in question over another memory-range of a different
"heap-variable").

In that case (switching the pvData-pointer within the
same routine to other areas) one needs to deactivate (uncheck)
the first advanced compiler-option (the aliasing-option), to
be on the safe side with the "safe-array-stuff".

Maybe as general rule: as soon as safe-array-mappings are used,
always leave the first option deactivated - this will cost only
3-5% performance in some routines (not in each and every one).

Olaf



0
Reply Schmidt 12/27/2009 2:30:56 PM

"Schmidt" <sss@online.de> wrote in message 
news:O3dVnKwhKHA.1824@TK2MSFTNGP04.phx.gbl...

> Good you asked that - since the above line really
> look "suspicious" and needs explanation - but after
> all, all the VB-Strings are allocated on the heap - and
> freed after going "out of scope" with the normal
> heap-pointer-based freeing- methods of the system
> (StrPtr(S) - 4 is the heap-pointer). These freeing-methods
> have not the slightest interests, regarding what is placed
> as the LenB-info in the first 4Bytes of a (formerly)
> allocated "heap-area" (the BString).

Thanks Olaf. That's cleared up my concerns regarding reducing the length 
info.

Mike



0
Reply Mike 12/27/2009 3:49:51 PM

"David Kaye" <sfdavidkaye2@yahoo.com> wrote
> "Larry Serflaten" <serflaten@usinternet.com> wrote:
>
> >See what you get for this method which does the same job as your post:
> [....]
> >  inc = StrConv(String$(32, 0) & String$(96, 2) & Chr$(0), vbFromUnicode)
> [....]
>
> I'm a bit confused about what the above string conversion does.  It appears to
> define a variable called "inc" as a Unicode string containing 32 bytes of a
> null string, 96 bytes of ASCII 2, and a single byte of the nullstring and
> convert the whole mess into plain ASCII.
>
> But then "inc" appears to be used later as a function.
>
> Can you explain this to me?


I just wanted to keep the code short.  As shown inc was declared as a
dynamic Byte array.  The part that StrConv replaces would look similar to:

ReDim inc(0 To 255) As Byte
For idx = 32 to 126
    inc(idx) = 2
Next

Inc is basically a second array to indicate which characters are acceptable
and which are not. For those characters out of the 32 to 126 range, the
dst 'pointer' would not be incremented:

    txt(dst) = txt(src)
    src = src + 2
    dst = dst + inc(txt(dst))

The input text was converted to a dynamic Byte array (which for very large
strings can be a significant delay) assigned to txt. The src variable visits
every other element (because strings are Unicode), and assigns the value
back to the txt array at the dst location.  [ txt(dst) = txt(src) ]

src is then incremented by 2 in preparation for the next loop, and dst
is either left where it is or is incremented by 2 depending on the value
in the inc array.  [ dst = dst + inc(txt(dst)) ]

inc has 0 at character offsets that are ignored, and 2 at character offsets
that are accepted such that dst is only increment when txt(dst) is an
acceptable character.

You were correct to say that is did not do the exact same thing, your
routine convered Asc values from 0 to 255 while my earlier post only
covered Asc values from 0 to 127.  Nobody posted a fix that would
extend the inc array out to cover those I left out.  For example:

  inc = StrConv(String$(32, 0) & String$(96, 2) & String(128, 0), vbFromUnicode)

Which that change, it should reproduce the results you get with your routine.

LFS


0
Reply Larry 12/27/2009 5:25:40 PM

Based on what I have learned from this discussion, for my current needs, it 
seems that working with a byte array is faster to do the scan and replace 
operations.
So now I am looking for a fast byte array replace that will replace longer 
(expand byte array) or shorter (shorten byte array) routines.
Converting from string to byte array and back is done only at the beginning 
and end of the scan and replace op that I need to do.  And those conversion 
ops seem plenty fast like 2 mSec on my PC.

Thank you for the great learning experience.
This has been highly educational.


"Schmidt" wrote:

> 
> "mayayana" <mayaXXyana@rcXXn.com> schrieb im Newsbeitrag
> news:OHhzDgdhKHA.1236@TK2MSFTNGP04.phx.gbl...
> 
> >   The VBSpeed samples are interesting, and Olaf's
> > code seems to be the clear winner there, but they
> > seem to be only testing short strings. And the results
> > vary so much for different calls that I wonder what
> > value the tests have. In real-world usage a few
> > microseconds difference is an absurd measure, especially
> > given that the function is seldom called more than a few
> > times. I don't see how those samples could be useful
> > unless they were repeated on a typical string of perhaps
> > 100 KB.
> 
> The practical reason I wrote this highly optimized
> Replace-routine was serverside code for dynamically
> "glued together" WebPage-snippets (templates), some
> of them below 1 or 2kByte, but some of them larger
> (10-30kByte).
> Building up such an "ready to post back" Webpage involved
> quite an amount of such replacements - and this was a few
> years back in time, where a "fast CPU" was one of the
> Pentium III-class (with only about 500MHz-1GHz clock-freq).
> 
> And the performance gain, compared with the builtin
> VB6-Replace-function was really worth it.
> 
> Nowadays the faster CPUs with their larger caches
> seem to make its usage obsolete, and you're right,
> the Mid-statement is not all that bad in comparison,
> just look at Jost Schwiders entries into the list - though
> there's a factor 2-4 anyways (on average), compared with the
> higher optimized (Integer-Array-mapped) string routines
> which follow his contributions.
> 
> From a short test, I can measure about the same difference
> of factor 3-4, if I time the performance of your posted
> routine with my approach (corrected and tuned somewhat
> by Guido Beckmann).
> 
> Factor 4, when tested on my older Pentium III 500MHz,
> and reduced to about factor 3, when compared on a
> modern CPU.
> 
> And that is comparing a generically working Replace-
> routine with a currently "somewhat specialized"
> code, which acts on a "limited input-range" currently.
> 
> Since you already work with the Integer-array-mapping,
> it would be consequent, to work "entirely within array-space",
> also for the "copy-over" of the appropriate parts.
> 
> You could speedup your routine by a reasonable amount,
> doing so.
> 
> Here comes a Class (based on Guidos latest corrections),
> which I've just adapted a bit, to work without the Typelib
> he was using - so that the code is ready for copy and paste:
> 
> '***Into a Class
> Option Explicit
> 
> Private Type SafeArray1D
>   cDims       As Integer
>   fFeatures   As Integer
>   cbElements  As Long
>   cLocks      As Long
>   pvData      As Long
>   cElements1d   As Long
>   lLBound     As Long
> End Type
> 
> Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
>   (PArr() As Any, PSrc&, Optional ByVal cb& = 4)
> Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
>   (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4)
> Private Declare Sub RtlMoveMemory Lib "kernel32" _
>   (dst As Any, src As Any, ByVal nBytes&)
> 
> Private Declare Function CharLowerBuffA& Lib "user32" _
>   (lpsz As Any, ByVal cchLength&)
> Private Declare Function CharLowerBuffW& Lib "user32" _
>   (lpsz As Any, ByVal cchLength&)
> 
> Private aSrc%(), saSrc As SafeArray1D
> Private aNew%(), saNew As SafeArray1D
> Private aOld%(), saOld As SafeArray1D
> Private aDst%(), saDst As SafeArray1D
> Private aPosFnd&(), ubPosFnd&
> Private aLowChars%(&H8000 To &H7FFF)
> 
> Friend Function Replace(Text As String, sOld As String, sNew As String, _
>     Optional ByVal Start As Long = 1, _
>     Optional ByVal Count As Long = 2147483647, _
>     Optional ByVal Compare As VbCompareMethod = vbBinaryCompare _
>   ) As String
> 
> Dim c&, i&, j&, cntCpy&
> Dim cntFnd&, ptrSrc&, ptrDst&
> Dim lenFnd&, lenSrc&, lenNew&, lenNewB&
> Dim posFnd&, posOut&, posIn&
> Dim ubFnd&, Fnd0%, fSameLen As Boolean
> 
>     lenSrc = Len(Text)
>     lenNew = Len(sNew)
>     lenFnd = Len(sOld)
>     ubFnd = lenFnd - 1
>     ptrSrc = StrPtr(Text)
> 
>     If lenSrc = 0 Then Exit Function
>     If lenFnd = 0 Then Replace = Text: Exit Function
>     If Start > 0 Then i = Start - 1
> 
>     saSrc.pvData = ptrSrc
>     saOld.pvData = StrPtr(sOld)
>     saNew.pvData = StrPtr(sNew)
> 
>     If lenFnd = lenNew Then
>         fSameLen = True
>         Replace = Text
> 
>         saDst.pvData = StrPtr(Replace)
> '        ptrDst = StrPtr(Replace11)
> '        saDst.pvData = ptrDst
>     End If
> 
>     c = lenSrc - lenFnd
>     If Compare = vbBinaryCompare Then
> 
>       Fnd0 = aOld(0)
> 
>       For i = i To c
>           'Inline-Cascading for first Char
>           If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then i = i + 1: _
>               If aSrc(i) <> Fnd0 Then GoTo loopNxt
> 
>           If i > c Then Exit For
> 
>           'Search all others
>           j = ubFnd
>           Do While j
>               If aSrc(i + j) <> aOld(j) Then GoTo loopNxt
>               j = j - 1
>           Loop
> 
>           cntFnd = cntFnd + 1
>           'Found at Position i (0 based)
>           If fSameLen Then
>             j = lenNew
>             Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j
>           Else
>             If cntFnd > ubPosFnd Then
>               ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd)
>             End If
>             aPosFnd(cntFnd) = i * 2
>           End If
> 
>           If cntFnd = Count Then Exit For
>           i = i + ubFnd
> loopNxt: Next i
> 
> 
>     Else 'vbStringCompare
> 
>       Fnd0 = aLowChars(aOld(0))
> 
>       For i = i To c
>           If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then i = i + 1: _
>               If aLowChars(aSrc(i)) <> Fnd0 Then GoTo loopNxt2
> 
>           If i > c Then Exit For
> 
>           'Search all others
>           j = ubFnd
>           Do While j
>               If aLowChars(aSrc(i + j)) <> aLowChars(aOld(j)) Then
>                 GoTo loopNxt2
>               End If
>               j = j - 1
>           Loop
>           'Found at Position i (0 based)
>           cntFnd = cntFnd + 1
>           If fSameLen Then
>             j = lenNew
>             Do: j = j - 1: aDst(i + j) = aNew(j): Loop While j
>           Else
>             If cntFnd > ubPosFnd Then
>               ubPosFnd = ubPosFnd + 512: ReDim Preserve aPosFnd(ubPosFnd)
>             End If
>             aPosFnd(cntFnd) = i * 2
>           End If
>           If cntFnd = Count Then Exit For
>           i = i + ubFnd
> loopNxt2: Next i
>     End If
> 
>     'Generate Output
>     If fSameLen Then Exit Function
> 
>     If cntFnd = 0 Then
>       Replace = Text
>     Else
>       c = lenSrc + (lenNew - lenFnd) * cntFnd
>       Replace = Space(c)
>       ptrDst = StrPtr(Replace)
>       saDst.pvData = ptrDst
> 
>       lenFnd = lenFnd * 2
>       If lenNew Then
>         lenNewB = lenNew * 2
>         For i = 1 To cntFnd
>           posFnd = aPosFnd(i)
>           cntCpy = posFnd - posIn
> 
>           If cntCpy > 50 Then
>             RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy
>             saDst.pvData = saDst.pvData + cntCpy
>           ElseIf cntCpy > 0 Then
>             j = cntCpy \ 2
>             Do: j = j - 1: aDst(j) = aSrc(j): Loop While j
>             saDst.pvData = saDst.pvData + cntCpy
>           End If
> 
>           posIn = posFnd + lenFnd
>           saSrc.pvData = ptrSrc + posIn
> 
>           If lenNew > 50 Then
>             RtlMoveMemory ByVal saDst.pvData, ByVal saNew.pvData, lenNewB
>           Else
>             j = lenNew
>             Do: j = j - 1: aDst(j) = aNew(j): Loop While j
>           End If
>           saDst.pvData = saDst.pvData + lenNewB
>         Next i
>       Else
>         For i = 1 To cntFnd
>           posFnd = aPosFnd(i)
>           cntCpy = posFnd - posIn
> 
>           If cntCpy > 50 Then
>             RtlMoveMemory ByVal saDst.pvData, ByVal saSrc.pvData, cntCpy
>             saDst.pvData = saDst.pvData + cntCpy
>           ElseIf cntCpy > 0 Then
>             j = cntCpy \ 2
>             Do: j = j - 1: aDst(j) = aSrc(j): Loop While j
>             saDst.pvData = saDst.pvData + cntCpy
>           End If
> 
>           posIn = posFnd + lenFnd
>           saSrc.pvData = ptrSrc + posIn
>         Next i
>       End If
> 
>       c = lenSrc * 2 - posIn
>       If c > 50 Then
>         RtlMoveMemory aDst(0), aSrc(0), c
>       ElseIf c > 0 Then
>         c = c \ 2
>         Do: c = c - 1: aDst(c) = aSrc(c): Loop While c
>       End If
>     End If
> End Function
> 
> 
> Private Sub Class_Initialize()
>     Dim c&
> 
>     ubPosFnd = 512: ReDim aPosFnd(ubPosFnd)
> 
>     saSrc.cDims = 1
>     saSrc.cbElements = 2
>     saSrc.cElements1d = &H7FFFFFFF
> 
>     saNew = saSrc
>     saOld = saSrc
>     saDst = saSrc
> 
>     BindArray aSrc, VarPtr(saSrc)
>     BindArray aNew, VarPtr(saNew)
>     BindArray aOld, VarPtr(saOld)
>     BindArray aDst, VarPtr(saDst)
> 
>     For c = -32768 To 32767: aLowChars(c) = c: Next c
>     If CharLowerBuffW(aLowChars(-32768), &H10000) = 0 Then
>       CharLowerBuffA aLowChars(65), (223 - 65) * 2
>     End If
> 
>     ' added by donald, 20011210
0
Reply Utf 12/27/2009 6:34:01 PM

"Bee" <Bee@discussions.microsoft.com> schrieb im Newsbeitrag
news:68BAA7A1-BF2E-4042-8FA3-F9A4845FC29B@microsoft.com...

> Based on what I have learned from this discussion,
> for my current needs, ...
Would be good, to have a better description of what your
needs really are - is it the same "vbCr-replacement"-problem
you described earlier - or some new stuff?
With a bit more background better suggestions are possible.

> ...it seems that working with a byte array is faster to do
> the scan and replace operations.
Mostly yes, but one has to decide, if the effort worth it,
to write each and every "specialized string-routine" based
on Byte- or 16Bit-Integer-Arrays.
IMO one does not need to "go there" that often - VBs builtin
stuff is sufficient in nearly all cases.


> So now I am looking for a fast byte array replace that will replace
> longer (expand byte array) or shorter (shorten byte array) routines.
This again raises the question, what you're doing (trying) exactly -
and why you're looking for more performance ... maybe the
bottleneck is at some other end...

> Converting from string to byte array and back is done only at
> the beginning and end of the scan and replace op that I need
> to do.  And those conversion ops seem plenty fast like 2 mSec
> on my PC.

The Demos, which use/show the SafeArray-mapping, avoid
explicit conversions to and from ByteArrays - they
span something like a "virtual array" over an already
existing String-Content.
That said, in case one does it explicitely (as copy) before/after
processing a "heavy routine" (especially if the passed string-
parameter is a larger one) -  then the overhead, compared with
safearray-mapping is not all that large.

The decision, if you should work with ByteArrays throughout
the whole process depends on your real problem, e.g. if you
start on FileContent, which was safed as ANSI ... but also on
how comfortable you want to code (also reflecting a bit on
"code-readability" later on) - and of course which performance
requirements need to be met.

The Mapping-approach is nice insofar as you could switch
into (virtual) array-mode only *within* a few (bottleneck)
routines maybe - and (aside from these few exceptions)
always work with normally passed VB-String-Parameters
through the larger process in question.

Olaf


0
Reply Schmidt 12/27/2009 8:00:14 PM

"Schmidt" <sss@online.de> wrote:

>Careful with the range below Asc("A") - since you would also
>cleanup all "Number-Digits" and many "wanted punctuations"
>.... not sure, if that was your intent.

You don't understand.  I specifically said in my post that my test was not the 
final search/replace I was going to use.  I think I also said that I was using 
it to have enough characters to search/replace for a test.  

>Repeated calls to Replace (scanning the string
>over and over again, to replace different single-chars)
>is "horribly inefficient" ... ;-)

My experience wasn't like that at all, which was my point.  An 80k file with 
about 5k of replacements or deletions took only 0.009 or 9/100ths of a second. 
 To me that's fast enough, not as fast as the byte array increment thing, but 
plenty fast.  

0
Reply sfdavidkaye2 12/27/2009 9:15:32 PM

"Bee" <Bee@discussions.microsoft.com> wrote in message 
news:68BAA7A1-BF2E-4042-8FA3-F9A4845FC29B@microsoft.com...

> Based on what I have learned from this discussion, for my
> current needs, it seems that working with a byte array is faster
> to do the scan and replace operations. So now I am looking
> for a fast byte array replace that will replace longer (expand
> byte array) or shorter (shorten byte array) routines.

Reading between the lines of your various posts in this and other related 
threads it appears that you want to perform a large number of different 
search and replace operations on a very large block of text data and that 
you are considering performing these operations one at a time, each time 
returning a suitably modified (and possibly changed in size) copy of the 
data before then performing the next operation on it, and in order to speed 
up the entire operation you are attempting to squeeze as much speed as 
possible out of your "search and replace" operation in the hope that your 
repeated calls to it for the different tasks will speed up the overall 
process. At least that's what I gather from the things you have posted. If 
that is the case then perhaps it is time to think about different ways of 
doing it because no matter how fast you make the search and replace 
operation the overall job will be slowed down a lot by your repeated calls 
to it and the repeated return of large mofified blocks of data. It would be 
far better to use even a relatively slow but flexible search and replace 
operation to start with, but to use it in such a way that it performs all of 
your otherwise separate operations (or at least as many of them as possible) 
all at the same time, returning the modifed copy of the data just once at 
the end. You can't always do this entirely of course, but in general you 
should aim for code that overall reduces the number of times it returns a 
modified copy of the data to a minimum.

Mike



0
Reply Mike 12/27/2009 9:19:07 PM

Thank you for your response.
I am loading a notepad "compatible" file from disk (shows extra control 
characters as boxes, etc).
It may or may not be totally pure printable text.
I need to clean out all non-printing characters other than the Tab, CR and LF.
I need to make proper paragraphs.
So I look for non-end-of-sentence with a CRLF near after and remove the CRLF 
and other whitespace and replace with a space if necessary.
So I scan forward then back up through the text and do a replace as necessary.
I also look for other characters that I need to change or delete.

Converting to and from a byte array is very fast.
I think this is legal.
  Dim aByte() as byte
  aByte=sString       ' to byte array
  work on the byte array
  sString = StrConv(aByte, vbUnicode)  ' back to string

Looping looking through a byte array is much faster so far for that part.
Instr (documented?) works with a byte array as does Mid or the right side.
A$=Mid(aByte,l,n)   ' interesting  
All I think I need now is a replace that is fast that works with the byte 
array.
I seem to remember some CopyMemory rotines that can work for a replace.
I need to find some that will work here.

Currently, with a very fast InString and Replace string routine the 1M text 
file takes over a minute to process.  The loop looks at a string. This is OK 
but I really would like to stretch my knowledge.  So for now I have a fast 
Instring and Replace for strings; even a fast Like.
I timed parts of the process and find that the For Next loop looking at sChr 
was very slow.  Doing the same on a byte arrays was quite fast.
So I am hoping that combining the byte array For Next scan and the other 
byte array ops needed will be as fast as I can get.
My nerons need greasing.

0
Reply Utf 12/27/2009 9:41:01 PM

"Bee" <Bee@discussions.microsoft.com> wrote in message 
news:5F3E878D-D34E-456B-9CB1-18AAA879136C@microsoft.com...

> Converting to and from a byte array is very fast.
> I think this is legal.
>  Dim aByte() as byte
>  aByte=sString       ' to byte array
>  work on the byte array
>  sString = StrConv(aByte, vbUnicode)  ' back to string

Well it's legal, but it doesn't do what you appear to think it does. Rather 
than explain what is going on I think it might be more instructive to just 
show you the result and then allow you to work out for yourself what is 
going on (remembering that a VB String has two bytes per character), so that 
you can post again if you can't work it out. Try this, which is your above 
code with an actual test string included:

Dim sString As String
sString = "abcd"
Dim aByte() As Byte
Print sString, Len(sString)
aByte = sString     ' to byte array
'work on the byte array
sString = StrConv(aByte, vbUnicode)  ' back to string
Print sString, Len(sString)

Mike



0
Reply Mike 12/27/2009 11:22:18 PM

"David Kaye" <sfdavidkaye2@yahoo.com> schrieb im Newsbeitrag
news:hh8ipi$42m$1@news.eternal-september.org...

> >Repeated calls to Replace (scanning the string
> >over and over again, to replace different single-chars)
> >is "horribly inefficient" ... ;-)
>
> My experience wasn't like that at all, which was my point.
> An 80k file with about 5k of replacements or deletions took
> only 0.009 or 9/100ths of a second.

You probably mean "0.090 or 9/100ths of a second".

And yes, that matches with the first results in the following
test-routines (on a modern CPU) - nonetheless the
"repeated replaces approach" is already about 50 times
slower than Larrys routine on these smaller Input-Lenghts
(as your ~80kByte). But 90msec is already near the
human "i can feel it"-barrier (of 1/10ths of a second)...

And on larger Input in the MegaByte-range, the influence
of the OLE-BSTR-caching has a much lesser effect ...
on a ~2.5MB input-string your routine is then already
about 160 times slower, needing about 13 seconds...
so be a bit patient, until the Demo below has finished.

Just test this yourself with the below copy and paste-code.
(compile to native code with all advanced options please,
 to see the real difference).

'***Into a Form
Option Explicit

Private Type SafeArray1D
  cDims       As Integer
  fFeatures   As Integer
  cbElements  As Long
  cLocks      As Long
  pvData      As Long
  cElements1d As Long
  lLBound1d   As Long
End Type

Private Declare Sub BindArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, PSrc&, Optional ByVal cb& = 4)
Private Declare Sub ReleaseArray Lib "kernel32" Alias "RtlMoveMemory" _
  (PArr() As Any, Optional PSrc& = 0, Optional ByVal cb& = 4)
Private Declare Sub RtlMoveMemory Lib "kernel32" _
  (dst As Any, src As Any, ByVal nBytes&)


Private Declare Function QueryPerformanceFrequency& Lib "kernel32" (x@)
Private Declare Function QueryPerformanceCounter& Lib "kernel32" (x@)


Private Sub Form_Load()
  AutoRedraw = True
  Caption = "Click the Form"
End Sub

Private Sub Form_Click()
Dim i&, T#, S$, S1$, S2$, S3$

  S = "   abc" & vbTab & "123" & vbCrLf & "ABC" & Chr(0) & "123  "
  For i = 1 To 12
    S = S & S 'results in an about 80kByte test-string
  Next i

  S1 = S
  S2 = S
  S3 = S

  Print "InputLen:", Len(S), vbCrLf

  T = HPTimer
    ScrubUsingReplace S1
  Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1)

  T = HPTimer
    ScrubUsingLookupTable S2
  Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2)

  T = HPTimer
    ScrubUsingSafeArray S3
  Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3)


  '******* and the same thing again with larger Input ********
  For i = 13 To 17
    S = S & S 'results in an about 2.5MByte test-string
  Next i

  S1 = S
  S2 = S
  S3 = S

  Print
  Print "Now we leave the efficiency-range of the OLE-BSTR-cache..."
  Print "InputLen:", Len(S), vbCrLf

  T = HPTimer
    ScrubUsingReplace S1
  Print "ScrubUsingReplace", Round((HPTimer - T) * 1000, 2), Len(S1)

  T = HPTimer
    ScrubUsingLookupTable S2
  Print "ScrubUsingLookupTable", Round((HPTimer - T) * 1000, 2), Len(S2)

  T = HPTimer
    ScrubUsingSafeArray S3
  Print "ScrubUsingSafeArray", Round((HPTimer - T) * 1000, 2), Len(S3)

  Print
End Sub

Private Sub ScrubUsingReplace(Text As String)
Dim j%
  For j% = 0 To 64
    Text$ = Replace(Text$, Chr$(j%), "")
  Next
  For j% = 128 To 255
    Text$ = Replace(Text$, Chr$(j%), "")
  Next
End Sub

Private Sub ScrubUsingLookupTable(Text As String)
Dim inc() As Byte, txt() As Byte
Dim i As Long, src As Long, dst As Long

  txt = Text

  ReDim inc(255)
  For i = 0 To UBound(inc)
    If i > 64 And i < 128 Then inc(i) = 2
  Next i

  Do While src < UBound(txt)
    txt(dst) = txt(src)
    src = src + 2
    dst = dst + inc(txt(dst))
  Loop
  Do While dst < UBound(txt)
    txt(dst) = 32
    dst = dst + 2
  Loop
  Text = Trim$(txt)
End Sub

Private Sub ScrubUsingSafeArray(Text As String)
Dim i&, j&, aSrc%(), saSrc As SafeArray1D
  saSrc.cDims = 1
  saSrc.cbElements = 2 'the width of an 16Bit-Integer
  saSrc.cElements1d = Len(Text) + 2 'two more, to reflect the LBound
  saSrc.lLBound1d = -2 'include the 4 Len-Info-Bytes of the BSTR
  saSrc.pvData = StrPtr(Text) - 4 'adapt to the real start of the BSTR

  If saSrc.cElements1d = 2 Then Exit Sub 'nothing to replace

  BindArray aSrc, VarPtr(saSrc)

    For i = 0 To UBound(aSrc)
      Select Case aSrc(i)
        Case Is < 65, Is > 127 '<-- define the scrubbed Char-ranges here...
        Case Else: aSrc(j) = aSrc(i): j = j + 1
      End Select
    Next i
    RtlMoveMemory aSrc(-2), CLng(j + j), 4 'adjust to the new Len-Info

  ReleaseArray aSrc
End Sub

Private Function HPTimer#()
Dim x@: Static Frq@
  If Frq = 0 Then QueryPerformanceFrequency Frq
  If QueryPerformanceCounter(x) Then HPTimer = x / Frq
End Function

Olaf


0
Reply Schmidt 12/28/2009 1:30:22 AM

"Bee" <Bee@discussions.microsoft.com> schrieb im Newsbeitrag
news:5F3E878D-D34E-456B-9CB1-18AAA879136C@microsoft.com...

> I am loading a notepad "compatible" file from disk (shows extra
> control characters as boxes, etc).
> It may or may not be totally pure printable text.
> I need to clean out all non-printing characters other than
> the Tab, CR and LF.
> I need to make proper paragraphs.
> So I look for non-end-of-sentence with a CRLF near after
> and remove the CRLF and other whitespace and replace
> with a space if necessary.
> So I scan forward then back up through the text and do
> a replace as necessary.
> I also look for other characters that I need to change or delete.
For such "pretty formatting" tasks it is difficult to give
concrete advise - cant you just post your current code?

> I think this is legal.
>   Dim aByte() as byte
>   aByte=sString       ' to byte array
>   work on the byte array
>   sString = StrConv(aByte, vbUnicode)  ' back to string

Nope, as Mike already pointed out, the correct pairing
would either be:
aByte = sString 'two Bytes per char (no ANSI-conversion)
....
sString = aByte 'two Bytes per Char back-conversion

or ANSI-based (ByteArray-StepWidth = 1)
aByte = StrConv(sString, vbFromUnicode)
....
sString = StrConv(aByte, vbUnicode)

> Currently, with a very fast InString and Replace string
> routine the 1M text file takes over a minute to process.
That's pretty much for an 1MB-input, yes.
As said, please post some code, regarding what you
currently do - would be easier than "guessing".
Aside from that, I'd probably split that up into two
scans - the first one doing all single-char-cleanups
(replacements with "nothing", using Larrys Lookup-
approach).
And in the second run over the already roughly cleaned
up String, I'd try to ensure your "pretty formatting-stuff".

Olaf


0
Reply Schmidt 12/28/2009 1:54:10 AM

I have something working now using strings.
It is now taking about 40 secs for the 1M file.
It used to be many many minutes.

I am working on the ReplaceInByteArray() routine.
I will post that tomorrow as a new start post.
I have everything except for this ReplaceInByteArray().
So I will let you tear it apart.  But be gentle.
And thanks to both of you for sticking with me on this.

I plan on this:

(1)Use very fast string replace to do the easy stuff.
I have a very fast Like search and replace now.
(2)Convert to a Byte Array and do the hard looping stuff.
(3)Then convert back.

the code is too large and convoluted for easy study and I think I have it 
down to just this one ReplaceInByteArray routine 'cause all else works.

and as I said, if all else fails, the sub is all working correctly with 
strings.

0
Reply Utf 12/28/2009 4:03:01 AM

28 Replies
200 Views

(page loaded in 0.538 seconds)

Similiar Articles:































7/27/2012 9:10:12 AM


Reply: