fast search in file

Hi,
quick questions, what would be the best choice for
fastest caching method (to the file)? I need fast inserting and retrieving
data (searching) from that file.
thanks


0
alladyn (21)
8/23/2005 9:16:25 PM
vc.mfc 33608 articles. 0 followers. Follow

8 Replies
705 Views

Similar Articles

[PageSpeed] 58

You may want to investigate a database format like .mdb or other ISAM 
methods.  If you are just searching through text you may want to use 
CStdioFile to search line by line through the file.

Tom

"alladyn" <alladyn@yahoo.com> wrote in message 
news:OgSrseCqFHA.3104@TK2MSFTNGP12.phx.gbl...
> Hi,
> quick questions, what would be the best choice for
> fastest caching method (to the file)? I need fast inserting and retrieving
> data (searching) from that file.
> thanks
>
> 


0
tserface (3861)
8/23/2005 11:21:56 PM
Depends on what you mean by "fast insert".  Insert will always have performance
limitations unless you do something that is nonlinear (e.g., a file which is a linked list
of text), in which case the fast search is going to be slower. unless you don't care about
things like line boundaries.  The old IBM ISAM file methodology worked well for fast
insertion and deletion (we're talking 30-year-old technology here...); it was basically a
linked list of file blocks with internal reuse of blocks.  I built an equivalent of this
in about an afternoon (actually, it took a good chunk of one afternoon and a couple hours
the next night), so it isn't too hard, but the file can only be read by the application
that understands its organization

For string searching, Boyer-Moore and some of the later fast-string-search algorithms have
excellent performance, but don't work well if the file gets badly-fragmented by the scheme
I just gave, unless you are willing to accept out-of-order results and sort the result
matches.

"fastest caching method" is a bit odd as a question; the file system cache really isn't
under your control.  And while you can do certain tricks to try to optimize L1/L2 cache
hit performance, it usually only applies to a limited number of chipsets (e.g., something
that is great for a Xeon might suck on a Centrino, or vice-versa).  So the first thing you
need to come to terms with is whether you want fast insert or fast search.  Search
technology, especially for large files, has been the core of several significant works
(historically, Don Knuth's "Searching and Sorting" book was a great compendium of the
state of the art of its day; but much more work has been done since.  What are you
searching for (arbitrary strings? strings within a field of a record?) What are your
search critiera (exact? initial substring? anchored vs. unanchored? case-folded? wildcard?
SOUNDEX?)  What is the size of your data (fractions of megabytes? tens of megabytes?
hundreds of megabytes? gigabytes? terabytes?) Can you afford to create an index? If so,
what kind? (B-tree? perfect hash? balanced tree?) Is there already an ordering imposed?
Can you take advantage of it in your search? Can you use hints to help optimize the
search?

Essentially, your question is far too vague to elicit any one answer.  If you can answer
most of the questions above, the solution space can probably be narrowed down to a
reasonably small set of solutions.
				joe

On Tue, 23 Aug 2005 17:16:25 -0400, "alladyn" <alladyn@yahoo.com> wrote:

>Hi,
>quick questions, what would be the best choice for
>fastest caching method (to the file)? I need fast inserting and retrieving
>data (searching) from that file.
>thanks
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15975)
8/23/2005 11:55:23 PM
Thanks for long elaborate :)
Basically I need fast search for a exact string, insert is not that 
important.
I just wander why stdio FILE or CFile have no methods for string search...


"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message 
news:94dng1dd0cllohhs9gog427d2te4v0usgf@4ax.com...
> Depends on what you mean by "fast insert".  Insert will always have 
> performance
> limitations unless you do something that is nonlinear (e.g., a file which 
> is a linked list
> of text), in which case the fast search is going to be slower. unless you 
> don't care about
> things like line boundaries.  The old IBM ISAM file methodology worked 
> well for fast
> insertion and deletion (we're talking 30-year-old technology here...); it 
> was basically a
> linked list of file blocks with internal reuse of blocks.  I built an 
> equivalent of this
> in about an afternoon (actually, it took a good chunk of one afternoon and 
> a couple hours
> the next night), so it isn't too hard, but the file can only be read by 
> the application
> that understands its organization
>
> For string searching, Boyer-Moore and some of the later fast-string-search 
> algorithms have
> excellent performance, but don't work well if the file gets 
> badly-fragmented by the scheme
> I just gave, unless you are willing to accept out-of-order results and 
> sort the result
> matches.
>
> "fastest caching method" is a bit odd as a question; the file system cache 
> really isn't
> under your control.  And while you can do certain tricks to try to 
> optimize L1/L2 cache
> hit performance, it usually only applies to a limited number of chipsets 
> (e.g., something
> that is great for a Xeon might suck on a Centrino, or vice-versa).  So the 
> first thing you
> need to come to terms with is whether you want fast insert or fast search. 
> Search
> technology, especially for large files, has been the core of several 
> significant works
> (historically, Don Knuth's "Searching and Sorting" book was a great 
> compendium of the
> state of the art of its day; but much more work has been done since.  What 
> are you
> searching for (arbitrary strings? strings within a field of a record?) 
> What are your
> search critiera (exact? initial substring? anchored vs. unanchored? 
> case-folded? wildcard?
> SOUNDEX?)  What is the size of your data (fractions of megabytes? tens of 
> megabytes?
> hundreds of megabytes? gigabytes? terabytes?) Can you afford to create an 
> index? If so,
> what kind? (B-tree? perfect hash? balanced tree?) Is there already an 
> ordering imposed?
> Can you take advantage of it in your search? Can you use hints to help 
> optimize the
> search?
>
> Essentially, your question is far too vague to elicit any one answer.  If 
> you can answer
> most of the questions above, the solution space can probably be narrowed 
> down to a
> reasonably small set of solutions.
> joe
>
> On Tue, 23 Aug 2005 17:16:25 -0400, "alladyn" <alladyn@yahoo.com> wrote:
>
>>Hi,
>>quick questions, what would be the best choice for
>>fastest caching method (to the file)? I need fast inserting and retrieving
>>data (searching) from that file.
>>thanks
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
alladyn (21)
8/24/2005 5:18:09 PM
Depending on the size of the file you could easily read the whole thing (if 
it is smaller) into a CString and use the various find functions in there.

Tom

"alladyn" <alladyn@yahoo.com> wrote in message 
news:%23JSIP%23MqFHA.620@TK2MSFTNGP15.phx.gbl...
> Thanks for long elaborate :)
> Basically I need fast search for a exact string, insert is not that 
> important.
> I just wander why stdio FILE or CFile have no methods for string search...


0
tserface (3861)
8/24/2005 6:13:04 PM
On Wed, 24 Aug 2005 13:18:09 -0400, "alladyn" <alladyn@yahoo.com> wrote:

>Thanks for long elaborate :)
>Basically I need fast search for a exact string, insert is not that 
>important.

Provided you can efficiently back up during the scan, Boyer-Moore is
probably your best bet. Just make sure the setup time for the algorithm
doesn't dominate the run time.

>I just wander why stdio FILE or CFile have no methods for string search...

You could ask the same question about any number of applications. The idea
behind an abstraction such as FILE is to provide a minimal interface that
captures what is necessary to use a stream and is sufficient for building
higher-level abstractions.

-- 
Doug Harrison
VC++ MVP
0
dsh (2498)
8/24/2005 6:41:00 PM
Is that Boyer-Moore algoritm imlemented somewhere or just need to
write by myself?
thanks

"Doug Harrison [MVP]" <dsh@mvps.org> wrote in message 
news:a9fpg1ll39k1ngag4ikjdjngj8ul7ns9qk@4ax.com...
> On Wed, 24 Aug 2005 13:18:09 -0400, "alladyn" <alladyn@yahoo.com> wrote:
>
>>Thanks for long elaborate :)
>>Basically I need fast search for a exact string, insert is not that
>>important.
>
> Provided you can efficiently back up during the scan, Boyer-Moore is
> probably your best bet. Just make sure the setup time for the algorithm
> doesn't dominate the run time.
>
>>I just wander why stdio FILE or CFile have no methods for string search...
>
> You could ask the same question about any number of applications. The idea
> behind an abstraction such as FILE is to provide a minimal interface that
> captures what is necessary to use a stream and is sufficient for building
> higher-level abstractions.
>
> -- 
> Doug Harrison
> VC++ MVP 


0
alladyn (21)
8/24/2005 7:26:05 PM
http://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM.html
http://www.codeproject.com/csharp/bmsearch.asp

Tom

"alladyn" <alladyn@yahoo.com> wrote in message 
news:%23QFqoFOqFHA.3204@TK2MSFTNGP10.phx.gbl...
> Is that Boyer-Moore algoritm imlemented somewhere or just need to
> write by myself?
> thanks
>
> "Doug Harrison [MVP]" <dsh@mvps.org> wrote in message 
> news:a9fpg1ll39k1ngag4ikjdjngj8ul7ns9qk@4ax.com...
>> On Wed, 24 Aug 2005 13:18:09 -0400, "alladyn" <alladyn@yahoo.com> wrote:
>>
>>>Thanks for long elaborate :)
>>>Basically I need fast search for a exact string, insert is not that
>>>important.
>>
>> Provided you can efficiently back up during the scan, Boyer-Moore is
>> probably your best bet. Just make sure the setup time for the algorithm
>> doesn't dominate the run time.
>>
>>>I just wander why stdio FILE or CFile have no methods for string 
>>>search...
>>
>> You could ask the same question about any number of applications. The 
>> idea
>> behind an abstraction such as FILE is to provide a minimal interface that
>> captures what is necessary to use a stream and is sufficient for building
>> higher-level abstractions.
>>
>> -- 
>> Doug Harrison
>> VC++ MVP
>
> 


0
tserface (3861)
8/24/2005 8:44:30 PM
thanks a lot!

"Tom Serface" <tserface@msn.com> wrote in message 
news:u5bohyOqFHA.616@TK2MSFTNGP15.phx.gbl...
> http://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM.html
> http://www.codeproject.com/csharp/bmsearch.asp
>
> Tom
>
> "alladyn" <alladyn@yahoo.com> wrote in message 
> news:%23QFqoFOqFHA.3204@TK2MSFTNGP10.phx.gbl...
>> Is that Boyer-Moore algoritm imlemented somewhere or just need to
>> write by myself?
>> thanks
>>
>> "Doug Harrison [MVP]" <dsh@mvps.org> wrote in message 
>> news:a9fpg1ll39k1ngag4ikjdjngj8ul7ns9qk@4ax.com...
>>> On Wed, 24 Aug 2005 13:18:09 -0400, "alladyn" <alladyn@yahoo.com> wrote:
>>>
>>>>Thanks for long elaborate :)
>>>>Basically I need fast search for a exact string, insert is not that
>>>>important.
>>>
>>> Provided you can efficiently back up during the scan, Boyer-Moore is
>>> probably your best bet. Just make sure the setup time for the algorithm
>>> doesn't dominate the run time.
>>>
>>>>I just wander why stdio FILE or CFile have no methods for string 
>>>>search...
>>>
>>> You could ask the same question about any number of applications. The 
>>> idea
>>> behind an abstraction such as FILE is to provide a minimal interface 
>>> that
>>> captures what is necessary to use a stream and is sufficient for 
>>> building
>>> higher-level abstractions.
>>>
>>> -- 
>>> Doug Harrison
>>> VC++ MVP
>>
>>
>
> 


0
alladyn (21)
8/24/2005 9:52:51 PM
Reply:

Similar Artilces:

file format needs to be changed
I have an excel file which comprises of a template for Identity card with a jpg file embedded in it. I would like to save this file in a jpeg format so as to get a photo printout of the same. How is this possible? If you scan the document, that would put the file into an image format. Then you could convert the file to whatever image file format you wanted. Just a thought... Highlight the area you want printed then hold SHIFT key and select Edit>Copy Picture. Open your favorite graphics editor or MS Paint and paste to there. Save as *.jpg Gord Dibben Excel MVP On Tue, 14 Jun 200...

What does it mean with read whole files atomically
Hi! Here is the complete sentence. "The file class can be used to open files, create new files, read whole files atomically, and even write files". So what does the test mean with read whole files atomically ? //Tony Tony Johansson wrote: > Hi! > > Here is the complete sentence. "The file class can be used to open files, > create new files, read whole files atomically, and even write files". > > So what does the test mean with read whole files atomically ? They must be talking about the ReadAllText() method, and using the word...

embed xls file
How do you embed a xls 97 file into word 97 and only icon appear and after double-click it, it will open up? Please advise. Thanks. > How do you embed a xls 97 file into word 97 and only icon appear and after > double-click it, it will open up? Please advise. Thanks. This works in Word 2K, hopefully in 97. In Word, Insert -- Object -- Create from file. Once you've browsed, check "Display as icon" & OK. Rgds, Andy Exactly, Thanks. "Andy Brown" <andy.j.brown@ntlworld.com> wrote in message news:%236VmyQ6REHA.3052@TK2MSFTNGP12.phx.gbl... > >...

How do I parse an XML file
This is a multi-part message in MIME format. ------=_NextPart_000_001A_01C69A10.8C752700 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The XML file looks something like this... --- Start XML --- <?xml version=3D"1.0" ?> <VULNERABILITIES> <PROGRAM_VERSION>0.0.1</PROGRAM_VERSION> <CONTROL_VERSION>0.0.1</CONTROL_VERSION> <VULNERABILITY> <V_SHORT_NAME>SOMETHING</V_SHORT_NAME> <V_LONG_NAME>SOMETHING</V_LONG_NAME> <V_DISCUSSION&...

convert AUTOCAD file to a shape
How do i convert a autocad file to a shape in visio process engineering shart. When i try the created stencil that is supposed to contain my new shape, is emty. Anyone know how to solve this? Hi Isabella, Did you use the Convert CAD Library add-on? Its located under Tools > Add-ons > Visio Extras. -- Hope this helps, Chris Roth Visio MVP Free Visio shapes: http://www.visguy.com/category/shapes Visio programming info: http://www.visguy.com/category/programming/ Other Visio resources: http://www.visguy.com/visio-links/ "Isabella" <Isabella@discussions.microsof...

File did not load completely
Hi there, I hope someone can give me an idea. I have quite a few file that are in the csv format. I can open them in excel. The problem is that they are very large and I get a message saying the the file did not load completely. How do I get the file to load completely or break the file up or split, or what ever I have to do. Please help. Will wait to hear. Thanks for any and all help you can be. Thanks Jackie Hi Jackie, > How do I get the file to load completely or break the file up or > split, or what ever I have to do. Please help. Will wait to hear. > Thanks for any and all h...

MSFile file transfer speed
Hello All: I have two PXA 270 based board, both has same Intel Flsh memory, one is running with WinCE 5.0 and PSM4.0 file system, and the other one is running with WinCE 6.0 and MSFile sytem, I transfer file to both unt using active sysnc, file size is 4 MB, ofcourse i am transfering file to onboard flash file area, NOT in RAM area, I notice transfer rate in WinCE 6.0 device is 50% slower than WinCE 5.0 platfrom with PSM4.0. My question is "Are there any setup in WionCE 6.0, that will change the speed of the file read write cycle, so the file transfer will be faster" Th...

EDB file to PST file?
Is there a utility (FREE) that I can convert an edb file to PST? I have some EDB files I want to take out some mail out of but not restore the whol thing. I test some COSTLY utilies but on need this for one time only. Thanks Mike What version of Exchange? Perhaps you could build a recovery server and do a disaster recovery scenario? -- Martin Blackstone MVP - Exchange http://www.swinc.com/resource/exchange.htm http://www.swinc.com/resource/e2kfaq_appxc.htm "Mike" <massey@rmci.net> wrote in message news:psst001mj15vod5fphqpfjc7hk39975d5l@4ax.com... > Is there a utilit...

Critical OE/Outlook Backup Files
Greetings. Would one of you knowledgable MPV's kindly tell me which OE/Outlook files should be backed up so I don't lose any of my address book, folders, or data from both programs. I am in need of reinstalling my OS (XP) and don't want to lose any of my important names, folders, documents, etc. Your assistance is much appreciated. Outlook: http://www.slipstick.com/config/backup.htm Outlook Express: http://insideoe.tomsterdam.com/backup/index.htm -- PATRICK REED [Outlook - MVP]~~~~~~ -Microsoft Certified Professional (MCP) -Have you checked http://www.slipstick.com? -P...

reducing file size
In an excel file, I just deleted a picture. When saving the fil again, I noticed that the file size grew instead of being reduced. Does anyway know if excel "save" things you delete, and if so how can you permantly delete things to reduce file size? Your used range has probalby gotten messed up. Look for any sheet where the scroll bars take you well past the end of the data. Hitting Ctrl + End on these sheets will also take you out the the data range. Here is a link on fixing the used range. http://www.contextures.com/xlfaqApp.html#Unused -- HTH... Jim Thomlins...

Deleted excel file
Hi all, I work a helpdesk and have a user that inadvertently deleted an Excel spreadsheet that she had just created (no backup). Since it is not in the recycle bin, is there a way to recover this file or is it lost? Thanks in advance. Kent Hi Kent, If the file is not listed in the Recycle Bin or elsewhere on the user's system, it is likely not recoverable; however, the user may want to check with third-party utilities such as Norton or Mace Utilities for possible recovery options. Hope this helps! -- Tom Moore [MSFT] This posting is provided "AS IS" with no warranties...

How to map a file with the memory
Hai, I want to map a file with the system memory. I am trying using CreateFileMapping but I am not able to map the file with the memory. Can any one help me. Regards, S.Vinodh Noel Bangalore. http://msdn2.microsoft.com/en-us/library/tzdxd4x0.aspx Tom "chinthamani" <vinodh_noel@yahoo.com> wrote in message news:d0efc199ebd7b8875325a7c997d8e598@localhost.talkaboutsoftware.com... > Hai, > I want to map a file with the system memory. I am trying using > CreateFileMapping but I am not able to map the file with the memory. Can > any one help me. > > Regar...

Search for mail in multiple PST files
I have about 90 pst files that I need to search for mail with about a dozen terms and phrases. Outlook will only allow you to search one pst at a time, and I've tried Google Desktop and MS Desktop Search and they are not effectively indexing all the emails, but even if they did I cannot effectively export the results to a file. Does anyone know of a utility or method that would work best for this? http://office.microsoft.com/en-us/outlook/HA011415261033.aspx or http://zyfind.com/products_technology/productsheets/ZyFIND%20for%20Outlook.pdf -- John Oliver, Jr MCSE, MCT, CCNA Exchange...

File not opening #2
I've got the same problem that I've been reading here and on the ExcelTips website. When I try to open 2 of the spreadsheets on my system, I get a " 'C:\Documents and Settings\........\spreadsheet.xls' could not be found. Check the spelling of the file name, and ..... (you know the rest)" I have tried the: excel /unregserver excel /regserver I have tried the: uncheck the ignore other applications I have tried the: regedit - remove dups (there were none) The directory with this spreadsheet (on my PC) is shared with 2 other computers (separate network - ...

File info from Windows Explorer to Excel?
I need to create a list of the file information from Windows Explorer to MS Excel. I used to have a dos command to do this. Please advise how this can be done. Thank you Regards I assume you mean the Dir command. This is still possible from the command prompt. Go to Start|Run and enter cmd. -- Ian -- "Wordgeek" <Wordgeek@discussions.microsoft.com> wrote in message news:ACE32060-668D-4A76-B901-1AEB70925A47@microsoft.com... >I need to create a list of the file information from Windows Explorer to MS > Excel. I used to have a dos command to do this. Please advise ho...

adding header file problem
Does anyone know why or how to prevent the contents (structures / functions) of a header file from being displayed in the workspace's class window, after it has been added to a project with the "Add to Project" then "Files" from the "Project" menu button. Or it this not the correct way to add header files to a project. THere might be some directive to do this, but it is that nature of the class view to display this information, so I doubt there is a way to suppress it. joe On Wed, 29 Nov 2006 17:52:29 GMT, "cdg" <anyone@anywhere.com...

Second window for second excel file
How to open second excel fine in seperate window Hi goto 'Tools - Options - View' and check 'Windows in taskbar' -- Regards Frank Kabel Frankfurt, Germany Murthy wrote: > How to open second excel fine in seperate window ...

Pulling data from individual files to master list
Hello, I've just been entering the world of Excel for the past few months, as I started a new job last year and my main duty is to bring the company into the 21st century (or even the late 20th, at this point). What I'd like to do sounds a little backwards, but I think it's the way to go, if it's possible: I'm creating individual sheets for our products, so that all the relevant info for, say, product A001 is shown on one sheet named "A001.xls". But I would also like to create a "master" list. I say "master" in quotes because it's not r...

Publisher 2007 open files problem with Server 2003
I have a very small network using Server 2003 and when we used Publisher 2003 It downloaded files from the Server to the Client Computers very fast. Now It takes 2-3 min. for small files and 3-5 min. for large files. There are a lot of pictures in these files but the same files opened in pub 2003 with no problems. Is there a solution for this issue. Thanks for your help. Maybe this article is your answer An Office program is slow or may appear to stop responding (hang) when you open a file from a network location http://support.microsoft.com/kb/833041 -- Mary Sauer http://m...

personal.xls file
Where is the personal.xls file located in Excel 12? ( I have no hidden files.) Thanks In article <2XYBk.41210$PK.37276@newsfe04.iad>, "Nemo" <n2moffat@charter.net> wrote: >Where is the personal.xls file located in Excel 12? ( I have no hidden >files.) Check your settings (tools/options/general for XL 11 SP3). I think you can put it wherever you like ... but it's usually in the 'startup' directory IIRC. HTH. The most likely place is here under Windows XP: C:\Documents and Settings\<username>\Application Data\Microsoft\Excel\XLSTART ...

When moving pst files into 2003 I get extra personal folders
I've recently upgraded to Outlook 2003 and after moving files from two previous versions of Outlook, I keep getting 2 extra personal folders added to my folder list. I can get rid of them using /cleanpst and /cleanprofile. But when I restart Outlook again the 2 personal folders are back. What to do? Your advice is appreciated. Thanks, Ken Hapa Molowa wrote: > I've recently upgraded to Outlook 2003 and after moving files from > two previous versions of Outlook, I keep getting 2 extra personal > folders added to my folder list. I can get rid of them using /cleanpst > and ...

How do I print save as files?
Someone sent me an excel spreadsheet. I made some corrections and saved it in the save as option. Now its in OLKD and I can not print it or send it back to the customer with the corrections because it can not open. I am totally new at this.Please can someone help me open it for review and to print and maybe move it? Thanks in Advance.Molly You didn't do a Save As and change the directory. It was a temp directory and may or may not be available. If there's any hope, this is it: http://www.gmayor.com/outlook_attachments.htm -- JoAnn Paules MVP Microsoft [Publisher] ~~~~~ How...

Formula calculation in a Shared Excel File
I have a Shared excel workbook which has data entered though the month by multiple people, in this file I have a number of formulas producing data about the entries. The issue I am facing is that when any of the users open or save the new entries the file takes quite a while to re-calculate the formulas. I know I can use the Calculation tab in options to manually calculate the fields but this only works at an individual level rather than in the actual workbook and it also will apply to evey excel file we work on. Does anyone know a solution to have a selected range of cells with formua...

Compare two file/colunms, hide row not does not equal list.
Excel 2000 -have two files. File "A" has 1 through 4155 records (rows with three columns) in numerical order. File "B", has a list of almost 200 rows/numbers (one column) from file "A". How can I hide the rows in file "A", not are listed in file "B"? Thanks, Jerry Not sure whether you are still monitoring this post, Jerry. Anyway, here's some thoughts ventured .. Conceptually, you should be able to achieve this via setting up a helper col and then autofiltering on the helper. Let's start by simplifying the scenario by having bo...

Save chart (that's on a chart sheet) to html file?
In Excel 2007, is it no longer possible to publish a chart that is on object in a worksheet to an html file? In Excel 2003, you had the option to choose a chart as the item to publish. That no longer seems to be available. ...