realbasic-nug
[Top] [All Lists]

Re: reading/parsing large files

To: REALbasic NUG <realbasic-nug@lists.realsoftware.com>
Subject: Re: reading/parsing large files
From: Michael Diehr <md03@xochi.com>
Date: Mon, 29 Sep 2008 19:44:09 -0700
Authentication-results: mx.google.com; spf=neutral (google.com: 74.124.194.228 is neither permitted nor denied by best guess record for domain of realbasic-nug-bounces@lists.realsoftware.com) smtp.mail=realbasic-nug-bounces@lists.realsoftware.com
Delivered-to: listarchive@realsoftware.com
In-reply-to: <49182.202.56.7.164.1222741959.squirrel@mail.btcl.net.bd>
References: <49182.202.56.7.164.1222741959.squirrel@mail.btcl.net.bd>
Reply-to: REALbasic NUG <realbasic-nug@lists.realsoftware.com>
Sender: realbasic-nug-bounces@lists.realsoftware.com
Have you tried Spotlight? It's built into RB now and is very fast, but mac-only of course...

On Sep 29, 2008, at 7:32 PM, Carlo wrote:

Hello,

I have to search for particular words in a folder containing more than
50000 files located in 6000 sub-folders. Files' size ranges from 40k to
12M.

Here follow the stats of a typical search (Mac Intel):

time needed: 9 minutes
hits: 3650
max CPU: 102%
avg CPU:  55%
max RAM: 999M
avg RAM: 554M

I would like to know if these values (especially the CPU and RAM values)
are acceptable, and in case they were too high if there is a way to
decrease them.

The code is basically a loop: skipping certain files (audio-visual and
non-visible files), I process all the others basically in this way:

//open the file as binary, false
source = defineEncoding(b.read(b.length),nil)
//parse it with Joe's TextUtilities
mnumber = countB(Source, wordToBeSearched)
//as soon as the word is found exit countB
//add a row to a listbox (file name, location etc.)

During the search both window and listbox are hidden.

I tried using a thread, but apart from the time needed that obviously
increased (18 minutes), the CPU and memory values didnt change very much;
app.doEvents not used; #pragmas widely used.

Since many files' size is more than 4M, I tried splitting them in chunks
of 2M each, but the results did not vary considerably.
Therefore I was thinking of using memoryblocks, but alas, I dont know how to deal with them. If using memoryblocks helped both reducing CPU/ memory
values and search-time too, could some good soul tell me how to do it?

Thanks for any advice,

Carlo


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>


<Prev in Thread] Current Thread [Next in Thread>