Re: Splitting Files into Multiple Folders for Read Performance
- From: dbtjreid@xxxxxxxxx
- Date: Thu, 8 May 2008 11:02:30 -0700 (PDT)
On May 8, 9:50 am, "Pegasus \(MVP\)" <I....@xxxxxxxxxx> wrote:
"BlackStarPro" <jr...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:5122c9c8-82d9-4bb6-b8ce-0d940cab5eb0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi,
I am a .NET developer for a company that has been producing upward of
100,000 PDF reports per month for several years, which has now turned
into a huge problem. We are releasing a new reporting system that
allows for dynamic reports and custom dashboards, however they want
ALL their legacy PDF reports to still be available through our new
reporting system.
We decided the best thing would be to build a simple app that iterated
through the millions of files and folders and re-organize them onto a
different (better performing) HD array. Right now the files are in a
very elaborate folder structure, the names of the folders are
essentially parameters for organizing the files. Our idea was to have
our application pull the files out and when doing so break its
original path up into parameters and write the parameters into a
database, then simply rename the PDF file something unique and store
them in a simpler folder structure, also keeping track of its new
location.
I had heard that for read performance reasons a folder shouldn't have
more than 2000 files in it. Is there anyone that can verify this?
Using that knowledge (which I don't know if true or not) we could have
main folders incrementally numbered (1,2,3,4, etc.) each containing
2000 uniquely named files.
example:
c:\1\ (would hold files 1-2000.pdf)
c:\2\ (would hold files 2001-4000.pdf) etc.
Our app would just keep track of the old folder parameters (structure)
and a reference to its new location.
Currently we are seeing major performance issues when we try to deal
with the millions of files and folders that are in there right now.
Can anyone recommend a better solution or concur with the above
solution? Any suggestions would help.
My own experience says that performance under NTFS is fine up
to 5000 files per folder and that the number should not exceed 10,000.
Seeing that this is a major project, I would run some tests beforehand.
Here is what I would do:
1. Create the test folder c:\Speed1.
2. Populate it with x files, using this command:
for /L %a in (1,1,x) do echo. > c:\Speed1\Test%a.txt
3. Log off, then log on in order to clear the disk cache.
4. Create the batch file c:\SpeedTest.bat:
@echo off
if exist c:\Speed2 rd /s /q c:\Speed2
md c:\Speed2
for /L %%i in (1,1,100) do call :Test
goto :eof
:Test
set /a r=%random% * x / 32767
copy /y c:\Speed1\Test%%r%.txt c:\Speed2 > nul
5. Invoke c:\Speed.bat like so:
timethis c:\SpeedTest.bat
This test will copy 100 randomly selected files from
c:\Speed1 to c:\Speed2. It will then tell you how long
the process took. Select different values for x to see
how many files you can safely store in each folder.- Hide quoted text -
- Show quoted text -
Nice! Thank you very much for the response. I will do these tests and
will post the results. Thanks again.
.
- References:
- Splitting Files into Multiple Folders for Read Performance
- From: BlackStarPro
- Re: Splitting Files into Multiple Folders for Read Performance
- From: Pegasus \(MVP\)
- Splitting Files into Multiple Folders for Read Performance
- Prev by Date: Re: Splitting Files into Multiple Folders for Read Performance
- Next by Date: Re: Max recommened SAN disk size
- Previous by thread: Re: Splitting Files into Multiple Folders for Read Performance
- Index(es):
Relevant Pages
|