Why is opening multiple files taking so long?

1

Recently, I asked the following question on SO:

I have a folder foo with thousands .html files about 300 Kb each.

Here is what I do to open them:

 import os
 import time

 folder_name = 'foo'
 for file_name in os.listdir(folder_name):
     t = time.time()
     with open(os.path.join(folder_name, file_name)) as f:
         print(time.time() - t, 'seconds to open', file_name)

And here is the output I get:

 1.6057319641113281 seconds to open 1.html
 1.3181514739990234 seconds to open 2.html
 1.1490132808685303 seconds to open 3.html
 1.2970092296600342 seconds to open 4.html
 1.0074846744537354 seconds to open 5.html
 1.5122349262237549 seconds to open 6.html
 1.1730327606201172 seconds to open 7.html
 1.9992561340332031 seconds to open 8.html

 etc.

I have an SSD and am quite surprised that it takes over a second to open a small file.

Is this normal? If not, what can be done to speed it up?

I mistakenly thought that my problem was Python-specific. Now I tried it on another PC and it takes milliseconds (as it should).

Moreover, zipping many small files also takes about 1 second per file. So the problem is with Windows 10 taking over 1 second to open a file.

Is there anything I can do about it? (except reinstalling the whole thing)

Leo

Posted 2017-12-01T10:57:24.387

Reputation: 113

3Could it be real-time anti-virus protection that is delaying file opening? Try disabling AV temporarily while you rerun the test. Otherwise, you'll need to use a performance monitor to find out where the program is spending its time. By the way, I don't know Python, but should you not close each file after opening, or is this implied by reusing the same f on each pass? Thousands of open file handles would be bound to have a performance impact. – AFH – 2017-12-01T11:14:02.780

1@AFH with does the closing when execution leaves its scope. – gronostaj – 2017-12-01T11:28:12.790

@gronostaj - Thanks, I was only guessing: from my experience of earlier languages, like Fortran, PL/1 and C, overwriting a file pointer was a good was to leak handles. – AFH – 2017-12-01T11:33:56.297

@AFH: I disabled Windows Defender's "real-time protection" and the problem is gone. Many thanks! Please post it as an answer so I could accept it. But it's still weird though: the other PC I tried it on did have this "real-time protection" on but it didn't cause any problems there. – Leo – 2017-12-01T11:51:00.290

Answers

1

It could be that real-time anti-virus protection is delaying file opening. You can test whether this is the cause by disabling AV temporarily while you rerun the test.

From your comments, I cannot explain why there would be a difference between two different machines if they are running the same AV software, unless the settings are different, including any folder or file-type exclusions.

If settings differences aren't the cause, you'll need to use a performance monitor to find out where the AV real-time checker is spending its time.

AFH

Posted 2017-12-01T10:57:24.387

Reputation: 15 470