Serious performance issues with Ubuntu 18.04 and large file numbers

3

I am using our university network for my research work, and was given a new PC with Ubuntu 18.04, and an Intel i7-7700. I work a lot with machine learning applications, so I am constantly moving huge datasets (~ 30-50 GB) of relatively small files. With Linux, I run into severe performance problems when doing so, especially when using Nautilus.

For example, when working with the AVA-Dataset (255530 .jpg-images), Nautilus takes FOREVER (~3 Minutes??) to simply open the folder and display the contents! Note that I am using list layout, without any thumbnails that have to be displayed, or anything like that - just the filename, and that's it. On my Windows machine, opening similar image folders only takes about 2-3 seconds. Am I doing something wrong, or is this a general Linux-specific problem?

Help and suggestions greatly appreciated! :)

masterBroesel

Posted 2019-09-16T12:18:41.780

Reputation: 31

Question was closed 2019-09-24T16:20:13.127

1Windows is probably benefiting from file indexing here perhaps? Not sure if there's a similar thing in Ubuntu ? – Smock – 2019-09-16T12:24:37.840

1Use shell commands, as you have trained yourself for some time you will be whizzing along. (tip: www.tldp.org contains at least two "guides" on bash, ...). In the process of learning to use bash you will be automating all your tasks to a reasonable level. – Hannu – 2019-09-16T16:43:36.153

I have tried the same thing within the command line - the standard -ls command also takes forever, as it has to stat and sort each file (multiple) times. Removing stat and sort features as described in [link] https://unix.stackexchange.com/questions/120077/the-ls-command-is-not-working-for-a-directory-with-a-huge-number-of-files helped a bit.

– masterBroesel – 2019-09-17T06:21:25.113

Answers

0

In its attempt to be user-friendly, Nautilus is sacrificing performance. In this case it's probably related to Nautilus checking every file to decide how to display information about it, so you're waiting for Nautilus to read the header of every file in that directory. At the risk of incurring the wrath of the powers that be I would recommend you pursue alternative software for this task, rather than attempting to force Nautilus into a role it's ill-suited for.

Some more performant alternatives you might try:

Gary

Posted 2019-09-16T12:18:41.780

Reputation: 166

I would prefer a solution without additional software, but if this is the only way, so be it. – masterBroesel – 2019-09-16T13:12:29.973

You can add Dolphin to that list. – xenoid – 2019-09-16T15:13:07.000

On a gnome desktop Dolphin would come with another ~40 dependencies including most of the core KDE frameworks. I wouldn't want to recommend that. – Gary – 2019-09-16T20:17:23.273

0

Two things come to my mind which may help you:

  1. Since you are working in machine learning, odds are thay maybe you use python already. In some of my work where I needed to organise large amounts of files I wrote dedicated, short python scripts to perform organising tasks. Python's implementation of e.g. getting a list of all files in a directory is said to be quite efficient. This way you would work less with a gui like Nautilus or with shell commands, both of which you say are not working too well for you.

  2. Which file system are you using on your new PC? Different file systems show different performance characteristics when working with large numbers of files, due to internal organisation, e.g. binary trees vs linear lists, see e.g. https://unix.stackexchange.com/questions/63250/storing-thousands-of-files-in-one-directory?noredirect=1&lq=1 Opinions and benchmark results vary greatly in this area, maybe you can repartition your hard drive or add an external SSD to try a few different file systems with your tasks, e.g. BTRFS vs XFS vs ...

Eradian

Posted 2019-09-16T12:18:41.780

Reputation: 66