From: Jerome Jutteau Date: Tue, 5 Jul 2022 11:46:16 +0000 (+0200) Subject: [TASK] Disabling file deduplication by default. X-Git-Tag: 4.5.0~13 X-Git-Url: https://git.p6c8.net/jirafeau_project.git/commitdiff_plain/b0d7e17277d6b5ec5b9110542ec7945848c1241a?ds=sidebyside;hp=aec88112ff1290657dbbdbc715ccff6d88b821cd [TASK] Disabling file deduplication by default. File hashing is a pretty intensive task both for CPU and disk. This feature seems not to be a must have and can produce unwanted side effects. Very large file can take a lot of time to be hashed at the end of transfert, producing protential timeouts and user frustration. Users can still enable file deduplication feature by setting `file_hash` to `md5`. Signed-off-by: Jerome Jutteau --- diff --git a/CHANGELOG.md b/CHANGELOG.md index 1f8488c..c1c29d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,7 +13,8 @@ # version 4.5.0 - Support for dark theme -- Fix side effects of setting too high values in php configuration. +- Fix side effects of setting too high values in php configuration +- Change default `file_hash` option to `random` New configuration items: - `max_upload_chunk_size_bytes` option diff --git a/README.md b/README.md index dff22cb..9105c7f 100644 --- a/README.md +++ b/README.md @@ -273,7 +273,7 @@ Check [issues](https://gitlab.com/mojo42/Jirafeau/issues) to check open bugs and ### What about this file deduplication thing? -Jirafeau uses a very simple file level deduplication for storage optimization. +Jirafeau can use a very simple file level deduplication for storage optimization. This mean that if some people upload several times the same file, this will only store one time the file and increment a counter. @@ -283,9 +283,11 @@ When the counter falls to zero, the file is destroyed. In order to know if a newly uploaded file already exist, Jirafeau will hash the file using md5 by default but other methods are available (see `file_hash` documentation in `lib/config.original.php`). +This feature is disabled by default and can be enabled through the `file_hash` option. + ### What is the difference between "delete link" and "delete file and links" in admin interface? -As explained in the previous question, files with the same hash are not duplicated and a reference counter stores the number of links pointing to a single file. +When file deduplication feature is enabled, files with the same hash are not duplicated and a reference counter stores the number of links pointing to a single file. So: - The button "delete link" will delete the reference to the file but might not destroy the file. - The button "delete file and links" will delete all references pointing to the file and will destroy the file. diff --git a/docker/README.md b/docker/README.md index b1ed8a7..daf5302 100644 --- a/docker/README.md +++ b/docker/README.md @@ -36,7 +36,7 @@ Available options: - `ADMIN_PASSWORD`: setup a specific admin password. If not set, a random password will be generated. - `WEB_ROOT`: setup a specific domain to point at when generating links (e.g. 'jirafeau.mydomain.com/'). - `VAR_ROOT`: setup a specific path where to place files. default: '/data'. -- `FILE_HASH`: can be set to `md5` (default), `partial_md5` or `random`. +- `FILE_HASH`: can be set to `md5`, `partial_md5` or `random` (default). - `PREVIEW`: set to 1 or 0 to enable or disable preview. - `TITLE`: set Jirafeau instance title. - `ORGANISATION`: set organisation (in ToS). diff --git a/lib/config.original.php b/lib/config.original.php index 96dd6f3..1f2e8b7 100644 --- a/lib/config.original.php +++ b/lib/config.original.php @@ -158,7 +158,7 @@ $cfg['proxy_ip'] = array(); /* File hash * In order to make file deduplication work, files can be hashed through different methods. - * By default, files are hashed through md5 but other methods are available. + * To enable file deduplication feature, set this option to `md5`. * * Possible values are 'md5', 'md5_outside' and 'random'. * @@ -168,9 +168,9 @@ $cfg['proxy_ip'] = array(); * - md5 of the last part of the file and * - file's size. * This method offer file deduplication at minimal cost but can be dangerous as files with the same partial hash can be mistaken. - * With 'random' option, file hash is set to a random value and file deduplication cannot work anymore but it is fast and safe. + * With 'random' option, file hash is set to a random value and file deduplication cannot work but it is fast and safe. */ -$cfg['file_hash'] = 'md5'; +$cfg['file_hash'] = 'random'; /* Work around that LiteSpeed truncates large files when downloading. * Only for use with the LiteSpeed web server!