Discussion:
[gentoo-portage-dev] [PATCH 1/3] ecompress: Fix crash on duplicate compressed & uncompressed files
Michał Górny
2018-09-25 19:18:42 UTC
Permalink
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.

Bug: https://bugs.gentoo.org/667072
Signed-off-by: Michał Górny <***@gentoo.org>
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )

while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
"${path}.ecompress" || die
done < <(find "${find_args[@]}" -print0 || die)
fi
--
2.19.0
Michał Górny
2018-09-25 19:18:43 UTC
Permalink
Fix the pre-compressed file warning to be reported only once, now that
files are processed in parallel.

Signed-off-by: Michał Górny <***@gentoo.org>
---
bin/ecompress | 6 ++++++
bin/ecompress-file | 13 +++----------
2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/bin/ecompress b/bin/ecompress
index d5ff3796c..e4fdc3934 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -150,6 +150,12 @@ find "${ED}" -name '*.ecompress' -delete -print0 |
___parallel_xargs -0 "${PORTAGE_BIN_PATH}"/ecompress-file
ret=${?}

+if [[ -f ${T}/.ecompress_had_precompressed ]]; then
+ eqawarn "One or more compressed files were found in docompress-ed directories."
+ eqawarn "Please fix the ebuild not to install compressed files (manpages,"
+ eqawarn "documentation) when automatic compression is used."
+fi
+
fix_symlinks
: $(( ret |= ${?} ))
[[ $ret -ne 0 ]] && __helpers_die "${0##*/} failed"
diff --git a/bin/ecompress-file b/bin/ecompress-file
index 18269c91b..27695da1b 100755
--- a/bin/ecompress-file
+++ b/bin/ecompress-file
@@ -14,7 +14,6 @@ compress_file() {
set +f
mask_ext_re="^(${mask_ext_re:1})\$"
local filtered_args=()
- local had_precompressed=
for x in "$@" ; do
[[ ${x##*.} =~ $mask_ext_re ]] && continue
[[ -s ${x} ]] || continue
@@ -24,15 +23,15 @@ compress_file() {
*.gz|*.Z)
gunzip -f "${x}" || __helpers_die "gunzip failed"
x=${x%.*}
- had_precompressed=1;;
+ touch "${T}/.ecompress_had_precompressed";;
*.bz2)
bunzip2 -f "${x}" || __helpers_die "bunzip2 failed"
x=${x%.bz2}
- had_precompressed=1;;
+ touch "${T}/.ecompress_had_precompressed";;
*.lzma|*.xz)
unxz -f "${x}" || __helpers_die "unxz failed"
x=${x%.*}
- had_precompressed=1;;
+ touch "${T}/.ecompress_had_precompressed";;
esac

filtered_args+=( "$x" )
@@ -40,12 +39,6 @@ compress_file() {
[[ ${#filtered_args[@]} -eq 0 ]] && return 0
set -- "${filtered_args[@]}"

- if [[ ${had_precompressed} ]]; then
- eqawarn "One or more compressed files were found in docompress-ed directories."
- eqawarn "Please fix the ebuild not to install compressed files (manpages,"
- eqawarn "documentation) when automatic compression is used."
- fi
-
# If a compressed version of the file already exists, simply
# delete it so that the compressor doesn't whine (bzip2 will
# complain and skip, gzip will prompt for input)
--
2.19.0
Michał Górny
2018-09-25 19:18:44 UTC
Permalink
---
bin/ecompress | 1 +
bin/ecompress-file | 9 +++------
2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/bin/ecompress b/bin/ecompress
index e4fdc3934..4a2c8a99a 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -53,6 +53,7 @@ while [[ $# -gt 0 ]] ; do
# skip the compressed variants (bug #667072)
case ${path} in
*.Z|*.gz|*.bz2|*.lzma|*.xz)
+ touch "${T}"/.ecompress_had_precompressed
[[ -s ${path%.*} ]] && continue
;;
esac
diff --git a/bin/ecompress-file b/bin/ecompress-file
index 27695da1b..bc8fe5451 100755
--- a/bin/ecompress-file
+++ b/bin/ecompress-file
@@ -22,16 +22,13 @@ compress_file() {
case ${x} in
*.gz|*.Z)
gunzip -f "${x}" || __helpers_die "gunzip failed"
- x=${x%.*}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.*};;
*.bz2)
bunzip2 -f "${x}" || __helpers_die "bunzip2 failed"
- x=${x%.bz2}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.bz2};;
*.lzma|*.xz)
unxz -f "${x}" || __helpers_die "unxz failed"
- x=${x%.*}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.*};;
esac

filtered_args+=( "$x" )
--
2.19.0
Zac Medico
2018-09-25 19:49:03 UTC
Permalink
Post by Michał Górny
---
bin/ecompress | 1 +
bin/ecompress-file | 9 +++------
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/bin/ecompress b/bin/ecompress
index e4fdc3934..4a2c8a99a 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -53,6 +53,7 @@ while [[ $# -gt 0 ]] ; do
# skip the compressed variants (bug #667072)
case ${path} in
*.Z|*.gz|*.bz2|*.lzma|*.xz)
+ touch "${T}"/.ecompress_had_precompressed
While we're here, for efficiency we should use the builtin >> operator
instead of touch (also touch has broken in the past,
https://bugs.gentoo.org/348640).
Post by Michał Górny
[[ -s ${path%.*} ]] && continue
;;
esac
diff --git a/bin/ecompress-file b/bin/ecompress-file
index 27695da1b..bc8fe5451 100755
--- a/bin/ecompress-file
+++ b/bin/ecompress-file
@@ -22,16 +22,13 @@ compress_file() {
case ${x} in
*.gz|*.Z)
gunzip -f "${x}" || __helpers_die "gunzip failed"
- x=${x%.*}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.*};;
*.bz2)
bunzip2 -f "${x}" || __helpers_die "bunzip2 failed"
- x=${x%.bz2}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.bz2};;
*.lzma|*.xz)
unxz -f "${x}" || __helpers_die "unxz failed"
- x=${x%.*}
- touch "${T}/.ecompress_had_precompressed";;
+ x=${x%.*};;
esac
filtered_args+=( "$x" )
--
Thanks,
Zac
Zac Medico
2018-09-25 19:55:19 UTC
Permalink
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Post by Michał Górny
"${path}.ecompress" || die
fi
--
Thanks,
Zac
Zac Medico
2018-09-25 19:58:11 UTC
Permalink
Post by Zac Medico
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Alternatively, we could uncompress the pre-compressed files right here.
--
Thanks,
Zac
Michał Górny
2018-09-25 20:07:43 UTC
Permalink
Post by Zac Medico
Post by Zac Medico
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Alternatively, we could uncompress the pre-compressed files right here.
That would cause them to be decompressed even if the path is eventually
exclude via 'docompress -x'.
--
Best regards,
Michał Górny
Zac Medico
2018-09-25 20:29:02 UTC
Permalink
Post by Michał Górny
Post by Zac Medico
Post by Zac Medico
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Alternatively, we could uncompress the pre-compressed files right here.
That would cause them to be decompressed even if the path is eventually
exclude via 'docompress -x'.
I suppose we could record the extensions in a shared .ecompress file,
and then ecompress-file could read the existing extensions from that file.
--
Thanks,
Zac
Michał Górny
2018-09-25 20:36:48 UTC
Permalink
Post by Zac Medico
Post by Michał Górny
Post by Zac Medico
Post by Zac Medico
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Alternatively, we could uncompress the pre-compressed files right here.
That would cause them to be decompressed even if the path is eventually
exclude via 'docompress -x'.
I suppose we could record the extensions in a shared .ecompress file,
and then ecompress-file could read the existing extensions from that file.
Could we please solve the problem here instead of inventing super-
complex solutions to a non-existing theoretical problem that are
eventually going to cause more issues than real gain? Just like this
problem wouldn't have occurred if we haven't added entirely pointless
parallel compression of tiny files.
--
Best regards,
Michał Górny
Zac Medico
2018-09-25 20:56:58 UTC
Permalink
Post by Michał Górny
Post by Zac Medico
Post by Michał Górny
Post by Zac Medico
Post by Zac Medico
Post by Michał Górny
Fix crash due to race condition in handling the same file being present
both in compressed and uncompressed variants. If that is the case,
just queue the uncompressed variant for compression, and ignore
the other compressed variants.
Bug: https://bugs.gentoo.org/667072
---
bin/ecompress | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/bin/ecompress b/bin/ecompress
index 36bdb585b..d5ff3796c 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -49,6 +49,13 @@ while [[ $# -gt 0 ]] ; do
find_args+=( -size "+${PORTAGE_DOCOMPRESS_SIZE_LIMIT}c" )
while IFS= read -d '' -r path; do
+ # if both compressed and uncompressed variant exists,
+ # skip the compressed variants (bug #667072)
+ case ${path} in
+ *.Z|*.gz|*.bz2|*.lzma|*.xz)
+ [[ -s ${path%.*} ]] && continue
+ ;;
+ esac
In theory, we'd still have a problem if the file existed with muliple
compressions, right? Maybe a good solution is to strip the compression
extension here, and then have ecompress-file check for duplicates?
Alternatively, we could uncompress the pre-compressed files right here.
That would cause them to be decompressed even if the path is eventually
exclude via 'docompress -x'.
I suppose we could record the extensions in a shared .ecompress file,
and then ecompress-file could read the existing extensions from that file.
Could we please solve the problem here instead of inventing super-
complex solutions to a non-existing theoretical problem that are
eventually going to cause more issues than real gain? Just like this
problem wouldn't have occurred if we haven't added entirely pointless
parallel compression of tiny files.
Yeah, for now let's just revert "ecompress: Run compression in parallel":

https://gitweb.gentoo.org/proj/portage.git/commit/?id=289d9a17dc9d9287e5dcb75f84b38ad0388e5fde
--
Thanks,
Zac
Loading...