Fixed unit test failures for `test_terminal_output_response_charset_detection` and `test_terminal_output_request_charset_detection` by jautung · Pull Request #1855 · httpie/cli

jautung · 2026-05-24T17:12:27Z

These two unit tests were failing due to incorrect encoding detection of 卷首卷首卷首卷首卷卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首 -> big5. The unit tests used an ambiguous Big5 test string that charset_normalizer could not reliably distinguish from Johab.

Before	After

Added (temporary, not committed) debugging logs to encoding.py:detect_encoding:

def detect_encoding(content: ContentBytes) -> str:
    ...
    if len(content) > TOO_SMALL_SEQUENCE:
        match = from_bytes(bytes(content)).best()
        print()
        print('content', content)
        print()
        print('bytes(content)', bytes(content))
        print()
        print('from_bytes(bytes(content))._results', from_bytes(bytes(content))._results)
        print()
        print('match', match)
        print()
        print('match.encoding', match.encoding)
        print()
        if match:
            encoding = match.encoding
    return encoding

Noted that, with the current (original) text string of: 卷首卷首卷首卷首卷卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首卷首, we were getting:

content bytearray(b'\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba')

bytes(content) b'\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba\xa8\xf7\xad\xba'

from_bytes(bytes(content))._results [<CharsetMatch 'johab' fp(-1782770569132810705)>, <CharsetMatch 'big5' fp(9095422849593591809)>, <CharsetMatch 'shift_jis_2004' fp(3898380262017389457)>]

match 뻥솤뻥솤뻥솤뻥솤뻥뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤뻥솤

match.encoding johab

The best match was johab, and big5 was the second best match in the list. Some byte sequences are genuinely ambiguous between encodings like Big5, Johab, and Shift-JIS because they share overlapping byte ranges, so this makes sense.

Fix: updated the test string to be 你好世界。你好世界。你好世界。你好世界。你好世界。你好世界。你好世界。, which is unambigiously big5-encoded:

content bytearray(b' \xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C')

bytes(content) b' \xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C\xa7A\xa6n\xa5@\xac\xc9\xa1C'

from_bytes(bytes(content))._results [<CharsetMatch 'big5' fp(320475358053722554)>]

match  你好世界。你好世界。你好世界。你好世界。你好世界。你好世界。你好世界。

match.encoding big5

All tests are passing now.

codecov-commenter · 2026-05-24T17:15:25Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.11%. Comparing base (4d7d6b6) to head (fac60b8).
⚠️ Report is 383 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1855      +/-   ##
==========================================
- Coverage   97.28%   94.11%   -3.18%     
==========================================
  Files          67      113      +46     
  Lines        4235     7694    +3459     
==========================================
+ Hits         4120     7241    +3121     
- Misses        115      453     +338

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

done

fac60b8

jautung mentioned this pull request May 24, 2026

downloads: don't use Content-Length as progress total when Content-Encoding is set #1816

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed unit test failures for `test_terminal_output_response_charset_detection` and `test_terminal_output_request_charset_detection`#1855

Fixed unit test failures for `test_terminal_output_response_charset_detection` and `test_terminal_output_request_charset_detection`#1855
jautung wants to merge 1 commit into
httpie:masterfrom
jautung:update-big5-detection-test

jautung commented May 24, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jautung commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jautung commented May 24, 2026 •

edited

Loading

codecov-commenter commented May 24, 2026 •

edited

Loading