Skip to content

Unable to update Unicode version to 16.0.0 #138

@adithyaov

Description

@adithyaov

I'm trying to upgrade Unicode to 16.0.0
On running ./ucd generate (after ./ucd/download) I'm greeted with this:

Too many script extensions: 271

Interesting code blocks to look at:

UCD2Haskell/Modules/ScriptExtensions.hs:150

        encodedExtensions :: Map.Map (NE.NonEmpty BS.ShortByteString) Word8
        encodedExtensions = let len = length extensionsList in if len > 0xff
            then error ("Too many script extensions: " <> show len)
            else Map.fromList (zip extensionsList [0..])
        -- Encode single script as their script value
        extensionsList = singleScriptExtensions
                      <> Set.toList multiScriptExtensions

It looks like
length (singleScriptExtensions <> Set.toList multiScriptExtensions)
is > 271.

UCD2Haskell/Modules/ScriptExtensions.hs:141

        singleScriptExtensions = pure . getScriptAbbr <$> scripts
        singleScriptExtensionsSet = Set.fromList singleScriptExtensions
        multiScriptExtensions :: Set.Set (NE.NonEmpty BS.ShortByteString)
        multiScriptExtensions = Set.fromList (Map.elems extensions)
                                Set.\\ singleScriptExtensionsSet

singleScriptExtensionsSet cannot be more than 204 as:

$ cat data/16.0.0/ucd/ScriptExtensions.txt | grep -v '^\s*#' | grep -v '^\s*$' | wc -l
204

So multiScriptExtensions bumps the value up?

I've not investigated this further. We use genEnumBitmapShamochu to generate
the bitmap, and Shamochu restricts the number of elements to be < 0xff. So by
design, we don't allow extensionsList to have more than 0xff elements.

This is a bug if there SHOULD NOT be more than 0xff extensions, or this is a
LIMITATION of the generator if we don't support more than 0xff extensions.

I'm not sure how to fix this as I don't fully understand the problem.

@wismill could you please suggest how I should proceed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions