unicode-width-workaround {cli}R Documentation

Working around the bad Unicode character widths

Description

R 3.6.2 and also the coming 3.6.3 and 4.0.0 versions use the Unicode 8 standard to calculate the display width of Unicode characters. Unfortunately the widths of most emojis are incorrect in this standard, and width 1 is reported instead of the correct 2 value.

Details

See more about this here: https://github.com/brodieG/fansi/issues/62

cli implements a workaround for this. The package contains a table that contains all Unicode ranges that have wide characters (display width 2).

On first use of one of the workaround wrappers (strwrap2_fixed(), etc.) we check what the current version of R thinks about the width of these characters, and then create a regex that matches the ones that R is wrong about (re_bad_char_width).

Then we use this regex to duplicate all of the problematic characters in the input string to the wrapper function, before calling the real string manupulation function (char, strwrap) etc. At end we undo the duplication before we return the result.

This workaround is fine for nchar() and strwrap() (& co in fansi). It is potentially not fine for substr(), but we don't currently use that...


[Package cli version 2.2.0 Index]