1
Vote

2-,3-, and 4-byte character sequences at the end of file

description

I believe I have found an issue in IsValid() in Utf8Checker.cs. The checks for 2-,3-, and 4-byte character sequences will fail if the bytes are at the end of the file due to >= checks.
 
I discovered this by taking ansi.txt, which is part of the test data, and saving it as another file using UTF-8 encoding. Utf8Checker fails this file. If I change the >= checks to simply >, I get the desired behavior (i.e., ansi.txt fails, but the same file converted to UTF-8 is considered valid).

comments