添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
I tried cpdgui.bat in pmd-bin-5.1.2.zip on windows 7 to parse c++ code.
And I got many TokenMgrError exceptions.
My console output is like:

net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file C:\work\common\app\wrapper\a.cpp at line 1, column 1. Encountered: "\ufffd" (65533), after : ""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1648)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:23)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:27)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.GUI.go(GUI.java:605)
at net.sourceforge.pmd.cpd.GUI.access$500(GUI.java:70)
at net.sourceforge.pmd.cpd.GUI$GoListener$1.run(GUI.java:189)
at java.lang.Thread.run(Thread.java:724)
Skipping C:\work\common\app\wrapper\a.cpp due to parse error

Best Regards,

Discussion

FFFD is the "replacement character" which is used, when a character couldn't be mapped.
It seems, your cpp file is in a different encoding than CPD uses to read the file.

Try to set the file encoding in the input box. Maybe you need to use "windows-1252" or "ISO-8859-15" instead of "UTF-8".

Let me know, if this fixes the problem. If not, please attach the problematic file to this issue. Thanks.

Hi Andreas,

Thank you for the response.
I tried the suggested encodings such as "windows-1252", "ISO-8859-15", "MS932" and "UTF-8" but none of them works.
Actually I use "UTF-8" in my source code.
I attach the file to reproduce this problem.

Best Regards,

How do you know, that his patch is not working? For me it works with the attached wrapper.cpp file.

While it's true, that the byte -sequence of the BOM for UTF-16 and UTF-8 is different, the unicode codepoint is still the same: U+FEFF .

If this is still an issue for you, please provide an example file.

Thanks,
Andreas

Hi Andreas. Sorry for the late reply. I attached example file. The PMD 5.2.2 can't parse this file.

Error:
net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file /home/Lukasz/Dokumenty/cpd_issue/example.cpp at line 5, column 23. Encountered: "\u0105" (261), after : "\""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1650)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:27)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:63)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.GUI.go(GUI.java:609)
at net.sourceforge.pmd.cpd.GUI.access$500(GUI.java:70)
at net.sourceforge.pmd.cpd.GUI$GoListener$1.run(GUI.java:193)
at java.lang.Thread.run(Thread.java:744)

Best regards,
Lukasz

This issue is not fixed in the versions 5.3.3 and 5.3.4. I still get the following errors:

net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file C:\MySource.cpp at line 1, column 1. Encountered: "\ufffd" (65533), after : ""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1623)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:27)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:63)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.CPDCommandLineInterface.addSourcesFilesToCPD(CPDCommandLineInterface.java:103)
at net.sourceforge.pmd.cpd.CPDCommandLineInterface.main(CPDCommandLineInterface.java:80)
at net.sourceforge.pmd.cpd.CPD.main(CPD.java:180)
Skipping C:\MySource.cpp due to parse error

thanks for the info. Would you mind posting me your file "MySource.cpp" which shows this problem? Otherwise I won't be able to reproduce your problem.
Thanks!
Andreas

I can reproduce the problem, if I create an invalid C file...

e.g. the following C file will reproduce the error:

�int main()
    return 0;

However, this file cannot be compiled:

gcc /tmp/c-with-replacementcharacter.c -o test
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\357’ in program
 �int main()
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\277’ in program
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\275’ in program

Does your file MySource.cpp actually compile?

I'm hesitating to add a workaround to skip the first unicode character if it is a U+FFFD, as for me, I can't compile such files...

thanks for the info. Would you mind posting me your file "MySource.cpp" which shows this problem? Otherwise I won't be able to reproduce your problem.
Thanks!
Andreas

Alright, yes: The problem is, that the file is UTF-16 encoded and CPD is not aware of this. Just call CPD with the --encoding argument to set the source encoding (which defaults to whatever your platform default is).

E.g. complete command line for CPD:

run.sh cpd --minimum-tokens 100 --files utf16 --language cpp --format text --encoding utf16

Then CPD swallows the file without complains :)