I tried cpdgui.bat in pmd-bin-5.1.2.zip on windows 7 to parse c++ code.
And I got many TokenMgrError exceptions.
My console output is like:
net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file C:\work\common\app\wrapper\a.cpp at line 1, column 1. Encountered: "\ufffd" (65533), after : ""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1648)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:23)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:27)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.GUI.go(GUI.java:605)
at net.sourceforge.pmd.cpd.GUI.access$500(GUI.java:70)
at net.sourceforge.pmd.cpd.GUI$GoListener$1.run(GUI.java:189)
at java.lang.Thread.run(Thread.java:724)
Skipping C:\work\common\app\wrapper\a.cpp due to parse error
Best Regards,
Discussion
FFFD is the "replacement character" which is used, when a character couldn't be mapped.
It seems, your cpp file is in a different encoding than CPD uses to read the file.
Try to set the file encoding in the input box. Maybe you need to use "windows-1252" or "ISO-8859-15" instead of "UTF-8".
Let me know, if this fixes the problem. If not, please attach the problematic file to this issue. Thanks.
Hi Andreas,
Thank you for the response.
I tried the suggested encodings such as "windows-1252", "ISO-8859-15", "MS932" and "UTF-8" but none of them works.
Actually I use "UTF-8" in my source code.
I attach the file to reproduce this problem.
Best Regards,
How do you know, that his patch is not working? For me it works with the attached wrapper.cpp file.
While it's true, that the
byte
-sequence of the BOM for UTF-16 and UTF-8 is different, the
unicode codepoint
is still the same:
U+FEFF
.
If this is still an issue for you, please provide an example file.
Thanks,
Andreas
Hi Andreas. Sorry for the late reply. I attached example file. The PMD 5.2.2 can't parse this file.
Error:
net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file /home/Lukasz/Dokumenty/cpd_issue/example.cpp at line 5, column 23. Encountered: "\u0105" (261), after : "\""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1650)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:27)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:63)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.GUI.go(GUI.java:609)
at net.sourceforge.pmd.cpd.GUI.access$500(GUI.java:70)
at net.sourceforge.pmd.cpd.GUI$GoListener$1.run(GUI.java:193)
at java.lang.Thread.run(Thread.java:744)
Best regards,
Lukasz
This issue is not fixed in the versions 5.3.3 and 5.3.4. I still get the following errors:
net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in file C:\MySource.cpp at line 1, column 1. Encountered: "\ufffd" (65533), after : ""
at net.sourceforge.pmd.lang.cpp.ast.CppParserTokenManager.getNextToken(CppParserTokenManager.java:1623)
at net.sourceforge.pmd.lang.cpp.CppTokenManager.getNextToken(CppTokenManager.java:27)
at net.sourceforge.pmd.cpd.CPPTokenizer.tokenize(CPPTokenizer.java:63)
at net.sourceforge.pmd.cpd.CPD.addAndThrowLexicalError(CPD.java:144)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:139)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:104)
at net.sourceforge.pmd.cpd.CPD.add(CPD.java:66)
at net.sourceforge.pmd.cpd.CPD.addDirectory(CPD.java:76)
at net.sourceforge.pmd.cpd.CPD.addRecursively(CPD.java:61)
at net.sourceforge.pmd.cpd.CPDCommandLineInterface.addSourcesFilesToCPD(CPDCommandLineInterface.java:103)
at net.sourceforge.pmd.cpd.CPDCommandLineInterface.main(CPDCommandLineInterface.java:80)
at net.sourceforge.pmd.cpd.CPD.main(CPD.java:180)
Skipping C:\MySource.cpp due to parse error
thanks for the info. Would you mind posting me your file "MySource.cpp" which shows this problem? Otherwise I won't be able to reproduce your problem.
Thanks!
Andreas
I can reproduce the problem, if I create an invalid C file...
e.g. the following C file will reproduce the error:
�int main()
return 0;
However, this file cannot be compiled:
gcc /tmp/c-with-replacementcharacter.c -o test
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\357’ in program
�int main()
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\277’ in program
/tmp/c-with-replacementcharacter.c:1:1: error: stray ‘\275’ in program
Does your file MySource.cpp actually compile?
I'm hesitating to add a workaround to skip the first unicode character if it is a U+FFFD, as for me, I can't compile such files...
thanks for the info. Would you mind posting me your file "MySource.cpp" which shows this problem? Otherwise I won't be able to reproduce your problem.
Thanks!
Andreas
Alright, yes: The problem is, that the file is UTF-16 encoded and CPD is not aware of this. Just call CPD with the
--encoding
argument to set the source encoding (which defaults to whatever your platform default is).