CookCC
0.4.3
CookCC documentation
Tutorial (pdf)
Command Line Options
Maven
Ant Task
Lexer
Regular Expression
Parser
Input
Java
XML
Yacc
Output
Java
Plain Text
XML
Yacc
Miscellaneous
JavaDoc
Troubleshooting
Test cases
Discussion group
LICENSE
Input Encoding Detection
Detection Methods
Java Libraries
ASCII map
Unicode map
CookCC
Docs
»
Input Encoding Detection
Input Encoding Detection
¶
Detection Methods
¶
Encoding detection is mostly a guess work.
BOM
is obviously the most useful in detecting the input incoding stream.
XML
has encoding declaration.
HTML
has an encoding sniffing algorithm.
Java Libraries
¶
juniversalchardet
jChardet
cpdetector
ICU4J
Apache Tika
- uses a combination of above libraries.