-
Notifications
You must be signed in to change notification settings - Fork 76
Inconsistent quoting of header columns #26
Comments
Code uses simple heuristics to determine whether quoting might be needed, and conservatively quotes Strings it thinks may require quoting. I am open to improvement ideas, if this is a problem; for example forcing quoting for all header names. But streaming nature of API makes it difficult to add limitations like "all-or-nothing". |
I'm noticing this as well but it doesn't seem specific to headers. I see some headers and some values that from a content perspective, would not require quotes around them but they have quotes around them. The resulting output is ugly unnecessarily. I'll look around and see if there is anything I can help with. |
Nothing to escape here but... protected final boolean _mayNeedQuotes(String value, int length)
{
// 21-Mar-2014, tatu: If quoting disabled, don't quote
if (_cfgQuoteCharacter < 0) {
return false;
}
// let's not bother checking long Strings, just quote already:
if (length > MAX_QUOTE_CHECK) {
return true;
}
for (int i = 0; i < length; ++i) {
if (value.charAt(i) < _cfgMinSafeChar) {
return true;
}
}
return false;
} You can take a look at beanIO library for parsing files. It's slower than jackson but with valid escaping and quotes. |
I can easily remove final from there (as a design principle, it's one of two choices, white vs blacklisting). But bigger question is this: should there be different logic for handling header fields? As I have commented on multiple entries, choice of whether to quote or not can be done in two ways:
Jackson does (2). I am perfectly fine in changing this, but want to do it in a way that tries to balance overhead with benefits. To me, for example, mix of quoted and unquoted works just fine. For others less so. What I do not like is getting complaints of how things suck, without suggestions of what could be done, and proposing improved handling. On that note: how about:
How does this sound? |
Thank you for your great Work! Handling of quotes must be the same everywhere I think(both headers and data). I've pasted a sample of text and quoting is inconsistent. Sorry, have never worked with GitHub. To my mind
is better and the final decision is up to you. Here is the implementation for escaping from beanIO library. /**
* Returns <tt>true</tt> if the given field must be quoted.
* @param cs the field to test
* @return <tt>true</tt> if the given field must be quoted
*/
private boolean mustQuote(char [] cs) {
for (char c : cs) {
if (c == delim)
return true;
if (c == quote)
return true;
if (c == '\n')
return true;
if (c == '\r')
return true;
}
return false;
} |
Added |
I instructed the writer to write the schema as a header line. Why are two columns quoted and the other three not? This is with 2.1.1.
The text was updated successfully, but these errors were encountered: