-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamReadConstraints: add maxStringLen #864
Conversation
@cowtowncoder I think it would be tidier to add a constructor to TextBuffer where a streamReadConstraints can be supplied. Then all the methods that take a streamReadConstraints will be changed not to take this any more. We could end up with virtually every method on TextBuffer taking streamReadConstraints as input otherwise. For 'write' use cases like SegmentedStringWriter, we would support creating TextBuffer with no streamReadConstraints and no limits would be applied by TextBuffer. |
I really do not like the idea of |
src/main/java/com/fasterxml/jackson/core/io/SegmentedStringWriter.java
Outdated
Show resolved
Hide resolved
Ok so yeah, thank you for starting with this! I think the place to check for exceeding limit(s) is when caller asks for new segment to use (usually as side effect of "finishing" current segment). But the question of whether to construct It may be necessary to see what to do with |
@cowtowncoder I had a look at the jackson-dataformat-csv code. It has its own TextBuffer class. I was trying to debug it because it has no 'append' methods. The appending is done by leaking the internal segment state into CsvDecoder. I haven't investigated if there is equivalent code in jackson-core code but if there is then checking in the append methods is not enough. I suspect that I don't know enough about how the TextBuffer is used. |
@pjfanning I don't remember details of |
@cowtowncoder I did some testing today with the latest code here. The tests are in jackson-databind and test very large strings. I found that I had to check for the length in a lot of places - because of the different paths that you can hit going through the TextBuffer code. Of all the checks, the append ones seem like they are not useful. The checks in contentsAsString seem to be needed for the edge case where the segmentLen + currentLen is just a little over the limit. |
As I've mentioned |
The new contentsAsString() checks could be removed - the impact is that strings that are a bit bigger than the supposed max can get through. It may be possible to track down somewhere else that this can be remedied. The impact of the PR change in contentsAsString is a small extra if check (is the len greater than the limit). Is this something to worry about? |
I am not sure I am supportive of this approach: I will actually propose a bit different way to go about for numeric checks too, trying to instead reduce work Looking at |
@cowtowncoder is this PR/issue worth pursuing for v2.15.0? |
@pjfanning I haven't had time to think more about this, but I would like to keep this open still; maybe this is the way to go. I think my main concern has always been that it'd be better to tackle length constraints before final aggregation -- probably in spot where next buffer is being requested (there are couple of methods). I think split into length-constrained and regular one makes actually sense; the only (?) concern is whether |
@cowtowncoder I'm not sure if your biggest objection to having the checks in TextBuffer or specifically, in TextBuffer.contentsAsString. In the latest iteration of this PR, I have removed the check in TextBuffer.contentsAsString. This leads to a failure in FasterXML/jackson-databind#3704 (a PR in jackson-databind that relies on this PR - using I'm happy enough if we decide that the max limit is an approx limit that errs a little (proportionally) on the side of allowing strings that are sized a little over the limit size. |
@pjfanning First of all, apologies once again for being very slow to follow up here. Having said that, to me approximate check is perfectly reasonable, so limit is sort of minimum -- will not fail at below that level. And also that relative accuracy matters more for bigger Strings; there's no need to worry about super low limits (IMO) like keeping things to 1k or something. Those should, I think, be handled by something else. At this level we worry about service protection kinds of things. My only concern about relying on I hope to get back to checking this out soon, sometime this week. And will try to keep thinking about the other dimension: that of nesting. It's even more performance sensitive as the best place to add limits would be within StreamReadContext / StreamWriteContext. But changes to that interface get... tricky. |
src/main/java/com/fasterxml/jackson/core/StreamReadConstraints.java
Outdated
Show resolved
Hide resolved
Ok this is looking pretty good and I think I am starting to feel better about having length-limited Couple of questions, thoughts:
|
I would ignore CSV for the moment. jackson-databind and other dataformat modules would need to be updated to add tests. We could circle back around to CSV module later. As part of adding a test, we can see if it is easier to recreate the new feature in CsvTextBuffer or to change CSV module to use TextBuffer. I did try the latter before and it didn't look straightforward. At this stage, CsvTextBuffer is something of its own beast. I can look at changing the 1m default to a 10m default. |
@@ -128,6 +128,34 @@ public void testLongUnicodeStrings() throws IOException | |||
_testStrings(f, input, data, 1, 1); | |||
} | |||
|
|||
public void testLongAsciiStringsSmallLimit() throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cowtowncoder the test I added to passes when it is run by itself but fails when it is run with other tests. It seems that running other tests with the default StreamReadConstraints seems to leave behind state - maybe in the BufferRecyclers - that means the validateStringLength checks don't get called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, due to buffer recycling it is possible that recycled buffer has reached size of up to 64k chars.
So test will probably be set as much less strict: to set maximum length to say 100k, produce 200k String and verify that gets caught.
Another thing we could consider: making MAX_SEGMENT_LEN
no higher than maxStringLength
.
Then again... maybe not worth bothering yet.
Just realized that I didn't mean to suggest 10 meg as default necessarily as much as sort of data point: if the max document is 10 megs, then perhaps max value length of 1 meg is reasonable. |
@pjfanning Ok, I am almost ready to merge. And now I think that I actually agree with you on also checking length on Doing this would make tests pass I think, since buffer size wouldn't matter on eventual check accuracy; we would temporarily handle longer content but catch that when String actually accessed. |
I added back some checks to contentsAsString and contentsAsArray. I think the checks are strategically placed so that we don't validate too often but it would be worth going through them again to make sure that none are superfluous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good -- we could possibly avoid some checks (resetWithString()) but nothing that seems like it'd have measurable performance impact.
If you can change this from draft to regular PR, I think you could merge it.
There's probably need to start discussing actual limits to go with, on mailing lists, but let's start with 1 megachars/bytes and see where we end up.
In my testing, resetWithString check was needed - it shouldn't add much overhead. I've marked this a ready for review. |
Ok phew! It finally went in. Thank you again @pjfanning ! Also, I decided I should add you as co-author for |
Relates to #863