StreamReadConstraints: add maxStringLen #864

pjfanning · 2022-12-15T14:09:08Z

Relates to #863

Open for discussion
The TextBuffer append methods have been updated to police the string length. This will lead to more timely exceptions than waiting until contentsAsString method is called.
A new class ReadConstrainedTextBuffer has been created. The legacy TextBuffer can still be used in cases where you don't want to apply the read constraints.
Needs tests
Needs uptake in jackson-databind and jackson-dataformat* libs

pjfanning · 2022-12-15T14:34:20Z

@cowtowncoder I think it would be tidier to add a constructor to TextBuffer where a streamReadConstraints can be supplied. Then all the methods that take a streamReadConstraints will be changed not to take this any more. We could end up with virtually every method on TextBuffer taking streamReadConstraints as input otherwise.

For 'write' use cases like SegmentedStringWriter, we would support creating TextBuffer with no streamReadConstraints and no limits would be applied by TextBuffer.

cowtowncoder · 2022-12-15T18:32:45Z

@cowtowncoder I think it would be tidier to add a constructor to TextBuffer where a streamReadConstraints can be supplied. Then all the methods that take a streamReadConstraints will be changed not to take this any more. We could end up with virtually every method on TextBuffer taking streamReadConstraints as input otherwise.

For 'write' use cases like SegmentedStringWriter, we would support creating TextBuffer with no streamReadConstraints and no limits would be applied by TextBuffer.

I really do not like the idea of TextBuffer being tied to StreamReadConstraints tho...

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java

src/main/java/com/fasterxml/jackson/core/io/SegmentedStringWriter.java

cowtowncoder · 2022-12-15T18:40:10Z

Ok so yeah, thank you for starting with this! I think the place to check for exceeding limit(s) is when caller asks for new segment to use (usually as side effect of "finishing" current segment).

But the question of whether to construct TextBuffer with constraints, or always pass is... I can see why this is challenging.

It may be necessary to see what to do with SegmentedStringWriter which (as you noticed) should not be affected by limits.

pjfanning · 2022-12-18T13:12:08Z

@cowtowncoder I had a look at the jackson-dataformat-csv code. It has its own TextBuffer class. I was trying to debug it because it has no 'append' methods. The appending is done by leaking the internal segment state into CsvDecoder. I haven't investigated if there is equivalent code in jackson-core code but if there is then checking in the append methods is not enough. I suspect that I don't know enough about how the TextBuffer is used.

cowtowncoder · 2022-12-19T01:52:41Z

@pjfanning I don't remember details of CsvTextBuffer but I am sure we can figure that out if we can handle standard TextBuffer. I'll see if I can find bit more to remind myself why there is the discrepancy; I thought it had to do with buffer recycling (probably not a great reason to have custom one).

pjfanning · 2022-12-19T10:44:18Z

@cowtowncoder I did some testing today with the latest code here. The tests are in jackson-databind and test very large strings. I found that I had to check for the length in a lot of places - because of the different paths that you can hit going through the TextBuffer code.

Of all the checks, the append ones seem like they are not useful. The checks in contentsAsString seem to be needed for the edge case where the segmentLen + currentLen is just a little over the limit.

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java

cowtowncoder · 2022-12-19T23:43:23Z

@cowtowncoder I did some testing today with the latest code here. The tests are in jackson-databind and test very large strings. I found that I had to check for the length in a lot of places - because of the different paths that you can hit going through the TextBuffer code.

Of all the checks, the append ones seem like they are not useful. The checks in contentsAsString seem to be needed for the edge case where the segmentLen + currentLen is just a little over the limit.

As I've mentioned contentsAsString() is not something I want to use. But I can accept that various append()s are probably not worth worrying about.

pjfanning · 2022-12-19T23:51:46Z

The new contentsAsString() checks could be removed - the impact is that strings that are a bit bigger than the supposed max can get through. It may be possible to track down somewhere else that this can be remedied.

The impact of the PR change in contentsAsString is a small extra if check (is the len greater than the limit). Is this something to worry about?

cowtowncoder · 2022-12-19T23:56:28Z

I am not sure I am supportive of this approach: I will actually propose a bit different way to go about for numeric checks too, trying to instead reduce work TextBuffer is to do, not increase it.

Looking at ParserBase.resetInt() and .resetFloat() I think I'd favor making checks from those points (for the most part) and reducing validation done at TextBuffer where possible.
I will also see if methods like TextBuffer.contentsAsDecimal() are actually needed at all.

pjfanning · 2023-01-31T20:27:29Z

@cowtowncoder is this PR/issue worth pursuing for v2.15.0?

cowtowncoder · 2023-02-01T00:04:29Z

@pjfanning I haven't had time to think more about this, but I would like to keep this open still; maybe this is the way to go. I think my main concern has always been that it'd be better to tackle length constraints before final aggregation -- probably in spot where next buffer is being requested (there are couple of methods).
Do you think that'd be possible? We could also there limit new buffer size to maximum allowed wrt limits, and so check would only be failing when requesting the next buffer, last being full.
(LMK if my explanation doesn't make sense, I can try to elaborate)

I think split into length-constrained and regular one makes actually sense; the only (?) concern is whether TextBuffer recycling could be problematic. I don't think so -- BufferRecycler only handles underlying char[], not TextBuffer?

pjfanning · 2023-02-03T20:57:54Z

@cowtowncoder I'm not sure if your biggest objection to having the checks in TextBuffer or specifically, in TextBuffer.contentsAsString.

In the latest iteration of this PR, I have removed the check in TextBuffer.contentsAsString. This leads to a failure in FasterXML/jackson-databind#3704 (a PR in jackson-databind that relies on this PR - using ./mvnw install on the branch used for this PR and testing jackson-databind with some extra tests in that 3704 PR). The reason is that testing at the buffer expansion point means that we will miss issues where the total size is a little bigger than the limit. Testing in contentsAsString means, we get the benefit of early failures at the buffer expansion point but that we apply the limit exactly in contentsAsString. The test in PR 3704 that fails is TestBigStrings.testBigString where the string is 1,001,000 chars when the limit is 1,000,000 chars - the test fails because we do not fail with the slightly too large string.

I'm happy enough if we decide that the max limit is an approx limit that errs a little (proportionally) on the side of allowing strings that are sized a little over the limit size.

cowtowncoder · 2023-02-07T04:42:59Z

@pjfanning First of all, apologies once again for being very slow to follow up here.

Having said that, to me approximate check is perfectly reasonable, so limit is sort of minimum -- will not fail at below that level. And also that relative accuracy matters more for bigger Strings; there's no need to worry about super low limits (IMO) like keeping things to 1k or something. Those should, I think, be handled by something else. At this level we worry about service protection kinds of things.

My only concern about relying on contentAsString() is if it's the only thing and we could, f.ex, read 400k even if limit was 10k or so. I do not mind having check also there if that makes sense.

I hope to get back to checking this out soon, sometime this week.

And will try to keep thinking about the other dimension: that of nesting. It's even more performance sensitive as the best place to add limits would be within StreamReadContext / StreamWriteContext. But changes to that interface get... tricky.

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java

src/main/java/com/fasterxml/jackson/core/StreamReadConstraints.java

cowtowncoder · 2023-02-07T04:53:04Z

Ok this is looking pretty good and I think I am starting to feel better about having length-limited TextBuffer sub-class.
So I think this might be the way to go!

Couple of questions, thoughts:

CSV module has CsvTextBuffer; wonder if we could somehow make CSV module use standard one(s) instead -- I don't quite remember why I did the "fork" here :-/
Although 1M characters sounds reasonable, I have a nagging feeling someone somewhere might still use slightly bigger sizes. So I wonder if there'd be any alternative bigger limit that makes sense. I think 10 megs is quite common for maximum uploadable file/doc size, fwtw.

pjfanning · 2023-02-07T19:34:29Z

I would ignore CSV for the moment. jackson-databind and other dataformat modules would need to be updated to add tests. We could circle back around to CSV module later. As part of adding a test, we can see if it is easier to recreate the new feature in CsvTextBuffer or to change CSV module to use TextBuffer. I did try the latter before and it didn't look straightforward. At this stage, CsvTextBuffer is something of its own beast.

I can look at changing the 1m default to a 10m default.

pjfanning · 2023-02-07T21:52:40Z

src/test/java/com/fasterxml/jackson/core/json/async/AsyncStringArrayTest.java

@@ -128,6 +128,34 @@ public void testLongUnicodeStrings() throws IOException
        _testStrings(f, input, data, 1, 1);
    }

+    public void testLongAsciiStringsSmallLimit() throws IOException


@cowtowncoder the test I added to passes when it is run by itself but fails when it is run with other tests. It seems that running other tests with the default StreamReadConstraints seems to leave behind state - maybe in the BufferRecyclers - that means the validateStringLength checks don't get called.

Yes, due to buffer recycling it is possible that recycled buffer has reached size of up to 64k chars.

So test will probably be set as much less strict: to set maximum length to say 100k, produce 200k String and verify that gets caught.

Another thing we could consider: making MAX_SEGMENT_LEN no higher than maxStringLength.
Then again... maybe not worth bothering yet.

cowtowncoder · 2023-02-09T04:34:15Z

I can look at changing the 1m default to a 10m default.

Just realized that I didn't mean to suggest 10 meg as default necessarily as much as sort of data point: if the max document is 10 megs, then perhaps max value length of 1 meg is reasonable.

cowtowncoder · 2023-02-10T03:16:21Z

@pjfanning Ok, I am almost ready to merge.

And now I think that I actually agree with you on also checking length on contentsAsString(): we may get check fail earlier on buffer alloc, but if not, this is where we get the exact check that users expect

Doing this would make tests pass I think, since buffer size wouldn't matter on eventual check accuracy; we would temporarily handle longer content but catch that when String actually accessed.

pjfanning · 2023-02-10T10:46:04Z

@pjfanning Ok, I am almost ready to merge.

And now I think that I actually agree with you on also checking length on contentsAsString(): we may get check fail earlier on buffer alloc, but if not, this is where we get the exact check that users expect

Doing this would make tests pass I think, since buffer size wouldn't matter on eventual check accuracy; we would temporarily handle longer content but catch that when String actually accessed.

I added back some checks to contentsAsString and contentsAsArray. I think the checks are strategically placed so that we don't validate too often but it would be worth going through them again to make sure that none are superfluous.

cowtowncoder

I think this looks good -- we could possibly avoid some checks (resetWithString()) but nothing that seems like it'd have measurable performance impact.

If you can change this from draft to regular PR, I think you could merge it.

There's probably need to start discussing actual limits to go with, on mailing lists, but let's start with 1 megachars/bytes and see where we end up.

pjfanning · 2023-02-14T00:18:20Z

In my testing, resetWithString check was needed - it shouldn't add much overhead.

I've marked this a ready for review.

cowtowncoder · 2023-02-14T04:45:28Z

Ok phew! It finally went in. Thank you again @pjfanning !

Also, I decided I should add you as co-author for jackson-core due to massive number of contributions; modified release-notes/CREDITS-2.x to state that.

add maxStringLen

0ace988

pjfanning marked this pull request as draft December 15, 2022 14:09

cowtowncoder reviewed Dec 15, 2022

View reviewed changes

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java Outdated Show resolved Hide resolved

cowtowncoder reviewed Dec 15, 2022

View reviewed changes

src/main/java/com/fasterxml/jackson/core/io/SegmentedStringWriter.java Outdated Show resolved Hide resolved

pjfanning added 8 commits December 17, 2022 12:30

start refactor

39e0615

add ReadConstrainedTextBuffer

2361a6a

check for string length in append methods

2c24999

revert some changes

9089481

Merge branch '2.15' into max-string-size

72e1765

javadoc

9b8598f

Update TextBuffer.java

99e6af1

Update TextBuffer.java

d0184b9

Update TextBuffer.java

cef6270

pjfanning mentioned this pull request Dec 19, 2022

test big strings FasterXML/jackson-databind#3704

Merged

pjfanning commented Dec 19, 2022

View reviewed changes

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java Outdated Show resolved Hide resolved

pjfanning added 3 commits December 20, 2022 12:41

Merge branch '2.15' into max-string-size

800c7f6

Merge branch '2.15' into max-string-size

02b0fdc

Merge branch '2.15' into max-string-size

9626ada

pjfanning added 2 commits January 31, 2023 21:50

remove append checks (don't seem to help much)

45178f8

remove number check

d02f9ac

Merge branch '2.15' into max-string-size

99d9b72

cowtowncoder reviewed Feb 7, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/core/util/TextBuffer.java Show resolved Hide resolved

cowtowncoder reviewed Feb 7, 2023

View reviewed changes

src/main/java/com/fasterxml/jackson/core/StreamReadConstraints.java Outdated Show resolved Hide resolved

pjfanning added 5 commits February 7, 2023 20:36

Merge branch '2.15' into max-string-size

5fc9289

increase limit to 10m

287bca2

add 1 test

1be0115

Update StreamReadConstraints.java

2dbb68e

Update AsyncStringArrayTest.java

e95d9f4

pjfanning commented Feb 7, 2023

View reviewed changes

pjfanning added 2 commits February 9, 2023 06:20

Merge branch '2.15' into max-string-size

75d4b5b

change default limit back to 1m

a83e816

check string len in more places

93d358b

cowtowncoder approved these changes Feb 14, 2023

View reviewed changes

pjfanning marked this pull request as ready for review February 14, 2023 00:17

pjfanning changed the title ~~WIP: add maxStringLen~~ StreamReadConstraints: add maxStringLen Feb 14, 2023

cowtowncoder added the 2.15 label Feb 14, 2023

cowtowncoder added this to the 2.15.0 milestone Feb 14, 2023

cowtowncoder merged commit b14ec4a into FasterXML:2.15 Feb 14, 2023

pjfanning deleted the max-string-size branch February 14, 2023 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamReadConstraints: add maxStringLen #864

StreamReadConstraints: add maxStringLen #864

pjfanning commented Dec 15, 2022 •

edited

Loading

pjfanning commented Dec 15, 2022

cowtowncoder commented Dec 15, 2022

cowtowncoder commented Dec 15, 2022

pjfanning commented Dec 18, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Dec 19, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Dec 19, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Jan 31, 2023

cowtowncoder commented Feb 1, 2023

pjfanning commented Feb 3, 2023 •

edited

Loading

cowtowncoder commented Feb 7, 2023

cowtowncoder commented Feb 7, 2023

pjfanning commented Feb 7, 2023

pjfanning Feb 7, 2023

cowtowncoder Feb 10, 2023

cowtowncoder commented Feb 9, 2023

cowtowncoder commented Feb 10, 2023

pjfanning commented Feb 10, 2023

cowtowncoder left a comment

pjfanning commented Feb 14, 2023

cowtowncoder commented Feb 14, 2023

StreamReadConstraints: add maxStringLen #864

StreamReadConstraints: add maxStringLen #864

Conversation

pjfanning commented Dec 15, 2022 • edited Loading

pjfanning commented Dec 15, 2022

cowtowncoder commented Dec 15, 2022

cowtowncoder commented Dec 15, 2022

pjfanning commented Dec 18, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Dec 19, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Dec 19, 2022

cowtowncoder commented Dec 19, 2022

pjfanning commented Jan 31, 2023

cowtowncoder commented Feb 1, 2023

pjfanning commented Feb 3, 2023 • edited Loading

cowtowncoder commented Feb 7, 2023

cowtowncoder commented Feb 7, 2023

pjfanning commented Feb 7, 2023

pjfanning Feb 7, 2023

Choose a reason for hiding this comment

cowtowncoder Feb 10, 2023

Choose a reason for hiding this comment

cowtowncoder commented Feb 9, 2023

cowtowncoder commented Feb 10, 2023

pjfanning commented Feb 10, 2023

cowtowncoder left a comment

Choose a reason for hiding this comment

pjfanning commented Feb 14, 2023

cowtowncoder commented Feb 14, 2023

pjfanning commented Dec 15, 2022 •

edited

Loading

pjfanning commented Feb 3, 2023 •

edited

Loading