Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.15.0-rc1 and processing-limit default values #958

Closed
carterkozak opened this issue Mar 21, 2023 · 23 comments
Closed

2.15.0-rc1 and processing-limit default values #958

carterkozak opened this issue Mar 21, 2023 · 23 comments

Comments

@carterkozak
Copy link
Contributor

In testing 2.15.0-rc1 I've found that several tests with large inputs begin to fail unexpectedly due to processing limits introduced in #827.

While I'm strongly in favor of granular control of limits, it's difficult to imagine a path through which I can safely adopt 2.15 without causing production failures. The initial rollout will be tremendously difficult to do effectively without a version range which provides the ability to set limits before defaults are enforced, otherwise anything that upgrades transitive jackson dependencies may produce runtime failures unexpectedly. This is compounded by ~1mb limits being large enough not to hinder most test inputs, but small enough to be common in certain production scenarios reliant on string values.

I suspect there are more general constraints that I'm not aware of, and there may not be a particularly clean path forward that satisfies all constraints. It may be worthwhile to run through the thought experiment of releasing a 2.15.0 with processing limit configuration, and no default values, and a 2.16.0 in relatively short succession which enforces default limits, allowing libraries which rely on jackson to encode their expectations before defaults are enforced.

In summary, I'm excited to have finer grained control over deserialization inputs and cannot overstate my appreciation for your work to improve security posture, but I'd like to explore options to reduce the risk of unexpected runtime failures upon upgrade.

Thank you!

@pjfanning
Copy link
Member

The code allows you to change the limits. If you can't modify code to set limits that suit you, you can choose not to upgrade Jackson.

@pjfanning
Copy link
Member

To be clear, we have a large number of users pressing for a secure by default version of Jackson. If we don't make 2.15.0 that release, they will keep pressing until we do a 2.16.0 release.

If you use Jackson via a 3rd party lib, warn the lib maintainer that they should expose some way to configure the Jackson Mapper instances that they create on your behalf.

All the main build tools have mechanisms to allow you to stop libs dragging in versions of transitive dependencies that you don't like.

@carterkozak
Copy link
Contributor Author

The code allows you to change the limits.

Yes, I understand that, however the issue that I'm trying to describe is one of ordering: library upgrades cannot always be adopted in a single atomic action.
If I upgrade a my library to use 2.15 and set correct limits, when consumers of my library upgrade, regardless of their previous jackson dependency version, their build system will more than likely resolve requests for an older jackson version up to the 2.15.0 dependency from my library. They must also be aware of processing limits, and configure their limits at precisely that moment (which may be an automated PR from something like dependabot, or version resolution due to deeply nested transitive dependencies). If the consumer project upgrades first, and I create an objectmapper in my library, there's no way for my library to configure its limits in jsonfactory instances it creates without a great deal of reflection hackery, which I'd prefer to avoid.

The safe path forward involves making my library aware of processing limits, and ensuring all my consumers upgrade my library at precisely the same time that they upgrade to 2.15 and configure their limits properly. Given that many libraries depend on jackson, they must all be upgraded at precisely the same time otherwise they risk runtime failures. Such coordinates upgrades between dozens of libraries which may or may not release frequently are difficult for consumers, library maintainers, and ultimately users. Perhaps that risk is perceived as lower than failure to set the default, I certainly don't have as much context as you do on the values themselves, but I'm confident that they will break some of my users if I'm not able to ensure a carefully coordinated rollout.

@carterkozak
Copy link
Contributor Author

To be clear, we have a large number of users pressing for a secure by default version of Jackson. If we don't make 2.15.0 that release, they will keep pressing until we do a 2.16.0 release.

I think we both want the same thing ❤️
The phrase "perfect is the enemy of good" comes to mind -- I understand that we want a perfect version as soon as possible, but it's not helpful if users cannot safely upgrade to it.

Is there a world where 2.15.0 and 2.16.0 are released at the same time, where the only change introduced into 2.16.0 is strict default values for processing-limits? This would be sufficient for my case where I could adopt 2.15.0 everywhere, and set limits based on expected value sizes, and after a short incubation period, upgrade to 2.16.0 where limits are strictly enforced.

All the main build tools have mechanisms to allow you to stop libs dragging in versions of transitive dependencies that you don't like.

This works when the library remains abi-compatible with previous releases, but that's more complex to validate upon upgrades. We can rely on build-time linkage checkers and compare diffs across upgrades, but such tools aren't terribly common these days in the java ecosystem. Once a dependency begins to interact with the new limit apis, the transitive can no longer be held back.

If you use Jackson via a 3rd party lib, warn the lib maintainer that they should expose some way to configure the Jackson Mapper instances that they create on your behalf.

This is an assertion that jackson should never be an implementation detail of a library, but must be considered part of the library API surface. While I agree that it's helpful to expose in many types of applications (e.g. spring-boot) there's some value in protecting internal ser/de details in purpose-built libraries. Consider a library which acts as a client for a very specific service, using its own domain types and wire-format guarantees, it would be dangerous to allow users to mutate and break the wire format, and removes the possibility of changing the serialization provider (not that that's something I'm considering, but loose coupling is a tremendous feature).

@pjfanning
Copy link
Member

Let's put it this way: you want an insecure version of Jackson. There is already 2.14.2 that fits your needs. Stick with it.

Allow other users to upgrade

@cowtowncoder
Copy link
Member

@pjfanning I actually agree with @carterkozak 's concerns here: the problem is NOT the direct dependencies by projects but transitive dependencies. It is often not possible to make coordinated upgrades in an easy manner; and we cannot solve all of these by "just don't upgrade". Conversely forced update by transitive dependencies can break downstream users, at least temporarily.
At the same time I agree with your point wrt us not being in position to wait until 2.16 & challenges with bootstrapping of changes.

To me, it seems like:

  1. We do want to start with limits, and not just offer configurability, BUT
  2. Start with very lenient limits to reduce likelihood of actual breakages (cannot eliminate, fundamentally, but avoid cases that SnakeYAML hit)

But as to specific references to 1 megabyte -- two quick notes:

  1. This is for maximum TOKEN (String) value, and NOT for maximum input document. We do not (yet at least) have default limits for the whole input (and if we had, it'd be way bigger value)
  2. We can definitely increase this default before 2.15.0 release; there was already some discussion on value to use. If 1M characters is too low (is it?), we could go to 2 or 5 or even 10; all of which are stricter than current "only limited by heap size"

Same goes for other limits too.

@pjfanning
Copy link
Member

pjfanning commented Mar 21, 2023

  • we have not had any complaints about jackson-dataformat-yaml and its low snakeyaml imposed limits - jackson-dataformat-yaml v2.14.0 has low processing limits but a mechanism to change the defaults
  • noone has supplied real world examples where the new jackson-databind limits cause trouble - the limits are generous - we can all theorise about edge cases but most users don't deliberately use messages that would come near the new Jackson limits
  • the 2.15 / 2.16 release idea may make the rollout slightly easier but in the end of the day, users who upgrade libs and go straight to production without proper testing will hit issues - there is little we can do to help them if they won't help themselves
  • I have in the past suggested making the defaults configurable via system properties or via config files (I really like https://github.com/lightbend/config) - I think this approach is better than having a 2.15 insecure but configurable and then a 2.16 secure release
  • we cannot claim to have fixed the sonatype issue that everyone claims about until we are secure by default - and a number of the OSS Fuzz scenarios that we fixed, will start failing again if we remove the limits

@pjfanning
Copy link
Member

If people can provide real world cases where the new defaults cause issues, we might be able to increase the defaults to suit those cases but not increase them to the extent that the limits become a low bar for malicious agents to jump straight over. A 10000 char number is much more dangerous than a 1000 char number. But we might be able to split the difference at something like 2000 chars.

@carterkozak
Copy link
Contributor Author

we have not had any complaints about jackson-dataformat-yaml and its low snakeyaml imposed limits - jackson-dataformat-yaml v2.14.0 has low processing limits but a mechanism to change the defaults

The snakeyaml upgrade caused several production failures across a handful products that I work on, however the products which were impacted are not open-source so I am unable to provide direct artifacts. At the point when that occurred, releases were already out, and work was in progress in jackson to expose configuration points. We had to do the best we could with the tools we had available, which weren't enough to completely prevent further failures (though we were able to reduce their rate of occurrence with reflection hackery in the meantime). In this case, I want to do anything and everything I can to get ahead of the problem while the release is in an RC state rather than help the teams I support remediate failures reactively.

I think there are a few open questions that are currently very hard to answer:

  1. What is the correct limit to add to existing code such that denial of service attacks are prevented, but existing users are not?
  2. Once I know the correct limits, how do I configure those which are higher than the proposed defaults without breaking other components which share dependency constraints?

I suspect I'll find there's not always a good answer to 1, and it will help me find code which buffers large strings on heap unnecessarily, leading to performance improvements when such code is migrated to a streaming API. I want to reiterate; this is a win, and in an entirely new project, it would be precisely what I want. However, when we're limiting the behavior of systems that have existed for longer, it's important to have observability into the impact of my limits before they cause failures. I'd love to register a component into my JsonFactory instance which reports the size of each string and numeric sequence, something that can tell me when I'm nearing the limit but prior to breaching it and failing, so that I have time to fully investigate the root cause when values grow over time. I can investigate what such an interface might look like tomorrow, ideally I can make something work with 2.14 as well and pull metrics from production systems.

@cowtowncoder
Copy link
Member

My understanding, too, is that SnakeYAML document limit definitely caused issues. Not so much for jackson-dataformat-yaml, but transitively.

While I think that in principle getting actual information on upper limits would be the right thing to go, my past experience suggests there is huge latency in getting that information, and that it comes in form of bug reports.

As to actual limits, my view is that:

  1. Maximum number length limit (1024 digits) is probably fine
  2. Maximum nesting limit (1000) is likewise fine, and close enough to what we sort of need (for various deserializers maximum on default JDKs would be no more than 5x - 10x higher)
  3. String value is the only thing where I could see need for >1M characters for non-trivial number of cases.

Further, wrt (3), I think there's plenty of room to increase maximum -- this is not asymmmetric processing cases like, say, number length (1). Attacker must actually provide full String.

Given this, I am thinking that raising limit to 5M (taking 10 meg of memory for char[]) would be fine. Or if not 5, then 2 megs.

I know this is just speculation, too, but to be honest I don't see attacks being much more likely with 10 than 1 meg limit -- anyone worried about too long Strings is likely wanting to scale back limit to 64k or 8k or whatever anyway.
Put another way, 1M is already high enough that for many use cases there's need for tuning.
But even 5M is somewhat useful over no limits whatever.

@cowtowncoder
Copy link
Member

FWTW, many other parsers do implement way stricter limits:

Another thing, then, would be to try to solicit opinions. I have tried but I am not very effective with that (Twitter, mailing lists).
Maybe others could get more information?

@pjfanning
Copy link
Member

pjfanning commented Mar 21, 2023

@cowtowncoder is there any way that you would consider having an API that let users inject a global StreamReadConstraints to override the default?

@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 22, 2023

@pjfanning I don't think I would want to allow that, due to the way Jackson is commonly used as a transitive dependency by multiple other libraries. The whole concept of global overrides does not work well with this embedded usage in mind.
This is a pretty big philosophical obstacle and has come up a few times wrt various features (often security-related ones).

EDIT: although, if they are true defaults and would NOT override whatever something else set... I am not sure.
It would then go to the question of actual use case. I can see defaulting for at least 2 things:

  1. Wanting to specify default baseline for anything that does not explicitly set defaults. I could see this use case as somewhat legit. Although then again, if multiple things (frameworks) want to change this we have a race condition (as in, which one's settings win)
  2. Wanting to forcible override constraints globally. This I find much less appealing.

@pjfanning
Copy link
Member

Users and lib maintainers would be strongly encouraged to not use the global override API. The API would be there just to allow users to set the defaults in the case where they are indirect users of Jackson. Imagine someone who uses Jersey for REST services. Imagine if Jersey does not expose a way to control the number size limit. Then the user of Jersey is unable to receive JSON with massive numbers and their use case might require it.

The API approach to setting the global defaults breaks down if lib maintainers start using that API. The values they set could affect other Jackson based code running in a user's JVM.

We could use System properties to achieve something similar but again lib maintainers could start using System.setProperty in a way that could affect other Jackson based code running in a user's JVM.

If we could add a dependency on https://github.com/lightbend/config - that lib has an elegant solution. Lib maintainers use a file called reference.conf to provide their config defaults and users can override them in their application.conf. The config lib will ignore application.conf files in libs. This approach does not fully solve the issue of lib maintainer setting jackson configs in their reference.conf but users can force everything in the JVM to use their preferred settings by explicitly setting them in their application.conf.

@pjfanning
Copy link
Member

@carterkozak is there any chance you would consider bundling a shaded version of the jackson libs you need in your library?

That way your lib is unaffected by a user who uses your lib but also somehow also uses Jackson directly or via another lib and that ends up with your lib using a newer version of jackson that it is not ready to use yet.

You can upgrade your shaded version of jackson when you are happy that you are ready.

@cowtowncoder
Copy link
Member

Yeah I do not think I want to take Jackson into this direction @pjfanning at this point.
I think configuration files are great for frameworks that are basically singletons (they control application etc) but do not fit well with more library-style packages like Jackson.

I have nothing against extensions that would do this tho, if there were ways to offer that somehow. But at basic level I do not want to (have to) add reading of configuration files, figuring out precedenses, deal with inevitable conflicts that result. It is another level of complexity that comes with that territory.

And in particular I would not consider adding Lightbend/config -- while it is very powerful thing, it's... like another magnitude of complexity on top of (or under maybe) Jackson. So that won't be something I'd use.

@carterkozak
Copy link
Contributor Author

@pjfanning You're correct that shading could solve the transitive dependency problem from the perspective of a single fully encapsulated library, however my scenario is perhaps more complex than I let on; I maintain a core set of libraries that enable teams to quickly and easily build and deploy services (think spring-boot or dropwizard, with a strong bias toward our environments), as well as tooling to help teams build and maintain their own libraries. In many of these libraries, jackson is a fully encapsulated implementation detail, and in others, it's part of our API. Shading can be used for libraries where the shaded target is not used at all in the API, but presents new security problems. When a new RCE vulnerability is discovered, it would be that much harder to remediate all impacted services due to the additionally rebuilding each library which shades a copy of jackson (I recall some tricky instances involving aws-sdk jars).

I have added some instrumentation to one of our more common ObjectMapper factories (here if you're interested), and data has begun to flow, but it may be a few hours before adoption is broad enough to make meaningful observations.

@pjfanning
Copy link
Member

@carterkozak would #962 fix the problem for you? So far, it seems unlikely that there will be agreement to keep the limits unlimited in jackson 2.15 but the aim is to make it easier to set the limits.

@cowtowncoder
Copy link
Member

At this point it sounds like data collection can help determine if there are issues and we can proceed with that.

@carterkozak Data on maximum field/property names would be very valuable as well: I think we should add limits for those in 2.16 (too late for 2.15). And for various reasons they should be significantly lower than limits for String values.
And this makes it important to have some idea of what kinds of things are seen "in the wild".

@carterkozak
Copy link
Contributor Author

Rollout of the aforementioned metrics to internal services is ongoing, but at this point I have over half a billion datapoints (individual parsed strings above the 64-character reporting threshold) and no values larger than the 5mb limit, but a small handful between 1 and 2 mb. The limit increase from 1mb to 5mb is definitely sufficient to avoid introducing failures in these instances upon upgrade, I really appreciate the change and I expect it to make the upgrade substantially easier. I will continue to collect data, ideally finding ways to avoid these large strings within structured json where possible.

I can add some instrumentation on my end to capture field name/map key length as well.

I would also like to determine if we can add something to Jackson to make it easier to collect these metrics in future versions, allowing other users to dial in limits without relying exclusively on signal from exceptions (I'd be happy to contribute, I'd want to consensus around a design first). The JsonFactory/JsonParser implementations I've constructed to collect metrics aren't the cleanest thing in the world and risk subtle breaks when the classes I've extended change.

Once the limits are rolled out to the products I support with jackson 2.15, my next goal will be to safely them as much as possible. The per-factory or per-stream limits make sense as default, but security-conscious users may be able to reduce defaults much further by providing separate context-based limits for individual fields which are expected to be larger than the default (e.g. allowing parser.getTextWithMaxLength(limit) instead of parser.getText(). Such a design would be complex and involved, likely impacting jackson-annotations to allow databind to read length-limit values from annotations and provide them to the parser. It would need to consider if other options will be requested when strings are too long (always throw, or provide the option to truncate, transform, etc). I apologize if this part has been discussed elsewhere already, I'd be happy to move this tangent somewhere more appropriate.

@cowtowncoder
Copy link
Member

cowtowncoder commented Mar 24, 2023

Thank you for doing the investigation here @carterkozak! It is good to get some validation on starting limitations applied.
I think that "small" tweaks now can save a lot of headache; and the whole idea that there are couple of >1 meg text values actually reminds me of my finding at previous companies wrt logging frameworks where typical log event (not logger like log4j but more like event generation) might be ~4k. And yet outliers up to 1.5 megs. And invariably truncation for longer events managed to cause issues even when events affected were rare -- just not rare enough.
Some downstream system getting truncated entry, croaking on poison pill etc.

So it all makes sense & is part of how my views are formed. I know distribution of data element size is not as uniform as one might initially assume.

@cowtowncoder
Copy link
Member

One thing forgot to mention wrt @carterkozak 's idea on limits: I think there is sort of natural order of limits, so that at low level (StreamReadConstraints f.ex) there are looser limits but ones that can efficiently prevent biggest of reads and memory usage, as sort of last effort. But then more granular settings can operate within those guardrails: they might not prevent reads or allocations, but would make it easier to tune user/externally visible limitations as needed.

But in general it is not necessary to try to make all limits go through StreamReadConstraints (and likely StreamWriteConstraints in near future), which can be kept relatively simple and simplistic.
More advanced, contextual limits can then use more sophisticated set up at possibly higher level (could be databinding f.ex).

@cowtowncoder
Copy link
Member

I think this issue has been addressed for now: looking forward to 2.15.0-rc3 (final release candidate I hope).
If anyone has further concerns please file a new issue (with link back to here if that makes sense).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants