-
-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.15.0-rc1 and processing-limit default values #958
Comments
The code allows you to change the limits. If you can't modify code to set limits that suit you, you can choose not to upgrade Jackson. |
To be clear, we have a large number of users pressing for a secure by default version of Jackson. If we don't make 2.15.0 that release, they will keep pressing until we do a 2.16.0 release. If you use Jackson via a 3rd party lib, warn the lib maintainer that they should expose some way to configure the Jackson Mapper instances that they create on your behalf. All the main build tools have mechanisms to allow you to stop libs dragging in versions of transitive dependencies that you don't like. |
Yes, I understand that, however the issue that I'm trying to describe is one of ordering: library upgrades cannot always be adopted in a single atomic action. The safe path forward involves making my library aware of processing limits, and ensuring all my consumers upgrade my library at precisely the same time that they upgrade to 2.15 and configure their limits properly. Given that many libraries depend on jackson, they must all be upgraded at precisely the same time otherwise they risk runtime failures. Such coordinates upgrades between dozens of libraries which may or may not release frequently are difficult for consumers, library maintainers, and ultimately users. Perhaps that risk is perceived as lower than failure to set the default, I certainly don't have as much context as you do on the values themselves, but I'm confident that they will break some of my users if I'm not able to ensure a carefully coordinated rollout. |
I think we both want the same thing ❤️ Is there a world where 2.15.0 and 2.16.0 are released at the same time, where the only change introduced into 2.16.0 is strict default values for processing-limits? This would be sufficient for my case where I could adopt 2.15.0 everywhere, and set limits based on expected value sizes, and after a short incubation period, upgrade to 2.16.0 where limits are strictly enforced.
This works when the library remains abi-compatible with previous releases, but that's more complex to validate upon upgrades. We can rely on build-time linkage checkers and compare diffs across upgrades, but such tools aren't terribly common these days in the java ecosystem. Once a dependency begins to interact with the new limit apis, the transitive can no longer be held back.
This is an assertion that jackson should never be an implementation detail of a library, but must be considered part of the library API surface. While I agree that it's helpful to expose in many types of applications (e.g. spring-boot) there's some value in protecting internal ser/de details in purpose-built libraries. Consider a library which acts as a client for a very specific service, using its own domain types and wire-format guarantees, it would be dangerous to allow users to mutate and break the wire format, and removes the possibility of changing the serialization provider (not that that's something I'm considering, but loose coupling is a tremendous feature). |
Let's put it this way: you want an insecure version of Jackson. There is already 2.14.2 that fits your needs. Stick with it. Allow other users to upgrade |
@pjfanning I actually agree with @carterkozak 's concerns here: the problem is NOT the direct dependencies by projects but transitive dependencies. It is often not possible to make coordinated upgrades in an easy manner; and we cannot solve all of these by "just don't upgrade". Conversely forced update by transitive dependencies can break downstream users, at least temporarily. To me, it seems like:
But as to specific references to 1 megabyte -- two quick notes:
Same goes for other limits too. |
|
If people can provide real world cases where the new defaults cause issues, we might be able to increase the defaults to suit those cases but not increase them to the extent that the limits become a low bar for malicious agents to jump straight over. A 10000 char number is much more dangerous than a 1000 char number. But we might be able to split the difference at something like 2000 chars. |
The snakeyaml upgrade caused several production failures across a handful products that I work on, however the products which were impacted are not open-source so I am unable to provide direct artifacts. At the point when that occurred, releases were already out, and work was in progress in jackson to expose configuration points. We had to do the best we could with the tools we had available, which weren't enough to completely prevent further failures (though we were able to reduce their rate of occurrence with reflection hackery in the meantime). In this case, I want to do anything and everything I can to get ahead of the problem while the release is in an RC state rather than help the teams I support remediate failures reactively. I think there are a few open questions that are currently very hard to answer:
I suspect I'll find there's not always a good answer to 1, and it will help me find code which buffers large strings on heap unnecessarily, leading to performance improvements when such code is migrated to a streaming API. I want to reiterate; this is a win, and in an entirely new project, it would be precisely what I want. However, when we're limiting the behavior of systems that have existed for longer, it's important to have observability into the impact of my limits before they cause failures. I'd love to register a component into my JsonFactory instance which reports the size of each string and numeric sequence, something that can tell me when I'm nearing the limit but prior to breaching it and failing, so that I have time to fully investigate the root cause when values grow over time. I can investigate what such an interface might look like tomorrow, ideally I can make something work with 2.14 as well and pull metrics from production systems. |
My understanding, too, is that SnakeYAML document limit definitely caused issues. Not so much for While I think that in principle getting actual information on upper limits would be the right thing to go, my past experience suggests there is huge latency in getting that information, and that it comes in form of bug reports. As to actual limits, my view is that:
Further, wrt (3), I think there's plenty of room to increase maximum -- this is not asymmmetric processing cases like, say, number length (1). Attacker must actually provide full String. Given this, I am thinking that raising limit to 5M (taking 10 meg of memory for I know this is just speculation, too, but to be honest I don't see attacks being much more likely with 10 than 1 meg limit -- anyone worried about too long Strings is likely wanting to scale back limit to 64k or 8k or whatever anyway. |
FWTW, many other parsers do implement way stricter limits: Another thing, then, would be to try to solicit opinions. I have tried but I am not very effective with that (Twitter, mailing lists). |
@cowtowncoder is there any way that you would consider having an API that let users inject a global StreamReadConstraints to override the default? |
@pjfanning I don't think I would want to allow that, due to the way Jackson is commonly used as a transitive dependency by multiple other libraries. The whole concept of global overrides does not work well with this embedded usage in mind. EDIT: although, if they are true defaults and would NOT override whatever something else set... I am not sure.
|
Users and lib maintainers would be strongly encouraged to not use the global override API. The API would be there just to allow users to set the defaults in the case where they are indirect users of Jackson. Imagine someone who uses Jersey for REST services. Imagine if Jersey does not expose a way to control the number size limit. Then the user of Jersey is unable to receive JSON with massive numbers and their use case might require it. The API approach to setting the global defaults breaks down if lib maintainers start using that API. The values they set could affect other Jackson based code running in a user's JVM. We could use System properties to achieve something similar but again lib maintainers could start using If we could add a dependency on https://github.com/lightbend/config - that lib has an elegant solution. Lib maintainers use a file called |
@carterkozak is there any chance you would consider bundling a shaded version of the jackson libs you need in your library? That way your lib is unaffected by a user who uses your lib but also somehow also uses Jackson directly or via another lib and that ends up with your lib using a newer version of jackson that it is not ready to use yet. You can upgrade your shaded version of jackson when you are happy that you are ready. |
Yeah I do not think I want to take Jackson into this direction @pjfanning at this point. I have nothing against extensions that would do this tho, if there were ways to offer that somehow. But at basic level I do not want to (have to) add reading of configuration files, figuring out precedenses, deal with inevitable conflicts that result. It is another level of complexity that comes with that territory. And in particular I would not consider adding Lightbend/config -- while it is very powerful thing, it's... like another magnitude of complexity on top of (or under maybe) Jackson. So that won't be something I'd use. |
@pjfanning You're correct that shading could solve the transitive dependency problem from the perspective of a single fully encapsulated library, however my scenario is perhaps more complex than I let on; I maintain a core set of libraries that enable teams to quickly and easily build and deploy services (think spring-boot or dropwizard, with a strong bias toward our environments), as well as tooling to help teams build and maintain their own libraries. In many of these libraries, jackson is a fully encapsulated implementation detail, and in others, it's part of our API. Shading can be used for libraries where the shaded target is not used at all in the API, but presents new security problems. When a new RCE vulnerability is discovered, it would be that much harder to remediate all impacted services due to the additionally rebuilding each library which shades a copy of jackson (I recall some tricky instances involving aws-sdk jars). I have added some instrumentation to one of our more common ObjectMapper factories (here if you're interested), and data has begun to flow, but it may be a few hours before adoption is broad enough to make meaningful observations. |
@carterkozak would #962 fix the problem for you? So far, it seems unlikely that there will be agreement to keep the limits unlimited in jackson 2.15 but the aim is to make it easier to set the limits. |
At this point it sounds like data collection can help determine if there are issues and we can proceed with that. @carterkozak Data on maximum field/property names would be very valuable as well: I think we should add limits for those in 2.16 (too late for 2.15). And for various reasons they should be significantly lower than limits for String values. |
Rollout of the aforementioned metrics to internal services is ongoing, but at this point I have over half a billion datapoints (individual parsed strings above the 64-character reporting threshold) and no values larger than the 5mb limit, but a small handful between 1 and 2 mb. The limit increase from 1mb to 5mb is definitely sufficient to avoid introducing failures in these instances upon upgrade, I really appreciate the change and I expect it to make the upgrade substantially easier. I will continue to collect data, ideally finding ways to avoid these large strings within structured json where possible. I can add some instrumentation on my end to capture field name/map key length as well. I would also like to determine if we can add something to Jackson to make it easier to collect these metrics in future versions, allowing other users to dial in limits without relying exclusively on signal from exceptions (I'd be happy to contribute, I'd want to consensus around a design first). The JsonFactory/JsonParser implementations I've constructed to collect metrics aren't the cleanest thing in the world and risk subtle breaks when the classes I've extended change. Once the limits are rolled out to the products I support with jackson 2.15, my next goal will be to safely them as much as possible. The per-factory or per-stream limits make sense as default, but security-conscious users may be able to reduce defaults much further by providing separate context-based limits for individual fields which are expected to be larger than the default (e.g. allowing |
Thank you for doing the investigation here @carterkozak! It is good to get some validation on starting limitations applied. So it all makes sense & is part of how my views are formed. I know distribution of data element size is not as uniform as one might initially assume. |
One thing forgot to mention wrt @carterkozak 's idea on limits: I think there is sort of natural order of limits, so that at low level ( But in general it is not necessary to try to make all limits go through |
I think this issue has been addressed for now: looking forward to |
In testing 2.15.0-rc1 I've found that several tests with large inputs begin to fail unexpectedly due to processing limits introduced in #827.
While I'm strongly in favor of granular control of limits, it's difficult to imagine a path through which I can safely adopt 2.15 without causing production failures. The initial rollout will be tremendously difficult to do effectively without a version range which provides the ability to set limits before defaults are enforced, otherwise anything that upgrades transitive jackson dependencies may produce runtime failures unexpectedly. This is compounded by ~1mb limits being large enough not to hinder most test inputs, but small enough to be common in certain production scenarios reliant on string values.
I suspect there are more general constraints that I'm not aware of, and there may not be a particularly clean path forward that satisfies all constraints. It may be worthwhile to run through the thought experiment of releasing a 2.15.0 with processing limit configuration, and no default values, and a 2.16.0 in relatively short succession which enforces default limits, allowing libraries which rely on jackson to encode their expectations before defaults are enforced.
In summary, I'm excited to have finer grained control over deserialization inputs and cannot overstate my appreciation for your work to improve security posture, but I'd like to explore options to reduce the risk of unexpected runtime failures upon upgrade.
Thank you!
The text was updated successfully, but these errors were encountered: