Exposure Fusion

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

Some disclaimers

In this post I will be using a HDRI by Greg Zaal .

Throughout the post I will be using ACES global tonemapper – I know it has a bad reputation (especially per-channel fits) and you might want to use something different, but its three js implementation is pretty good and has some of the proper desaturation properties. Important to note – in this post I don’t assume anything about the used global tonemapper and its curves or shapes – most will work as long as they reduce dynamic range. So in a way, this technique is orthogonal to the global tonemapping look and treatment of colors and tones.

Finally, I know how many – including myself – hate the “ toxic HDR ” look that was popular in the early noughties – and while it is possible to achieve it with the described method, it doesn’t have to be – this is a matter of tuning parameters.

Note that if you ever use Lightroom or Adobe Camera Raw to open RAW files in Photoshop, you are using a local tonemapper! Simply their implementation is very good and has subtle, reasonable defaults.

Localized exposure, local tonemapping – recap of the problem

In my old post , I went quite deep into why one might want to use a localized exposure / localized tonemapping solution. I encourage you to read it if you haven’t, but I’ll quickly summarize the problem here as well.

This problem occurs in photography, but we can look at it from a graphics perspective.

We render a scene with a physically correct pipeline, using physical values for lighting, and have an HDR representation of the scene. To display it, you need to adjust exposure and tonemap it.

If your scene has a lot of dynamic range (simplest case – a day scene with harsh sunlight and parts of the scene in shadows), picking the right exposure is challenging.

If we pick a “medium” exposure, exposing for the midtones, we get something “reasonable”:

On the other hand, details in the sunlit areas are washed out and barely visible, and information in the shadows is completely dark and barely visible .

You might want to render a scene like this – high contrast can be a desired outcome with some artistic intent. In some other cases it might not be – and this is especially important in video games that are not just pure art, but visuals need to serve gameplay and interactive purposes – very often, you cannot have important detail not visible.

If we try to reduce the contrast, we can start seeing all the details, but everything looks washed out and ugly:

My previous post described some solutions used in photography, filmography, and video games. Typically it involves either manually brightening/darkening some areas (famous Ansel Adams dodge/burn aspects of the zone system ), but it can start much earlier, before taking the picture – by inserting artificial lights that reduce contrast of the scene, tarps, reflectors, diffusers.

In video games it is way easier to “fake” it, and break all physicality completely – from lights through materials to post fx – and while it’s an useful tool, it reduces the ease and potential of using physical consistency and ability to use references and real life models. Once you hack your sky lighting model, the sun or artificial lights at sunset will not look correctly…

Or you could brighten the character albedos – but then the character art director will be upset that their characters might be visible, but look too chalky and have no proper rim specular. (Yes, this is a real anecdote from my working experience. 🙂 )

In practice you want to do as much as you can for artistic purposes – fill lights and character lights are amazing tools for shaping and conveying the mood of the scene. You don’t want to waste those, their artistic expressive power, and performance budgets to “fight” with the tonemapper…

Blending exposures

So we have three exposures – and we’d want to blend them, deciding per region which exposure to take.

There are a few ways to go about it, let’s go through them and some of their problems.

Per-pixel blending

The simplest option is very simple indeed – just deciding per pixel how to blend the three exposures depending on how well exposed they are.

This doesn’t work very well:

Bright regions trying to bring the dark ones down will leak onto medium exposure and dark regions, darkening them further and vice versa – dark regions strong influence will leak onto the surroundings. On a still image it can look acceptable, but with a video game moving camera it is visible and distracting…

Bilateral blending

Given that we would like to prevent bleeding over edges, one might try to use some form of edge-preserving or edge-stopping filter like bilateral. And it’s not a bad idea, but comes with some problems – gradient reversals and edge ringing.

I will refer you here to the mentioned excellent Siggraph presentation by Jasmin Patry who has analyzed where those come from.

In his Desmos calculator he demos the problem on a simple 1D edge:

Source and credit: Desmos calculator by Jasmin Patry

His proposed solution to this problem (to blend bilateral with Gaussian) is great and offers a balance between halos and edge/gradient reversals and ringing that can occur with bilateral filters.

But we can do even better and reduce the problem further through Exposure Fusion. But before we do, let’s look first however at a reasonable (but also flawed) alternative.

Guided filter blending

A while ago, I wrote about the guided filter – how local linear models can be very useful and in some cases can work much better (and more efficiently!) than a joint bilateral filter.

I’ll refer you to the post – and we will be actually using it later for speeding up the processing, so might be worth refreshing.

If we try to use a guided filter to transfer exposure information from low resolution / blurry image with blended exposure to full resolution, we end up with a result like this:

“Exposure fusion” by Mertens et al attempts to solve blending multiple globally tonemapped exposures (can be “synthetic” in the case of rendering), but in a way that preserves detail, edges, and minimizes halos.

It starts with an observation that depending on the region of the image and presence of details, sometimes you want to have a wide blending radius, sometimes very sharp.

Anytime you have a sharp edge – you want your blending to happen over a small area to avoid a halo. Anytime you have a relatively flat region – you want blending to happen over a large area, smoothen and be imperceptible.

The way authors propose to achieve it is through blending different frequency information with a different radius .

This might be somewhat surprising and it’s hard to visualize, but let me attempt it. Here we change the exposure rapidly over a small horizontal line section:

Notice how the change is not perceivable on the top of the image, this could be a normal picture, while on the bottom it is very harsh. Why? Because on top, the high frequency change correlates with high frequency information and image content change, while on the bottom it is applied to the low frequency information.

The key insight here is that you want frequency of the change to correlate with the frequency content of the image.

On low frequency, flat regions, we are going to use a very wide blending radius. In areas with edges and textures, we are going to make it steeper and stop around them. Changes in brightness are hidden by the edges or alternatively, smoothened out over large edgeless regions!

Authors propose a simple approach: Construct Laplacian pyramids for each blended image – and blend those Laplacians . Laplacian blending radius is proportional to the Laplacian radius – and can be trivially constructed by creating a Gaussian pyramid of the weights.

Here is a figure from the paper that shows how simple (and brilliant) the idea is:

Image source and credit: “Exposure Fusion”, T.Mertens, J.Kautz, F.Van Reeth

I will describe later some GPU implementation details and parameters used how to make it behave well, but first let’s have a look at the results:

I highly encourage you to read the paper, but here is a short description of all of the steps:

Create “synthetic exposures” by tonemapping your image with different exposure settings. In general the more – the better, but 3 are a pretty good starting choice, allowing for separate control of “shadows” and highlights.
Compute the “lightness” / “brightness” image from each synthetic exposure . Gamma-mapped luminance is an extremely crude and wrong approximation, but this is what I used in the demo to simplify it a lot.
Create a Laplacian pyramid of each lightness image up to some level – more about it later. This last level will be just Gaussian-blurred, low resolution version of the given exposure, all the other will be Laplacian – difference between two Gaussian levels.
Assign a per-pixel weight to each high resolution lightness image . Authors propose to use three metrics – contrast, saturation, and exposure – closeness to gray. In practice if you want to avoid some of the over-saturated, over-contrasty look, I recommend using just the exposure.
Create a Gaussian pyramid of the weights.
On some selected coarse level (like on the 5th or 6th mip-map), blend the coarsest Gaussian lightness mip-maps with the Gaussian weights .
Go towards the finer pyramid levels / resolution – and on each level, blend Laplacians using a given level Gaussian and add it to the accumulated result.
Transfer the lightness to the full target image and do the rest of your tonemapping and color grading shenanigans.

Voila! We have an image in which edges (Laplacians) are blended at different scales with different radii.

In practice, getting this algorithm to look very well in multiple conditions (it has some unintuitive behaviors) requires some tricks and some tuning – but in a nutshell it’s very simple!

Optional – local contrast boost

One interesting option that the algorithm gives us is to boost the local contrast. It can be done in a few ways, but one that is pretty subtle and I like is to include the Laplacian magnitudes when deciding on the blending weights. Our effective weight will be Gaussian of the per-pixel weights times the absolute value of the magnitude. Note that generally this again is done per scale – so each scale weight is picked separately.

Produced local contrast boost can be visually pleasing:

I will describe algorithm / implementation parameters in the next section, but I couldn’t resist producing some “extreme toxic HDR” look as well – mostly to show that the algorithm is capable of doing so (if this is your aesthetic preference – and for some cases like architecture visualization it seems to be…).

When we decide up to which level we’d want to construct the pyramids, we decide which frequencies will be blended together (anything at that mip level and above). Setting the mip level to 0 is equivalent to full per-pixel weights and blending. On the other hand, setting it to maximum blends each level as Laplacian.

The lower the coarsest level, the more dynamic range compression there is – but also more washed out, fake-HDR look.

Here are mip levels 0, 5, 9:

This is the final parameter – it describes how “strong” the weighting preference is based on closeness of the lightness to 0.5. It affects the overall strength of the effect – with zero providing almost no LTM (all exposure weights are the same!), and with extreme settings producing artifacts and overcompressed look (pixels getting contribution only from a single exposure with some discontinuities):

Overall, I like the algorithm a lot and find it excellent and able to produce great results.

The biggest problem is its counterintuitive behavior that depends on the image frequency content. Images with strong edges will compress differently, with a different look than images with weak edges. Even the frequency of detail (like fine scale – foliage vs medium scale like textures) matters and will affect the look. This is a problem as a certain value of “shadows” will brighten actual shadows of scenes differently depending on their contents. This is where artist control and tuning come into play. While there are many sweet spots, getting them perfectly right for every scenario (like casual smartphone photography) requires a lot of work. Luckily in games and adjusting it per scene it can be much easier.

The second issue is that when pushed to the extreme, the algorithm will produce artifacts. One way to get around it is to increase the number of the synthetic exposures, but this increases the cost.

Note that both of those problems don’t occur if you use it in a more “subtle” way, as a tool and in combination with other tools of the trade.

Finally, the algorithm is designed for blending LDR exposures and producing an LDR image. I think it should work with HDR pipelines as well, but some modifications might be needed.

GPU implementation

The algorithm is very straightforward to implement on the GPU. See my 300loc implementation , where those 300 lines include GUI, loading etc!

All synthetic exposures can be packed together in a single texture. One doesn’t need to create Laplacian pyramids and allocate memory for them – they can be constructed as a difference between Gaussian pyramids mip-maps. Creation of pyramids can be as simple as creating mips, or as complicated and fast as using compute shaders to produce multiple levels in one pass.

Note: I used the most simple bilinear downsampling and upsampling , which comes with some problems and would alias and flicked under motion. It also can produce diamond-shaped artifacts ( an old post of mine explains those ). In practice, I would suggest using a much better downsampling filter, and adjusting it for the upsampling operator . But for subtle settings this might not be necessary.

The biggest cost is in tonemapping the synthetic exposures (as expensive as your global tonemapping operator) – but one could use a simplified “proxy” operator – we care only about the correlation with the final lightness here.

Other than this, every level is just a little bit of ALU and 3 texture fetches from very low resolution textures!

How fast is it? I don’t have a way to profile it now (coding on my laptop), but I believe that if implemented correctly, it should definitely be under 1ms on even previous generation consoles.

But… we can make it even faster and make most computations happen in low resolution !

Guided upsampling

While it is doable (and possibly not too expensive) to compute the exposure fusion in full resolution, why not just compute it at lower resolution and transfer the information to the full resolution with a joint bilateral or a guided filter?

As long as we don’t filter out too much of the local contrast (in low resolution representation, sharp edges and details are missing), the results can be excellent. This idea was used in many Google research projects/products: HDRNet , Portrait mode , and for the tonemapping (it was again a fairly complex and sophisticated variation of this simple algorithm, designed and implemented by my colleague Dillon Sharlet).

I’ll refer you to my guided filter post for the details , but so far every single result that I have shown in this post was produced in ¼ x ¼ resolution and guided upsampled!

Here is a comparison of the computation at full, half, and quarter resolution, as a gif, as differences are impossible to see side-by-side:

Animated GIF showing results computed at full, half, and quarter resolution. Differences are subtle, and mostly on the floor texture.

I also had to crop as otherwise the difference was not noticeable – you can see it on the ground texture (high frequency dots) as well as a subtle smudge artifact on the chair back.

I encourage you to play with it in the demo app – you can adjust the “display_mip” parameter .

Summary

To conclude, in this post I came back to the topic of localized tonemapping, its general ideas and have described the “exposure fusion” algorithm with a simple, GPU friendly implementation – suitable for a simple WebGL demo (I release it to public domain, feel free to use or modify as much of it as you want).

It feels good to close the loop after 6y and knowing much more on the topic. 🙂

And a personal perspective – after those years, I am now even more convinced that having some LTM is a must as it’s an extremely invaluable tool. I hope this post convinced you so, and inspired you to experiment with it, and maybe implement it in your engine/game.