“Simply put, Deep Fusion is an advanced form of computational photography that does permutations to pick the best shot for you”
Apple has a penchant for hyperbole, so much so that no Apple event feels complete without phrases such as “absolutely amazing“, “nothing like anything before” and “truly incredible“. Customers and the technology journalism fraternity alike have gotten used to Apple’s self-back patting — after all, even though the brand does have the tendency for flaunting rehashed versions of existing technology as revolutionary, it still just works. At the recently concluded iPhone 11 event, however, the Cupertino giant was a bit unlike itself, shifting from one launch to another at a brisk pace without labouring much on anything. That, though, did not stop Phil Schiller from labelling their new camera technology as “computational photography mad science“. What we are talking about here, is Apple’s Deep Fusion technology, and if you’re wondering what it is and why should you care, then read on.
What is it?
In every conceivable way, Deep Fusion falls well within the established definition of computational photography. To rejig your memory, computational photography can be defined as a system where the algorithms of a processing unit are majorly responsible for producing the resultant photograph once a shot is taken. In typical cases, it uses the image signal processor (ISP) inside a smartphone to enhance the image captured by the sensor and lens in myriad mathematical computations, to produce an image that would have been otherwise impossible by the existing optical setup.
With Deep Fusion, Apple is taking computational photography a step further. Instead of solely relying on the ISP, Apple’s big gamble is to shed off most of the responsibility to the neural engine, or neural processing unit (NPU) to use more computing power when a photograph is being shot. By doing so, any image shot on Apple’s new iPhones will be taken in multiple shots, from which Deep Fusion will then adopt and enhance elements to create a photo that has the best possible balance of colours, rich details, textures, white balance and noise. Given that Apple has typically used top-notch hardware in its cameras, the result, at least on paper, should be stellar.
How does it work?
In the new iPhone 11, iPhone 11 Pro and iPhone 11 Pro Max, a bunch of algorithms instruct the imaging systems into action the moment you open the camera application. This sets it up for the key bit of its operation, wherein the camera is ready to take a bunch of shots of your subject before you actually press the shutter button. The mainstay for doing this is the ISP itself, which is programmed to take a total of nine shots the moment the shutter is pressed — four fast exposure photos, four secondary photos and a single long exposure photo.
The first eight shots are actually clicked before the shutter button is pressed, and upon tapping on the shutter, the iPhone in question will take a long exposure photograph of your subject. It is important to note that “long exposure” here is subjective to the standard shutter speed of an average smartphone photograph, and will likely range below 1/2 or 1/6th of a second. Once these images are taken, this is where the new Neural Engine on the Apple A13 Bionic SoC kicks in, referring to the Deep Fusion part of the technology.
Here, the Neural Engine creates the overall image frame from combined data of 24 megapixels, or 24 million pixels, to select the best among the nine shots taken in every section of the photograph. In even more granular terms, the Neural Engine’s algorithms are trained to pick the best pixel out of the nine images shot, for each final pixel in the final photograph. Even in modern day system-on-chip processing unit parlance, that is a rather large amount of data being processed within the span of just over one second — from before pressing the shutter to viewing the photograph.
Since the basic mandate for Deep Fusion is to vary the exposure times of all the shots taken, the Neural Engine has a wide range of metrics to choose from, for every pixel. Basically, once you click an image, the A13 SoC’s Neural Engine can then decide the best contrast, level of sharpness, fidelity, dynamic range, colour accuracy, white balance, brightness and all other factors associated with an image. With the camera essentially having a wider range of variable factors to play around with, the Deep Fusion technique should ideally produce images with higher dynamic range, incredibly high levels of intricate details, excellent low light imaging, low levels of noise and great accuracy of colours.
Another area that should benefit is low light photography, which Apple advertised with its own take on Night Mode. In low light, Apple will combine the brightness advantage of the long exposure image with the stability, details and accuracy of the short exposure images to create one super image of sorts, which would offer great illumination of subjects in the dark, while retaining stability and not causing blurred shots. In such conditions, Deep Fusion will likely show its powers the most, since in the daylight, it will all depend on whether Apple gets its algorithms to judge colours correctly.
How does it differ from other techniques?
To be honest, Deep Fusion is not entirely pathbreaking, and not even new. Three years ago, Sony introduced predictive shooting on its erstwhile flagship, the Sony Xperia Z5, which used to take a total of four shots — three prior to tapping the shutter, and one upon pressing it. The idea was to help users shoot stable, steady photographs of fast-moving objects, or while moving quickly themselves — such as a speeding race car on track, or a unique tree by the road while in a fast car.
Sony’s famously faltering interface meant that the feature was too sluggish and buggy, and more often than not, would freeze the camera app when the shutter was pressed. Hence, despite being one of the first makers of computational AI photography on smartphones, Sony failed to capitalise on it, just like ultra slow motion videos at 480 and 960fps. Google on the other hand, played it smarter with burst HDR imaging. It used a lightweight interface, a resource-friendly camera app and an excellent ISP to offer 15-shot HDR+ computational photography on the Google Pixel 3. This does pretty much the same “mad science” that Schiller claimed is unique to the iPhone 11 — from before pressing the shutter to viewing the photograph, it all takes about one second, in which the Pixel 3 camera takes 15 images of the same subject in HDR+, and combines the best pixels from each of the 15 frames to create the final photograph.
Frankly speaking, Deep Fusion’s only unique play, from what it seems so far, lies in the additional long exposure image that it clicks. The long exposure frame can give the Neural Engine decent leverage in selecting the correct brightness and luminance level, especially in dim settings. That said, it will certainly be interesting to see if Apple’s computational photography really has leverage over what Google does with its algorithms, in terms of its speed of shooting, overall details and more.
So, is it a big deal?
For Apple, Deep Fusion is its answer to the much vaunted Google Pixel camera. The makers of Android have aced computational photography over the past few generations of their Pixel smartphones, and while many reports have stated that Apple has fallen behind, the onus is now on the iPhone to show what all its Neural Engine and software can do.
For now however, it is hard to tell whether Deep Fusion is truly a technology worthy of being called “mad science”, or would fade away into another of Apple’s hyperboles.