# Rendering in UE4
Presented at the Gnomon School of VFX in January 2018, part two of the class offers an in-depth look at the rendering pipeline in Unreal Engine, its terminology and best practices for rendering scenes in real-time. This course also presents guidelines and profiling techniques that improve the debugging process for both CPU and GPU performance.

## Index
* 1.Intro

* 2.Before Rendering

* 3.Geometry Rendering

* 4.Rasterizing and Gbuffer

* 5.Dynamic Lighting/Shadows

* 6.Static Lighting/Shadows

* 7.Post Processing

## 1.INTRO


* Everything needs to be as efficient as possible
* Adjust piplelines to engine and hardware restrictions
* Try to offload parts to pre-calculations
* Use the engine’s pool of techniques to achieve quality at suitable cost
* CPU and GPU handle different parts of teh rendering calculations
* They are interdependent and can bottleneck each other
* Know how the load is distributed between the 2

* 不仅仅用来渲染高质量的静态图片,也用来渲染有交互的动态场景。
* Quality Features Performance 三者间的权衡
* 调节引擎的pipelines和硬件限制
* 进行预计算

**Shadring techniques**
* Real time rendering techniques are differnt fromm offline rendering
* Expensive ray-tracing features are approximated or pre-calculated
* Depends on projection(rasterization)
* Shading/lighting are mainly done either through defferred or Forward shading UE4 supports both

**Deferred Shading**

1.Composition based using the GBuffer

2.Shading happens in deferred passes

3.Good at rendering dynamic lighting

4.More flexible when it comes to disabling feature,less flexible when it comes to surface attributes





**CPU-Game Thread**

Calculate all logic and transforms

* 1.Animations

* 2.Position of models and objects

* 3.Physics

* 4.AI

* 5.Spawn and destroy,Hide and Unhide

Anything that relates to the posistion of objects to change



**CPU-Draw Thread**

Before we can use the transforms to rendering the image we need to know what to include in the rendering

Ignoring this question might make rendering expensive on GPU

Occlusion process-Builds up a list of all visible models/objects

Happens per object-Not per triangle

Stage process-in order of execution

* 1.Distance Culling

* 2.Frustum Culling

* 3.Precomputed Visibility

* 4.Occlusion Culling




**Occlusion Performance Implications**

UE4 has a list of models to render

* 1.Set up manual culling(i.e.distance culling,pre-coputed vis)

* 2.Even things like particles occlude

* 3.Many small objects cause more stress on CPU for culling

* 4.Large models will rarely occlude and thus increase GPU

* 5.Know your world and balance objects size vs count






**GPU-Prepass/Early z pass**

The GPU now has a list of models and transforms but if we just render this info out we could possibly cause a lot of redundant pixel rendering

Similar to excluding objects,we need to exclude pixels

We need to figure out which pixels are occlluded

To do this, we generate a depth pass and use it to determine if the given pixel is in front and visible

z pass 来处理像素的渲染,被遮挡的不渲染。





GPU renders drawcall by drawcall not triangle by traingle

A drawcall is group of tris sharing the same properties

Drawcalls are prepared by the CPU(Draw) thread

Distilling rendering info for objects into a GPU state ready for submission

GPU 渲染物体通过drawcall 而不是三角形,CPU阶段提交drawcall到GPU state


**UE4 with current gen high-end PCs**

2000-3000 is reasonable

More than 5000 is getting high

More than 10000 is probably a problem

On mobile this number is far lower(few hundred max)

Drawcalls count is determined by visible objects

Measure with “stat RHI”

UE4 三角面的数量问题。Drawcall 次数受可见物体的影响


**Drawcalls have a huge impact on the CPU(Draw) thread**

Has high overhaead for preparing GPU state

Usually we hit the issues with Drawcalls way before issues with tri count

GPU state之前,Drawcall相比tri count的问题,要优先解决。


**Drawcalls Performance Implications**

1.Render your triangles with as few Drawcalls as possible

2.50000 triangles can run worse than 50 million dependent on scene setup(Drawcalls)

3.When optimizing scene,know your bottleneck(Drawcalls vs Tri count)






**Optimizing Drawcalls (Merging objects)**

To lower the drawcalls it is better to use fewer larger models than many small ones

You cannot do that too much,it impacts other things negatively

* a. Occlusion
* b. Lightmapping
* c. Collision calculation
* d. Memory

Good balance between size and count is a good strategy



**Optimizing Drawcalls (Merging guidelines) 合并准则**

1.Target low poly objects

2.Merge only meshes within the same area

3.Merge only meshes sharing the same material

4.Meshes with no or simple collision are better for merging

5.Distant geometry is usually great to merge(fine with culling)








**Optimizing Drawcalls (HLODs)**

Hierarical level of Detail

* a.Regular LODs means a model becomes lower poly in the distance
* b.Essentially swaps one object for another simpler object(less materials)
* c.Hierical Lod(HLOD) is a bigger version, it merges objects together in the distance to lower the drawcalls

Lod 分层细节绘制。远距离视野的单个组合 静态网格体 替代多个 静态网格体,降低每帧的drawcalls数量以提升性能。

**do Instanced Rendering**

* a.Groups objecs together into single drawcalls
* b.Grouping need to be done manually


Strategy is to mix all prvious solutions

Some merged content(Materials merged)

Some modular content(instanced)

and swapable LODs and HLODs



**Vertex Processing**

First thing processing the Drawcall

Vertex shader takes care of this process

Vertex shader is a small program specialized in vertex processing

Runs completely on the GPU and so they are fast

Input is vertex data in 3D space output vertex data in screen-space

**Vertex-Shaders-Common tasks**

It converts local VTX positions to world position

It handles vertex shading/coloring

It can apply additional offsets to vertex positions






Practical examples of world position offset vertex shaders are


2.Water displacement

3.Foliage wind animation



**Vertex Shaders – Drawback**

vertex Shaders do not modify the actual object or affect the scene state, it is purely a visual effect

The CPU is not aware of what the vertex shaders do

Thus things like physics or collisions will not take it into account




**Vertex shader Performance Implications**
* 1.The more complex the animation performed the slower
* 2.The more vertices affected the slower
* 3.Disable complex vertex sahder effects on distant geometry






GPU ready to render pixels

Determine which pixels should be shaded called rasterizing

Done drawcall by drawcall then tri by tri

Pixel Shaders are responsible for calculating the pixel color

Input is generally interpolated vertex data, texture samplers

**Rasterizing inefficiency**

When rasterizing dense meshes at distance, they converge to only few pixels

A waste of vertex processing

A 100k tris object seen from so far away that it would be 1 pixel big,will only show 1 pixel of its closest triangle!




Due to hardware design, it always uses a 2×2 pixel quad for processing

If a traingle is very small or very thin then it means it might process 4 pixels while only 1 pixel is actually filled

由于硬件的原因,每次处理2×2 4个像素


Rasterization and Overshading Performance Implications

1. Triangles are more expensive to render in great density
2. When seen at a distance the density increases
3. Thus reducing triangle count at a distance(lodding/culling) is critical
4. Very thin triangles are inefficient because they pass through many 2×2 pixel quads yet only fill a fraction of them
5. The more complex the pixel shader is the more expensive

性能分析:密度大的三角面,性能要求高。距离远密度会变大,尽可能降低三角面个数,thin tri资源消耗大。


Results are written out to:

Multiple Gbuffers in case of deferred shading

Shaded buffer in case of forward shading


GBuffer PPerformance Implications

The GBuffer takes up a lot of memory and bandwidth and thus has a limit on how many different GBuffer images you can render out

Gbuffers memory is resolutions dependent



Two approaches for lighting and shadows
* Dynamic
* static

**Lighting(Deferred Shading)**

Is calclated and applied using pixel shaders

Dynamic point lights are rendered as spheres

The spheres act like a mask

Anything within the sphere is to receive a pixel shader operation to blend in the dynamic light



Light calculation requires position

Depth buffer used to get pixels pos in 3D

Use normal buffer to appley shading.Direct between Normal and light

计算光照,深度depth buffer和Normal buffer共同作用,计算光照。






Common technique for rendering shadows is Shadow Maps

Check for each pixel if it is visible to the given light or no

Requires rendering depth for light Pov

在light view空间下,渲染shadow map。

Process Pros/Cons
* Pros
* 1. Is rendered in real time using the GBuffer
* Lights can be changed,moved,or add
* Does not need any special model preparation
* Cons
* Especially shadows are performance heavy



**Quality Pros/Cons**
1. Shadows are heavy on performance, so usually render quality is reduced to compensate
2. Doea not do radiosity/global illumination for majority of content
3. Dynamic soft shadows are very hard to do well, dyn shadows ofter looks sharp or blocky



**Dynamic Lighting Performance Implications**

1. Small dyn light is relatively cheap in a deferred renderer
2. The cost is down to the pixel shader operations, so the more pixels the slower it is
3. the radius must be as small as possible
4. Prevent excessive and regular overlap




**Dynamic Shadows Performance Implication**

1. Turn off shadow casting if not needed
2. The tri count of geometry affect shadows perf
3. Fade or toggle off shadows when far away


Dynamic lights and shadows expensive

Thus part of it is offloaded to pre-calculations/pre-rendering

This is referred as static lights and shadows

Lighting data stored mainly in lightmaps


A lightmap is a texture with the lighting and shadows baked into it

An object usually requires UV lightmap coordinates for this to work

This texture is then multiplied on top of the basecolor






Stand alone application that handles light rendering,baking to lightmaps and integerating into materials

Raytracer supporting Gl

Supports distributed rendering over a network

Bake quality is determined by Light Build Quality as well as settings in the Lightmass section of each level

Better to have a lightmass importance Volume around part of the scene



**Process Pros/Cons**

1. Super fast for performance in real-time, but increases memory
2. Takes a long time to pre-calculate the lighting
3. Each time something is changed,it must be re-rendered again
4. Models require lightmap UVs,this additional prep step that takes time



Quality Pros/Cons
1. Handles Radiosity and Global Illumination
2. Renders realistic shadows including soft shadows
3. Quality is dependent on lightmap resolution and UV layout
4. May have seams in the lighting due to the UV layout



**Static Lighting Performance Implications**
1. Static Lighting always renders at the same speed
2. Lightmap resolution affects memory and filesize,not framerate
3. Bake time are increased by:
* Lightmap resolutions
* Number of models/light
* Higher quality settings
* Lights with a large attenuation radius or source radius




Visual effects applied at the very end of the rendering process

Uses the GBuffers to calculate its effects

Once more relies heavily on Pixel Shaders


* light Bloom
* Depth of Field/Blurring
* Some types of lensflares
* Light Shafts
* Vignette
* Tonemapping/Color correction
* Exposure
* Motion Blur


**Post Processing Performance Implications**

Affected directly by final resolution

Affected by shader complexity

Parameter(e.g.DoF blur radius)



### 参考视频:
Gnomon Masterclass Part II: Rendering in UE4 | Event Coverage | Unreal Engine


