为了与时俱进,紧跟UE官方的步伐,从本篇起:

分析的UE源码升级到4.26.2,不再是4.25的源码!

分析的UE源码升级到4.26.2,不再是4.25的源码!!

分析的UE源码升级到4.26.2,不再是4.25的源码!!!

重要的事情说三遍,需要同步看源码的同学注意更新了。

其实在剖析虚幻渲染体系(04)- 延迟渲染管线的章节4.3.9 PostProcessing就已经粗略地介绍过后处理的内容,不过本篇将更加深入详细地阐述后处理的流程和主要技术点。

更具体地,本篇主要阐述UE的以下内容:

  • 后处理主流程
  • 后处理序列
  • 后处理主要技术点

不过,还是推荐先阅读剖析虚幻渲染体系(04)- 延迟渲染管线,再阅读此篇,保持循序渐进地学习UE渲染体系的步伐。另外,也推荐先阅读笔者的一篇文章2.1 色彩理论,对颜色空间、色彩理论、线性空间和Gamma校正有了清晰的认知之后更好地过渡到本篇文章。

本篇涉及的源码目录主要在:

  • Engine\Source\Runtime\Renderer\Private\PostProcess\
  • Engine\Shaders\Private\

可能有些同学会疑惑,后处理不就是对渲染完的图片进行处理吗?跟图形渲染的紧密性有那么大么?可以忽略学习后处理么?

为了解答以上疑问,也体现后处理的重要性,以及其在UE或图形渲染的关联和地位,特地开辟了此小节。

以UE的默认场景为例,它的画面如下:

现在用以下命令行关闭所有后处理:

  1. ShowFlag.PostProcessing 0

结果画面变成了下面这般模样:

用RenderDoc截帧,发现上面的画面其实还应用了后处理阶段的Gamma校正,好家伙,把它也关了,由此得到彻底没有后处理的画面:

对比第一幅图,看到差别了么?是不是颜色的亮度、对比度、色彩、还有锯齿都不一样?

这也印证了,即便你没有对后处理做任何设置或更改,UE依然在默认情况下执行了很多后处理,才使得渲染画面最终正常地呈现在屏幕前。

由此可知,后处理之于渲染、之于UE,有着何等重要的位置。有了后处理,我们便可如虎添翼,画龙点睛,让画面才更加可信、生动、有趣。

实际上,后处理的应用远不止于此,结合深度、法线等屏幕空间的信息之后,将拥有更广阔更丰富的魔法世界。

 

本章将阐述后处理的一些基础知识点和概念及UE的操作使用。

艺术家和设计师使用虚幻引擎提供的后期处理效果,可以调整场景的整体外观和感觉。

默认情况下,UE会开启抗锯齿、自动曝光、Bloom、色调映射和Gamma校正等后处理:

当然,可以通过场景视图的Show/Post Processing菜单下的选项动态开启关闭后处理,以便观察指定后处理的对场景产生的效果和变化。

也可以通过之前提及的控制台命令开启或关闭后处理。

对于艺术家,更通用且方便的方法是往场景拖曳后处理体积(Post Processing Volume),以便精确地控制后处理效果和参数。

后处理体积涉及的类型和参数非常多,下面是后处理体积的属性分类:

其中Lens是镜头相关的后处理效果,包含Bloom、Exposure、Flares、DOF等效果;Color Grading是颜色分级,包含白平衡、全局、阴影、中调、高调等效果;Film就是电影色调映射,可以调整斜度、低调、黑色、肩部、白色等曲线参数;Rendering Feature包含了渲染管线相关的效果,包含后处理材质、环境立方图、AO、光追相关特性、GI、运动模糊、LPV、反射、SSR、透明、路径追踪以及屏幕百分比;最后是Post Processing Volume Setting,可以指定优先级、混合权重、设定是否影响无限范围(上图)。

同一个场景可以同时存在多个后处理体积,但为了性能和可维护性,应该保持一个场景只有一个全局后处理(Infinite Extent),其余的设置为局部范围。

虽然后处理体积提供了很多内置的后处理效果,但渲染的效果千变万化,它们肯定无法全部满足应用的实际需求。UE的后处理材质(Post Processing Material)便满足自定义要求,可以利用材质编辑器实现自定义的后处理效果。

添加后处理材质也不复杂,新建材质,将材质域(Material Domain)设置为Post Process,此时材质引脚只有Emissive Color被点亮:

并且Post Process Material属性栏处于可编辑状态:

这些参数的含义说明如下:

  • Blendable Location:材质混合位置,可选的有After Tonemapping(色调映射之后)、Before Tonemapping(色调映射之前)、Before Translucency(透明之前)、Replacing Tonemapping(替换色调映射)、SSR Input(屏幕空间反射输入),默认是After Tonemapping(色调映射之后)。
  • Output Alpha:是否输出Alpha,如果是,则需要正确处理和输出Emissive Color的Alpha通道。默认不开启。
  • Blendable Priority:混合优先级,数值越高越后被渲染(即优先级小的先被渲染)。默认是0。
  • Is Blendable:是否可混合,如果可混合,会在C++层预先插值混合共享同一母材质的所有材质(或材质实例)参数。默认是可以。
  • Enable Stencil Test:是否开启模板测试,如果开启,可以设置比较方式和参考值。默认不开启。

编辑好后处理材质之后,为了将它应用到场景上,可以在后处理体积的Rendering Feature属性栏的Post Process Material列表中设置:

还可以调整每个材质的混合权重和顺序(拖曳权重左边的点阵)。

值得注意的是,在后处理材质中,SceneTexture材质节点的SceneColor无法访问,否则会报错:

后处理材质中无法访问SceneColor,提示SceneColor只能在Surface材质域中使用。

解决这个问题就是选中SceneTexture节点,在属性栏的Scene Texture Id选择PostProcessInput0:

除了PostProcessInput0,还有其它很多屏幕空间的数据(GBuffer)可以被后处理材质读取:

但是,在多数后处理通道中,PostProcessInput1~PostProcessInput6是空纹理。

 

本章将进入UE的后处理代码进行分析。

后处理的主入口是AddPostProcessingPasses,位于的FDeferred末尾:

  1. void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
  2. {
  3. (......)
  4. RenderTranslucency(RHICmdList, ...);
  5. (......)
  6. // 后处理阶段。
  7. if (ViewFamily.bResolveScene)
  8. {
  9. GRenderTargetPool.AddPhaseEvent(TEXT("PostProcessing"));
  10. (......)
  11. // 后处理的输入参数.
  12. FPostProcessingInputs PostProcessingInputs;
  13. PostProcessingInputs.ViewFamilyTexture = ViewFamilyTexture;
  14. PostProcessingInputs.SeparateTranslucencyTextures = &SeparateTranslucencyTextures;
  15. PostProcessingInputs.SceneTextures = SceneTextures;
  16. (......)
  17. {
  18. // 遍历所有view, 每个view增加后处理Pass.
  19. for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
  20. {
  21. FViewInfo& View = Views[ViewIndex];
  22. // 增加后处理通道.
  23. AddPostProcessingPasses(GraphBuilder, View, PostProcessingInputs);
  24. }
  25. }
  26. // 将场景上下文的场景颜色纹理置空.
  27. AddPass(GraphBuilder, [this, &SceneContext](FRHICommandListImmediate&)
  28. {
  29. SceneContext.SetSceneColor(nullptr);
  30. });
  31. }
  32. (......)
  33. }

AddPostProcessingPasses是处理UE内置后处理序列的,涉及的代码量大,不过下面先分析其主要流程:

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessing.cpp
  2. void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
  3. {
  4. Inputs.Validate();
  5. // 获取纹理和视图数据.
  6. const FIntRect PrimaryViewRect = View.ViewRect;
  7. const FSceneTextureParameters SceneTextureParameters = GetSceneTextureParameters(GraphBuilder, Inputs.SceneTextures);
  8. const FScreenPassRenderTarget ViewFamilyOutput = FScreenPassRenderTarget::CreateViewFamilyOutput(Inputs.ViewFamilyTexture, View);
  9. const FScreenPassTexture SceneDepth(SceneTextureParameters.SceneDepthTexture, PrimaryViewRect);
  10. const FScreenPassTexture SeparateTranslucency(Inputs.SeparateTranslucencyTextures->GetColorForRead(GraphBuilder), PrimaryViewRect);
  11. const FScreenPassTexture CustomDepth((*Inputs.SceneTextures)->CustomDepthTexture, PrimaryViewRect);
  12. const FScreenPassTexture Velocity(SceneTextureParameters.GBufferVelocityTexture, PrimaryViewRect);
  13. const FScreenPassTexture BlackDummy(GSystemTextures.GetBlackDummy(GraphBuilder));
  14. // 场景颜色.
  15. FScreenPassTexture SceneColor((*Inputs.SceneTextures)->SceneColorTexture, PrimaryViewRect);
  16. FScreenPassTexture SceneColorBeforeTonemap;
  17. FScreenPassTexture SceneColorAfterTonemap;
  18. const FScreenPassTexture OriginalSceneColor = SceneColor;
  19. // 初始化纹理.
  20. const FEyeAdaptationParameters EyeAdaptationParameters = GetEyeAdaptationParameters(View, ERHIFeatureLevel::SM5);
  21. FRDGTextureRef LastEyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
  22. FRDGTextureRef EyeAdaptationTexture = LastEyeAdaptationTexture;
  23. FRDGTextureRef HistogramTexture = BlackDummy.Texture;
  24. // 处理后处理开启标记.
  25. const FEngineShowFlags& EngineShowFlags = View.Family->EngineShowFlags;
  26. const bool bVisualizeHDR = EngineShowFlags.VisualizeHDR;
  27. const bool bViewFamilyOutputInHDR = GRHISupportsHDROutput && IsHDREnabled();
  28. const bool bVisualizeGBufferOverview = IsVisualizeGBufferOverviewEnabled(View);
  29. const bool bVisualizeGBufferDumpToFile = IsVisualizeGBufferDumpToFileEnabled(View);
  30. const bool bVisualizeGBufferDumpToPIpe = IsVisualizeGBufferDumpToPipeEnabled(View);
  31. const bool bOutputInHDR = IsPostProcessingOutputInHDR();
  32. const FPaniniProjectionConfig PaniniConfig(View);
  33. // 后处理特定Pass.
  34. enum class EPass : uint32
  35. {
  36. MotionBlur, // 运动模糊
  37. Tonemap, // 色调映射
  38. FXAA, // FXAA抗锯齿
  39. PostProcessMaterialAfterTonemapping, // 色调映射之后的后处理
  40. VisualizeDepthOfField,
  41. VisualizeStationaryLightOverlap,
  42. VisualizeLightCulling,
  43. SelectionOutline,
  44. EditorPrimitive,
  45. VisualizeShadingModels,
  46. VisualizeGBufferHints,
  47. VisualizeSubsurface,
  48. VisualizeGBufferOverview,
  49. VisualizeHDR,
  50. PixelInspector,
  51. HMDDistortion,
  52. HighResolutionScreenshotMask,
  53. PrimaryUpscale, // 主放大
  54. SecondaryUpscale, // 次放大
  55. MAX
  56. };
  57. (......)
  58. // 后处理特定Pass对应的名字.
  59. const TCHAR* PassNames[] =
  60. {
  61. TEXT("MotionBlur"),
  62. TEXT("Tonemap"),
  63. TEXT("FXAA"),
  64. TEXT("PostProcessMaterial (AfterTonemapping)"),
  65. TEXT("VisualizeDepthOfField"),
  66. TEXT("VisualizeStationaryLightOverlap"),
  67. TEXT("VisualizeLightCulling"),
  68. TEXT("SelectionOutline"),
  69. TEXT("EditorPrimitive"),
  70. TEXT("VisualizeShadingModels"),
  71. TEXT("VisualizeGBufferHints"),
  72. TEXT("VisualizeSubsurface"),
  73. TEXT("VisualizeGBufferOverview"),
  74. TEXT("VisualizeHDR"),
  75. TEXT("PixelInspector"),
  76. TEXT("HMDDistortion"),
  77. TEXT("HighResolutionScreenshotMask"),
  78. TEXT("PrimaryUpscale"),
  79. TEXT("SecondaryUpscale")
  80. };
  81. static_assert(static_cast<uint32>(EPass::MAX) == UE_ARRAY_COUNT(PassNames), "EPass does not match PassNames.");
  82. // 声明后处理序列PassSequence实例.
  83. TOverridePassSequence<EPass> PassSequence(ViewFamilyOutput);
  84. PassSequence.SetNames(PassNames, UE_ARRAY_COUNT(PassNames));
  85. // 开启或关闭指定Pass.
  86. PassSequence.SetEnabled(EPass::VisualizeStationaryLightOverlap, EngineShowFlags.StationaryLightOverlap);
  87. PassSequence.SetEnabled(EPass::VisualizeLightCulling, EngineShowFlags.VisualizeLightCulling);
  88. PassSequence.SetEnabled(EPass::SelectionOutline, false);
  89. PassSequence.SetEnabled(EPass::EditorPrimitive, false);
  90. PassSequence.SetEnabled(EPass::VisualizeShadingModels, EngineShowFlags.VisualizeShadingModels);
  91. PassSequence.SetEnabled(EPass::VisualizeGBufferHints, EngineShowFlags.GBufferHints);
  92. PassSequence.SetEnabled(EPass::VisualizeSubsurface, EngineShowFlags.VisualizeSSS);
  93. PassSequence.SetEnabled(EPass::VisualizeGBufferOverview, bVisualizeGBufferOverview || bVisualizeGBufferDumpToFile || bVisualizeGBufferDumpToPIpe);
  94. PassSequence.SetEnabled(EPass::VisualizeHDR, EngineShowFlags.VisualizeHDR);
  95. PassSequence.SetEnabled(EPass::PixelInspector, false);
  96. PassSequence.SetEnabled(EPass::HMDDistortion, EngineShowFlags.StereoRendering && EngineShowFlags.HMDDistortion);
  97. PassSequence.SetEnabled(EPass::HighResolutionScreenshotMask, IsHighResolutionScreenshotMaskEnabled(View));
  98. PassSequence.SetEnabled(EPass::PrimaryUpscale, PaniniConfig.IsEnabled() || (View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::SpatialUpscale && PrimaryViewRect.Size() != View.GetSecondaryViewRectSize()));
  99. PassSequence.SetEnabled(EPass::SecondaryUpscale, View.RequiresSecondaryUpscale());
  100. (......)
  101. if (IsPostProcessingEnabled(View)) // 视图启用后处理
  102. {
  103. const EStereoscopicPass StereoPass = View.StereoPass;
  104. // 处理数据和标记.
  105. const bool bPrimaryView = IStereoRendering::IsAPrimaryView(View);
  106. const bool bHasViewState = View.ViewState != nullptr;
  107. const bool bDepthOfFieldEnabled = DiaphragmDOF::IsEnabled(View);
  108. const bool bVisualizeDepthOfField = bDepthOfFieldEnabled && EngineShowFlags.VisualizeDOF;
  109. const bool bVisualizeMotionBlur = IsVisualizeMotionBlurEnabled(View);
  110. const EAutoExposureMethod AutoExposureMethod = GetAutoExposureMethod(View);
  111. const EAntiAliasingMethod AntiAliasingMethod = !bVisualizeDepthOfField ? View.AntiAliasingMethod : AAM_None;
  112. const EDownsampleQuality DownsampleQuality = GetDownsampleQuality();
  113. const EPixelFormat DownsampleOverrideFormat = PF_FloatRGB;
  114. const bool bMotionBlurEnabled = !bVisualizeMotionBlur && IsMotionBlurEnabled(View);
  115. const bool bTonemapEnabled = !bVisualizeMotionBlur;
  116. const bool bTonemapOutputInHDR = View.Family->SceneCaptureSource == SCS_FinalColorHDR || View.Family->SceneCaptureSource == SCS_FinalToneCurveHDR || bOutputInHDR || bViewFamilyOutputInHDR;
  117. const bool bEyeAdaptationEnabled = bHasViewState && bPrimaryView;
  118. const bool bHistogramEnabled = bVisualizeHDR || (bEyeAdaptationEnabled && AutoExposureMethod == EAutoExposureMethod::AEM_Histogram && View.FinalPostProcessSettings.AutoExposureMinBrightness < View.FinalPostProcessSettings.AutoExposureMaxBrightness);
  119. const bool bBloomEnabled = View.FinalPostProcessSettings.BloomIntensity > 0.0f;
  120. // 色调映射之后的后处理材质.
  121. const FPostProcessMaterialChain PostProcessMaterialAfterTonemappingChain = GetPostProcessMaterialChain(View, BL_AfterTonemapping);
  122. PassSequence.SetEnabled(EPass::MotionBlur, bVisualizeMotionBlur || bMotionBlurEnabled);
  123. PassSequence.SetEnabled(EPass::Tonemap, bTonemapEnabled);
  124. PassSequence.SetEnabled(EPass::FXAA, AntiAliasingMethod == AAM_FXAA);
  125. PassSequence.SetEnabled(EPass::PostProcessMaterialAfterTonemapping, PostProcessMaterialAfterTonemappingChain.Num() != 0);
  126. PassSequence.SetEnabled(EPass::VisualizeDepthOfField, bVisualizeDepthOfField);
  127. // 插件后处理回调.
  128. for (int32 ViewExt = 0; ViewExt < View.Family->ViewExtensions.Num(); ++ViewExt)
  129. {
  130. for (int32 SceneViewPassId = 0; SceneViewPassId != static_cast<int>(ISceneViewExtension::EPostProcessingPass::MAX); SceneViewPassId++)
  131. {
  132. ISceneViewExtension::EPostProcessingPass SceneViewPass = static_cast<ISceneViewExtension::EPostProcessingPass>(SceneViewPassId);
  133. EPass PostProcessingPass = TranslatePass(SceneViewPass);
  134. View.Family->ViewExtensions[ViewExt]->SubscribeToPostProcessingPass(
  135. SceneViewPass,
  136. PassSequence.GetAfterPassCallbacks(PostProcessingPass),
  137. PassSequence.IsEnabled(PostProcessingPass));
  138. }
  139. }
  140. // 后处理序列开启或关闭处理完毕.
  141. PassSequence.Finalize();
  142. // 后处理材质链 - 透明混合之前(Before Translucency)
  143. {
  144. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTranslucency);
  145. if (MaterialChain.Num())
  146. {
  147. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
  148. }
  149. }
  150. // 光圈DOF
  151. {
  152. FRDGTextureRef LocalSceneColorTexture = SceneColor.Texture;
  153. if (bDepthOfFieldEnabled)
  154. {
  155. LocalSceneColorTexture = DiaphragmDOF::AddPasses(GraphBuilder, SceneTextureParameters, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
  156. }
  157. if (LocalSceneColorTexture == SceneColor.Texture)
  158. {
  159. LocalSceneColorTexture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
  160. }
  161. SceneColor.Texture = LocalSceneColorTexture;
  162. }
  163. // 后处理材质链 - 色调映射之前(Before Tonemapping)
  164. {
  165. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTonemapping);
  166. if (MaterialChain.Num())
  167. {
  168. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
  169. }
  170. }
  171. FScreenPassTexture HalfResolutionSceneColor;
  172. // 主视图区域.
  173. FIntRect SecondaryViewRect = PrimaryViewRect;
  174. // 时间抗锯齿TAA.
  175. if (AntiAliasingMethod == AAM_TemporalAA)
  176. {
  177. // 是否允许场景颜色下采样.
  178. const bool bAllowSceneDownsample =
  179. IsTemporalAASceneDownsampleAllowed(View) &&
  180. // We can only merge if the normal downsample pass would happen immediately after.
  181. !bMotionBlurEnabled && !bVisualizeMotionBlur &&
  182. // TemporalAA is only able to match the low quality mode (box filter).
  183. GetDownsampleQuality() == EDownsampleQuality::Low;
  184. int32 UpscaleMode = ITemporalUpscaler::GetTemporalUpscalerMode();
  185. const ITemporalUpscaler* DefaultTemporalUpscaler = ITemporalUpscaler::GetDefaultTemporalUpscaler();
  186. const ITemporalUpscaler* UpscalerToUse = ( UpscaleMode == 0 || !View.Family->GetTemporalUpscalerInterface())? DefaultTemporalUpscaler : View.Family->GetTemporalUpscalerInterface();
  187. const TCHAR* UpscalerName = UpscalerToUse->GetDebugName();
  188. (......)
  189. ITemporalUpscaler::FPassInputs UpscalerPassInputs;
  190. UpscalerPassInputs.bAllowDownsampleSceneColor = bAllowSceneDownsample;
  191. UpscalerPassInputs.DownsampleOverrideFormat = DownsampleOverrideFormat;
  192. UpscalerPassInputs.SceneColorTexture = SceneColor.Texture;
  193. UpscalerPassInputs.SceneDepthTexture = SceneDepth.Texture;
  194. UpscalerPassInputs.SceneVelocityTexture = Velocity.Texture;
  195. UpscalerPassInputs.EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
  196. // 增加TAA Pass.
  197. UpscalerToUse->AddPasses(
  198. GraphBuilder,
  199. View,
  200. UpscalerPassInputs,
  201. &SceneColor.Texture,
  202. &SecondaryViewRect,
  203. &HalfResolutionSceneColor.Texture,
  204. &HalfResolutionSceneColor.ViewRect);
  205. }
  206. // 屏幕空间反射(SSR).
  207. else if (ShouldRenderScreenSpaceReflections(View))
  208. {
  209. if (!View.bStatePrevViewInfoIsReadOnly)
  210. {
  211. check(View.ViewState);
  212. FTemporalAAHistory& OutputHistory = View.ViewState->PrevFrameViewInfo.TemporalAAHistory;
  213. GraphBuilder.QueueTextureExtraction(SceneColor.Texture, &OutputHistory.RT[0]);
  214. FTAAPassParameters TAAInputs(View);
  215. TAAInputs.SceneColorInput = SceneColor.Texture;
  216. TAAInputs.SetupViewRect(View);
  217. OutputHistory.ViewportRect = TAAInputs.OutputViewRect;
  218. OutputHistory.ReferenceBufferSize = TAAInputs.GetOutputExtent() * TAAInputs.ResolutionDivisor;
  219. }
  220. }
  221. // 场景颜色视图区域编程次视图的.
  222. SceneColor.ViewRect = SecondaryViewRect;
  223. // 后处理材质链 - 屏幕空间反射输入(SSR Input)
  224. if (View.ViewState && !View.bStatePrevViewInfoIsReadOnly)
  225. {
  226. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_SSRInput);
  227. if (MaterialChain.Num())
  228. {
  229. // 保存SSR的后处理输出给下一帧使用.
  230. FScreenPassTexture PassOutput = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
  231. GraphBuilder.QueueTextureExtraction(PassOutput.Texture, &View.ViewState->PrevFrameViewInfo.CustomSSRInput);
  232. }
  233. }
  234. // 运动模糊.
  235. if (PassSequence.IsEnabled(EPass::MotionBlur))
  236. {
  237. FMotionBlurInputs PassInputs;
  238. PassSequence.AcceptOverrideIfLastPass(EPass::MotionBlur, PassInputs.OverrideOutput);
  239. PassInputs.SceneColor = SceneColor;
  240. PassInputs.SceneDepth = SceneDepth;
  241. PassInputs.SceneVelocity = Velocity;
  242. PassInputs.Quality = GetMotionBlurQuality();
  243. PassInputs.Filter = GetMotionBlurFilter();
  244. if (bVisualizeMotionBlur)
  245. {
  246. SceneColor = AddVisualizeMotionBlurPass(GraphBuilder, View, PassInputs);
  247. }
  248. else
  249. {
  250. SceneColor = AddMotionBlurPass(GraphBuilder, View, PassInputs);
  251. }
  252. }
  253. SceneColor = AddAfterPass(EPass::MotionBlur, SceneColor);
  254. // 如果TAA没有下采样场景颜色, 这里将采用半尺寸分辨率执行之.
  255. if (!HalfResolutionSceneColor.Texture)
  256. {
  257. FDownsamplePassInputs PassInputs;
  258. PassInputs.Name = TEXT("HalfResolutionSceneColor");
  259. PassInputs.SceneColor = SceneColor;
  260. PassInputs.Quality = DownsampleQuality;
  261. PassInputs.FormatOverride = DownsampleOverrideFormat;
  262. HalfResolutionSceneColor = AddDownsamplePass(GraphBuilder, View, PassInputs);
  263. }
  264. // 保存半尺寸分辨率的场景颜色到历史中.
  265. extern int32 GSSRHalfResSceneColor;
  266. if (ShouldRenderScreenSpaceReflections(View) && !View.bStatePrevViewInfoIsReadOnly && GSSRHalfResSceneColor)
  267. {
  268. check(View.ViewState);
  269. GraphBuilder.QueueTextureExtraction(HalfResolutionSceneColor.Texture, &View.ViewState->PrevFrameViewInfo.HalfResTemporalAAHistory);
  270. }
  271. FSceneDownsampleChain SceneDownsampleChain;
  272. // 直方图.
  273. if (bHistogramEnabled)
  274. {
  275. HistogramTexture = AddHistogramPass(GraphBuilder, View, EyeAdaptationParameters, HalfResolutionSceneColor, LastEyeAdaptationTexture);
  276. }
  277. // 人眼适应(自动曝光).
  278. if (bEyeAdaptationEnabled)
  279. {
  280. const bool bBasicEyeAdaptationEnabled = bEyeAdaptationEnabled && (AutoExposureMethod == EAutoExposureMethod::AEM_Basic);
  281. if (bBasicEyeAdaptationEnabled)
  282. {
  283. const bool bLogLumaInAlpha = true;
  284. SceneDownsampleChain.Init(GraphBuilder, View, EyeAdaptationParameters, HalfResolutionSceneColor, DownsampleQuality, bLogLumaInAlpha);
  285. // Use the alpha channel in the last downsample (smallest) to compute eye adaptations values.
  286. EyeAdaptationTexture = AddBasicEyeAdaptationPass(GraphBuilder, View, EyeAdaptationParameters, SceneDownsampleChain.GetLastTexture(), LastEyeAdaptationTexture);
  287. }
  288. // Add histogram eye adaptation pass even if no histogram exists to support the manual clamping mode.
  289. else
  290. {
  291. EyeAdaptationTexture = AddHistogramEyeAdaptationPass(GraphBuilder, View, EyeAdaptationParameters, HistogramTexture);
  292. }
  293. }
  294. FScreenPassTexture Bloom;
  295. // 泛光.
  296. if (bBloomEnabled)
  297. {
  298. FSceneDownsampleChain BloomDownsampleChain;
  299. FBloomInputs PassInputs;
  300. PassInputs.SceneColor = SceneColor;
  301. const bool bBloomThresholdEnabled = View.FinalPostProcessSettings.BloomThreshold > -1.0f;
  302. // Reuse the main scene downsample chain if a threshold isn\'t required for bloom.
  303. if (SceneDownsampleChain.IsInitialized() && !bBloomThresholdEnabled)
  304. {
  305. PassInputs.SceneDownsampleChain = &SceneDownsampleChain;
  306. }
  307. else
  308. {
  309. FScreenPassTexture DownsampleInput = HalfResolutionSceneColor;
  310. if (bBloomThresholdEnabled)
  311. {
  312. const float BloomThreshold = View.FinalPostProcessSettings.BloomThreshold;
  313. FBloomSetupInputs SetupPassInputs;
  314. SetupPassInputs.SceneColor = DownsampleInput;
  315. SetupPassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
  316. SetupPassInputs.Threshold = BloomThreshold;
  317. DownsampleInput = AddBloomSetupPass(GraphBuilder, View, SetupPassInputs);
  318. }
  319. const bool bLogLumaInAlpha = false;
  320. BloomDownsampleChain.Init(GraphBuilder, View, EyeAdaptationParameters, DownsampleInput, DownsampleQuality, bLogLumaInAlpha);
  321. PassInputs.SceneDownsampleChain = &BloomDownsampleChain;
  322. }
  323. FBloomOutputs PassOutputs = AddBloomPass(GraphBuilder, View, PassInputs);
  324. SceneColor = PassOutputs.SceneColor;
  325. Bloom = PassOutputs.Bloom;
  326. FScreenPassTexture LensFlares = AddLensFlaresPass(GraphBuilder, View, Bloom, *PassInputs.SceneDownsampleChain);
  327. if (LensFlares.IsValid())
  328. {
  329. Bloom = LensFlares;
  330. }
  331. }
  332. if (!Bloom.IsValid())
  333. {
  334. Bloom = BlackDummy;
  335. }
  336. SceneColorBeforeTonemap = SceneColor;
  337. // 色调映射.
  338. if (PassSequence.IsEnabled(EPass::Tonemap))
  339. {
  340. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_ReplacingTonemapper);
  341. if (MaterialChain.Num())
  342. {
  343. const UMaterialInterface* HighestPriorityMaterial = MaterialChain[0];
  344. FPostProcessMaterialInputs PassInputs;
  345. PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
  346. PassInputs.SetInput(EPostProcessMaterialInput::SceneColor, SceneColor);
  347. PassInputs.SetInput(EPostProcessMaterialInput::SeparateTranslucency, SeparateTranslucency);
  348. PassInputs.SetInput(EPostProcessMaterialInput::CombinedBloom, Bloom);
  349. PassInputs.SceneTextures = GetSceneTextureShaderParameters(Inputs.SceneTextures);
  350. PassInputs.CustomDepthTexture = CustomDepth.Texture;
  351. SceneColor = AddPostProcessMaterialPass(GraphBuilder, View, PassInputs, HighestPriorityMaterial);
  352. }
  353. else
  354. {
  355. FRDGTextureRef ColorGradingTexture = nullptr;
  356. if (bPrimaryView)
  357. {
  358. ColorGradingTexture = AddCombineLUTPass(GraphBuilder, View);
  359. }
  360. // We can re-use the color grading texture from the primary view.
  361. else if (View.GetTonemappingLUT())
  362. {
  363. ColorGradingTexture = TryRegisterExternalTexture(GraphBuilder, View.GetTonemappingLUT());
  364. }
  365. else
  366. {
  367. const FViewInfo* PrimaryView = static_cast<const FViewInfo*>(View.Family->Views[0]);
  368. ColorGradingTexture = TryRegisterExternalTexture(GraphBuilder, PrimaryView->GetTonemappingLUT());
  369. }
  370. FTonemapInputs PassInputs;
  371. PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
  372. PassInputs.SceneColor = SceneColor;
  373. PassInputs.Bloom = Bloom;
  374. PassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
  375. PassInputs.ColorGradingTexture = ColorGradingTexture;
  376. PassInputs.bWriteAlphaChannel = AntiAliasingMethod == AAM_FXAA || IsPostProcessingWithAlphaChannelSupported();
  377. PassInputs.bOutputInHDR = bTonemapOutputInHDR;
  378. SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
  379. }
  380. }
  381. SceneColor = AddAfterPass(EPass::Tonemap, SceneColor);
  382. SceneColorAfterTonemap = SceneColor;
  383. // FXAA抗锯齿.
  384. if (PassSequence.IsEnabled(EPass::FXAA))
  385. {
  386. FFXAAInputs PassInputs;
  387. PassSequence.AcceptOverrideIfLastPass(EPass::FXAA, PassInputs.OverrideOutput);
  388. PassInputs.SceneColor = SceneColor;
  389. PassInputs.Quality = GetFXAAQuality();
  390. SceneColor = AddFXAAPass(GraphBuilder, View, PassInputs);
  391. }
  392. SceneColor = AddAfterPass(EPass::FXAA, SceneColor);
  393. // 后处理材质链 - 色调映射之后(After Tonemapping)
  394. if (PassSequence.IsEnabled(EPass::PostProcessMaterialAfterTonemapping))
  395. {
  396. FPostProcessMaterialInputs PassInputs = GetPostProcessMaterialInputs(SceneColor);
  397. PassSequence.AcceptOverrideIfLastPass(EPass::PostProcessMaterialAfterTonemapping, PassInputs.OverrideOutput);
  398. PassInputs.SetInput(EPostProcessMaterialInput::PreTonemapHDRColor, SceneColorBeforeTonemap);
  399. PassInputs.SetInput(EPostProcessMaterialInput::PostTonemapHDRColor, SceneColorAfterTonemap);
  400. PassInputs.SceneTextures = GetSceneTextureShaderParameters(Inputs.SceneTextures);
  401. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, PassInputs, PostProcessMaterialAfterTonemappingChain);
  402. }
  403. (......)
  404. SceneColor = AddAfterPass(EPass::VisualizeDepthOfField, SceneColor);
  405. }
  406. else // 视图不启用后处理, 则最小化后处理序列: 只混合透明纹理和Gamma校正.
  407. {
  408. PassSequence.SetEnabled(EPass::MotionBlur, false);
  409. PassSequence.SetEnabled(EPass::Tonemap, true);
  410. PassSequence.SetEnabled(EPass::FXAA, false);
  411. PassSequence.SetEnabled(EPass::PostProcessMaterialAfterTonemapping, false);
  412. PassSequence.SetEnabled(EPass::VisualizeDepthOfField, false);
  413. PassSequence.Finalize();
  414. SceneColor.Texture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
  415. SceneColorBeforeTonemap = SceneColor;
  416. if (PassSequence.IsEnabled(EPass::Tonemap))
  417. {
  418. FTonemapInputs PassInputs;
  419. PassSequence.AcceptOverrideIfLastPass(EPass::Tonemap, PassInputs.OverrideOutput);
  420. PassInputs.SceneColor = SceneColor;
  421. PassInputs.EyeAdaptationTexture = EyeAdaptationTexture;
  422. PassInputs.bOutputInHDR = bViewFamilyOutputInHDR;
  423. PassInputs.bGammaOnly = true;
  424. SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
  425. }
  426. SceneColor = AddAfterPass(EPass::Tonemap, SceneColor);
  427. SceneColorAfterTonemap = SceneColor;
  428. }
  429. // 可视化后处理Pass.
  430. if (PassSequence.IsEnabled(EPass::VisualizeStationaryLightOverlap))
  431. {
  432. (......)
  433. SceneColor = AddVisualizeComplexityPass(GraphBuilder, View, PassInputs);
  434. }
  435. (......) // 忽略编辑器或可视化代码.
  436. // 主放大Pass
  437. if (PassSequence.IsEnabled(EPass::PrimaryUpscale))
  438. {
  439. FUpscaleInputs PassInputs;
  440. PassSequence.AcceptOverrideIfLastPass(EPass::PrimaryUpscale, PassInputs.OverrideOutput);
  441. PassInputs.SceneColor = SceneColor;
  442. PassInputs.Method = GetUpscaleMethod();
  443. PassInputs.Stage = PassSequence.IsEnabled(EPass::SecondaryUpscale) ? EUpscaleStage::PrimaryToSecondary : EUpscaleStage::PrimaryToOutput;
  444. // 帕尼尼投影(Panini projection)由主放大通道处理。
  445. PassInputs.PaniniConfig = PaniniConfig;
  446. SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
  447. }
  448. // 次放大Pass
  449. if (PassSequence.IsEnabled(EPass::SecondaryUpscale))
  450. {
  451. FUpscaleInputs PassInputs;
  452. PassSequence.AcceptOverrideIfLastPass(EPass::SecondaryUpscale, PassInputs.OverrideOutput);
  453. PassInputs.SceneColor = SceneColor;
  454. PassInputs.Method = View.Family->SecondaryScreenPercentageMethod == ESecondaryScreenPercentageMethod::LowerPixelDensitySimulation ? EUpscaleMethod::SmoothStep : EUpscaleMethod::Nearest;
  455. PassInputs.Stage = EUpscaleStage::SecondaryToOutput;
  456. SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
  457. }
  458. }

后处理材质会先声明一个TOverridePassSequence实例,然后按需开启或关闭它们,之后会根据视图是否启用后处理进入两个分支:如果启用,则按序列先后处理每个后处理效果;如果不启用,则只保留最小化的后处理序列,仅包含透明纹理混合和Gamma校正。

至于判断视图是否开启后处理,由以下接口实现:

  1. bool IsPostProcessingEnabled(const FViewInfo& View)
  2. {
  3. if (View.GetFeatureLevel() >= ERHIFeatureLevel::SM5) // 高于SM5的设备
  4. {
  5. return
  6. // 视图家族开启后处理
  7. View.Family->EngineShowFlags.PostProcessing &&
  8. // 并且不是可视化调试模式.
  9. !View.Family->EngineShowFlags.VisualizeDistanceFieldAO &&
  10. !View.Family->EngineShowFlags.VisualizeShadingModels &&
  11. !View.Family->EngineShowFlags.VisualizeMeshDistanceFields &&
  12. !View.Family->EngineShowFlags.VisualizeGlobalDistanceField &&
  13. !View.Family->EngineShowFlags.ShaderComplexity;
  14. }
  15. // < SM5的设备
  16. else
  17. {
  18. // 视图家族开启后处理且不开启着色复杂度调试模式且是移动端HDR管线.
  19. return View.Family->EngineShowFlags.PostProcessing && !View.Family->EngineShowFlags.ShaderComplexity && IsMobileHDR();
  20. }
  21. }

这些特定后处理Pass都有统一的形式:输入纹理、输入参数、输出纹理,输入纹理必然包含SceneColor,输出纹理通常也是SceneColor,并且上一个后处理Pass的SceneColor输出作为下一个后处理Pass的SceneColor输入,以实现不同后处理效果的叠加。

另外,需要注意的是,有一些后处理(屏幕空间)的效果并没有在PassSequence中体现,而是安插在后处理渲染管线的特定位置中。

由于后处理效果是叠加状态,意味着它们的混合(处理)顺序相关,如果处理顺序不当,将得到非预想的结果。

TOverridePassSequence就是给定一个枚举类型,按照特殊规则管理和有序地执行所有的Pass。它的定义如下:

  1. // Engine\Source\Runtime\Renderer\Private\OverridePassSequence.h
  2. template <typename EPass>
  3. class TOverridePassSequence final
  4. {
  5. public:
  6. TOverridePassSequence(const FScreenPassRenderTarget& InOverrideOutput)
  7. : OverrideOutput(InOverrideOutput)
  8. {}
  9. ~TOverridePassSequence();
  10. // 设置名字
  11. void SetName(EPass Pass, const TCHAR* Name);
  12. void SetNames(const TCHAR* const* Names, uint32 NameCount);
  13. // 开启指定Pass.
  14. void SetEnabled(EPass Pass, bool bEnabled);
  15. bool IsEnabled(EPass Pass) const;
  16. // 是否最后一个Pass.
  17. bool IsLastPass(EPass Pass) const;
  18. // 接受Pass, 如果没有按顺序, 则会报错.
  19. void AcceptPass(EPass Pass)
  20. {
  21. #if RDG_ENABLE_DEBUG
  22. const int32 PassIndex = (int32)Pass;
  23. check(bFinalized);
  24. checkf(NextPass == Pass, TEXT("Pass was accepted out of order: %s. Expected %s."), Passes[PassIndex].Name, Passes[(int32)NextPass].Name);
  25. checkf(Passes[PassIndex].bEnabled, TEXT("Only accepted passes can be enabled: %s."), Passes[PassIndex].Name);
  26. Passes[PassIndex].bAccepted = true;
  27. // Walk the remaining passes until we hit one that\'s enabled. This will be the next pass to add.
  28. for (int32 NextPassIndex = int32(NextPass) + 1; NextPassIndex < PassCountMax; ++NextPassIndex)
  29. {
  30. if (Passes[NextPassIndex].bEnabled)
  31. {
  32. NextPass = EPass(NextPassIndex);
  33. break;
  34. }
  35. }
  36. #endif
  37. }
  38. // 如果Pass是最后一个, 则接受覆盖的RT.
  39. bool AcceptOverrideIfLastPass(EPass Pass, FScreenPassRenderTarget& OutTargetToOverride, const TOptional<int32>& AfterPassCallbackIndex = TOptional<int32>())
  40. {
  41. bool bLastAfterPass = AfterPass[(int32)Pass].Num() == 0;
  42. if (AfterPassCallbackIndex)
  43. {
  44. bLastAfterPass = AfterPassCallbackIndex.GetValue() == AfterPass[(int32)Pass].Num() - 1;
  45. }
  46. else
  47. {
  48. // Display debug information for a Pass unless it is an after pass.
  49. AcceptPass(Pass);
  50. }
  51. // We need to override output only if this is the last pass and the last after pass.
  52. if (IsLastPass(Pass) && bLastAfterPass)
  53. {
  54. OutTargetToOverride = OverrideOutput;
  55. return true;
  56. }
  57. return false;
  58. }
  59. // Pass开启结束.
  60. void Finalize()
  61. {
  62. #if RDG_ENABLE_DEBUG
  63. check(!bFinalized);
  64. bFinalized = true;
  65. for (int32 PassIndex = 0; PassIndex < PassCountMax; ++PassIndex)
  66. {
  67. checkf(Passes[PassIndex].bAssigned, TEXT("Pass was not assigned to enabled or disabled: %s."), Passes[PassIndex].Name);
  68. }
  69. #endif
  70. bool bFirstPass = true;
  71. for (int32 PassIndex = 0; PassIndex < PassCountMax; ++PassIndex)
  72. {
  73. if (Passes[PassIndex].bEnabled)
  74. {
  75. if (bFirstPass)
  76. {
  77. #if RDG_ENABLE_DEBUG
  78. NextPass = (EPass)PassIndex;
  79. #endif
  80. bFirstPass = false;
  81. }
  82. LastPass = (EPass)PassIndex;
  83. }
  84. }
  85. }
  86. FAfterPassCallbackDelegateArray& GetAfterPassCallbacks(EPass Pass);
  87. private:
  88. static const int32 PassCountMax = (int32)EPass::MAX;
  89. struct FPassInfo
  90. {
  91. #if RDG_ENABLE_DEBUG
  92. const TCHAR* Name = nullptr;
  93. bool bAssigned = false;
  94. bool bAccepted = false;
  95. #endif
  96. bool bEnabled = false;
  97. };
  98. FScreenPassRenderTarget OverrideOutput;
  99. TStaticArray<FPassInfo, PassCountMax> Passes;
  100. TStaticArray<FAfterPassCallbackDelegateArray, PassCountMax> AfterPass;
  101. EPass LastPass = EPass::MAX;
  102. #if RDG_ENABLE_DEBUG
  103. EPass NextPass = EPass(0);
  104. bool bFinalized = false;
  105. #endif
  106. };

通过TOverridePassSequence可以方便地实现、管理、执行一组有序的后处理效果。但也需要注意以下几点:

  • PassSequence在处理完通道开启和关闭之后,需要手动调用一次Finalize,否则开发者模式下会报错。

  • TOverridePassSequence需要显示开启和关闭指定通道,如果开启或关闭的通道和实际加入的Pass不一致,则会报错。下面详细地按某些情形讨论PassSequence是否会产生报错(开发模式下):

    • 通道A被开启,没有向GraphBuilder添加Pass,没有调用AcceptOverrideIfLastPass,会报错。
    • 通道A被开启,向GraphBuilder添加了Pass,没有调用AcceptOverrideIfLastPass,会报错。
    • 通道A被开启,没有向GraphBuilder添加Pass,有调用AcceptOverrideIfLastPass,不会报错。PassSequence无法察觉到这种情况的异常!!
    • 通道A被关闭,向GraphBuilder添加了Pass,有调用AcceptOverrideIfLastPass,会报错。
    • 通道A被关闭,向GraphBuilder添加了Pass,没有调用AcceptOverrideIfLastPass,不会报错。PassSequence无法察觉到这种情况的异常!!
    • 如果通道A和B都被开启,但B在A之前向GraphBuilder添加了Pass并调用AcceptOverrideIfLastPass,会报错。

    举个具体的例子,有以下代码:

    1. // 关闭通道序列的FXAA.
    2. PassSequence.SetEnabled(EPass::FXAA, false);
    3. (......)
    4. // 构造FXAA输入参数
    5. FFXAAInputs PassInputs;
    6. // 调用Pass接受.
    7. PassSequence.AcceptOverrideIfLastPass(EPass::FXAA, PassInputs.OverrideOutput);
    8. PassInputs.SceneColor = SceneColor;
    9. PassInputs.Quality = GetFXAAQuality();
    10. // 加入FXAA通道.
    11. SceneColor = AddFXAAPass(GraphBuilder, View, PassInputs);

    由于以上代码中已经关闭了PassSequence的FXAA通道,但又尝试进行对其添加通道并调用AcceptOverrideIfLastPass,则开发者模式下会报以下错误:

    意思是说Pass没有按照开启的Pass顺序被接受,是在AcceptOverrideIfLastPass内部触发的。

BlendableLocation就是后处理材质的混合位置,它的定义如下:

  1. enum EBlendableLocation
  2. {
  3. // 色调映射之后.
  4. BL_AfterTonemapping,
  5. // 色调映射之前.
  6. BL_BeforeTonemapping,
  7. // 半透明组合之前.
  8. BL_BeforeTranslucency,
  9. // 替换掉色调映射.
  10. BL_ReplacingTonemapper,
  11. // SSR输入.
  12. BL_SSRInput,
  13. BL_MAX,
  14. };

EBlendableLocation可在材质编辑器的属性面板中指定:

默认的混合阶段是在色调映射之后(After Tonemapping),但是,我们可以改变混合位置来实现不同的效果。比如,我们的后处理效果需要用到色调映射之前的场景颜色,那么就需要将混合位置改成Before Tonemapping;如果需要自定义色调映射算法,以代替UE的默认色调映射效果,那么可以改成Replacing the Tonemapper;如果我们的后处理效果不希望影响透明物体,则可以改成Before Translucency;如果想实现自定义的SSR算法,则可以改成SSR Input。

为了阐明BlendableLocation在后处理管线中的作用及处理过程,抽取其相关的类型、代码和步骤:

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessMaterial.h
  2. using FPostProcessMaterialChain = TArray<const UMaterialInterface*, TInlineAllocator<10>>;
  3. FPostProcessMaterialChain GetPostProcessMaterialChain(const FViewInfo& View, EBlendableLocation Location);
  4. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessMaterial.cpp
  5. FPostProcessMaterialChain GetPostProcessMaterialChain(const FViewInfo& View, EBlendableLocation Location)
  6. {
  7. if (!IsPostProcessMaterialsEnabledForView(View))
  8. {
  9. return {};
  10. }
  11. const FSceneViewFamily& ViewFamily = *View.Family;
  12. TArray<FPostProcessMaterialNode, TInlineAllocator<10>> Nodes;
  13. FBlendableEntry* Iterator = nullptr;
  14. (......)
  15. // 遍历视图的后处理设置, 获取所有后处理材质节点. 注意, 这里的迭代器已经指明了Location, 意味着添加到Nodes的都是在Location的材质.
  16. while (FPostProcessMaterialNode* Data = IteratePostProcessMaterialNodes(View.FinalPostProcessSettings, Location, Iterator))
  17. {
  18. check(Data->GetMaterialInterface());
  19. Nodes.Add(*Data);
  20. }
  21. if (!Nodes.Num())
  22. {
  23. return {};
  24. }
  25. // 按优先级排序.
  26. ::Sort(Nodes.GetData(), Nodes.Num(), FPostProcessMaterialNode::FCompare());
  27. FPostProcessMaterialChain OutputChain;
  28. OutputChain.Reserve(Nodes.Num());
  29. // 添加材质到输出列表.
  30. for (const FPostProcessMaterialNode& Node : Nodes)
  31. {
  32. OutputChain.Add(Node.GetMaterialInterface());
  33. }
  34. return OutputChain;
  35. }
  36. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessing.cpp
  37. void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
  38. {
  39. (......)
  40. const FPostProcessMaterialChain PostProcessMaterialAfterTonemappingChain = GetPostProcessMaterialChain(View, BL_AfterTonemapping);
  41. (......)
  42. // 后处理材质链 - Before Translucency
  43. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTranslucency);
  44. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
  45. (......)
  46. // 组合半透明纹理到场景颜色纹理中.
  47. LocalSceneColorTexture = AddSeparateTranslucencyCompositionPass(GraphBuilder, View, SceneColor.Texture, *Inputs.SeparateTranslucencyTextures);
  48. (......)
  49. // 后处理材质链 - Before Tonemapping
  50. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_BeforeTonemapping);
  51. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, GetPostProcessMaterialInputs(SceneColor), MaterialChain);
  52. (......)
  53. // 后处理材质链 - SSR Input
  54. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_SSRInput);
  55. GraphBuilder.QueueTextureExtraction(PassOutput.Texture, &View.ViewState->PrevFrameViewInfo.CustomSSRInput);
  56. (......)
  57. // 色调映射
  58. if (PassSequence.IsEnabled(EPass::Tonemap))
  59. {
  60. const FPostProcessMaterialChain MaterialChain = GetPostProcessMaterialChain(View, BL_ReplacingTonemapper);
  61. // 如果存在需要替换UE默认色调映射的材质, 则执行之.
  62. if (MaterialChain.Num())
  63. {
  64. SceneColor = AddPostProcessMaterialPass(GraphBuilder, View, PassInputs, HighestPriorityMaterial);
  65. }
  66. // 不存在需要替换UE默认色调映射的材质, 执行UE默认的色调映射.
  67. else
  68. {
  69. SceneColor = AddTonemapPass(GraphBuilder, View, PassInputs);
  70. }
  71. }
  72. // 后处理材质链 - After Tonemapping
  73. SceneColor = AddPostProcessMaterialChain(GraphBuilder, View, PassInputs, PostProcessMaterialAfterTonemappingChain);
  74. (......)
  75. }

由以上代码可知,BlendableLocation名符其实,单看Location的字面意思就已经知道它的运行位置。其中最重要的是色调映射,在之前、之中、之后都可以自定义后处理材质,为引擎可扩展性添砖加瓦。

PostProcessMaterial就是处理和渲染BlendableLocation的后处理材质类,其定义和相关类型如下:

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessMaterial.h
  2. // 后处理材质输入槽.
  3. enum class EPostProcessMaterialInput : uint32
  4. {
  5. SceneColor = 0, // 场景颜色, 总是激活(可用)状态. 来自上一个后处理的输出.
  6. SeparateTranslucency = 1, // 透明纹理, 总是激活状态.
  7. CombinedBloom = 2, // 组合的泛光.
  8. // 仅用于可视化.
  9. PreTonemapHDRColor = 2,
  10. PostTonemapHDRColor = 3,
  11. // 速度.
  12. Velocity = 4
  13. };
  14. // 后处理材质Uniform Buffer.
  15. BEGIN_SHADER_PARAMETER_STRUCT(FPostProcessMaterialParameters, )
  16. SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View)
  17. SHADER_PARAMETER_STRUCT_INCLUDE(FSceneTextureShaderParameters, SceneTextures)
  18. SHADER_PARAMETER_STRUCT(FScreenPassTextureViewportParameters, PostProcessOutput)
  19. SHADER_PARAMETER_STRUCT_ARRAY(FScreenPassTextureInput, PostProcessInput, [kPostProcessMaterialInputCountMax])
  20. SHADER_PARAMETER_SAMPLER(SamplerState, PostProcessInput_BilinearSampler)
  21. SHADER_PARAMETER_RDG_TEXTURE(Texture2D, MobileCustomStencilTexture)
  22. SHADER_PARAMETER_SAMPLER(SamplerState, MobileCustomStencilTextureSampler)
  23. SHADER_PARAMETER_RDG_TEXTURE(Texture2D, EyeAdaptationTexture)
  24. SHADER_PARAMETER_SRV(Buffer<float4>, EyeAdaptationBuffer)
  25. SHADER_PARAMETER(int32, MobileStencilValueRef)
  26. SHADER_PARAMETER(uint32, bFlipYAxis)
  27. SHADER_PARAMETER(uint32, bMetalMSAAHDRDecode)
  28. RENDER_TARGET_BINDING_SLOTS()
  29. END_SHADER_PARAMETER_STRUCT()
  30. // 后处理材质输入.
  31. struct FPostProcessMaterialInputs
  32. {
  33. inline void SetInput(EPostProcessMaterialInput Input, FScreenPassTexture Texture)
  34. {
  35. Textures[(uint32)Input] = Texture;
  36. }
  37. inline FScreenPassTexture GetInput(EPostProcessMaterialInput Input) const
  38. {
  39. return Textures[(uint32)Input];
  40. }
  41. // 验证纹理有效性.
  42. inline void Validate() const
  43. {
  44. ValidateInputExists(EPostProcessMaterialInput::SceneColor);
  45. ValidateInputExists(EPostProcessMaterialInput::SeparateTranslucency);
  46. // Either override output format is valid or the override output texture is; not both.
  47. if (OutputFormat != PF_Unknown)
  48. {
  49. check(OverrideOutput.Texture == nullptr);
  50. }
  51. if (OverrideOutput.Texture)
  52. {
  53. check(OutputFormat == PF_Unknown);
  54. }
  55. check(SceneTextures.SceneTextures || SceneTextures.MobileSceneTextures);
  56. }
  57. inline void ValidateInputExists(EPostProcessMaterialInput Input) const
  58. {
  59. const FScreenPassTexture Texture = GetInput(EPostProcessMaterialInput::SceneColor);
  60. check(Texture.IsValid());
  61. }
  62. // 可选的, 渲染到指定的RT. 如果没有, 则新的纹理被创建.
  63. FScreenPassRenderTarget OverrideOutput;
  64. // 纹理列表.
  65. TStaticArray<FScreenPassTexture, kPostProcessMaterialInputCountMax> Textures;
  66. // 输出RT格式.
  67. EPixelFormat OutputFormat = PF_Unknown;
  68. // 自定义深度纹理.
  69. FRDGTextureRef CustomDepthTexture = nullptr;
  70. // 场景的GBuffer.
  71. FSceneTextureShaderParameters SceneTextures;
  72. // 是否翻转Y轴.
  73. bool bFlipYAxis = false;
  74. // 是否允许输入的场景颜色作为输出.
  75. bool bAllowSceneColorInputAsOutput = true;
  76. // Metal MSAA特殊标记.
  77. bool bMetalMSAAHDRDecode = false;
  78. };
  79. // 增加后处理材质Pass.
  80. FScreenPassTexture AddPostProcessMaterialPass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessMaterialInputs& Inputs, const UMaterialInterface* MaterialInterface);
  81. // 增加后处理材质链.
  82. FScreenPassTexture AddPostProcessMaterialChain(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessMaterialInputs& Inputs, const FPostProcessMaterialChain& MaterialChain);

下面继续分析在后处理管线需要调用到的AddPostProcessMaterialChain

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessMaterial.cpp
  2. FScreenPassTexture AddPostProcessMaterialChain(
  3. FRDGBuilder& GraphBuilder,
  4. const FViewInfo& View,
  5. const FPostProcessMaterialInputs& InputsTemplate,
  6. const FPostProcessMaterialChain& Materials)
  7. {
  8. // 初始化输出为场景颜色的输入.
  9. FScreenPassTexture Outputs = InputsTemplate.GetInput(EPostProcessMaterialInput::SceneColor);
  10. (......)
  11. // 遍历材质链, 给每个材质添加一个通道.
  12. for (const UMaterialInterface* MaterialInterface : Materials)
  13. {
  14. FPostProcessMaterialInputs Inputs = InputsTemplate;
  15. Inputs.SetInput(EPostProcessMaterialInput::SceneColor, Outputs);
  16. (......)
  17. // 如果不是最后一个材质, 则不应用输出覆盖.
  18. if (MaterialInterface != Materials.Last())
  19. {
  20. Inputs.OverrideOutput = FScreenPassRenderTarget();
  21. Inputs.bFlipYAxis = false;
  22. }
  23. // 增加单个后处理材质通道. (见后面分析)
  24. Outputs = AddPostProcessMaterialPass(GraphBuilder, View, Inputs, MaterialInterface);
  25. }
  26. return Outputs;
  27. }
  28. // 增加单个后处理材质通道.
  29. FScreenPassTexture AddPostProcessMaterialPass(
  30. FRDGBuilder& GraphBuilder,
  31. const FViewInfo& View,
  32. const FPostProcessMaterialInputs& Inputs,
  33. const UMaterialInterface* MaterialInterface)
  34. {
  35. // 验证输入有效性.
  36. Inputs.Validate();
  37. // 初始化输入数据.
  38. const FScreenPassTexture SceneColor = Inputs.GetInput(EPostProcessMaterialInput::SceneColor);
  39. const ERHIFeatureLevel::Type FeatureLevel = View.GetFeatureLevel();
  40. const FMaterial* Material = nullptr;
  41. const FMaterialRenderProxy* MaterialRenderProxy = nullptr;
  42. const FMaterialShaderMap* MaterialShaderMap = nullptr;
  43. GetMaterialInfo(MaterialInterface, FeatureLevel, Inputs.OutputFormat, Material, MaterialRenderProxy, MaterialShaderMap);
  44. FRHIDepthStencilState* DefaultDepthStencilState = FScreenPassPipelineState::FDefaultDepthStencilState::GetRHI();
  45. FRHIDepthStencilState* DepthStencilState = DefaultDepthStencilState;
  46. FRDGTextureRef DepthStencilTexture = nullptr;
  47. // Allocate custom depth stencil texture(s) and depth stencil state.
  48. const ECustomDepthPolicy CustomStencilPolicy = GetMaterialCustomDepthPolicy(Material, FeatureLevel);
  49. if (CustomStencilPolicy == ECustomDepthPolicy::Enabled)
  50. {
  51. check(Inputs.CustomDepthTexture);
  52. DepthStencilTexture = Inputs.CustomDepthTexture;
  53. DepthStencilState = GetMaterialStencilState(Material);
  54. }
  55. // 混合状态.
  56. FRHIBlendState* DefaultBlendState = FScreenPassPipelineState::FDefaultBlendState::GetRHI();
  57. FRHIBlendState* BlendState = DefaultBlendState;
  58. if (IsMaterialBlendEnabled(Material))
  59. {
  60. BlendState = GetMaterialBlendState(Material);
  61. }
  62. // 处理各种标记.
  63. const bool bCompositeWithInput = DepthStencilState != DefaultDepthStencilState || BlendState != DefaultBlendState;
  64. const bool bPrimeOutputColor = bCompositeWithInput || !View.IsFirstInFamily();
  65. const bool bBackbufferWithDepthStencil = (DepthStencilTexture != nullptr && !GRHISupportsBackBufferWithCustomDepthStencil && Inputs.OverrideOutput.IsValid());
  66. const bool bCompositeWithInputAndFlipY = bCompositeWithInput && Inputs.bFlipYAxis;
  67. const bool bCompositeWithInputAndDecode = Inputs.bMetalMSAAHDRDecode && bCompositeWithInput;
  68. const bool bForceIntermediateTarget = bBackbufferWithDepthStencil || bCompositeWithInputAndFlipY || bCompositeWithInputAndDecode;
  69. // 渲染输出.
  70. FScreenPassRenderTarget Output = Inputs.OverrideOutput;
  71. // 将场景颜色作为输出.
  72. if (!Output.IsValid() && !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0) && bPrimeOutputColor && !bForceIntermediateTarget && Inputs.bAllowSceneColorInputAsOutput)
  73. {
  74. Output = FScreenPassRenderTarget(SceneColor, ERenderTargetLoadAction::ELoad);
  75. }
  76. else
  77. {
  78. // 创新新的纹理作为输出.
  79. if (!Output.IsValid() || bForceIntermediateTarget)
  80. {
  81. FRDGTextureDesc OutputDesc = SceneColor.Texture->Desc;
  82. OutputDesc.Reset();
  83. if (Inputs.OutputFormat != PF_Unknown)
  84. {
  85. OutputDesc.Format = Inputs.OutputFormat;
  86. }
  87. OutputDesc.ClearValue = FClearValueBinding(FLinearColor::Black);
  88. OutputDesc.Flags |= GFastVRamConfig.PostProcessMaterial;
  89. Output = FScreenPassRenderTarget(GraphBuilder.CreateTexture(OutputDesc, TEXT("PostProcessMaterial")), SceneColor.ViewRect, View.GetOverwriteLoadAction());
  90. }
  91. if (bPrimeOutputColor || bForceIntermediateTarget)
  92. {
  93. // Copy existing contents to new output and use load-action to preserve untouched pixels.
  94. if (Inputs.bMetalMSAAHDRDecode)
  95. {
  96. AddMobileMSAADecodeAndDrawTexturePass(GraphBuilder, View, SceneColor, Output);
  97. }
  98. else
  99. {
  100. AddDrawTexturePass(GraphBuilder, View, SceneColor, Output);
  101. }
  102. Output.LoadAction = ERenderTargetLoadAction::ELoad;
  103. }
  104. }
  105. const FScreenPassTextureViewport SceneColorViewport(SceneColor);
  106. const FScreenPassTextureViewport OutputViewport(Output);
  107. RDG_EVENT_SCOPE(GraphBuilder, "PostProcessMaterial %dx%d Material=%s", SceneColorViewport.Rect.Width(), SceneColorViewport.Rect.Height(), *Material->GetFriendlyName());
  108. const uint32 MaterialStencilRef = Material->GetStencilRefValue();
  109. const bool bMobilePlatform = IsMobilePlatform(View.GetShaderPlatform());
  110. // 处理后处理材质参数.
  111. FPostProcessMaterialParameters* PostProcessMaterialParameters = GraphBuilder.AllocParameters<FPostProcessMaterialParameters>();
  112. PostProcessMaterialParameters->SceneTextures = Inputs.SceneTextures;
  113. PostProcessMaterialParameters->View = View.ViewUniformBuffer;
  114. if (bMobilePlatform)
  115. {
  116. PostProcessMaterialParameters->EyeAdaptationBuffer = GetEyeAdaptationBuffer(View);
  117. }
  118. else
  119. {
  120. PostProcessMaterialParameters->EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
  121. }
  122. PostProcessMaterialParameters->PostProcessOutput = GetScreenPassTextureViewportParameters(OutputViewport);
  123. PostProcessMaterialParameters->MobileCustomStencilTexture = DepthStencilTexture;
  124. PostProcessMaterialParameters->MobileCustomStencilTextureSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
  125. PostProcessMaterialParameters->MobileStencilValueRef = MaterialStencilRef;
  126. PostProcessMaterialParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
  127. PostProcessMaterialParameters->bMetalMSAAHDRDecode = Inputs.bMetalMSAAHDRDecode ? 1 : 0;
  128. // 处理深度模板缓冲.
  129. if (DepthStencilTexture && !bMobilePlatform)
  130. {
  131. PostProcessMaterialParameters->RenderTargets.DepthStencil = FDepthStencilBinding(
  132. DepthStencilTexture,
  133. ERenderTargetLoadAction::ELoad,
  134. ERenderTargetLoadAction::ELoad,
  135. FExclusiveDepthStencil::DepthRead_StencilRead);
  136. }
  137. else if (!DepthStencilTexture && bMobilePlatform && Material->IsStencilTestEnabled())
  138. {
  139. PostProcessMaterialParameters->MobileCustomStencilTexture = GSystemTextures.GetBlackDummy(GraphBuilder);
  140. switch (Material->GetStencilCompare())
  141. {
  142. case EMaterialStencilCompare::MSC_Less:
  143. PostProcessMaterialParameters->MobileStencilValueRef = -1;
  144. break;
  145. case EMaterialStencilCompare::MSC_LessEqual:
  146. case EMaterialStencilCompare::MSC_GreaterEqual:
  147. case EMaterialStencilCompare::MSC_Equal:
  148. PostProcessMaterialParameters->MobileStencilValueRef = 0;
  149. break;
  150. case EMaterialStencilCompare::MSC_Greater:
  151. case EMaterialStencilCompare::MSC_NotEqual:
  152. PostProcessMaterialParameters->MobileStencilValueRef = 1;
  153. break;
  154. case EMaterialStencilCompare::MSC_Always:
  155. PostProcessMaterialParameters->MobileStencilValueRef = 256;
  156. break;
  157. default:
  158. break;
  159. }
  160. }
  161. // 系统纹理和采样器.
  162. PostProcessMaterialParameters->PostProcessInput_BilinearSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();;
  163. const FScreenPassTexture BlackDummy(GSystemTextures.GetBlackDummy(GraphBuilder));
  164. GraphBuilder.RemoveUnusedTextureWarning(BlackDummy.Texture);
  165. FRHISamplerState* PointClampSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
  166. // 处理材质后处理参数的输入槽(PostProcessInput0~PostProcessInput4).
  167. for (uint32 InputIndex = 0; InputIndex < kPostProcessMaterialInputCountMax; ++InputIndex)
  168. {
  169. FScreenPassTexture Input = Inputs.GetInput((EPostProcessMaterialInput)InputIndex);
  170. // 如果指定插槽的输入纹理不存在或后处理材质没有用到该插槽,则将输入纹理置空。
  171. if (!Input.Texture || !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0 + InputIndex))
  172. {
  173. Input = BlackDummy;
  174. }
  175. PostProcessMaterialParameters->PostProcessInput[InputIndex] = GetScreenPassTextureInput(Input, PointClampSampler);
  176. }
  177. const bool bIsMobile = FeatureLevel <= ERHIFeatureLevel::ES3_1;
  178. PostProcessMaterialParameters->bFlipYAxis = Inputs.bFlipYAxis && !bForceIntermediateTarget;
  179. // 处理后处理材质的VS和PS.
  180. FPostProcessMaterialShader::FPermutationDomain PermutationVector;
  181. PermutationVector.Set<FPostProcessMaterialShader::FMobileDimension>(bIsMobile);
  182. TShaderRef<FPostProcessMaterialVS> VertexShader = MaterialShaderMap->GetShader<FPostProcessMaterialVS>(PermutationVector);
  183. TShaderRef<FPostProcessMaterialPS> PixelShader = MaterialShaderMap->GetShader<FPostProcessMaterialPS>(PermutationVector);
  184. ClearUnusedGraphResources(VertexShader, PixelShader, PostProcessMaterialParameters);
  185. EScreenPassDrawFlags ScreenPassFlags = EScreenPassDrawFlags::AllowHMDHiddenAreaMask;
  186. if (PostProcessMaterialParameters->bFlipYAxis)
  187. {
  188. ScreenPassFlags |= EScreenPassDrawFlags::FlipYAxis;
  189. }
  190. // 增加全屏幕绘制.
  191. AddDrawScreenPass(
  192. GraphBuilder,
  193. RDG_EVENT_NAME("PostProcessMaterial"),
  194. View,
  195. OutputViewport,
  196. SceneColorViewport,
  197. FScreenPassPipelineState(VertexShader, PixelShader, BlendState, DepthStencilState),
  198. PostProcessMaterialParameters,
  199. ScreenPassFlags,
  200. [&View, VertexShader, PixelShader, MaterialRenderProxy, PostProcessMaterialParameters, MaterialStencilRef](FRHICommandListImmediate& RHICmdList)
  201. {
  202. FPostProcessMaterialVS::SetParameters(RHICmdList, VertexShader, View, MaterialRenderProxy, *PostProcessMaterialParameters);
  203. FPostProcessMaterialPS::SetParameters(RHICmdList, PixelShader, View, MaterialRenderProxy, *PostProcessMaterialParameters);
  204. RHICmdList.SetStencilRef(MaterialStencilRef);
  205. });
  206. // 处理翻转和输出覆盖.
  207. if (bForceIntermediateTarget && !bCompositeWithInputAndDecode)
  208. {
  209. if (!Inputs.bFlipYAxis)
  210. {
  211. // We shouldn\'t get here unless we had an override target.
  212. check(Inputs.OverrideOutput.IsValid());
  213. AddDrawTexturePass(GraphBuilder, View, Output.Texture, Inputs.OverrideOutput.Texture);
  214. Output = Inputs.OverrideOutput;
  215. }
  216. else
  217. {
  218. FScreenPassRenderTarget TempTarget = Output;
  219. if (Inputs.OverrideOutput.IsValid())
  220. {
  221. Output = Inputs.OverrideOutput;
  222. }
  223. else
  224. {
  225. Output = FScreenPassRenderTarget(SceneColor, ERenderTargetLoadAction::ENoAction);
  226. }
  227. AddCopyAndFlipTexturePass(GraphBuilder, View, TempTarget.Texture, Output.Texture);
  228. }
  229. }
  230. return MoveTemp(Output);
  231. }

需要注意,在后处理材质中,处理的PostProcessInput0~PostProcessInput4对应着材质编辑器SceneTexture节点的PostProcessInput(下图)。

除了PostProcessInput0被SceneColor占用,其它插槽可以搭载自定义的纹理,以便在材质编辑器中访问。示例:

  1. for (uint32 InputIndex = 0; InputIndex < kPostProcessMaterialInputCountMax; ++InputIndex)
  2. {
  3. FScreenPassTexture Input = Inputs.GetInput((EPostProcessMaterialInput)InputIndex);
  4. if (!Input.Texture || !MaterialShaderMap->UsesSceneTexture(PPI_PostProcessInput0 + InputIndex))
  5. {
  6. // 如果自定义的输入纹理有效, 则放到插槽4.
  7. if(MyInput.Texture && InputIndex == 4)
  8. {
  9. Input = MyInput;
  10. }
  11. else
  12. {
  13. Input = BlackDummy;
  14. }
  15. }
  16. PostProcessMaterialParameters->PostProcessInput[InputIndex] = GetScreenPassTextureInput(Input, PointClampSampler);
  17. }

这样就实现了搭载自定义的纹理。当然这种方式带点侵入性,如果需要更加优雅的方式,则需要扩展EPostProcessMaterialInput并修改相关代码。

 

本章将阐述UE内置的常见的后处理技术,包含部分屏幕空间的渲染技术。

对图形学或PBR有所了解的同学应该清楚,现代图形引擎中存在线性空间的渲染管线和传统的sRGB空间的渲染管线,它们之间的区别如下图所示:

上:Gamma空间渲染管线。在渲染前后不会对纹理颜色执行线性转换。

下:线性空间渲染管线。在shader前期去除了Gamma校正,在shader后期恢复Gamma校正。

为什么线性空间的渲染管线要在前期和后期分别去掉又加回Gamma校正呢?

这其实是历史遗留的问题。

早期的电视机采用CRT显像管,由于电压的强度与人眼感知的亮度不成正比,成指数为0.45的指数级曲线。为了解决这个问题,就引入指数为2.2的Gamma校正,强行提升显示图像的数据让电压与人眼感知亮度成线性比例。久而久之,之后的很多硬件设备、色彩空间(如sRGB)、文件格式(如jpeg,png等)、DDC软件(如ps)默认都执行了Gamma校正,并且一直沿用至今。

虽然当今的液晶显示器不再需要Gamma校正,但为了兼容已经广泛存在的有着Gamma校正的色彩空间及标准,也不得不保留Gamma校正。所以在shader后期还是要恢复Gamma校正,以便图片能够在显示设备正常显示。

两个不同的渲染管线,对最终的光照结果会产生较大的差异:

上半部分是线性空间,在不同光强度的反应下,能够得到更加物理正确的结果;下半部分是Gamma空间的计算,对光强度的反应过于强烈,得到过暗或过曝的画面。

Gamma和线性空间的颜色是可以互相转换的,利用简单的指数运算即可:

\[c\’= f(c) = c^n = pow(c, n)
\]

其中,\(c\)是输入颜色,\(c\’\)是输出颜色,\(n\)是Gamma校正指数。下图中,分别是\(n\)分别取0.45, 1.0, 2.2的Gamma校正曲线图:

1.0是线性空间,输入输出值一样;0.45和2.2是Gamma曲线,处于此空间的色彩将被提亮或压暗,并且0.45⋅2.2≈1.00.45⋅2.2≈1.0,以保证两次Gamma校正之后能够恢复到线性空间:

\[c\’ = f_{gamma2.2}(f_{gamma0.45}(c)) = ({c^{0.45}})^{2.2} = c^{0.45 \cdot 2.2} = c^{0.99} \approx c
\]

默认情况下,UE的渲染管线已经是线性空间的,意味着所有纹理和颜色在shader计算过程中需要在保持在线性空间状态,最后呈现到屏幕前又需要经过Gamma校正转换成sRGB(如果显示器支持HDR或线性空间,则可以不需要)。

通常情况下,UE在导入原始的纹理资源时,已经将纹理转换成了线性空间:

导入sRGB的图片后,UE默认将其转换成线性空间。

如果导入的图片已经是线性空间的,则需要去掉sRGB的勾选。如果想在材质编辑器中动态地转换Gamma校正,则可以使用类似以下的材质节点:

当然,绝大多数情况下,在材质编辑器中,我们不需要关心Gamma和线性空间的转换,因为UE已经在背后为我们进行了处理。通常情况下,Gamma恢复会和色调映射一起处理。

HDR(High Dynamic Range)即高动态范围,拥有更高的对比度和更广的色域,可分为基于软件的后处理HDR和基于硬件的显示设备HDR。

与HDR相对立的是LDR(Low Dynamic Range,低动态范围)。

由于UE等现代引擎都已经支持了线性空间的渲染管线,由此在光照计算过程中可能产生大于普通白色(颜色值为1.0)的成百上千倍的亮度,此时,如果不用某些曲线将其压缩到合理的值域,在很多设备都会显示异常。

色调映射(Tone Mapping)就是将过高的颜色值调整到和显示设备兼容的色彩范围。

UE支持基于物理的色调映射技术,被称为ACES Tonemapper。ACES Tonemapper采用以下的曲线来映射线性空间的颜色:

实现代码如下:

  1. float3 ACESToneMapping(float3 color, float adapted_lum)
  2. {
  3. const float A = 2.51f;
  4. const float B = 0.03f;
  5. const float C = 2.43f;
  6. const float D = 0.59f;
  7. const float E = 0.14f;
  8. color *= adapted_lum;
  9. return (color * (A * color + B)) / (color * (C * color + D) + E);
  10. }

相较旧的色调映射器,ACES Tonemapper的渲染结果更加接近物理真实:

上图是UE旧的色调映射,下图采用了ACES的色调映射。可见新的色调映射在自发光率足够大时,颜色开始变白,更符合物理真实。

UE的实际实现代码远上面介绍的复杂,对应的C++实现代码如下:

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessTonemap.cpp
  2. FScreenPassTexture AddTonemapPass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FTonemapInputs& Inputs)
  3. {
  4. const FSceneViewFamily& ViewFamily = *(View.Family);
  5. const FPostProcessSettings& PostProcessSettings = View.FinalPostProcessSettings;
  6. const bool bIsEyeAdaptationResource = (View.GetFeatureLevel() >= ERHIFeatureLevel::SM5) ? Inputs.EyeAdaptationTexture != nullptr : Inputs.EyeAdaptationBuffer != nullptr;
  7. const bool bEyeAdaptation = ViewFamily.EngineShowFlags.EyeAdaptation && bIsEyeAdaptationResource;
  8. const FScreenPassTextureViewport SceneColorViewport(Inputs.SceneColor);
  9. FScreenPassRenderTarget Output = Inputs.OverrideOutput;
  10. // 创建输出纹理.
  11. if (!Output.IsValid())
  12. {
  13. FRDGTextureDesc OutputDesc = Inputs.SceneColor.Texture->Desc;
  14. OutputDesc.Reset();
  15. OutputDesc.Flags |= View.bUseComputePasses ? TexCreate_UAV : TexCreate_RenderTargetable;
  16. OutputDesc.Flags |= GFastVRamConfig.Tonemap;
  17. // RGB is the color in LDR, A is the luminance for PostprocessAA
  18. OutputDesc.Format = Inputs.bOutputInHDR ? GRHIHDRDisplayOutputFormat : PF_B8G8R8A8;
  19. OutputDesc.ClearValue = FClearValueBinding(FLinearColor(0, 0, 0, 0));
  20. const FTonemapperOutputDeviceParameters OutputDeviceParameters = GetTonemapperOutputDeviceParameters(*View.Family);
  21. const ETonemapperOutputDevice OutputDevice = static_cast<ETonemapperOutputDevice>(OutputDeviceParameters.OutputDevice);
  22. if (OutputDevice == ETonemapperOutputDevice::LinearEXR)
  23. {
  24. OutputDesc.Format = PF_A32B32G32R32F;
  25. }
  26. if (OutputDevice == ETonemapperOutputDevice::LinearNoToneCurve || OutputDevice == ETonemapperOutputDevice::LinearWithToneCurve)
  27. {
  28. OutputDesc.Format = PF_FloatRGBA;
  29. }
  30. Output = FScreenPassRenderTarget(
  31. GraphBuilder.CreateTexture(OutputDesc, TEXT("Tonemap")),
  32. Inputs.SceneColor.ViewRect,
  33. ERenderTargetLoadAction::EClear);
  34. }
  35. const FScreenPassTextureViewport OutputViewport(Output);
  36. FRHITexture* BloomDirtMaskTexture = GBlackTexture->TextureRHI;
  37. if (PostProcessSettings.BloomDirtMask && PostProcessSettings.BloomDirtMask->Resource)
  38. {
  39. BloomDirtMaskTexture = PostProcessSettings.BloomDirtMask->Resource->TextureRHI;
  40. }
  41. // 采样器.
  42. FRHISamplerState* BilinearClampSampler = TStaticSamplerState<SF_Bilinear, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
  43. FRHISamplerState* PointClampSampler = TStaticSamplerState<SF_Point, AM_Clamp, AM_Clamp, AM_Clamp>::GetRHI();
  44. const float DefaultEyeExposure = bEyeAdaptation ? 0.0f : GetEyeAdaptationFixedExposure(View);
  45. const float SharpenDiv6 = FMath::Clamp(CVarTonemapperSharpen.GetValueOnRenderThread(), 0.0f, 10.0f) / 6.0f;
  46. // 处理色差参数.
  47. FVector4 ChromaticAberrationParams;
  48. {
  49. // 处理场景颜色边缘
  50. // 从百分比到分数
  51. float Offset = 0.0f;
  52. float StartOffset = 0.0f;
  53. float Multiplier = 1.0f;
  54. if (PostProcessSettings.ChromaticAberrationStartOffset < 1.0f - KINDA_SMALL_NUMBER)
  55. {
  56. Offset = PostProcessSettings.SceneFringeIntensity * 0.01f;
  57. StartOffset = PostProcessSettings.ChromaticAberrationStartOffset;
  58. Multiplier = 1.0f / (1.0f - StartOffset);
  59. }
  60. // 基色的波长,单位是纳米.
  61. const float PrimaryR = 611.3f;
  62. const float PrimaryG = 549.1f;
  63. const float PrimaryB = 464.3f;
  64. // 简单透镜的色差在波长上大致是线性的.
  65. float ScaleR = 0.007f * (PrimaryR - PrimaryB);
  66. float ScaleG = 0.007f * (PrimaryG - PrimaryB);
  67. ChromaticAberrationParams = FVector4(Offset * ScaleR * Multiplier, Offset * ScaleG * Multiplier, StartOffset, 0.f);
  68. }
  69. // 处理色调映射参数.
  70. FTonemapParameters CommonParameters;
  71. CommonParameters.View = View.ViewUniformBuffer;
  72. CommonParameters.FilmGrain = GetFilmGrainParameters(View);
  73. CommonParameters.OutputDevice = GetTonemapperOutputDeviceParameters(ViewFamily);
  74. CommonParameters.Color = GetScreenPassTextureViewportParameters(SceneColorViewport);
  75. if (Inputs.Bloom.Texture)
  76. {
  77. const FScreenPassTextureViewport BloomViewport(Inputs.Bloom);
  78. CommonParameters.Bloom = GetScreenPassTextureViewportParameters(BloomViewport);
  79. CommonParameters.ColorToBloom = GetScreenPassTextureViewportTransform(CommonParameters.Color, CommonParameters.Bloom);
  80. }
  81. CommonParameters.Output = GetScreenPassTextureViewportParameters(OutputViewport);
  82. CommonParameters.ColorTexture = Inputs.SceneColor.Texture;
  83. CommonParameters.BloomTexture = Inputs.Bloom.Texture;
  84. CommonParameters.EyeAdaptationTexture = Inputs.EyeAdaptationTexture;
  85. CommonParameters.ColorGradingLUT = Inputs.ColorGradingTexture;
  86. CommonParameters.BloomDirtMaskTexture = BloomDirtMaskTexture;
  87. CommonParameters.ColorSampler = BilinearClampSampler;
  88. CommonParameters.BloomSampler = BilinearClampSampler;
  89. CommonParameters.ColorGradingLUTSampler = BilinearClampSampler;
  90. CommonParameters.BloomDirtMaskSampler = BilinearClampSampler;
  91. CommonParameters.ColorScale0 = PostProcessSettings.SceneColorTint;
  92. CommonParameters.ColorScale1 = FLinearColor::White * PostProcessSettings.BloomIntensity;
  93. CommonParameters.BloomDirtMaskTint = PostProcessSettings.BloomDirtMaskTint * PostProcessSettings.BloomDirtMaskIntensity;
  94. CommonParameters.ChromaticAberrationParams = ChromaticAberrationParams;
  95. CommonParameters.TonemapperParams = FVector4(PostProcessSettings.VignetteIntensity, SharpenDiv6, 0.0f, 0.0f);
  96. CommonParameters.SwitchVerticalAxis = Inputs.bFlipYAxis;
  97. CommonParameters.DefaultEyeExposure = DefaultEyeExposure;
  98. CommonParameters.EditorNITLevel = EditorNITLevel;
  99. CommonParameters.bOutputInHDR = ViewFamily.bIsHDR;
  100. CommonParameters.LensPrincipalPointOffsetScale = View.LensPrincipalPointOffsetScale;
  101. CommonParameters.LensPrincipalPointOffsetScaleInverse.X = -View.LensPrincipalPointOffsetScale.X / View.LensPrincipalPointOffsetScale.Z;
  102. CommonParameters.LensPrincipalPointOffsetScaleInverse.Y = -View.LensPrincipalPointOffsetScale.Y / View.LensPrincipalPointOffsetScale.W;
  103. CommonParameters.LensPrincipalPointOffsetScaleInverse.Z = 1.0f / View.LensPrincipalPointOffsetScale.Z;
  104. CommonParameters.LensPrincipalPointOffsetScaleInverse.W = 1.0f / View.LensPrincipalPointOffsetScale.W;
  105. CommonParameters.EyeAdaptationBuffer = Inputs.EyeAdaptationBuffer;
  106. // 处理桌面版色调映射的排列.
  107. TonemapperPermutation::FDesktopDomain DesktopPermutationVector;
  108. {
  109. TonemapperPermutation::FCommonDomain CommonDomain = TonemapperPermutation::BuildCommonPermutationDomain(View, Inputs.bGammaOnly, Inputs.bFlipYAxis, Inputs.bMetalMSAAHDRDecode);
  110. DesktopPermutationVector.Set<TonemapperPermutation::FCommonDomain>(CommonDomain);
  111. if (!CommonDomain.Get<TonemapperPermutation::FTonemapperGammaOnlyDim>())
  112. {
  113. // 量化颗粒.
  114. {
  115. static TConsoleVariableData<int32>* CVar = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.Tonemapper.GrainQuantization"));
  116. const int32 Value = CVar->GetValueOnRenderThread();
  117. DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperGrainQuantizationDim>(Value > 0);
  118. }
  119. DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperColorFringeDim>(PostProcessSettings.SceneFringeIntensity > 0.01f);
  120. }
  121. DesktopPermutationVector.Set<TonemapperPermutation::FTonemapperOutputDeviceDim>(ETonemapperOutputDevice(CommonParameters.OutputDevice.OutputDevice));
  122. DesktopPermutationVector = TonemapperPermutation::RemapPermutation(DesktopPermutationVector, View.GetFeatureLevel());
  123. }
  124. const bool bComputePass = (Output.Texture->Desc.Flags & TexCreate_UAV) == TexCreate_UAV ? View.bUseComputePasses : false;
  125. if (bComputePass) // 启用CS.
  126. {
  127. FTonemapCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTonemapCS::FParameters>();
  128. PassParameters->Tonemap = CommonParameters;
  129. PassParameters->RWOutputTexture = GraphBuilder.CreateUAV(Output.Texture);
  130. FTonemapCS::FPermutationDomain PermutationVector;
  131. PermutationVector.Set<TonemapperPermutation::FDesktopDomain>(DesktopPermutationVector);
  132. PermutationVector.Set<TonemapperPermutation::FTonemapperEyeAdaptationDim>(bEyeAdaptation);
  133. TShaderMapRef<FTonemapCS> ComputeShader(View.ShaderMap, PermutationVector);
  134. FComputeShaderUtils::AddPass(
  135. GraphBuilder,
  136. RDG_EVENT_NAME("Tonemap %dx%d (CS GammaOnly=%d)", OutputViewport.Rect.Width(), OutputViewport.Rect.Height(), Inputs.bGammaOnly),
  137. ComputeShader,
  138. PassParameters,
  139. FComputeShaderUtils::GetGroupCount(OutputViewport.Rect.Size(), FIntPoint(GTonemapComputeTileSizeX, GTonemapComputeTileSizeY)));
  140. }
  141. else // 启用PS
  142. {
  143. FTonemapPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTonemapPS::FParameters>();
  144. PassParameters->Tonemap = CommonParameters;
  145. PassParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
  146. FTonemapVS::FPermutationDomain VertexPermutationVector;
  147. VertexPermutationVector.Set<TonemapperPermutation::FTonemapperSwitchAxis>(Inputs.bFlipYAxis);
  148. VertexPermutationVector.Set<TonemapperPermutation::FTonemapperEyeAdaptationDim>(bEyeAdaptation);
  149. TShaderMapRef<FTonemapVS> VertexShader(View.ShaderMap, VertexPermutationVector);
  150. TShaderMapRef<FTonemapPS> PixelShader(View.ShaderMap, DesktopPermutationVector);
  151. const bool bIsStereo = IStereoRendering::IsStereoEyeView(View);
  152. FRHIBlendState* BlendState = Inputs.bWriteAlphaChannel || bIsStereo ? FScreenPassPipelineState::FDefaultBlendState::GetRHI() : TStaticBlendStateWriteMask<CW_RGB>::GetRHI();
  153. FRHIDepthStencilState* DepthStencilState = FScreenPassPipelineState::FDefaultDepthStencilState::GetRHI();
  154. EScreenPassDrawFlags DrawFlags = EScreenPassDrawFlags::AllowHMDHiddenAreaMask;
  155. // 绘制全屏纹理.
  156. AddDrawScreenPass(
  157. GraphBuilder,
  158. RDG_EVENT_NAME("Tonemap %dx%d (PS GammaOnly=%d)", OutputViewport.Rect.Width(), OutputViewport.Rect.Height(), Inputs.bGammaOnly),
  159. View,
  160. OutputViewport,
  161. SceneColorViewport,
  162. FScreenPassPipelineState(VertexShader, PixelShader, BlendState, DepthStencilState),
  163. PassParameters,
  164. DrawFlags,
  165. [VertexShader, PixelShader, PassParameters](FRHICommandList& RHICmdList)
  166. {
  167. SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), PassParameters->Tonemap);
  168. SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
  169. });
  170. }
  171. return MoveTemp(Output);
  172. }

由于在笔者的PC电脑上,运行的是PS分支的色调映射,下面就分析其使用的PS及相关代码:

  1. // Engine\Shaders\Private\PostProcessTonemap.usf
  2. // PS入口.
  3. void MainPS(
  4. in noperspective float2 UV : TEXCOORD0,
  5. in noperspective float3 InExposureScaleVignette : TEXCOORD1,
  6. in noperspective float4 GrainUV : TEXCOORD2,
  7. in noperspective float2 ScreenPos : TEXCOORD3,
  8. in noperspective float2 FullViewUV : TEXCOORD4,
  9. float4 SvPosition : SV_POSITION, // after all interpolators
  10. out float4 OutColor : SV_Target0
  11. )
  12. {
  13. OutColor = TonemapCommonPS(UV, InExposureScaleVignette, GrainUV, ScreenPos, FullViewUV, SvPosition);
  14. }

PS主入口会调用TonemapCommonPS,由于TonemapCommonPS存在大量宏定义,影响正常主流程分析,下面直接用RenderDoc截帧得到的简化版代码:

  1. // 注意是RenderDoc截帧得到的简化版代码, 非原版代码.
  2. float4 TonemapCommonPS(
  3. float2 UV,
  4. float3 ExposureScaleVignette,
  5. float4 GrainUV,
  6. float2 ScreenPos,
  7. float2 FullViewUV,
  8. float4 SvPosition
  9. )
  10. {
  11. float4 OutColor = 0;
  12. const float OneOverPreExposure = View_OneOverPreExposure;
  13. float Grain = GrainFromUV(GrainUV.zw);
  14. float2 SceneUV = UV.xy;
  15. // 获取场景颜色
  16. float4 SceneColor = SampleSceneColor(SceneUV);
  17. SceneColor.rgb *= OneOverPreExposure;
  18. float ExposureScale = ExposureScaleVignette.x;
  19. float SharpenMultiplierDiv6 = TonemapperParams.y;
  20. float3 LinearColor = SceneColor.rgb * ColorScale0.rgb;
  21. // Bloom
  22. float2 BloomUV = ColorToBloom_Scale * UV + ColorToBloom_Bias;
  23. BloomUV = clamp(BloomUV, Bloom_UVViewportBilinearMin, Bloom_UVViewportBilinearMax);
  24. float4 CombinedBloom = Texture2DSample(BloomTexture, BloomSampler, BloomUV);
  25. CombinedBloom.rgb *= OneOverPreExposure;
  26. // 暗角参数.
  27. float2 DirtLensUV = ConvertScreenViewportSpaceToLensViewportSpace(ScreenPos) * float2(1.0f, -1.0f);
  28. float3 BloomDirtMaskColor = Texture2DSample(BloomDirtMaskTexture, BloomDirtMaskSampler, DirtLensUV * .5f + .5f).rgb * BloomDirtMaskTint.rgb;
  29. LinearColor += CombinedBloom.rgb * (ColorScale1.rgb + BloomDirtMaskColor);
  30. LinearColor *= ExposureScale;
  31. // 暗角.
  32. LinearColor.rgb *= ComputeVignetteMask( ExposureScaleVignette.yz, TonemapperParams.x );
  33. // LUT
  34. float3 OutDeviceColor = ColorLookupTable(LinearColor);
  35. // 颗粒.
  36. float LuminanceForPostProcessAA = dot(OutDeviceColor, float3 (0.299f, 0.587f, 0.114f));
  37. float GrainQuantization = 1.0/256.0;
  38. float GrainAdd = (Grain * GrainQuantization) + (-0.5 * GrainQuantization);
  39. OutDeviceColor.rgb += GrainAdd;
  40. OutColor = float4(OutDeviceColor, saturate(LuminanceForPostProcessAA));
  41. // HDR输出.
  42. [branch]
  43. if(bOutputInHDR)
  44. {
  45. OutColor.rgb = ST2084ToLinear(OutColor.rgb);
  46. OutColor.rgb = OutColor.rgb / EditorNITLevel;
  47. OutColor.rgb = LinearToPostTonemapSpace(OutColor.rgb);
  48. }
  49. return OutColor;
  50. }

色调映射阶段处理和组合了颗粒、暗角、Bloom、曝光、LUT、HDR等处理,不过,这里有点奇怪,为什么没有找到色调映射相关的代码?

结合RenderDoc截帧,可以发现端倪,原来答案就藏在ColorLookupTable,这里的LUT查找不是简单的ColorGrading之类的效果,而是将执行了色调映射。下面进入ColorLookupTable

  1. Texture3D ColorGradingLUT;
  2. SamplerState ColorGradingLUTSampler;
  3. static const float LUTSize = 32;
  4. float3 LinToLog( float3 LinearColor )
  5. {
  6. const float LinearRange = 14;
  7. const float LinearGrey = 0.18;
  8. const float ExposureGrey = 444;
  9. // 使用剥离,“纯对数”公式。参数化的灰点和动态范围覆盖。
  10. float3 LogColor = log2(LinearColor) / LinearRange - log2(LinearGrey) / LinearRange + ExposureGrey / 1023.0;
  11. LogColor = saturate( LogColor );
  12. return LogColor;
  13. }
  14. float3 ColorLookupTable( float3 LinearColor )
  15. {
  16. float3 LUTEncodedColor;
  17. // 线性转Log空间.
  18. LUTEncodedColor = LinToLog( LinearColor + LogToLin( 0 ) );
  19. // 将float转成int.
  20. float3 UVW = LUTEncodedColor * ((LUTSize - 1) / LUTSize) + (0.5f / LUTSize);
  21. // 采样3D的ColorGradingLUT纹理.
  22. float3 OutDeviceColor = Texture3DSample( ColorGradingLUT, ColorGradingLUTSampler, UVW ).rgb;
  23. return OutDeviceColor * 1.05;
  24. }

下面简单分析ColorGradingLUT的生成过程。从截帧数据可以看到ColorGradingLUT是在Tonemap之前的CombineLUT生成的:

它是一个32x32x32的3D纹理,下图是切片0的颜色值(放大8倍):

需要注意的是,ColorGradingLUT会每帧动态生成,根据场景颜色动态调整,不是生成一次之后就缓存起来。下面直接进入其使用的PS代码(RenderDoc截帧的简化版本):

  1. // Engine\Shaders\Private\PostProcessCombineLUTs.usf
  2. // 校正颜色.
  3. float3 ColorCorrect( float3 WorkingColor,
  4. float4 ColorSaturation,
  5. float4 ColorContrast,
  6. float4 ColorGamma,
  7. float4 ColorGain,
  8. float4 ColorOffset )
  9. {
  10. float Luma = dot( WorkingColor, AP1_RGB2Y );
  11. WorkingColor = max( 0, lerp( Luma.xxx, WorkingColor, ColorSaturation.xyz*ColorSaturation.w ) );
  12. WorkingColor = pow( WorkingColor * (1.0 / 0.18), ColorContrast.xyz*ColorContrast.w ) * 0.18;
  13. WorkingColor = pow( WorkingColor, 1.0 / (ColorGamma.xyz*ColorGamma.w) );
  14. WorkingColor = WorkingColor * (ColorGain.xyz * ColorGain.w) + (ColorOffset.xyz + ColorOffset.w);
  15. return WorkingColor;
  16. }
  17. // 对颜色的阴影/中调/高调执行校正.
  18. float3 ColorCorrectAll( float3 WorkingColor )
  19. {
  20. float Luma = dot( WorkingColor, AP1_RGB2Y );
  21. float3 CCColorShadows = ColorCorrect(...);
  22. float CCWeightShadows = 1- smoothstep(0, ColorCorrectionShadowsMax, Luma);
  23. float3 CCColorHighlights = ColorCorrect(...);
  24. float CCWeightHighlights = smoothstep(ColorCorrectionHighlightsMin, 1, Luma);
  25. float3 CCColorMidtones = ColorCorrect...);
  26. float CCWeightMidtones = 1 - CCWeightShadows - CCWeightHighlights;
  27. float3 WorkingColorSMH = CCColorShadows*CCWeightShadows + CCColorMidtones*CCWeightMidtones + CCColorHighlights*CCWeightHighlights;
  28. return WorkingColorSMH;
  29. }
  30. float BlueCorrection;
  31. float ExpandGamut;
  32. float ToneCurveAmount;
  33. float4 CombineLUTsCommon(float2 InUV, uint InLayerIndex)
  34. {
  35. // 计算自然色彩.
  36. float4 Neutral;
  37. {
  38. float2 UV = InUV - float2(0.5f / LUTSize, 0.5f / LUTSize);
  39. Neutral = float4(UV * LUTSize / (LUTSize - 1), InLayerIndex / (LUTSize - 1), 0);
  40. }
  41. float4 OutColor = 0;
  42. // 初始化颜色转换系数.
  43. const float3x3 sRGB_2_AP1 = mul( XYZ_2_AP1_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
  44. const float3x3 AP1_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP1_2_XYZ_MAT ) );
  45. const float3x3 AP0_2_AP1 = mul( XYZ_2_AP1_MAT, AP0_2_XYZ_MAT );
  46. const float3x3 AP1_2_AP0 = mul( XYZ_2_AP0_MAT, AP1_2_XYZ_MAT );
  47. const float3x3 AP1_2_Output = OuputGamutMappingMatrix( OutputGamut );
  48. float3 LUTEncodedColor = Neutral.rgb;
  49. float3 LinearColor;
  50. if (GetOutputDevice() >= 3)
  51. LinearColor = ST2084ToLinear(LUTEncodedColor) * LinearToNitsScaleInverse;
  52. else
  53. LinearColor = LogToLin( LUTEncodedColor ) - LogToLin( 0 );
  54. // 色彩平衡.
  55. float3 BalancedColor = WhiteBalance( LinearColor );
  56. // 计算颜色调整系数.
  57. float3 ColorAP1 = mul( sRGB_2_AP1, BalancedColor );
  58. if (!bUseMobileTonemapper)
  59. {
  60. float LumaAP1 = dot( ColorAP1, AP1_RGB2Y );
  61. float3 ChromaAP1 = ColorAP1 / LumaAP1;
  62. float ChromaDistSqr = dot( ChromaAP1 - 1, ChromaAP1 - 1 );
  63. float ExpandAmount = ( 1 - exp2( -4 * ChromaDistSqr ) ) * ( 1 - exp2( -4 * ExpandGamut * LumaAP1*LumaAP1 ) );
  64. const float3x3 Wide_2_XYZ_MAT =
  65. {
  66. 0.5441691, 0.2395926, 0.1666943,
  67. 0.2394656, 0.7021530, 0.0583814,
  68. -0.0023439, 0.0361834, 1.0552183,
  69. };
  70. const float3x3 Wide_2_AP1 = mul( XYZ_2_AP1_MAT, Wide_2_XYZ_MAT );
  71. const float3x3 ExpandMat = mul( Wide_2_AP1, AP1_2_sRGB );
  72. float3 ColorExpand = mul( ExpandMat, ColorAP1 );
  73. ColorAP1 = lerp( ColorAP1, ColorExpand, ExpandAmount );
  74. }
  75. // 校正颜色的高中低调.
  76. ColorAP1 = ColorCorrectAll( ColorAP1 );
  77. float3 GradedColor = mul( AP1_2_sRGB, ColorAP1 );
  78. // 蓝色校正.
  79. const float3x3 BlueCorrect =
  80. {
  81. 0.9404372683, -0.0183068787, 0.0778696104,
  82. 0.0083786969, 0.8286599939, 0.1629613092,
  83. 0.0005471261, -0.0008833746, 1.0003362486
  84. };
  85. const float3x3 BlueCorrectInv =
  86. {
  87. 1.06318, 0.0233956, -0.0865726,
  88. -0.0106337, 1.20632, -0.19569,
  89. -0.000590887, 0.00105248, 0.999538
  90. };
  91. const float3x3 BlueCorrectAP1 = mul( AP0_2_AP1, mul( BlueCorrect, AP1_2_AP0 ) );
  92. const float3x3 BlueCorrectInvAP1 = mul( AP0_2_AP1, mul( BlueCorrectInv, AP1_2_AP0 ) );
  93. ColorAP1 = lerp( ColorAP1, mul( BlueCorrectAP1, ColorAP1 ), BlueCorrection );
  94. // Film色调映射.
  95. float3 ToneMappedColorAP1 = FilmToneMap( ColorAP1 );
  96. ColorAP1 = lerp(ColorAP1, ToneMappedColorAP1, ToneCurveAmount);
  97. ColorAP1 = lerp( ColorAP1, mul( BlueCorrectInvAP1, ColorAP1 ), BlueCorrection );
  98. float3 FilmColor = max(0, mul( AP1_2_sRGB, ColorAP1 ));
  99. FilmColor = ColorCorrection( FilmColor );
  100. float3 FilmColorNoGamma = lerp( FilmColor * ColorScale, OverlayColor.rgb, OverlayColor.a );
  101. GradedColor = lerp(GradedColor * ColorScale, OverlayColor.rgb, OverlayColor.a);
  102. FilmColor = pow( max(0, FilmColorNoGamma), InverseGamma.y );
  103. float3 OutDeviceColor = 0;
  104. // 根据输出设备的类型调用不同的色彩处理, 默认是0.
  105. if( GetOutputDevice() == 0 )
  106. {
  107. // 高阶(原始)颜色是FilmColor.
  108. float3 OutputGamutColor = FilmColor;
  109. // 线性空间转到sRGB.
  110. OutDeviceColor = LinearToSrgb( OutputGamutColor );
  111. }
  112. else if( GetOutputDevice() == 1 )
  113. {
  114. float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, FilmColor ) );
  115. OutDeviceColor = LinearTo709Branchless( OutputGamutColor );
  116. }
  117. else if( GetOutputDevice() == 3 || GetOutputDevice() == 5 )
  118. {
  119. float3 ODTColor = ACESOutputTransforms1000( GradedColor );
  120. ODTColor = mul( AP1_2_Output, ODTColor );
  121. OutDeviceColor = LinearToST2084( ODTColor );
  122. }
  123. else if( GetOutputDevice() == 4 || GetOutputDevice() == 6 )
  124. {
  125. float3 ODTColor = ACESOutputTransforms2000( GradedColor );
  126. ODTColor = mul( AP1_2_Output, ODTColor );
  127. OutDeviceColor = LinearToST2084( ODTColor );
  128. }
  129. else if( GetOutputDevice() == 7 )
  130. {
  131. float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, GradedColor ) );
  132. OutDeviceColor = LinearToST2084( OutputGamutColor );
  133. }
  134. else if( GetOutputDevice() == 8 )
  135. {
  136. OutDeviceColor = GradedColor;
  137. }
  138. else if (GetOutputDevice() == 9)
  139. {
  140. float3 OutputGamutColor = mul(AP1_2_Output, mul(sRGB_2_AP1, FilmColorNoGamma));
  141. OutDeviceColor = OutputGamutColor;
  142. }
  143. else
  144. {
  145. float3 OutputGamutColor = mul( AP1_2_Output, mul( sRGB_2_AP1, FilmColor ) );
  146. OutDeviceColor = pow( OutputGamutColor, InverseGamma.z );
  147. }
  148. OutColor.rgb = OutDeviceColor / 1.05;
  149. OutColor.a = 0;
  150. return OutColor;
  151. }
  152. // PS主入口.
  153. void MainPS(FWriteToSliceGeometryOutput Input, out float4 OutColor : SV_Target0)
  154. {
  155. OutColor = CombineLUTsCommon(Input.Vertex.UV, Input.LayerIndex);
  156. }

上面密密麻麻布满了颜色的系数计算、空间转换,着实让人眼花,不过我们只需要重点关注FilmToneMap

  1. // Engine\Shaders\Private\TonemapCommon.ush
  2. // 在后处理体积中编辑得到, 然后由C++传入.
  3. float FilmSlope;
  4. float FilmToe;
  5. float FilmShoulder;
  6. float FilmBlackClip;
  7. float FilmWhiteClip;
  8. half3 FilmToneMap( half3 LinearColor )
  9. {
  10. const float3x3 sRGB_2_AP0 = mul( XYZ_2_AP0_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
  11. const float3x3 sRGB_2_AP1 = mul( XYZ_2_AP1_MAT, mul( D65_2_D60_CAT, sRGB_2_XYZ_MAT ) );
  12. const float3x3 AP0_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP0_2_XYZ_MAT ) );
  13. const float3x3 AP1_2_sRGB = mul( XYZ_2_sRGB_MAT, mul( D60_2_D65_CAT, AP1_2_XYZ_MAT ) );
  14. const float3x3 AP0_2_AP1 = mul( XYZ_2_AP1_MAT, AP0_2_XYZ_MAT );
  15. const float3x3 AP1_2_AP0 = mul( XYZ_2_AP0_MAT, AP1_2_XYZ_MAT );
  16. float3 ColorAP1 = LinearColor;
  17. #if 1
  18. // 发光模块常数
  19. const float RRT_GLOW_GAIN = 0.05;
  20. const float RRT_GLOW_MID = 0.08;
  21. float saturation = rgb_2_saturation( ColorAP0 );
  22. float ycIn = rgb_2_yc( ColorAP0 );
  23. float s = sigmoid_shaper( (saturation - 0.4) / 0.2);
  24. float addedGlow = 1 + glow_fwd( ycIn, RRT_GLOW_GAIN * s, RRT_GLOW_MID);
  25. ColorAP0 *= addedGlow;
  26. #endif
  27. #if 1
  28. // --- 红色修改系数 --- //
  29. const float RRT_RED_SCALE = 0.82;
  30. const float RRT_RED_PIVOT = 0.03;
  31. const float RRT_RED_HUE = 0;
  32. const float RRT_RED_WIDTH = 135;
  33. float hue = rgb_2_hue( ColorAP0 );
  34. float centeredHue = center_hue( hue, RRT_RED_HUE );
  35. float hueWeight = Square( smoothstep( 0, 1, 1 - abs( 2 * centeredHue / RRT_RED_WIDTH ) ) );
  36. ColorAP0.r += hueWeight * saturation * (RRT_RED_PIVOT - ColorAP0.r) * (1. - RRT_RED_SCALE);
  37. #endif
  38. // 使用ACEScg基数作为工作空间.
  39. float3 WorkingColor = mul( AP0_2_AP1_MAT, ColorAP0 );
  40. WorkingColor = max( 0, WorkingColor );
  41. // 准备降低饱和度.
  42. WorkingColor = lerp( dot( WorkingColor, AP1_RGB2Y ), WorkingColor, 0.96 );
  43. const half ToeScale = 1 + FilmBlackClip - FilmToe;
  44. const half ShoulderScale = 1 + FilmWhiteClip - FilmShoulder;
  45. const float InMatch = 0.18;
  46. const float OutMatch = 0.18;
  47. float ToeMatch;
  48. if( FilmToe > 0.8 )
  49. {
  50. // 0.18 will be on straight segment
  51. ToeMatch = ( 1 - FilmToe - OutMatch ) / FilmSlope + log10( InMatch );
  52. }
  53. else
  54. {
  55. // 0.18 will be on toe segment
  56. // Solve for ToeMatch such that input of InMatch gives output of OutMatch.
  57. const float bt = ( OutMatch + FilmBlackClip ) / ToeScale - 1;
  58. ToeMatch = log10( InMatch ) - 0.5 * log( (1+bt)/(1-bt) ) * (ToeScale / FilmSlope);
  59. }
  60. float StraightMatch = ( 1 - FilmToe ) / FilmSlope - ToeMatch;
  61. float ShoulderMatch = FilmShoulder / FilmSlope - StraightMatch;
  62. half3 LogColor = log10( WorkingColor );
  63. half3 StraightColor = FilmSlope * ( LogColor + StraightMatch );
  64. half3 ToeColor = ( -FilmBlackClip ) + (2 * ToeScale) / ( 1 + exp( (-2 * FilmSlope / ToeScale) * ( LogColor - ToeMatch ) ) );
  65. half3 ShoulderColor = ( 1 + FilmWhiteClip ) - (2 * ShoulderScale) / ( 1 + exp( ( 2 * FilmSlope / ShoulderScale) * ( LogColor - ShoulderMatch ) ) );
  66. ToeColor = LogColor < ToeMatch ? ToeColor : StraightColor;
  67. ShoulderColor = LogColor > ShoulderMatch ? ShoulderColor : StraightColor;
  68. half3 t = saturate( ( LogColor - ToeMatch ) / ( ShoulderMatch - ToeMatch ) );
  69. t = ShoulderMatch < ToeMatch ? 1 - t : t;
  70. t = (3-2*t)*t*t;
  71. half3 ToneColor = lerp( ToeColor, ShoulderColor, t );
  72. // 后置降饱和度
  73. ToneColor = lerp( dot( float3(ToneColor), AP1_RGB2Y ), ToneColor, 0.93 );
  74. return max( 0, ToneColor );
  75. }

可知UE的Film Tonemapping(电影色调映射)除了常规的色彩空间转换和曲线映射,还增加了Slope(斜率)、Toe(脚趾)、Shoulder(肩部)、Black Clip(黑色裁剪)、White Clip(白色裁剪)等不同色阶的调整,以便艺术家精确地控制画面效果。

它们在后处理体积中可以编辑:

上图是UE的默认值,但实际上在代码中,UE还给出了不同游戏和配置的参考系数:

  1. // Default settings
  2. Slope = 0.88;
  3. Toe = 0.55;
  4. Shoulder = 0.26;
  5. BlackClip= 0;
  6. WhiteClip = 0.04;
  7. // Uncharted settings
  8. Slope = 0.63;
  9. Toe = 0.55;
  10. Shoulder = 0.47;
  11. BlackClip= 0;
  12. WhiteClip = 0.01;
  13. // HP settings
  14. Slope = 0.65;
  15. Toe = 0.63;
  16. Shoulder = 0.45;
  17. BlackClip = 0;
  18. WhiteClip = 0;
  19. // Legacy settings
  20. Slope = 0.98;
  21. Toe = 0.3;
  22. Shoulder = 0.22;
  23. BlackClip = 0;
  24. WhiteClip = 0.025;
  25. // ACES settings
  26. Slope = 0.91;
  27. Toe = 0.53;
  28. Shoulder = 0.23;
  29. BlackClip = 0;
  30. WhiteClip = 0.035;

更加具体的参数含义和效果变化参加官方文档:Color Grading and Filmic Tonemapper

值得一提的是,前述代码隐含了大量的色彩空间转换、色调曲线映射等知识点,如果没有接触过这类知识,将会云里雾里。幸好,它们可以在这篇文献Tone Mapping找到理论依据和参考实现,值得一读。通篇理解之后,将会豁然开朗,之前的很多疑团将被解开!

参考文献Tone Mapping展示XYZ色彩空间转换到sRGB的过程和公式。

UE存在屏幕百分比(Screen Percentage)技术,用于比显示屏幕分辨率更低的分辨率进行渲染,然后上采样到指定屏幕分辨率。它有两种屏幕百分比:Primary Screen Percentage(主屏幕百分比)Secondary Screen Percentage(次屏幕百分比)

Primary Screen Percentage是用户可以设置和修改的分辨率比例,它基于以较低的分辨率渲染帧,然后在用户界面(UI)绘制之前将其升级的想法。Secondary Screen Percentage在Primary Screen Percentage之后再一次也是最后一次执行的分辨率上采样通道,它不可在运行时动态地修改,用于高DPI但性能较低的设备,以便使用较低的分辨率然后上采样到高DPI分辨率。

渲染阶段的较低分辨率经过主屏幕百分比后放大纹理分辨率,经过各种后处理效果后再由次屏幕百分比放大到适配屏幕的分辨率,之后再处理UI。

在场景视图中可以调整主屏幕百分比:

在后处理体积中也可以设置主屏幕百分比:

当然,还可以通过控制台命令来更改:

  1. r.ScreenPercentage 100

它们的实现在后处理渲染管线的最后阶段:

  1. void AddPostProcessingPasses(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FPostProcessingInputs& Inputs)
  2. {
  3. (......)
  4. // 主放大
  5. if (PassSequence.IsEnabled(EPass::PrimaryUpscale))
  6. {
  7. FUpscaleInputs PassInputs;
  8. PassSequence.AcceptOverrideIfLastPass(EPass::PrimaryUpscale, PassInputs.OverrideOutput);
  9. PassInputs.SceneColor = SceneColor;
  10. PassInputs.Method = GetUpscaleMethod();
  11. PassInputs.Stage = PassSequence.IsEnabled(EPass::SecondaryUpscale) ? EUpscaleStage::PrimaryToSecondary : EUpscaleStage::PrimaryToOutput;
  12. // Panini projection is handled by the primary upscale pass.
  13. PassInputs.PaniniConfig = PaniniConfig;
  14. SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
  15. }
  16. // 次放大
  17. if (PassSequence.IsEnabled(EPass::SecondaryUpscale))
  18. {
  19. FUpscaleInputs PassInputs;
  20. PassSequence.AcceptOverrideIfLastPass(EPass::SecondaryUpscale, PassInputs.OverrideOutput);
  21. PassInputs.SceneColor = SceneColor;
  22. PassInputs.Method = View.Family->SecondaryScreenPercentageMethod == ESecondaryScreenPercentageMethod::LowerPixelDensitySimulation ? EUpscaleMethod::SmoothStep : EUpscaleMethod::Nearest;
  23. PassInputs.Stage = EUpscaleStage::SecondaryToOutput;
  24. SceneColor = AddUpscalePass(GraphBuilder, View, PassInputs);
  25. }
  26. }

主放大和次放大都调用了AddUpscalePass

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\PostProcessUpscale.cpp
  2. FScreenPassTexture AddUpscalePass(FRDGBuilder& GraphBuilder, const FViewInfo& View, const FUpscaleInputs& Inputs)
  3. {
  4. FScreenPassRenderTarget Output = Inputs.OverrideOutput;
  5. // 创建新的输出纹理.
  6. if (!Output.IsValid())
  7. {
  8. FRDGTextureDesc OutputDesc = Inputs.SceneColor.Texture->Desc;
  9. OutputDesc.Reset();
  10. if (Inputs.Stage == EUpscaleStage::PrimaryToSecondary)
  11. {
  12. const FIntPoint SecondaryViewRectSize = View.GetSecondaryViewRectSize();
  13. QuantizeSceneBufferSize(SecondaryViewRectSize, OutputDesc.Extent);
  14. Output.ViewRect.Min = FIntPoint::ZeroValue;
  15. Output.ViewRect.Max = SecondaryViewRectSize;
  16. }
  17. else
  18. {
  19. OutputDesc.Extent = View.UnscaledViewRect.Max;
  20. Output.ViewRect = View.UnscaledViewRect;
  21. }
  22. OutputDesc.Flags |= GFastVRamConfig.Upscale;
  23. Output.Texture = GraphBuilder.CreateTexture(OutputDesc, TEXT("Upscale"));
  24. Output.LoadAction = ERenderTargetLoadAction::EClear;
  25. }
  26. const FScreenPassTextureViewport InputViewport(Inputs.SceneColor);
  27. const FScreenPassTextureViewport OutputViewport(Output);
  28. // Panini投影.
  29. FPaniniProjectionConfig PaniniConfig = Inputs.PaniniConfig;
  30. PaniniConfig.Sanitize();
  31. const bool bUsePaniniProjection = PaniniConfig.IsEnabled();
  32. // 上采用参数.
  33. FUpscaleParameters* PassParameters = GraphBuilder.AllocParameters<FUpscaleParameters>();
  34. PassParameters->RenderTargets[0] = Output.GetRenderTargetBinding();
  35. PassParameters->Input = GetScreenPassTextureViewportParameters(InputViewport);
  36. PassParameters->Output = GetScreenPassTextureViewportParameters(OutputViewport);
  37. PassParameters->SceneColorTexture = Inputs.SceneColor.Texture;
  38. PassParameters->SceneColorSampler = TStaticSamplerState<SF_Bilinear, AM_Border, AM_Border, AM_Border>::GetRHI();
  39. PassParameters->PointSceneColorTexture = Inputs.SceneColor.Texture;
  40. PassParameters->PointSceneColorTextureArray = Inputs.SceneColor.Texture;
  41. PassParameters->PointSceneColorSampler = TStaticSamplerState<SF_Point, AM_Border, AM_Border, AM_Border>::GetRHI();
  42. PassParameters->Panini = GetPaniniProjectionParameters(PaniniConfig, View);
  43. PassParameters->UpscaleSoftness = FMath::Clamp(CVarUpscaleSoftness.GetValueOnRenderThread(), 0.0f, 1.0f);
  44. PassParameters->View = View.ViewUniformBuffer;
  45. // 处理FUpscalePS.
  46. FUpscalePS::FPermutationDomain PixelPermutationVector;
  47. PixelPermutationVector.Set<FUpscalePS::FMethodDimension>(Inputs.Method);
  48. TShaderMapRef<FUpscalePS> PixelShader(View.ShaderMap, PixelPermutationVector);
  49. const TCHAR* const StageNames[] = { TEXT("PrimaryToSecondary"), TEXT("PrimaryToOutput"), TEXT("SecondaryToOutput") };
  50. static_assert(UE_ARRAY_COUNT(StageNames) == static_cast<uint32>(EUpscaleStage::MAX), "StageNames does not match EUpscaleStage");
  51. const TCHAR* StageName = StageNames[static_cast<uint32>(Inputs.Stage)];
  52. GraphBuilder.AddPass(
  53. RDG_EVENT_NAME("Upscale (%s) %dx%d", StageName, Output.ViewRect.Width(), Output.ViewRect.Height()),
  54. PassParameters,
  55. ERDGPassFlags::Raster,
  56. [&View, bUsePaniniProjection, PixelShader, PassParameters, InputViewport, OutputViewport](FRHICommandList& RHICmdList)
  57. {
  58. RHICmdList.SetViewport(OutputViewport.Rect.Min.X, OutputViewport.Rect.Min.Y, 0.0f, OutputViewport.Rect.Max.X, OutputViewport.Rect.Max.Y, 1.0f);
  59. TShaderRef<FShader> VertexShader;
  60. // Panini投影使用特殊的VS. 亦即在VS里处理Panini投影.
  61. if (bUsePaniniProjection)
  62. {
  63. TShaderMapRef<FUpscaleVS> TypedVertexShader(View.ShaderMap);
  64. SetScreenPassPipelineState(RHICmdList, FScreenPassPipelineState(TypedVertexShader, PixelShader));
  65. SetShaderParameters(RHICmdList, TypedVertexShader, TypedVertexShader.GetVertexShader(), *PassParameters);
  66. VertexShader = TypedVertexShader;
  67. }
  68. else
  69. {
  70. TShaderMapRef<FScreenPassVS> TypedVertexShader(View.ShaderMap);
  71. SetScreenPassPipelineState(RHICmdList, FScreenPassPipelineState(TypedVertexShader, PixelShader));
  72. VertexShader = TypedVertexShader;
  73. }
  74. check(VertexShader.IsValid());
  75. SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
  76. // 全屏绘制.
  77. DrawRectangle(
  78. RHICmdList,
  79. // Output Rect (RHI viewport relative).
  80. 0, 0, OutputViewport.Rect.Width(), OutputViewport.Rect.Height(),
  81. // Input Rect
  82. InputViewport.Rect.Min.X, InputViewport.Rect.Min.Y, InputViewport.Rect.Width(), InputViewport.Rect.Height(),
  83. OutputViewport.Rect.Size(),
  84. InputViewport.Extent,
  85. VertexShader,
  86. // Panini投影使用曲面细分.
  87. bUsePaniniProjection ? EDRF_UseTesselatedIndexBuffer : EDRF_UseTriangleOptimization);
  88. });
  89. return MoveTemp(Output);
  90. }

以上可知根据是否Panini投影,会使用不同的VS,但PS一样,都是FUpscaleVS,下面分析FUpscaleVS的shader代码:

  1. // Engine\Shaders\Private\PostProcessUpscale.usf
  2. void MainPS(noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
  3. {
  4. OutColor = 0;
  5. // 最近点上采样.(不会模糊, 但有块状)
  6. #if METHOD == UPSCALE_METHOD_NEAREST
  7. #if ES3_1_PROFILE
  8. #if MOBILE_MULTI_VIEW
  9. OutColor = Texture2DArraySample(PointSceneColorTextureArray, PointSceneColorSampler, float3(UVAndScreenPos.xy,0));
  10. #else
  11. OutColor = Texture2DSample(PointSceneColorTexture, PointSceneColorSampler, UVAndScreenPos.xy);
  12. #endif
  13. #else
  14. #if MOBILE_MULTI_VIEW
  15. OutColor = PointSceneColorTextureArray.SampleLevel(PointSceneColorSampler, vec3(UVAndScreenPos.xy,0), 0, int2(0, 0));
  16. #else
  17. OutColor = PointSceneColorTexture.SampleLevel(PointSceneColorSampler, UVAndScreenPos.xy, 0, int2(0, 0));
  18. #endif
  19. #endif
  20. // 双线性上采样.(快, 但有锯齿)
  21. #elif METHOD == UPSCALE_METHOD_BILINEAR
  22. OutColor.rgb = SampleSceneColorRGB(UVAndScreenPos.xy);
  23. // 定向模糊上采样, 使用了与不锐利的蒙版.
  24. #elif METHOD == UPSCALE_METHOD_DIRECTIONAL
  25. float2 UV = UVAndScreenPos.xy;
  26. float X = 0.5;
  27. float3 ColorNW = SampleSceneColorRGB(UV + float2(-X, -X) * Input_ExtentInverse);
  28. float3 ColorNE = SampleSceneColorRGB(UV + float2( X, -X) * Input_ExtentInverse);
  29. float3 ColorSW = SampleSceneColorRGB(UV + float2(-X, X) * Input_ExtentInverse);
  30. float3 ColorSE = SampleSceneColorRGB(UV + float2( X, X) * Input_ExtentInverse);
  31. OutColor.rgb = (ColorNW * 0.25) + (ColorNE * 0.25) + (ColorSW * 0.25) + (ColorSE * 0.25);
  32. float LumaNW = Luma(ColorNW);
  33. float LumaNE = Luma(ColorNE);
  34. float LumaSW = Luma(ColorSW);
  35. float LumaSE = Luma(ColorSE);
  36. float2 IsoBrightnessDir;
  37. float DirSWMinusNE = LumaSW - LumaNE;
  38. float DirSEMinusNW = LumaSE - LumaNW;
  39. IsoBrightnessDir.x = DirSWMinusNE + DirSEMinusNW;
  40. IsoBrightnessDir.y = DirSWMinusNE - DirSEMinusNW;
  41. // avoid NaN on zero vectors by adding 2^-24 (float ulp when length==1, and also minimum representable half)
  42. IsoBrightnessDir = IsoBrightnessDir * (0.125 * rsqrt(dot(IsoBrightnessDir, IsoBrightnessDir) + 6e-8));
  43. float3 ColorN = SampleSceneColorRGB(UV - IsoBrightnessDir * Input_ExtentInverse);
  44. float3 ColorP = SampleSceneColorRGB(UV + IsoBrightnessDir * Input_ExtentInverse);
  45. float UnsharpMask = 0.25;
  46. OutColor.rgb = (ColorN + ColorP) * ((UnsharpMask + 1.0) * 0.5) - (OutColor.rgb * UnsharpMask);
  47. // 双立方的Catmull-Rom上采样, 每像素使用5个采样点.
  48. #elif METHOD == UPSCALE_METHOD_CATMULL_ROM
  49. FCatmullRomSamples Samples = GetBicubic2DCatmullRomSamples(UVAndScreenPos.xy, Input_Extent, Input_ExtentInverse);
  50. for (uint i = 0; i < Samples.Count; i++)
  51. {
  52. OutColor.rgb += SampleSceneColorRGB(Samples.UV[i]) * Samples.Weight[i];
  53. }
  54. OutColor *= Samples.FinalMultiplier;
  55. // LANCZOS上采样.
  56. #elif METHOD == UPSCALE_METHOD_LANCZOS
  57. {
  58. // Lanczos 3
  59. float2 UV = UVAndScreenPos.xy * Input_Extent;
  60. float2 tc = floor(UV - 0.5) + 0.5;
  61. float2 f = UV - tc + 2;
  62. // compute at f, f-1, f-2, f-3, f-4, and f-5 using trig angle addition
  63. float2 fpi = f*PI, fpi3 = f * (PI / 3.0);
  64. float2 sinfpi = sin(fpi), sinfpi3 = sin(fpi3), cosfpi3 = cos(fpi3);
  65. const float r3 = sqrt(3.0);
  66. float2 w0 = ( sinfpi * sinfpi3 ) / ( f * f );
  67. float2 w1 = (-sinfpi * ( sinfpi3 - r3*cosfpi3)) / ((f - 1.0)*(f - 1.0));
  68. float2 w2 = ( sinfpi * ( -sinfpi3 - r3*cosfpi3)) / ((f - 2.0)*(f - 2.0));
  69. float2 w3 = (-sinfpi * (-2.0*sinfpi3 )) / ((f - 3.0)*(f - 3.0));
  70. float2 w4 = ( sinfpi * ( -sinfpi3 + r3*cosfpi3)) / ((f - 4.0)*(f - 4.0));
  71. float2 w5 = (-sinfpi * ( sinfpi3 + r3*cosfpi3)) / ((f - 5.0)*(f - 5.0));
  72. // use bilinear texture weights to merge center two samples in each dimension
  73. float2 Weight[5];
  74. Weight[0] = w0;
  75. Weight[1] = w1;
  76. Weight[2] = w2 + w3;
  77. Weight[3] = w4;
  78. Weight[4] = w5;
  79. float2 Sample[5];
  80. Sample[0] = Input_ExtentInverse * (tc - 2);
  81. Sample[1] = Input_ExtentInverse * (tc - 1);
  82. Sample[2] = Input_ExtentInverse * (tc + w3 / Weight[2]);
  83. Sample[3] = Input_ExtentInverse * (tc + 2);
  84. Sample[4] = Input_ExtentInverse * (tc + 3);
  85. OutColor = 0;
  86. // 5x5 footprint with corners dropped to give 13 texture taps
  87. OutColor += float4(SampleSceneColorRGB(float2(Sample[0].x, Sample[2].y)), 1) * Weight[0].x * Weight[2].y;
  88. OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[1].y)), 1) * Weight[1].x * Weight[1].y;
  89. OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[2].y)), 1) * Weight[1].x * Weight[2].y;
  90. OutColor += float4(SampleSceneColorRGB(float2(Sample[1].x, Sample[3].y)), 1) * Weight[1].x * Weight[3].y;
  91. OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[0].y)), 1) * Weight[2].x * Weight[0].y;
  92. OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[1].y)), 1) * Weight[2].x * Weight[1].y;
  93. OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[2].y)), 1) * Weight[2].x * Weight[2].y;
  94. OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[3].y)), 1) * Weight[2].x * Weight[3].y;
  95. OutColor += float4(SampleSceneColorRGB(float2(Sample[2].x, Sample[4].y)), 1) * Weight[2].x * Weight[4].y;
  96. OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[1].y)), 1) * Weight[3].x * Weight[1].y;
  97. OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[2].y)), 1) * Weight[3].x * Weight[2].y;
  98. OutColor += float4(SampleSceneColorRGB(float2(Sample[3].x, Sample[3].y)), 1) * Weight[3].x * Weight[3].y;
  99. OutColor += float4(SampleSceneColorRGB(float2(Sample[4].x, Sample[2].y)), 1) * Weight[4].x * Weight[2].y;
  100. OutColor /= OutColor.w;
  101. }
  102. // 高斯上采样.
  103. #elif METHOD == UPSCALE_METHOD_GAUSSIAN
  104. {
  105. // Gaussian filtered unsharp mask
  106. float2 UV = UVAndScreenPos.xy * Input_Extent;
  107. float2 tc = floor(UV) + 0.5;
  108. // estimate pixel value and derivatives
  109. OutColor = 0;
  110. float3 Laplacian = 0;
  111. UNROLL for (int i = -3; i <= 2; ++i)
  112. {
  113. UNROLL for (int j = -3; j <= 2; ++j)
  114. {
  115. float2 TexelOffset = float2(i, j) + 0.5;
  116. // skip corners: eliminated entirely by UNROLL
  117. if (dot(TexelOffset, TexelOffset) > 9) continue;
  118. float2 Texel = tc + TexelOffset;
  119. float2 Offset = UV - Texel;
  120. float OffsetSq = 2 * dot(Offset, Offset); // texel loop is optimized for variance = 0.5
  121. float Weight = exp(-0.5 * OffsetSq);
  122. float4 Sample = Weight * float4(SampleSceneColorRGB(Texel * Input_ExtentInverse), 1);
  123. OutColor += Sample;
  124. Laplacian += Sample.rgb * (OffsetSq - 2);
  125. }
  126. }
  127. OutColor /= OutColor.a;
  128. Laplacian /= OutColor.a;
  129. float UnsharpScale = UpscaleSoftness * (1 - Input_Extent.x * Input_Extent.y * Output_ViewportSizeInverse.x * Output_ViewportSizeInverse.y);
  130. OutColor.rgb -= UnsharpScale * Laplacian;
  131. }
  132. // 平滑采样.
  133. #elif METHOD == UPSCALE_METHOD_SMOOTHSTEP
  134. OutColor.rgb = SampleSceneColorRGB(GetSmoothstepUV(UVAndScreenPos.xy, Input_Extent, Input_ExtentInverse));
  135. #endif
  136. }

上面涉及到了部分纹理过滤和采样技术:最近点、双线性、双立方、Lanczos等,其中部分采样曲线示意图如下:

部分曲线的效果对比图如下:

这里说一下最复杂的Lanczos采样算法。Lanczos的卷积核普态公式如下:

\[L(x) = \begin{cases}
1 & \text{if}\ x = 0, \\
\dfrac{a \sin(\pi x) \sin(\pi x / a)}{\pi^2 x^2} & \text{if}\ -a \leq x < a \ \text{and}\ x \neq 0, \\
0 & \text{otherwise}.
\end{cases}
\]

其中\(a\)是正整数,通常是2或3,表示卷积核的尺寸。当\(a=2\)\(=3\)时,卷积核曲线如下所示:

利用卷积核公式,可以获得Lanczos的插值(采样)公式:

\[S(x) = \sum_{i=\lfloor x \rfloor – a + 1}^{\lfloor x \rfloor + a} s_{i} L(x – i)
\]

其中\(x\)是采样位置,\(a\)是过滤尺寸大小,\(\lfloor x \rfloor\)floor函数。

不过上面的PS的shader代码中并没有完全按照公式实现,而是对三角函数运算和循环语句做了优化。

此外,上面的C++代码中涉及到了Panini Projection,它是用来校正广角FOV的透视畸变。

上:未采用Panini Projection,画面产生了明显的畸变;下:采用了Panini Projection,画面恢复正常。

UE的内置抗锯齿算法有MSAA、FXAA、TAA,MSAA主要用于前向渲染的基于硬件抗锯齿的算法,TAA主要用于延迟渲染的抗锯齿算法,而FXAA是基于后处理的抗锯齿算法。它们的比较如下表:

适用管线 效果 消耗 其它描述
MSAA 前向 清晰度高,抗锯齿好 带宽中,显存中 需要额外记录采样覆盖数据
FXAA 前向,延迟 清晰度较高,抗锯齿较好 带宽低,显存低 不需要额外记录数据,计算量较大
TAA 延迟 清晰度较低,存在延时、闪烁、鬼影等,但静态画面抗锯齿非常好 带宽高,显存高 需要速度缓冲和额外记录历史帧数据

UE4.26延迟渲染管线下的TAA和FXAA对比图。上面是TAA,下面是FXAA。

FXAA全称Fast approXimate Anti-Aliasing,是就职于NVIDIA的Timothy Lottes首次提出的一种快速近似MSAA的后处理抗锯齿方法,他分别在2009年、2011年发表了文献FXAAFiltering Approaches for Real-Time Anti-Aliasing

FXAA的核心算法如下图和文字所示(序号和图片一一对应):

1、输入一副没有抗锯齿的sRGB颜色空间的纹理,在Shader逻辑中,它将内部转换为亮度的估计标量值。

2、检查局部对比度以避免处理非边缘的部分。检测到的边缘数据放到R通道,用向黄色混合来表示检测到的子像素锯齿量。

这一步实现中做了优化,会对低对比度(非边缘)的像素进行早期返回(Early Exit)。

3、通过局部对比度检测的像素被分类为水平(金色)或垂直(蓝色)。

4、给定边缘方向,选择与边缘成90度的最高对比度的像素作为一对(Pair),用蓝/绿表示。

5、在边缘的正负(红/蓝)方向上搜索边缘末端(end-of-edge)。检查沿边缘高对比度像素对(Pair)的平均亮度的显著变化。

6、给定边缘末端,将边缘上的像素位置转换为垂直于边缘的子像素偏移90度,以减少锯齿。其中,红/蓝是减少/增加水平偏移,金色/天空蓝是减少/增加垂直偏移。

7、给定子像素偏移量,对输入纹理重新采样。

8、最后根据检测到的子像素锯齿量加入一个低通滤波器。

关于FXAA需要补充几点说明:

  • 由于FXAA不需要额外的纹理数据,输入和输出纹理只有一张,所以可以在一个Pass处理完,带宽和显存消耗低。
  • 要求输入纹理是sRGB,如果是XYZ或线性空间的颜色将得不到预想的效果。
  • 由于FXAA需要进行多次步骤的计算,因此计算消耗理论上要比MSAA高,相当于时间换空间。
  • FXAA是基于屏幕空间的后处理算法,不需要用到法线、深度等GBuffer数据。
  • 由于FXAA只根据颜色的亮度来查找边缘,所以效果有限,无法检测出深度边缘和曲面边缘。
  • UE4.26的实现正是基于第二篇文献的算法,有很多种预设(Preset),它们是针对不同平台和质量等级执行的优化和适配。

下面分析UE的FXAA在PC平台的算法,其它平台核心算法类似,此文不涉及。下面代码涉及的很多后缀,它们的含义如下图:

上图的各个缩写含义如下:

  • M:Median,中心像素。
  • N:North,M上面的像素。
  • S:South,M下面的像素。
  • W:West,M左边的像素。
  • E:East,M右边的像素。
  • NW:Northwest,M左上角的像素。
  • NE:Northeast,M右上角的像素。
  • SW:Southwest,M左下角的像素。
  • SE:Southeast,M右下角的像素。
  1. // Engine\Shaders\Private\FXAAShader.usf
  2. // 包含NVIDIA的FXAA实现文件.
  3. #include "Fxaa3_11.ush"
  4. // FXAA的PS主入口.
  5. void FxaaPS(noperspective float2 TexCenter : TEXCOORD0, noperspective float4 TexCorners : TEXCOORD1, out float4 OutColor : SV_Target0)
  6. {
  7. FxaaTex TextureAndSampler;
  8. TextureAndSampler.tex = Input_Texture;
  9. TextureAndSampler.smpl = Input_Sampler;
  10. TextureAndSampler.UVMinMax = float4(Input_UVViewportBilinearMin, Input_UVViewportBilinearMax);
  11. OutColor = FxaaPixelShader(
  12. TexCenter, TexCorners,
  13. TextureAndSampler,
  14. TextureAndSampler,
  15. TextureAndSampler,
  16. Input_ExtentInverse,
  17. fxaaConsoleRcpFrameOpt,
  18. fxaaConsoleRcpFrameOpt2,
  19. fxaaConsole360RcpFrameOpt2,
  20. fxaaQualitySubpix,
  21. fxaaQualityEdgeThreshold,
  22. fxaaQualityEdgeThresholdMin,
  23. fxaaConsoleEdgeSharpness,
  24. fxaaConsoleEdgeThreshold,
  25. fxaaConsoleEdgeThresholdMin,
  26. fxaaConsole360ConstDir);
  27. #if (POST_PROCESS_ALPHA != 2)
  28. OutColor.a = 1.0;
  29. #endif
  30. }

下面直接分析PC平台的FxaaPixelShader

  1. // Engine\Shaders\Private\Fxaa3_11.ush
  2. #if (FXAA_PC == 1) // 表明是PC平台
  3. FxaaFloat4 FxaaPixelShader(
  4. FxaaFloat2 pos, // 是像素中心, 这里使用非透视插值 (关闭透视插值).
  5. FxaaFloat4 fxaaConsolePosPos, // 只用于Console平台.
  6. FxaaTex tex, // 输入纹理
  7. FxaaTex fxaaConsole360TexExpBiasNegOne, // 只用于360平台.
  8. FxaaTex fxaaConsole360TexExpBiasNegTwo, // 只用于360平台.
  9. FxaaFloat2 fxaaQualityRcpFrame, // 只用于FXAA的质量, 必须是constant/uniform, {x_} = 1.0/screenWidthInPixels, {_y} = 1.0/screenHeightInPixels
  10. FxaaFloat4 fxaaConsoleRcpFrameOpt, // 只用于360平台.
  11. FxaaFloat4 fxaaConsoleRcpFrameOpt2, // 只用于360平台.
  12. FxaaFloat4 fxaaConsole360RcpFrameOpt2, // 只用于360平台.
  13. FxaaFloat fxaaQualitySubpix, // 只用于FXAA的质量. 控制锐利度.
  14. FxaaFloat fxaaQualityEdgeThreshold, // 边缘阈值. 只用于FXAA的质量.
  15. FxaaFloat fxaaQualityEdgeThresholdMin, // 最小边缘阈值. 只用于FXAA的质量.
  16. FxaaFloat fxaaConsoleEdgeSharpness, // 只用于360平台.
  17. FxaaFloat fxaaConsoleEdgeThreshold,
  18. FxaaFloat fxaaConsoleEdgeThresholdMin,
  19. FxaaFloat4 fxaaConsole360ConstDir
  20. ) {
  21. FxaaFloat2 posM;
  22. posM.x = pos.x;
  23. posM.y = pos.y;
  24. // 从输入纹理采样数据, 计算亮度值.
  25. #if (FXAA_GATHER4_ALPHA == 1)
  26. #if (FXAA_DISCARD == 0)
  27. FxaaFloat4 rgbyM = FxaaTexTop(tex, posM);
  28. #if (FXAA_GREEN_AS_LUMA == 0)
  29. #define lumaM rgbyM.w
  30. #else
  31. #define lumaM rgbyM.y
  32. #endif
  33. #endif
  34. #if (FXAA_GREEN_AS_LUMA == 0)
  35. FxaaFloat4 luma4A = FxaaTexAlpha4(tex, posM);
  36. FxaaFloat4 luma4B = FxaaTexOffAlpha4(tex, posM, FxaaInt2(-1, -1));
  37. #else
  38. FxaaFloat4 luma4A = FxaaTexGreen4(tex, posM);
  39. FxaaFloat4 luma4B = FxaaTexOffGreen4(tex, posM, FxaaInt2(-1, -1));
  40. #endif
  41. #if (FXAA_DISCARD == 1)
  42. #define lumaM luma4A.w
  43. #endif
  44. #define lumaE luma4A.z
  45. #define lumaS luma4A.x
  46. #define lumaSE luma4A.y
  47. #define lumaNW luma4B.w
  48. #define lumaN luma4B.z
  49. #define lumaW luma4B.x
  50. #else
  51. FxaaFloat4 rgbyM = FxaaTexTop(tex, posM);
  52. #if (FXAA_GREEN_AS_LUMA == 0)
  53. #define lumaM rgbyM.w
  54. #else
  55. #define lumaM rgbyM.y
  56. #endif
  57. FxaaFloat lumaS = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 0, 1), fxaaQualityRcpFrame.xy));
  58. FxaaFloat lumaE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1, 0), fxaaQualityRcpFrame.xy));
  59. FxaaFloat lumaN = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 0,-1), fxaaQualityRcpFrame.xy));
  60. FxaaFloat lumaW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 0), fxaaQualityRcpFrame.xy));
  61. #endif
  62. /*--------------------------------------------------------------------------*/
  63. // 计算各个方向上的亮度最大最小值, 检测是否可提前退出.
  64. FxaaFloat maxSM = max(lumaS, lumaM);
  65. FxaaFloat minSM = min(lumaS, lumaM);
  66. FxaaFloat maxESM = max(lumaE, maxSM);
  67. FxaaFloat minESM = min(lumaE, minSM);
  68. FxaaFloat maxWN = max(lumaN, lumaW);
  69. FxaaFloat minWN = min(lumaN, lumaW);
  70. FxaaFloat rangeMax = max(maxWN, maxESM);
  71. FxaaFloat rangeMin = min(minWN, minESM);
  72. FxaaFloat rangeMaxScaled = rangeMax * fxaaQualityEdgeThreshold;
  73. FxaaFloat range = rangeMax - rangeMin;
  74. FxaaFloat rangeMaxClamped = max(fxaaQualityEdgeThresholdMin, rangeMaxScaled);
  75. FxaaBool earlyExit = range < rangeMaxClamped;
  76. /*--------------------------------------------------------------------------*/
  77. if(earlyExit)
  78. #if (FXAA_DISCARD == 1)
  79. FxaaDiscard;
  80. #else
  81. return rgbyM;
  82. #endif
  83. /*--------------------------------------------------------------------------*/
  84. // 计算对角方向的亮度值.
  85. #if (FXAA_GATHER4_ALPHA == 0)
  86. FxaaFloat lumaNW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1,-1), fxaaQualityRcpFrame.xy));
  87. FxaaFloat lumaSE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1, 1), fxaaQualityRcpFrame.xy));
  88. FxaaFloat lumaNE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2( 1,-1), fxaaQualityRcpFrame.xy));
  89. FxaaFloat lumaSW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 1), fxaaQualityRcpFrame.xy));
  90. #else
  91. FxaaFloat lumaNE = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(1, -1), fxaaQualityRcpFrame.xy));
  92. FxaaFloat lumaSW = FxaaLuma(FxaaTexOff(tex, posM, FxaaInt2(-1, 1), fxaaQualityRcpFrame.xy));
  93. #endif
  94. /*--------------------------------------------------------------------------*/
  95. // 下面计算各个方向的边缘在水平和竖直的值, 以及子像素的值.
  96. /*--------------------------------------------------------------------------*/
  97. FxaaFloat lumaNS = lumaN + lumaS;
  98. FxaaFloat lumaWE = lumaW + lumaE;
  99. FxaaFloat subpixRcpRange = 1.0/range;
  100. FxaaFloat subpixNSWE = lumaNS + lumaWE;
  101. FxaaFloat edgeHorz1 = (-2.0 * lumaM) + lumaNS;
  102. FxaaFloat edgeVert1 = (-2.0 * lumaM) + lumaWE;
  103. /*--------------------------------------------------------------------------*/
  104. FxaaFloat lumaNESE = lumaNE + lumaSE;
  105. FxaaFloat lumaNWNE = lumaNW + lumaNE;
  106. FxaaFloat edgeHorz2 = (-2.0 * lumaE) + lumaNESE;
  107. FxaaFloat edgeVert2 = (-2.0 * lumaN) + lumaNWNE;
  108. /*--------------------------------------------------------------------------*/
  109. FxaaFloat lumaNWSW = lumaNW + lumaSW;
  110. FxaaFloat lumaSWSE = lumaSW + lumaSE;
  111. FxaaFloat edgeHorz4 = (abs(edgeHorz1) * 2.0) + abs(edgeHorz2);
  112. FxaaFloat edgeVert4 = (abs(edgeVert1) * 2.0) + abs(edgeVert2);
  113. FxaaFloat edgeHorz3 = (-2.0 * lumaW) + lumaNWSW;
  114. FxaaFloat edgeVert3 = (-2.0 * lumaS) + lumaSWSE;
  115. FxaaFloat edgeHorz = abs(edgeHorz3) + edgeHorz4;
  116. FxaaFloat edgeVert = abs(edgeVert3) + edgeVert4;
  117. /*--------------------------------------------------------------------------*/
  118. FxaaFloat subpixNWSWNESE = lumaNWSW + lumaNESE;
  119. FxaaFloat lengthSign = fxaaQualityRcpFrame.x;
  120. // 如果水平方向的边缘长度>竖直边缘长度, 说明是水平方向的边缘.
  121. FxaaBool horzSpan = edgeHorz >= edgeVert;
  122. FxaaFloat subpixA = subpixNSWE * 2.0 + subpixNWSWNESE;
  123. /*--------------------------------------------------------------------------*/
  124. // 如果不是水平边缘, 则将N和S换成W和E.(这样后面就避免了重复的代码)
  125. if(!horzSpan) lumaN = lumaW;
  126. if(!horzSpan) lumaS = lumaE;
  127. if(horzSpan) lengthSign = fxaaQualityRcpFrame.y;
  128. FxaaFloat subpixB = (subpixA * (1.0/12.0)) - lumaM;
  129. /*--------------------------------------------------------------------------*/
  130. // 根据梯度计算配对.
  131. FxaaFloat gradientN = lumaN - lumaM;
  132. FxaaFloat gradientS = lumaS - lumaM;
  133. FxaaFloat lumaNN = lumaN + lumaM;
  134. FxaaFloat lumaSS = lumaS + lumaM;
  135. FxaaBool pairN = abs(gradientN) >= abs(gradientS);
  136. FxaaFloat gradient = max(abs(gradientN), abs(gradientS));
  137. if(pairN) lengthSign = -lengthSign;
  138. FxaaFloat subpixC = FxaaSat(abs(subpixB) * subpixRcpRange);
  139. /*--------------------------------------------------------------------------*/
  140. // 计算偏移.
  141. FxaaFloat2 posB;
  142. posB.x = posM.x;
  143. posB.y = posM.y;
  144. FxaaFloat2 offNP;
  145. offNP.x = (!horzSpan) ? 0.0 : fxaaQualityRcpFrame.x;
  146. offNP.y = ( horzSpan) ? 0.0 : fxaaQualityRcpFrame.y;
  147. if(!horzSpan) posB.x += lengthSign * 0.5;
  148. if( horzSpan) posB.y += lengthSign * 0.5;
  149. /*--------------------------------------------------------------------------*/
  150. // 计算偏移后的位置.
  151. // 上面的像素位置.
  152. FxaaFloat2 posN;
  153. posN.x = posB.x - offNP.x * FXAA_QUALITY__P0;
  154. posN.y = posB.y - offNP.y * FXAA_QUALITY__P0;
  155. // 下面的像素位置.
  156. FxaaFloat2 posP;
  157. posP.x = posB.x + offNP.x * FXAA_QUALITY__P0;
  158. posP.y = posB.y + offNP.y * FXAA_QUALITY__P0;
  159. FxaaFloat subpixD = ((-2.0)*subpixC) + 3.0;
  160. FxaaFloat lumaEndN = FxaaLuma(FxaaTexTop(tex, posN));
  161. FxaaFloat subpixE = subpixC * subpixC;
  162. FxaaFloat lumaEndP = FxaaLuma(FxaaTexTop(tex, posP));
  163. /*--------------------------------------------------------------------------*/
  164. if(!pairN) lumaNN = lumaSS;
  165. // 梯度缩放.
  166. FxaaFloat gradientScaled = gradient * 1.0/4.0;
  167. FxaaFloat lumaMM = lumaM - lumaNN * 0.5;
  168. FxaaFloat subpixF = subpixD * subpixE;
  169. FxaaBool lumaMLTZero = lumaMM < 0.0;
  170. /*--------------------------------------------------------------------------*/
  171. // 第1次边缘末端查找.
  172. lumaEndN -= lumaNN * 0.5;
  173. lumaEndP -= lumaNN * 0.5;
  174. FxaaBool doneN = abs(lumaEndN) >= gradientScaled;
  175. FxaaBool doneP = abs(lumaEndP) >= gradientScaled;
  176. if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P1;
  177. if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P1;
  178. FxaaBool doneNP = (!doneN) || (!doneP);
  179. if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P1;
  180. if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P1;
  181. /*--------------------------------------------------------------------------*/
  182. // 第2次边缘末端查找.
  183. if(doneNP) {
  184. if(!doneN) lumaEndN = FxaaLuma(FxaaTexTop(tex, posN.xy));
  185. if(!doneP) lumaEndP = FxaaLuma(FxaaTexTop(tex, posP.xy));
  186. if(!doneN) lumaEndN = lumaEndN - lumaNN * 0.5;
  187. if(!doneP) lumaEndP = lumaEndP - lumaNN * 0.5;
  188. doneN = abs(lumaEndN) >= gradientScaled;
  189. doneP = abs(lumaEndP) >= gradientScaled;
  190. if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P2;
  191. if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P2;
  192. doneNP = (!doneN) || (!doneP);
  193. if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P2;
  194. if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P2;
  195. /*--------------------------------------------------------------------------*/
  196. // 第3次边缘末端查找.
  197. #if (FXAA_QUALITY__PS > 3)
  198. if(doneNP) {
  199. if(!doneN) lumaEndN = FxaaLuma(FxaaTexTop(tex, posN.xy));
  200. if(!doneP) lumaEndP = FxaaLuma(FxaaTexTop(tex, posP.xy));
  201. if(!doneN) lumaEndN = lumaEndN - lumaNN * 0.5;
  202. if(!doneP) lumaEndP = lumaEndP - lumaNN * 0.5;
  203. doneN = abs(lumaEndN) >= gradientScaled;
  204. doneP = abs(lumaEndP) >= gradientScaled;
  205. if(!doneN) posN.x -= offNP.x * FXAA_QUALITY__P3;
  206. if(!doneN) posN.y -= offNP.y * FXAA_QUALITY__P3;
  207. doneNP = (!doneN) || (!doneP);
  208. if(!doneP) posP.x += offNP.x * FXAA_QUALITY__P3;
  209. if(!doneP) posP.y += offNP.y * FXAA_QUALITY__P3;
  210. /*--------------------------------------------------------------------------*/
  211. #if (FXAA_QUALITY__PS > 4)
  212. (......) // 最多到12个以上的采样像素.
  213. #endif
  214. /*--------------------------------------------------------------------------*/
  215. }
  216. #endif
  217. /*--------------------------------------------------------------------------*/
  218. }
  219. /*--------------------------------------------------------------------------*/
  220. FxaaFloat dstN = posM.x - posN.x;
  221. FxaaFloat dstP = posP.x - posM.x;
  222. if(!horzSpan) dstN = posM.y - posN.y;
  223. if(!horzSpan) dstP = posP.y - posM.y;
  224. /*--------------------------------------------------------------------------*/
  225. FxaaBool goodSpanN = (lumaEndN < 0.0) != lumaMLTZero;
  226. FxaaFloat spanLength = (dstP + dstN);
  227. FxaaBool goodSpanP = (lumaEndP < 0.0) != lumaMLTZero;
  228. FxaaFloat spanLengthRcp = 1.0/spanLength;
  229. /*--------------------------------------------------------------------------*/
  230. FxaaBool directionN = dstN < dstP;
  231. FxaaFloat dstMin = min(dstN, dstP);
  232. FxaaBool goodSpan = directionN ? goodSpanN : goodSpanP;
  233. FxaaFloat subpixG = subpixF * subpixF;
  234. FxaaFloat pixelOffset = (dstMin * (-spanLengthRcp)) + 0.5;
  235. FxaaFloat subpixH = subpixG * fxaaQualitySubpix;
  236. /*--------------------------------------------------------------------------*/
  237. // 计算最终的采样位置并从输入纹理中采样.
  238. FxaaFloat pixelOffsetGood = goodSpan ? pixelOffset : 0.0;
  239. FxaaFloat pixelOffsetSubpix = max(pixelOffsetGood, subpixH);
  240. if(!horzSpan) posM.x += pixelOffsetSubpix * lengthSign;
  241. if( horzSpan) posM.y += pixelOffsetSubpix * lengthSign;
  242. // 注意FxaaTexTop使用了纹理的双线性采样, 所以才能呈现出混合过渡的效果.
  243. #if ((FXAA_DISCARD == 1) || (POST_PROCESS_ALPHA == 2))
  244. return FxaaTexTop(tex, posM);
  245. #else
  246. return FxaaFloat4(FxaaTexTop(tex, posM).xyz, lumaM);
  247. #endif
  248. }
  249. #endif

以上代码须知,最终计算出来的是当前像素偏移后的采样位置,并且使用了双线性采样,所以才能达到混合过渡的抗锯齿效果。

FXAA在延迟渲染的表现不甚理想,综合抗锯齿效果上不如TAA或SMAA好。不过SMAA需要额外集成或实现,SMAA在UE的实现在GitHub上可以找到:https://github.com/inequation/UnrealEngine/tree/SMAA-4.12。

FXAA、TAA和SMAA的效果对比图。

TAA全称是Temporal Anti-Aliasing,通常被翻译成时间抗锯齿(或临时抗锯齿)。它是Epic Game的Brian Karis(很熟悉的名字吧,本系列文章数次提及他)实现并提出的UE当家抗锯齿技术,并且在SIGGRAPH2014发表了演讲High Quality Temporal Supersampling。2016年,NVIDIA的研究员Marco Salvi发表了改进篇An Excursion in Temporal Superampling

TAA的核心思想是将MSAA在同一帧的空间采样分摊到时间轴上的多帧采样,然后按照某种权重混合它们:

由于要在多帧中生成采样偏移量(被称为Jitter),可以通过修改投影矩阵达成:

当然,Jitter也可以通过特殊的Pattern或Halton等低差异序列算法生成:

对于移动平均数(Moving Average),可以使用简单的多帧历史数据和当前帧数据混合,但这种采样不足,受限于历史帧的数量,太大会导致带宽暴增。可以用指数型移动平均数,可以模拟几乎无穷大的样本与固定存储:

当当前帧的混合权重\(\alpha\)(UE默认是0.04)足够小时,可以近似等于Simple的方法(这样就可以只保留一帧历史数据):

但是,以上方法只是适用于静态场景,对于动态场景,需要配合某些方法(如速度缓冲、深度、材质索引、颜色差别、法线变化等)丢弃无效的采样样本(如突然出现或消失在屏幕的像素)。

UE使用的是结合了速度缓冲的重投影(Reprojection)和邻居截取(Neighborhood Clamping)的技术。其中速度缓冲和运动模糊的一致,需要根据速度缓冲重投影或删除Jitter:

这就对速度缓冲的精度要求非常高,需要记录所有像素的运动向量,需要R16G16精度的速度缓冲,对于程序化的动画、滚动的纹理以及半透明物体可能产生瑕疵:

UE为了防止Ghost鬼影,采用了邻居截取(Neighborhood Clamping),先对当前的像素以及周围数个像素的颜色建立一个颜色空间Bounding Box,然后在使用历史的颜色样本之前,先将这个历史的颜色Clamp在这个包围盒的区域内。直观的理解就是,当历史样本的颜色和当前的颜色差别特别大的时候,用这个Clamping尽量拉回到和当前帧这个像素周围差不多的颜色。

需要注意的是,Clamping使用的颜色空间是YCoCg,可以将min和max的基数视作RGB空间的AABB,且可以将Box的朝向定位到亮度方向上:

关于YCoCg颜色空间

也被称为YCgCo,其中Y是亮度值,Cg是绿色量,Co是橙色量。可以和sRGB空间互转,公式如下:

\[\begin{bmatrix} Y \\ Co \\ Cg \end{bmatrix}
=
\begin{bmatrix} \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\
\frac{1}{2} & 0 & -\frac{1}{2} \\
-\frac{1}{4} & \frac{1}{2} & -\frac{1}{4}\end{bmatrix}
\cdot
\begin{bmatrix} R \\ G \\ B \end{bmatrix}
\]

\[\begin{bmatrix} R \\ G \\ B \end{bmatrix}
=
\begin{bmatrix} 1 & 1 & -1 \\
1 & 0 & 1 \\
1 & -1 & -1\end{bmatrix}
\cdot
\begin{bmatrix} Y \\ Co \\ Cg \end{bmatrix}
\]

下图是正常图片被分解成YCoCg空间的图例(从上到下依次是正常、亮度Y、绿色Cg、橙色Co):

有了YCoCg的AABB的Box,便可以裁剪历史数据到这个Box的边缘:

NVIDIA版本提出一种改进的方式,叫方差裁剪(Variance Clipping)。先计算最早的两个颜色的矩(Moment),用来建立一个改进的AABB;利用两个矩可以计算出平均值和标准方差,进一步地可以计算出新的基数minc和maxc,用它们来代替旧的AABB:

根据新的AABB计算出一个Gaussian模型,然后用它可以生成更加紧凑的包围盒:

理论部分已经说完了,接下来分析UE的具体实现代码。首先分析TAA的Jitter生成,在InitViews阶段的PreVisibilityFrameSetup

  1. // Engine\Source\Runtime\Renderer\Private\SceneVisibility.cpp
  2. void FSceneRenderer::PreVisibilityFrameSetup(FRHICommandListImmediate& RHICmdList)
  3. {
  4. RHICmdList.BeginScene();
  5. (......)
  6. for(int32 ViewIndex = 0;ViewIndex < Views.Num();ViewIndex++)
  7. {
  8. FViewInfo& View = Views[ViewIndex];
  9. FSceneViewState* ViewState = View.ViewState;
  10. (......)
  11. // TAA子像素采样数量.
  12. int32 CVarTemporalAASamplesValue = CVarTemporalAASamples.GetValueOnRenderThread();
  13. bool bTemporalUpsampling = View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::TemporalUpscale;
  14. // 计算视图的TAA子像素偏移量(Jitter).
  15. if (View.AntiAliasingMethod == AAM_TemporalAA && ViewState && (CVarTemporalAASamplesValue > 0 || bTemporalUpsampling) && View.bAllowTemporalJitter)
  16. {
  17. float EffectivePrimaryResolutionFraction = float(View.ViewRect.Width()) / float(View.GetSecondaryViewRectSize().X);
  18. // 计算TAA采样数量.
  19. int32 TemporalAASamples = CVarTemporalAASamplesValue;
  20. {
  21. if (Scene->GetFeatureLevel() < ERHIFeatureLevel::SM5)
  22. {
  23. // 移动端只能使用2采样数量.
  24. TemporalAASamples = 2;
  25. }
  26. else if (bTemporalUpsampling)
  27. {
  28. // 屏幕百分比<100%的TAA上采样需要额外的时间采样数量, 为最终输出纹理获得稳定的时间采样密度, 避免输出像素对齐的收敛问题.
  29. TemporalAASamples = float(TemporalAASamples) * FMath::Max(1.f, 1.f / (EffectivePrimaryResolutionFraction * EffectivePrimaryResolutionFraction));
  30. }
  31. else if (CVarTemporalAASamplesValue == 5)
  32. {
  33. TemporalAASamples = 4;
  34. }
  35. TemporalAASamples = FMath::Clamp(TemporalAASamples, 1, 255);
  36. }
  37. // 计算在时间序列的采样点的索引.
  38. int32 TemporalSampleIndex = ViewState->TemporalAASampleIndex + 1;
  39. if(TemporalSampleIndex >= TemporalAASamples || View.bCameraCut)
  40. {
  41. TemporalSampleIndex = 0;
  42. }
  43. // 更新view state.
  44. if (!View.bStatePrevViewInfoIsReadOnly && !bFreezeTemporalSequences)
  45. {
  46. ViewState->TemporalAASampleIndex = TemporalSampleIndex;
  47. ViewState->TemporalAASampleIndexUnclamped = ViewState->TemporalAASampleIndexUnclamped+1;
  48. }
  49. // 在时间序列上选择一个子像素采样坐标.
  50. float SampleX, SampleY;
  51. if (Scene->GetFeatureLevel() < ERHIFeatureLevel::SM5)
  52. {
  53. float SamplesX[] = { -8.0f/16.0f, 0.0/16.0f };
  54. float SamplesY[] = { /* - */ 0.0f/16.0f, 8.0/16.0f };
  55. check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
  56. SampleX = SamplesX[ TemporalSampleIndex ];
  57. SampleY = SamplesY[ TemporalSampleIndex ];
  58. }
  59. else if (View.PrimaryScreenPercentageMethod == EPrimaryScreenPercentageMethod::TemporalUpscale)
  60. {
  61. // 均匀分布时域Jitter在[-0.5, 0.5],因为不再有任何输入和输出像素对齐. 注意此处用的Halton序列.
  62. SampleX = Halton(TemporalSampleIndex + 1, 2) - 0.5f;
  63. SampleY = Halton(TemporalSampleIndex + 1, 3) - 0.5f;
  64. View.MaterialTextureMipBias = -(FMath::Max(-FMath::Log2(EffectivePrimaryResolutionFraction), 0.0f) ) + CVarMinAutomaticViewMipBiasOffset.GetValueOnRenderThread();
  65. View.MaterialTextureMipBias = FMath::Max(View.MaterialTextureMipBias, CVarMinAutomaticViewMipBias.GetValueOnRenderThread());
  66. }
  67. else if( CVarTemporalAASamplesValue == 2 )
  68. {
  69. // 2xMSAA
  70. // Pattern docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx
  71. // N.
  72. // .S
  73. float SamplesX[] = { -4.0f/16.0f, 4.0/16.0f };
  74. float SamplesY[] = { -4.0f/16.0f, 4.0/16.0f };
  75. check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
  76. SampleX = SamplesX[ TemporalSampleIndex ];
  77. SampleY = SamplesY[ TemporalSampleIndex ];
  78. }
  79. else if( CVarTemporalAASamplesValue == 3 )
  80. {
  81. // 3xMSAA
  82. // A..
  83. // ..B
  84. // .C.
  85. // Rolling circle pattern (A,B,C).
  86. float SamplesX[] = { -2.0f/3.0f, 2.0/3.0f, 0.0/3.0f };
  87. float SamplesY[] = { -2.0f/3.0f, 0.0/3.0f, 2.0/3.0f };
  88. check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
  89. SampleX = SamplesX[ TemporalSampleIndex ];
  90. SampleY = SamplesY[ TemporalSampleIndex ];
  91. }
  92. else if( CVarTemporalAASamplesValue == 4 )
  93. {
  94. // 4xMSAA
  95. // Pattern docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx
  96. // .N..
  97. // ...E
  98. // W...
  99. // ..S.
  100. // Rolling circle pattern (N,E,S,W).
  101. float SamplesX[] = { -2.0f/16.0f, 6.0/16.0f, 2.0/16.0f, -6.0/16.0f };
  102. float SamplesY[] = { -6.0f/16.0f, -2.0/16.0f, 6.0/16.0f, 2.0/16.0f };
  103. check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
  104. SampleX = SamplesX[ TemporalSampleIndex ];
  105. SampleY = SamplesY[ TemporalSampleIndex ];
  106. }
  107. else if( CVarTemporalAASamplesValue == 5 )
  108. {
  109. // Compressed 4 sample pattern on same vertical and horizontal line (less temporal flicker).
  110. // Compressed 1/2 works better than correct 2/3 (reduced temporal flicker).
  111. // . N .
  112. // W . E
  113. // . S .
  114. // Rolling circle pattern (N,E,S,W).
  115. float SamplesX[] = { 0.0f/2.0f, 1.0/2.0f, 0.0/2.0f, -1.0/2.0f };
  116. float SamplesY[] = { -1.0f/2.0f, 0.0/2.0f, 1.0/2.0f, 0.0/2.0f };
  117. check(TemporalAASamples == UE_ARRAY_COUNT(SamplesX));
  118. SampleX = SamplesX[ TemporalSampleIndex ];
  119. SampleY = SamplesY[ TemporalSampleIndex ];
  120. }
  121. else // 大于5采样数, 则使用Halton序列.
  122. {
  123. float u1 = Halton( TemporalSampleIndex + 1, 2 );
  124. float u2 = Halton( TemporalSampleIndex + 1, 3 );
  125. // 生成正态分布的样本.
  126. // exp( x^2 / Sigma^2 )
  127. static auto CVar = IConsoleManager::Get().FindConsoleVariable(TEXT("r.TemporalAAFilterSize"));
  128. float FilterSize = CVar->GetFloat();
  129. // 缩放分布以设置非单位方差.
  130. // Variance = Sigma^2
  131. float Sigma = 0.47f * FilterSize;
  132. float OutWindow = 0.5f;
  133. float InWindow = FMath::Exp( -0.5 * FMath::Square( OutWindow / Sigma ) );
  134. // Box-Muller变换
  135. float Theta = 2.0f * PI * u2;
  136. float r = Sigma * FMath::Sqrt( -2.0f * FMath::Loge( (1.0f - u1) * InWindow + u1 ) );
  137. SampleX = r * FMath::Cos( Theta );
  138. SampleY = r * FMath::Sin( Theta );
  139. }
  140. // 保存采样数据到View.
  141. View.TemporalJitterSequenceLength = TemporalAASamples;
  142. View.TemporalJitterIndex = TemporalSampleIndex;
  143. View.TemporalJitterPixels.X = SampleX;
  144. View.TemporalJitterPixels.Y = SampleY;
  145. View.ViewMatrices.HackAddTemporalAAProjectionJitter(FVector2D(SampleX * 2.0f / View.ViewRect.Width(), SampleY * -2.0f / View.ViewRect.Height()));
  146. }
  147. (......)
  148. }

由于UE在PC平台默认的采样数量是8,所以默认使用Halton序列来采样子像素。其中Halton的生成序列如下图所示:

相比随机采样,Halton获得的采样序列更加均匀,且可以获得没有上限的样本数(UE默认限制在8以内)。除此之外,还有Sobel、Niederreiter、Kronecker等低差异序列算法,它们的比较如下图:

对于UE的默认设置,利用Halton生成的前8个序列计算出来的SampleX和SampleY分别是:

  1. 0: (-0.163972363, 0.284008324)
  2. 1: (-0.208000556, -0.360267729)
  3. 2: (0.172162965, 0.144461900)
  4. 3: (-0.430473328, 0.156679258)
  5. 4: (0.0485312343, -0.275233328)
  6. 5: (0.0647613853, 0.367280841)
  7. 6: (-0.147184864, -0.0535709597)
  8. 7: (0.366960347, -0.307915747)

由于有正数有负数,且处于[-0.5, 0.5]之间,说明是基于像素中心(0.5, 0.5)的子像素偏移量。图形化后的坐标点如下:

解析完了Jitter,继续分析TAA是如何在C++组织绘制逻辑的(UE4.26支持第4代和第5代TAA,此处以第4代为分析对象):

  1. // Engine\Source\Runtime\Renderer\Private\PostProcess\TemporalAA.cpp
  2. FTAAOutputs AddTemporalAAPass(
  3. FRDGBuilder& GraphBuilder,
  4. const FViewInfo& View,
  5. const FTAAPassParameters& Inputs,
  6. const FTemporalAAHistory& InputHistory,
  7. FTemporalAAHistory* OutputHistory)
  8. {
  9. // 记录标记等.
  10. const bool bSupportsAlpha = IsPostProcessingWithAlphaChannelSupported();
  11. const int32 IntputTextureCount = (IsDOFTAAConfig(Inputs.Pass) && bSupportsAlpha) ? 2 : 1;
  12. const bool bIsMainPass = IsMainTAAConfig(Inputs.Pass);
  13. const bool bCameraCut = !InputHistory.IsValid() || View.bCameraCut;
  14. const FIntPoint OutputExtent = Inputs.GetOutputExtent();
  15. // 记录输入区域.
  16. const FIntRect SrcRect = Inputs.InputViewRect;
  17. const FIntRect DestRect = Inputs.OutputViewRect;
  18. const FIntRect PracticableSrcRect = FIntRect::DivideAndRoundUp(SrcRect, Inputs.ResolutionDivisor);
  19. const FIntRect PracticableDestRect = FIntRect::DivideAndRoundUp(DestRect, Inputs.ResolutionDivisor);
  20. const uint32 PassIndex = static_cast<uint32>(Inputs.Pass);
  21. const TCHAR* PassName = kTAAPassNames[PassIndex];
  22. // 输出纹理.
  23. FTAAOutputs Outputs;
  24. // 当前帧的历史纹理.
  25. TStaticArray<FRDGTextureRef, FTemporalAAHistory::kRenderTargetCount> NewHistoryTexture;
  26. // 创建输出和历史帧纹理.
  27. {
  28. EPixelFormat HistoryPixelFormat = PF_FloatRGBA;
  29. if (bIsMainPass && Inputs.bUseFast && !bSupportsAlpha && CVarTAAR11G11B10History.GetValueOnRenderThread())
  30. {
  31. HistoryPixelFormat = PF_FloatR11G11B10;
  32. }
  33. FRDGTextureDesc SceneColorDesc = FRDGTextureDesc::Create2D(
  34. OutputExtent,
  35. HistoryPixelFormat,
  36. FClearValueBinding::Black,
  37. TexCreate_ShaderResource | TexCreate_UAV);
  38. if (Inputs.bOutputRenderTargetable)
  39. {
  40. SceneColorDesc.Flags |= TexCreate_RenderTargetable;
  41. }
  42. const TCHAR* OutputName = kTAAOutputNames[PassIndex];
  43. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  44. {
  45. NewHistoryTexture[i] = GraphBuilder.CreateTexture(
  46. SceneColorDesc,
  47. OutputName,
  48. ERDGTextureFlags::MultiFrame);
  49. }
  50. NewHistoryTexture[0] = Outputs.SceneColor = NewHistoryTexture[0];
  51. if (IntputTextureCount == 2)
  52. {
  53. Outputs.SceneMetadata = NewHistoryTexture[1];
  54. }
  55. if (Inputs.bDownsample)
  56. {
  57. const FRDGTextureDesc HalfResSceneColorDesc = FRDGTextureDesc::Create2D(
  58. SceneColorDesc.Extent / 2,
  59. Inputs.DownsampleOverrideFormat != PF_Unknown ? Inputs.DownsampleOverrideFormat : Inputs.SceneColorInput->Desc.Format,
  60. FClearValueBinding::Black,
  61. TexCreate_ShaderResource | TexCreate_UAV | GFastVRamConfig.Downsample);
  62. Outputs.DownsampledSceneColor = GraphBuilder.CreateTexture(HalfResSceneColorDesc, TEXT("SceneColorHalfRes"));
  63. }
  64. }
  65. RDG_GPU_STAT_SCOPE(GraphBuilder, TAA);
  66. TStaticArray<bool, FTemporalAAHistory::kRenderTargetCount> bUseHistoryTexture;
  67. // 处理FTAAStandaloneCS参数.
  68. {
  69. FTAAStandaloneCS::FPermutationDomain PermutationVector;
  70. PermutationVector.Set<FTAAStandaloneCS::FTAAPassConfigDim>(Inputs.Pass);
  71. PermutationVector.Set<FTAAStandaloneCS::FTAAFastDim>(Inputs.bUseFast);
  72. PermutationVector.Set<FTAAStandaloneCS::FTAADownsampleDim>(Inputs.bDownsample);
  73. PermutationVector.Set<FTAAStandaloneCS::FTAAUpsampleFilteredDim>(true);
  74. if (IsTAAUpsamplingConfig(Inputs.Pass))
  75. {
  76. const bool bUpsampleFiltered = CVarTemporalAAUpsampleFiltered.GetValueOnRenderThread() != 0 || Inputs.Pass != ETAAPassConfig::MainUpsampling;
  77. PermutationVector.Set<FTAAStandaloneCS::FTAAUpsampleFilteredDim>(bUpsampleFiltered);
  78. // 根据屏幕百分比设置排列.
  79. if (SrcRect.Width() > DestRect.Width() ||
  80. SrcRect.Height() > DestRect.Height())
  81. {
  82. PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(2);
  83. }
  84. else if (SrcRect.Width() * 100 < 50 * DestRect.Width() &&
  85. SrcRect.Height() * 100 < 50 * DestRect.Height() &&
  86. Inputs.Pass == ETAAPassConfig::MainSuperSampling)
  87. {
  88. PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(3);
  89. }
  90. else if (SrcRect.Width() * 100 < 71 * DestRect.Width() &&
  91. SrcRect.Height() * 100 < 71 * DestRect.Height())
  92. {
  93. PermutationVector.Set<FTAAStandaloneCS::FTAAScreenPercentageDim>(1);
  94. }
  95. }
  96. FTAAStandaloneCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FTAAStandaloneCS::FParameters>();
  97. // 设置通用的着色器参数.
  98. const FIntPoint InputExtent = Inputs.SceneColorInput->Desc.Extent;
  99. const FIntRect InputViewRect = Inputs.InputViewRect;
  100. const FIntRect OutputViewRect = Inputs.OutputViewRect;
  101. if (!IsTAAUpsamplingConfig(Inputs.Pass))
  102. {
  103. SetupSampleWeightParameters(PassParameters, Inputs, View.TemporalJitterPixels);
  104. }
  105. const float ResDivisor = Inputs.ResolutionDivisor;
  106. const float ResDivisorInv = 1.0f / ResDivisor;
  107. PassParameters->ViewUniformBuffer = View.ViewUniformBuffer;
  108. PassParameters->CurrentFrameWeight = CVarTemporalAACurrentFrameWeight.GetValueOnRenderThread();
  109. PassParameters->bCameraCut = bCameraCut;
  110. PassParameters->SceneDepthTexture = Inputs.SceneDepthTexture;
  111. PassParameters->GBufferVelocityTexture = Inputs.SceneVelocityTexture;
  112. PassParameters->SceneDepthTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  113. PassParameters->GBufferVelocityTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  114. PassParameters->StencilTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateWithPixelFormat(Inputs.SceneDepthTexture, PF_X24_G8));
  115. // 速度缓冲.
  116. if (!PassParameters->GBufferVelocityTexture)
  117. {
  118. PassParameters->GBufferVelocityTexture = GraphBuilder.RegisterExternalTexture(GSystemTextures.BlackDummy);;
  119. }
  120. // 输入缓冲着色器参数.
  121. {
  122. PassParameters->InputSceneColorSize = FVector4(
  123. InputExtent.X,
  124. InputExtent.Y,
  125. 1.0f / float(InputExtent.X),
  126. 1.0f / float(InputExtent.Y));
  127. PassParameters->InputMinPixelCoord = PracticableSrcRect.Min;
  128. PassParameters->InputMaxPixelCoord = PracticableSrcRect.Max - FIntPoint(1, 1);
  129. PassParameters->InputSceneColor = Inputs.SceneColorInput;
  130. PassParameters->InputSceneColorSampler = TStaticSamplerState<SF_Point>::GetRHI();
  131. PassParameters->InputSceneMetadata = Inputs.SceneMetadataInput;
  132. PassParameters->InputSceneMetadataSampler = TStaticSamplerState<SF_Point>::GetRHI();
  133. }
  134. PassParameters->OutputViewportSize = FVector4(
  135. PracticableDestRect.Width(), PracticableDestRect.Height(), 1.0f / float(PracticableDestRect.Width()), 1.0f / float(PracticableDestRect.Height()));
  136. PassParameters->OutputViewportRect = FVector4(PracticableDestRect.Min.X, PracticableDestRect.Min.Y, PracticableDestRect.Max.X, PracticableDestRect.Max.Y);
  137. PassParameters->OutputQuantizationError = ComputePixelFormatQuantizationError(NewHistoryTexture[0]->Desc.Format);
  138. // 设置历史着色器参数.
  139. {
  140. FRDGTextureRef BlackDummy = GraphBuilder.RegisterExternalTexture(GSystemTextures.BlackDummy);
  141. if (bCameraCut)
  142. {
  143. PassParameters->ScreenPosToHistoryBufferUV = FVector4(1.0f, 1.0f, 1.0f, 1.0f);
  144. PassParameters->ScreenPosAbsMax = FVector2D(0.0f, 0.0f);
  145. PassParameters->HistoryBufferUVMinMax = FVector4(0.0f, 0.0f, 0.0f, 0.0f);
  146. PassParameters->HistoryBufferSize = FVector4(1.0f, 1.0f, 1.0f, 1.0f);
  147. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  148. {
  149. PassParameters->HistoryBuffer[i] = BlackDummy;
  150. }
  151. // Remove dependency of the velocity buffer on camera cut, given it\'s going to be ignored by the shader.
  152. PassParameters->GBufferVelocityTexture = BlackDummy;
  153. }
  154. else
  155. {
  156. FIntPoint ReferenceViewportOffset = InputHistory.ViewportRect.Min;
  157. FIntPoint ReferenceViewportExtent = InputHistory.ViewportRect.Size();
  158. FIntPoint ReferenceBufferSize = InputHistory.ReferenceBufferSize;
  159. float InvReferenceBufferSizeX = 1.f / float(InputHistory.ReferenceBufferSize.X);
  160. float InvReferenceBufferSizeY = 1.f / float(InputHistory.ReferenceBufferSize.Y);
  161. PassParameters->ScreenPosToHistoryBufferUV = FVector4(
  162. ReferenceViewportExtent.X * 0.5f * InvReferenceBufferSizeX,
  163. -ReferenceViewportExtent.Y * 0.5f * InvReferenceBufferSizeY,
  164. (ReferenceViewportExtent.X * 0.5f + ReferenceViewportOffset.X) * InvReferenceBufferSizeX,
  165. (ReferenceViewportExtent.Y * 0.5f + ReferenceViewportOffset.Y) * InvReferenceBufferSizeY);
  166. FIntPoint ViewportOffset = ReferenceViewportOffset / Inputs.ResolutionDivisor;
  167. FIntPoint ViewportExtent = FIntPoint::DivideAndRoundUp(ReferenceViewportExtent, Inputs.ResolutionDivisor);
  168. FIntPoint BufferSize = ReferenceBufferSize / Inputs.ResolutionDivisor;
  169. PassParameters->ScreenPosAbsMax = FVector2D(1.0f - 1.0f / float(ViewportExtent.X), 1.0f - 1.0f / float(ViewportExtent.Y));
  170. float InvBufferSizeX = 1.f / float(BufferSize.X);
  171. float InvBufferSizeY = 1.f / float(BufferSize.Y);
  172. PassParameters->HistoryBufferUVMinMax = FVector4(
  173. (ViewportOffset.X + 0.5f) * InvBufferSizeX,
  174. (ViewportOffset.Y + 0.5f) * InvBufferSizeY,
  175. (ViewportOffset.X + ViewportExtent.X - 0.5f) * InvBufferSizeX,
  176. (ViewportOffset.Y + ViewportExtent.Y - 0.5f) * InvBufferSizeY);
  177. PassParameters->HistoryBufferSize = FVector4(BufferSize.X, BufferSize.Y, InvBufferSizeX, InvBufferSizeY);
  178. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  179. {
  180. if (InputHistory.RT[i].IsValid())
  181. {
  182. PassParameters->HistoryBuffer[i] = GraphBuilder.RegisterExternalTexture(InputHistory.RT[i]);
  183. }
  184. else
  185. {
  186. PassParameters->HistoryBuffer[i] = BlackDummy;
  187. }
  188. }
  189. }
  190. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  191. {
  192. PassParameters->HistoryBufferSampler[i] = TStaticSamplerState<SF_Bilinear>::GetRHI();
  193. }
  194. }
  195. PassParameters->MaxViewportUVAndSvPositionToViewportUV = FVector4(
  196. (PracticableDestRect.Width() - 0.5f * ResDivisor) / float(PracticableDestRect.Width()),
  197. (PracticableDestRect.Height() - 0.5f * ResDivisor) / float(PracticableDestRect.Height()),
  198. ResDivisor / float(DestRect.Width()),
  199. ResDivisor / float(DestRect.Height()));
  200. PassParameters->HistoryPreExposureCorrection = View.PreExposure / View.PrevViewInfo.SceneColorPreExposure;
  201. {
  202. float InvSizeX = 1.0f / float(InputExtent.X);
  203. float InvSizeY = 1.0f / float(InputExtent.Y);
  204. PassParameters->ViewportUVToInputBufferUV = FVector4(
  205. ResDivisorInv * InputViewRect.Width() * InvSizeX,
  206. ResDivisorInv * InputViewRect.Height() * InvSizeY,
  207. ResDivisorInv * InputViewRect.Min.X * InvSizeX,
  208. ResDivisorInv * InputViewRect.Min.Y * InvSizeY);
  209. }
  210. PassParameters->EyeAdaptationTexture = GetEyeAdaptationTexture(GraphBuilder, View);
  211. // 时间上采样特定的参数.
  212. {
  213. float InputViewSizeInvScale = Inputs.ResolutionDivisor;
  214. float InputViewSizeScale = 1.0f / InputViewSizeInvScale;
  215. PassParameters->TemporalJitterPixels = InputViewSizeScale * View.TemporalJitterPixels;
  216. PassParameters->ScreenPercentage = float(InputViewRect.Width()) / float(OutputViewRect.Width());
  217. PassParameters->UpscaleFactor = float(OutputViewRect.Width()) / float(InputViewRect.Width());
  218. PassParameters->InputViewMin = InputViewSizeScale * FVector2D(InputViewRect.Min.X, InputViewRect.Min.Y);
  219. PassParameters->InputViewSize = FVector4(
  220. InputViewSizeScale * InputViewRect.Width(), InputViewSizeScale * InputViewRect.Height(),
  221. InputViewSizeInvScale / InputViewRect.Width(), InputViewSizeInvScale / InputViewRect.Height());
  222. }
  223. // UAVs
  224. {
  225. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  226. {
  227. PassParameters->OutComputeTex[i] = GraphBuilder.CreateUAV(NewHistoryTexture[i]);
  228. }
  229. if (Outputs.DownsampledSceneColor)
  230. {
  231. PassParameters->OutComputeTexDownsampled = GraphBuilder.CreateUAV(Outputs.DownsampledSceneColor);
  232. }
  233. }
  234. // Debug UAVs
  235. {
  236. FRDGTextureDesc DebugDesc = FRDGTextureDesc::Create2D(
  237. OutputExtent,
  238. PF_FloatRGBA,
  239. FClearValueBinding::None,
  240. /* InFlags = */ TexCreate_ShaderResource | TexCreate_UAV);
  241. FRDGTextureRef DebugTexture = GraphBuilder.CreateTexture(DebugDesc, TEXT("Debug.TAA"));
  242. PassParameters->DebugOutput = GraphBuilder.CreateUAV(DebugTexture);
  243. }
  244. TShaderMapRef<FTAAStandaloneCS> ComputeShader(View.ShaderMap, PermutationVector);
  245. ClearUnusedGraphResources(ComputeShader, PassParameters);
  246. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  247. {
  248. bUseHistoryTexture[i] = PassParameters->HistoryBuffer[i] != nullptr;
  249. }
  250. // 增加TAA处理CS通道.
  251. FComputeShaderUtils::AddPass(
  252. GraphBuilder,
  253. RDG_EVENT_NAME("TAA %s%s %dx%d -> %dx%d",
  254. PassName, Inputs.bUseFast ? TEXT(" Fast") : TEXT(""),
  255. PracticableSrcRect.Width(), PracticableSrcRect.Height(),
  256. PracticableDestRect.Width(), PracticableDestRect.Height()),
  257. ComputeShader,
  258. PassParameters,
  259. FComputeShaderUtils::GetGroupCount(PracticableDestRect.Size(), GTemporalAATileSizeX));
  260. }
  261. // 处理历史输出数据.
  262. if (!View.bStatePrevViewInfoIsReadOnly)
  263. {
  264. OutputHistory->SafeRelease();
  265. for (int32 i = 0; i < FTemporalAAHistory::kRenderTargetCount; i++)
  266. {
  267. if (bUseHistoryTexture[i])
  268. {
  269. GraphBuilder.QueueTextureExtraction(NewHistoryTexture[i], &OutputHistory->RT[i]);
  270. }
  271. }
  272. OutputHistory->ViewportRect = DestRect;
  273. OutputHistory->ReferenceBufferSize = OutputExtent * Inputs.ResolutionDivisor;
  274. }
  275. return Outputs;
  276. } // AddTemporalAAPass()

下面分析FTAAStandaloneCS使用的CS shader:

  1. // Engine\Shaders\Private\TemporalAA\TAAStandalone.usf
  2. [numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
  3. void MainCS(
  4. uint2 DispatchThreadId : SV_DispatchThreadID,
  5. uint2 GroupId : SV_GroupID,
  6. uint2 GroupThreadId : SV_GroupThreadID,
  7. uint GroupThreadIndex : SV_GroupIndex)
  8. {
  9. // 获取视口UV.
  10. float2 ViewportUV = (float2(DispatchThreadId) + 0.5f) * OutputViewportSize.zw;
  11. #if AA_LOWER_RESOLUTION
  12. {
  13. ViewportUV = (float2(DispatchThreadId) + 0.5f) * MaxViewportUVAndSvPositionToViewportUV.zw;
  14. ViewportUV = min(ViewportUV, MaxViewportUVAndSvPositionToViewportUV.xy);
  15. }
  16. #endif
  17. // 曝光缩放.
  18. float FrameExposureScale = EyeAdaptationLookup();
  19. FTAAHistoryPayload OutputPayload = TemporalAASample(GroupId, GroupThreadId, GroupThreadIndex, ViewportUV, FrameExposureScale);
  20. float4 OutColor0 = 0;
  21. float4 OutColor1 = 0;
  22. // 处理输出数据.
  23. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC
  24. {
  25. OutColor0.rgb = OutputPayload.Color.rgb;
  26. OutColor0.a = OutputPayload.CocRadius;
  27. }
  28. #elif AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
  29. {
  30. OutColor0 = OutputPayload.Color;
  31. OutColor1.r = OutputPayload.CocRadius;
  32. }
  33. #else
  34. {
  35. OutColor0 = OutputPayload.Color;
  36. }
  37. #endif
  38. uint2 PixelPos = DispatchThreadId + OutputViewportRect.xy;
  39. if (all(PixelPos < OutputViewportRect.zw))
  40. {
  41. float4 FinalOutput0 = min(MaxHalfFloat.xxxx, OutColor0);
  42. // 随机量化采样.
  43. #if AA_ENABLE_STOCASTIC_QUANTIZATION
  44. {
  45. uint2 Random = Rand3DPCG16(int3(PixelPos, View.StateFrameIndexMod8)).xy;
  46. float2 E = Hammersley16(0, 1, Random);
  47. FinalOutput0.rgb += FinalOutput0.rgb * (E.x * OutputQuantizationError);
  48. }
  49. #endif
  50. // 存储最终的颜色输出.
  51. OutComputeTex_0[PixelPos] = FinalOutput0;
  52. #if HISTORY_RENDER_TARGETS == 2
  53. OutComputeTex_1[PixelPos] = OutColor1;
  54. #endif
  55. }
  56. // 下采样.
  57. #if TAA_DOWNSAMPLE
  58. {
  59. uint P0 = GroupThreadId.x + GroupThreadId.y * THREADGROUP_SIZEX;
  60. uint P1 = P0 + 1;
  61. uint P2 = P0 + THREADGROUP_SIZEX;
  62. uint P3 = P2 + 1;
  63. GroupSharedDownsampleArray[P0] = OutColor0;
  64. GroupMemoryBarrierWithGroupSync();
  65. if (((GroupThreadId.x | GroupThreadId.y) & 1) == 0)
  66. {
  67. OutComputeTexDownsampled[PixelPos / 2] =
  68. (OutColor0 + GroupSharedDownsampleArray[P1] + GroupSharedDownsampleArray[P2] + GroupSharedDownsampleArray[P3]) * 0.25;
  69. }
  70. }
  71. #endif //TAA_DOWNSAMPLE
  72. }

TAA的主要逻辑在TemporalAASample

  1. FTAAHistoryPayload TemporalAASample(uint2 GroupId, uint2 GroupThreadId, uint GroupThreadIndex, float2 ViewportUV, float FrameExposureScale)
  2. {
  3. // 设置TAA输入参数.
  4. FTAAInputParameters InputParams;
  5. // 预曝光.
  6. #if USE_PREEXPOSURE
  7. InputParams.FrameExposureScale = ToScalarMemory(FrameExposureScale * View.OneOverPreExposure);
  8. #else
  9. InputParams.FrameExposureScale = ToScalarMemory(FrameExposureScale);
  10. #endif
  11. // 逐像素设置.
  12. {
  13. InputParams.GroupId = GroupId;
  14. InputParams.GroupThreadId = GroupThreadId;
  15. InputParams.GroupThreadIndex = GroupThreadIndex;
  16. InputParams.ViewportUV = ViewportUV;
  17. InputParams.ScreenPos = ViewportUVToScreenPos(ViewportUV);
  18. InputParams.NearestBufferUV = ViewportUV * ViewportUVToInputBufferUV.xy + ViewportUVToInputBufferUV.zw;
  19. // 处理单个或多通道的响应AA(responsive AA).
  20. #if AA_SINGLE_PASS_RESPONSIVE
  21. {
  22. const uint kResponsiveStencilMask = 1 << 3;
  23. int2 SceneStencilUV = (int2)trunc(InputParams.NearestBufferUV * InputSceneColorSize.xy);
  24. uint SceneStencilRef = StencilTexture.Load(int3(SceneStencilUV, 0)) STENCIL_COMPONENT_SWIZZLE;
  25. InputParams.bIsResponsiveAAPixel = (SceneStencilRef & kResponsiveStencilMask) ? 1.f : 0.f;
  26. }
  27. #elif TAA_RESPONSIVE
  28. InputParams.bIsResponsiveAAPixel = 1.f;
  29. #else
  30. InputParams.bIsResponsiveAAPixel = 0.f;
  31. #endif
  32. // 处理上采样.
  33. #if AA_UPSAMPLE
  34. {
  35. // 像素原点坐标.
  36. float2 PPCo = ViewportUV * InputViewSize.xy + TemporalJitterPixels;
  37. // 像素中心坐标.
  38. float2 PPCk = floor(PPCo) + 0.5;
  39. // 像素左上角的中心坐标.
  40. float2 PPCt = floor(PPCo - 0.5) + 0.5;
  41. InputParams.NearestBufferUV = InputSceneColorSize.zw * (InputViewMin + PPCk);
  42. InputParams.NearestTopLeftBufferUV = InputSceneColorSize.zw * (InputViewMin + PPCt);
  43. }
  44. #endif
  45. }
  46. // 设置中间结果.
  47. FTAAIntermediaryResult IntermediaryResult = CreateIntermediaryResult();
  48. // 查找像素和最近相邻像素的运动向量.
  49. // ------------------------------------------------
  50. float3 PosN; // 本像素的位置, 但随后可能是最近的相邻像素.
  51. PosN.xy = InputParams.ScreenPos;
  52. PrecacheInputSceneDepth(InputParams);
  53. PosN.z = SampleCachedSceneDepthTexture(InputParams, int2(0, 0));
  54. // 最小深度的屏幕位置.
  55. float2 VelocityOffset = float2(0.0, 0.0);
  56. #if AA_CROSS // 在深度搜索X模式中使用的像素交叉距离。
  57. {
  58. float4 Depths;
  59. // AA_CROSS默认是2.
  60. // 左下
  61. Depths.x = SampleCachedSceneDepthTexture(InputParams, int2(-AA_CROSS, -AA_CROSS));
  62. // 右上
  63. Depths.y = SampleCachedSceneDepthTexture(InputParams, int2( AA_CROSS, -AA_CROSS));
  64. // 左下
  65. Depths.z = SampleCachedSceneDepthTexture(InputParams, int2(-AA_CROSS, AA_CROSS));
  66. // 右下
  67. Depths.w = SampleCachedSceneDepthTexture(InputParams, int2( AA_CROSS, AA_CROSS));
  68. float2 DepthOffset = float2(AA_CROSS, AA_CROSS);
  69. float DepthOffsetXx = float(AA_CROSS);
  70. #if HAS_INVERTED_Z_BUFFER
  71. // Nearest depth is the largest depth (depth surface 0=far, 1=near).
  72. if(Depths.x > Depths.y)
  73. {
  74. DepthOffsetXx = -AA_CROSS;
  75. }
  76. if(Depths.z > Depths.w)
  77. {
  78. DepthOffset.x = -AA_CROSS;
  79. }
  80. float DepthsXY = max(Depths.x, Depths.y);
  81. float DepthsZW = max(Depths.z, Depths.w);
  82. if(DepthsXY > DepthsZW)
  83. {
  84. DepthOffset.y = -AA_CROSS;
  85. DepthOffset.x = DepthOffsetXx;
  86. }
  87. float DepthsXYZW = max(DepthsXY, DepthsZW);
  88. if(DepthsXYZW > PosN.z)
  89. {
  90. VelocityOffset = DepthOffset * InputSceneColorSize.zw;
  91. PosN.z = DepthsXYZW;
  92. }
  93. #else // !HAS_INVERTED_Z_BUFFER
  94. #error Fix me!
  95. #endif // !HAS_INVERTED_Z_BUFFER
  96. }
  97. #endif // AA_CROSS
  98. // 像素或最近像素的摄像机运动(在ScreenPos空间中).
  99. bool OffScreen = false;
  100. float Velocity = 0;
  101. float HistoryBlur = 0;
  102. float2 HistoryScreenPosition = InputParams.ScreenPos;
  103. #if 1
  104. {
  105. // 当前和上一帧裁剪数据.
  106. float4 ThisClip = float4( PosN.xy, PosN.z, 1 );
  107. float4 PrevClip = mul( ThisClip, View.ClipToPrevClip );
  108. float2 PrevScreen = PrevClip.xy / PrevClip.w;
  109. float2 BackN = PosN.xy - PrevScreen;
  110. float2 BackTemp = BackN * OutputViewportSize.xy;
  111. #if AA_DYNAMIC // 动态模糊.
  112. {
  113. float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV + VelocityOffset, 0);
  114. bool DynamicN = EncodedVelocity.x > 0.0;
  115. if(DynamicN)
  116. {
  117. BackN = DecodeVelocityFromTexture(EncodedVelocity).xy;
  118. }
  119. BackTemp = BackN * OutputViewportSize.xy;
  120. }
  121. #endif
  122. Velocity = sqrt(dot(BackTemp, BackTemp));
  123. #if !AA_BICUBIC
  124. // Save the amount of pixel offset of just camera motion, used later as the amount of blur introduced by history.
  125. float HistoryBlurAmp = 2.0;
  126. HistoryBlur = saturate(abs(BackTemp.x) * HistoryBlurAmp + abs(BackTemp.y) * HistoryBlurAmp);
  127. #endif
  128. // 当前像素对应的历史帧位置.
  129. HistoryScreenPosition = InputParams.ScreenPos - BackN;
  130. // 检测HistoryBufferUV是否在视口之外.
  131. OffScreen = max(abs(HistoryScreenPosition.x), abs(HistoryScreenPosition.y)) >= 1.0;
  132. }
  133. #endif
  134. // 缓存输入的颜色数据, 将它们加载到LDS中.
  135. PrecacheInputSceneColor(/* inout = */ InputParams);
  136. #if AA_UPSAMPLE_ADAPTIVE_FILTERING == 0
  137. // 过滤输入数据.
  138. FilterCurrentFrameInputSamples(
  139. InputParams,
  140. /* inout = */ IntermediaryResult);
  141. #endif
  142. // 计算邻域的包围盒.
  143. FTAAHistoryPayload NeighborMin;
  144. FTAAHistoryPayload NeighborMax;
  145. ComputeNeighborhoodBoundingbox(
  146. InputParams,
  147. /* inout = */ IntermediaryResult,
  148. NeighborMin, NeighborMax);
  149. // 采样历史数据.
  150. FTAAHistoryPayload History = SampleHistory(HistoryScreenPosition);
  151. // 是否需要忽略历史数据(历史数据在视口之外或突然出现).
  152. bool IgnoreHistory = OffScreen || bCameraCut;
  153. // 动态抗鬼影.
  154. // ---------------------
  155. #if AA_DYNAMIC_ANTIGHOST && AA_DYNAMIC && HISTORY_PAYLOAD_COMPONENTS == 3
  156. bool Dynamic4; // 判断这个点是不是运动的
  157. {
  158. #if !AA_DYNAMIC
  159. #error AA_DYNAMIC_ANTIGHOST requires AA_DYNAMIC
  160. #endif
  161. // 分别采样速度缓冲的下边(Dynamic1), 左边(Dynamic3), 自身(Dynamic4), 右边(Dynamic5), 上面(Dynamic7).
  162. bool Dynamic1 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 0, -1)).x > 0;
  163. bool Dynamic3 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2(-1, 0)).x > 0;
  164. Dynamic4 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0).x > 0;
  165. bool Dynamic5 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 1, 0)).x > 0;
  166. bool Dynamic7 = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, InputParams.NearestBufferUV, 0, int2( 0, 1)).x > 0;
  167. // 判断以上任意一点是否运动的
  168. bool Dynamic = Dynamic1 || Dynamic3 || Dynamic4 || Dynamic5 || Dynamic7;
  169. // 继续判断是否需要忽略历史数据(不运动且历史的alpha>0).
  170. IgnoreHistory = IgnoreHistory || (!Dynamic && History.Color.a > 0);
  171. }
  172. #endif
  173. // Clamp历史亮度之前先保存之.
  174. float LumaMin = GetSceneColorLuma4(NeighborMin.Color);
  175. float LumaMax = GetSceneColorLuma4(NeighborMax.Color);
  176. float LumaHistory = GetSceneColorLuma4(History.Color);
  177. FTAAHistoryPayload PreClampingHistoryColor = History;
  178. // Clamp历史数据.
  179. History = ClampHistory(IntermediaryResult, History, NeighborMin, NeighborMax);
  180. // 颜色Clamp之后过滤输入.
  181. #if AA_UPSAMPLE_ADAPTIVE_FILTERING == 1
  182. {
  183. #if AA_VARIANCE
  184. #error AA_VARIANCE and AA_UPSAMPLE_ADAPTIVE_FILTERING are not compatible because of circular code dependency.
  185. #endif
  186. // 忽略历史帧数据.
  187. if (IgnoreHistory)
  188. {
  189. IntermediaryResult.InvFilterScaleFactor = 0;
  190. }
  191. IntermediaryResult.InvFilterScaleFactor -= (Velocity * UpscaleFactor) * 0.1;
  192. IntermediaryResult.InvFilterScaleFactor = max(IntermediaryResult.InvFilterScaleFactor, ScreenPercentage);
  193. FilterCurrentFrameInputSamples(
  194. InputParams,
  195. /* inout = */ IntermediaryResult);
  196. }
  197. #endif
  198. // 重新添加锯齿以锐化
  199. // -------------------------------
  200. #if AA_FILTERED && !AA_BICUBIC
  201. {
  202. #if AA_UPSAMPLE
  203. #error Temporal upsample does not support sharpen.
  204. #endif
  205. // Blend in non-filtered based on the amount of sub-pixel motion.
  206. float AddAliasing = saturate(HistoryBlur) * 0.5;
  207. float LumaContrastFactor = 32.0;
  208. #if AA_YCOCG // TODO: Probably a bug arround here because using Luma4() even with YCOCG=0.
  209. // 1/4 as bright.
  210. LumaContrastFactor *= 4.0;
  211. #endif
  212. float LumaContrast = LumaMax - LumaMin;
  213. AddAliasing = saturate(AddAliasing + rcp(1.0 + LumaContrast * LumaContrastFactor));
  214. IntermediaryResult.Filtered.Color = lerp(IntermediaryResult.Filtered.Color, SampleCachedSceneColorTexture(InputParams, int2(0, 0)).Color, AddAliasing);
  215. }
  216. #endif
  217. // 计算混合因子.
  218. // --------------------
  219. float BlendFinal;
  220. {
  221. float LumaFiltered = GetSceneColorLuma4(IntermediaryResult.Filtered.Color);
  222. // CurrentFrameWeight是从c++传入的,默认为0.04f
  223. BlendFinal = IntermediaryResult.FilteredTemporalWeight * CurrentFrameWeight;
  224. // 根据速度进行插值,速度越大,则BlendFinal越大
  225. // 速度越大,历史帧越不可信
  226. BlendFinal = lerp(BlendFinal, 0.2, saturate(Velocity / 40));
  227. // 确保至少有一些小的贡献.
  228. BlendFinal = max( BlendFinal, saturate( 0.01 * LumaHistory / abs( LumaFiltered - LumaHistory ) ) );
  229. #if AA_NAN && (COMPILER_GLSL || COMPILER_METAL)
  230. // The current Metal & GLSL compilers don\'t handle saturate(NaN) -> 0, instead they return NaN/INF.
  231. BlendFinal = -min(-BlendFinal, 0.0);
  232. #endif
  233. // ResponsiveAA强制成新帧的1/4.
  234. BlendFinal = InputParams.bIsResponsiveAAPixel ? (1.0/4.0) : BlendFinal;
  235. #if AA_LERP
  236. BlendFinal = 1.0/float(AA_LERP);
  237. #endif
  238. // 处理DOF.
  239. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC || AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
  240. {
  241. float BilateralWeight = ComputeBilateralWeight(IntermediaryResult.Filtered.CocRadius, History.CocRadius);
  242. BlendFinal = lerp(1, BlendFinal, BilateralWeight);
  243. }
  244. #endif
  245. // 如果是镜头切换, 当前帧强制成1.
  246. if (bCameraCut)
  247. {
  248. BlendFinal = 1.0;
  249. }
  250. }
  251. // 忽略历史帧, 重置数据.
  252. if (IgnoreHistory)
  253. {
  254. // 历史帧等于滤波后的结果.
  255. History = IntermediaryResult.Filtered;
  256. #if HISTORY_PAYLOAD_COMPONENTS == 3
  257. History.Color.a = 0.0;
  258. #endif
  259. }
  260. // 最终在历史和过滤颜色之间混合
  261. // -------------------------------------------------
  262. // 亮度权重混合.
  263. float FilterWeight = GetSceneColorHdrWeight(InputParams, IntermediaryResult.Filtered.Color.x);
  264. float HistoryWeight = GetSceneColorHdrWeight(InputParams, History.Color.x);
  265. FTAAHistoryPayload OutputPayload;
  266. {
  267. // 计算带权重的插值.
  268. float2 Weights = WeightedLerpFactors(HistoryWeight, FilterWeight, BlendFinal);
  269. // 增加输出的历史负载数据, 会进行加权, 历史帧的alpha会乘以Weights.x系数下降.
  270. OutputPayload = AddPayload(MulPayload(History, Weights.x), MulPayload(IntermediaryResult.Filtered, Weights.y));
  271. }
  272. // 调整靠近1的Alpha, 0.995 < 0.996 = 254/255
  273. if (OutputPayload.Color.a > 0.995)
  274. {
  275. OutputPayload.Color.a = 1;
  276. }
  277. // 转换颜色回到线性空间.
  278. OutputPayload.Color = TransformBackToRawLinearSceneColor(OutputPayload.Color);
  279. #if AA_NAN // 非法数据.
  280. OutputPayload.Color = -min(-OutputPayload.Color, 0.0);
  281. OutputPayload.CocRadius = isnan(OutputPayload.CocRadius) ? 0.0 : OutputPayload.CocRadius;
  282. #endif
  283. #if HISTORY_PAYLOAD_COMPONENTS == 3
  284. #if AA_DYNAMIC_ANTIGHOST && AA_DYNAMIC
  285. // 如果这一帧是运动的话,那么alpha为1,写入历史帧.
  286. OutputPayload.Color.a = Dynamic4 ? 1 : 0;
  287. #else
  288. // 不运动或非动态, Alpha为0.
  289. OutputPayload.Color.a = 0;
  290. #endif
  291. #endif
  292. return OutputPayload;
  293. }

上面TAA的主流程中涉及到了很多个重要的接口调用,下面继续解析之:

  1. // 过滤当前帧的输入采样数据.
  2. void FilterCurrentFrameInputSamples(
  3. in FTAAInputParameters InputParams,
  4. inout FTAAIntermediaryResult IntermediaryResult)
  5. {
  6. (......)
  7. FTAAHistoryPayload Filtered;
  8. {
  9. // 上采样.
  10. #if AA_UPSAMPLE
  11. // Pixel coordinate of the center of output pixel O in the input viewport.
  12. float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
  13. // Pixel coordinate of the center of the nearest input pixel K.
  14. float2 PPCk = floor(PPCo) + 0.5;
  15. // Vector in pixel between pixel K -> O.
  16. float2 dKO = PPCo - PPCk;
  17. #endif
  18. // 根据采样数量选择不同的卷积核.
  19. #if AA_SAMPLES == 9
  20. const uint SampleIndexes[9] = kSquareIndexes3x3;
  21. #elif AA_SAMPLES == 5 || AA_SAMPLES == 6
  22. const uint SampleIndexes[5] = kPlusIndexes3x3;
  23. #endif
  24. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC || AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
  25. // Fetches center pixel\'s Coc for the bilateral filtering.
  26. float CenterCocRadius = SampleCachedSceneColorTexture(InputParams, int2(0, 0)).CocRadius;
  27. #endif
  28. // 计算邻居的HDR, 最终权重和颜色.
  29. float NeighborsHdrWeight = 0;
  30. float NeighborsFinalWeight = 0;
  31. float4 NeighborsColor = 0;
  32. UNROLL
  33. for (uint i = 0; i < AA_SAMPLES; i++)
  34. {
  35. // 从最近的输入像素获得样本偏移量.
  36. int2 SampleOffset;
  37. #if AA_UPSAMPLE && AA_SAMPLES == 6
  38. if (i == 5)
  39. {
  40. SampleOffset = SignFastInt(dKO);
  41. }
  42. else
  43. #endif
  44. {
  45. const uint SampleIndex = SampleIndexes[i];
  46. SampleOffset = kOffsets3x3[SampleIndex];
  47. }
  48. float2 fSampleOffset = float2(SampleOffset);
  49. // When doing Coc bilateral, the center sample is accumulated last.
  50. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC && 0
  51. if (all(SampleOffset == 0) && (AA_SAMPLES != 6 || i != 5))
  52. {
  53. continue;
  54. }
  55. #endif
  56. // 找出这个输入样本的空间权值.
  57. #if AA_UPSAMPLE
  58. // 计算输出像素和输入像素I之间的像素增量.
  59. // 注意: abs() 不必要, 因为后面会用dot(dPP, dPP).
  60. float2 dPP = fSampleOffset - dKO;
  61. float SampleSpatialWeight = ComputeSampleWeigth(IntermediaryResult, dPP);
  62. #elif AA_SAMPLES == 9
  63. float SampleSpatialWeight = SampleWeights[i];
  64. #elif AA_SAMPLES == 5
  65. float SampleSpatialWeight = PlusWeights[i];
  66. #else
  67. #error Do not know how to compute filtering sample weight.
  68. #endif
  69. // 获取颜色采样.
  70. FTAASceneColorSample Sample = SampleCachedSceneColorTexture(InputParams, SampleOffset);
  71. // 查找采样点的HDR权重.
  72. #if AA_TONE
  73. float SampleHdrWeight = Sample.HdrWeight;
  74. #else
  75. float SampleHdrWeight = 1;
  76. #endif
  77. // 根据有效负载求出样本的双边权重.
  78. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_COC
  79. float BilateralWeight = ComputeNeightborSampleBilateralWeight(CenterCocRadius, Sample.CocRadius);
  80. #else
  81. float BilateralWeight = 1;
  82. #endif
  83. // 计算最终采样权重.
  84. float SampleFinalWeight = SampleSpatialWeight * SampleHdrWeight * BilateralWeight;
  85. // 应用权重到采样颜色中.
  86. NeighborsColor += SampleFinalWeight * Sample.Color;
  87. NeighborsFinalWeight += SampleFinalWeight;
  88. NeighborsHdrWeight += SampleSpatialWeight * SampleHdrWeight;
  89. }
  90. (......)
  91. }
  92. IntermediaryResult.Filtered = Filtered;
  93. }
  94. // 计算用于拒绝历史记录的邻域包围盒.
  95. void ComputeNeighborhoodBoundingbox(
  96. in FTAAInputParameters InputParams,
  97. in FTAAIntermediaryResult IntermediaryResult,
  98. out FTAAHistoryPayload OutNeighborMin,
  99. out FTAAHistoryPayload OutNeighborMax)
  100. {
  101. // 相邻像素的数据.
  102. FTAAHistoryPayload Neighbors[kNeighborsCount];
  103. UNROLL
  104. for (uint i = 0; i < kNeighborsCount; i++)
  105. {
  106. Neighbors[i].Color = SampleCachedSceneColorTexture(InputParams, kOffsets3x3[i]).Color;
  107. Neighbors[i].CocRadius = SampleCachedSceneColorTexture(InputParams, kOffsets3x3[i]).CocRadius;
  108. }
  109. FTAAHistoryPayload NeighborMin;
  110. FTAAHistoryPayload NeighborMax;
  111. #if AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_VARIANCE
  112. // 这个就是NVIDIA版本的Variance Clipping.
  113. {
  114. #if AA_SAMPLES == 9
  115. const uint SampleIndexes[9] = kSquareIndexes3x3;
  116. #elif AA_SAMPLES == 5
  117. const uint SampleIndexes[5] = kPlusIndexes3x3;
  118. #else
  119. #error Unknown number of samples.
  120. #endif
  121. // 计算当前像素的矩(moment).
  122. float4 m1 = 0;
  123. float4 m2 = 0;
  124. for( uint i = 0; i < AA_SAMPLES; i++ )
  125. {
  126. float4 SampleColor = Neighbors[ SampleIndexes[i] ];
  127. m1 += SampleColor;
  128. m2 += Pow2( SampleColor );
  129. }
  130. m1 *= (1.0 / AA_SAMPLES);
  131. m2 *= (1.0 / AA_SAMPLES);
  132. // 标准方差.
  133. float4 StdDev = sqrt( abs(m2 - m1 * m1) );
  134. // 邻居的最大最小值.
  135. NeighborMin = m1 - 1.25 * StdDev;
  136. NeighborMax = m1 + 1.25 * StdDev;
  137. // 跟输入的过滤数据做比较, 找出最大最小值.
  138. NeighborMin = min( NeighborMin, IntermediaryResult.Filtered );
  139. NeighborMax = max( NeighborMax, IntermediaryResult.Filtered );
  140. }
  141. #elif AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_SAMPLE_DISTANCE
  142. // 只在某个半径内执行颜色裁剪.
  143. {
  144. float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
  145. float2 PPCk = floor(PPCo) + 0.5;
  146. float2 dKO = PPCo - PPCk;
  147. // 总是考虑4个样本.
  148. NeighborMin = Neighbors[4];
  149. NeighborMax = Neighbors[4];
  150. // 减少距离阈值作为upsacale因素增加, 以减少鬼影.
  151. float DistthresholdLerp = UpscaleFactor - 1;
  152. float DistThreshold = lerp(1.51, 1.3, DistthresholdLerp);
  153. #if AA_SAMPLES == 9
  154. const uint Indexes[9] = kSquareIndexes3x3;
  155. #else
  156. const uint Indexes[5] = kPlusIndexes3x3;
  157. #endif
  158. // 计算所有样本的最大最小值.
  159. UNROLL
  160. for( uint i = 0; i < AA_SAMPLES; i++ )
  161. {
  162. uint NeightborId = Indexes[i];
  163. if (NeightborId != 4)
  164. {
  165. float2 dPP = float2(kOffsets3x3[NeightborId]) - dKO;
  166. FLATTEN
  167. if (dot(dPP, dPP) < (DistThreshold * DistThreshold))
  168. {
  169. NeighborMin = MinPayload(NeighborMin, Neighbors[NeightborId]);
  170. NeighborMax = MaxPayload(NeighborMax, Neighbors[NeightborId]);
  171. }
  172. }
  173. }
  174. }
  175. #elif AA_HISTORY_CLAMPING_BOX == HISTORY_CLAMPING_BOX_MIN_MAX
  176. // 用最大最小包围盒来裁剪, 是默认的方式.
  177. {
  178. NeighborMin = MinPayload3( Neighbors[1], Neighbors[3], Neighbors[4] );
  179. NeighborMin = MinPayload3( NeighborMin, Neighbors[5], Neighbors[7] );
  180. NeighborMax = MaxPayload3( Neighbors[1], Neighbors[3], Neighbors[4] );
  181. NeighborMax = MaxPayload3( NeighborMax, Neighbors[5], Neighbors[7] );
  182. #if AA_SAMPLES == 6
  183. {
  184. float2 PPCo = InputParams.ViewportUV * InputViewSize.xy + TemporalJitterPixels;
  185. float2 PPCk = floor(PPCo) + 0.5;
  186. float2 dKO = PPCo - PPCk;
  187. int2 FifthNeighborOffset = SignFastInt(dKO);
  188. FTAAHistoryPayload FifthNeighbor;
  189. FifthNeighbor.Color = SampleCachedSceneColorTexture(InputParams, FifthNeighborOffset).Color;
  190. FifthNeighbor.CocRadius = SampleCachedSceneColorTexture(InputParams, FifthNeighborOffset).CocRadius;
  191. NeighborMin = MinPayload(NeighborMin, FifthNeighbor);
  192. NeighborMax = MaxPayload(NeighborMax, FifthNeighbor);
  193. }
  194. #elif AA_SAMPLES == 9
  195. {
  196. FTAAHistoryPayload NeighborMinPlus = NeighborMin;
  197. FTAAHistoryPayload NeighborMaxPlus = NeighborMax;
  198. NeighborMin = MinPayload3( NeighborMin, Neighbors[0], Neighbors[2] );
  199. NeighborMin = MinPayload3( NeighborMin, Neighbors[6], Neighbors[8] );
  200. NeighborMax = MaxPayload3( NeighborMax, Neighbors[0], Neighbors[2] );
  201. NeighborMax = MaxPayload3( NeighborMax, Neighbors[6], Neighbors[8] );
  202. if( AA_ROUND )
  203. {
  204. NeighborMin = AddPayload(MulPayload(NeighborMin, 0.5), MulPayload(NeighborMinPlus, 0.5));
  205. NeighborMax = AddPayload(MulPayload(NeighborMax, 0.5), MulPayload(NeighborMaxPlus, 0.5));
  206. }
  207. }
  208. #endif
  209. }
  210. #else
  211. #error Unknown history clamping box.
  212. #endif
  213. OutNeighborMin = NeighborMin;
  214. OutNeighborMax = NeighborMax;
  215. }
  216. // 采样历史数据.
  217. FTAAHistoryPayload SampleHistory(in float2 HistoryScreenPosition)
  218. {
  219. float4 RawHistory0 = 0;
  220. float4 RawHistory1 = 0;
  221. #if AA_BICUBIC // 用Catmull-Rom曲线采样历史数据, 以减少运动模糊.(默认使用)
  222. {
  223. float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
  224. // 裁剪HistoryBufferUV,避免对额外样本的计算.
  225. #if AA_MANUALLY_CLAMP_HISTORY_UV
  226. HistoryBufferUV = clamp(HistoryBufferUV, HistoryBufferUVMinMax.xy, HistoryBufferUVMinMax.zw);
  227. #endif
  228. FCatmullRomSamples Samples = GetBicubic2DCatmullRomSamples(HistoryBufferUV, HistoryBufferSize.xy, HistoryBufferSize.zw);
  229. for (uint i = 0; i < Samples.Count; i++)
  230. {
  231. float2 SampleUV = Samples.UV[i];
  232. // 裁剪SampleUV在HistoryBufferUVMinMax内, 避免取样潜在NaN跑到视图区域之外.
  233. // 可能消耗很大,但Samples.UVDir实际上是编译期常数。
  234. if (AA_MANUALLY_CLAMP_HISTORY_UV)
  235. {
  236. if (Samples.UVDir[i].x < 0)
  237. {
  238. SampleUV.x = max(SampleUV.x, HistoryBufferUVMinMax.x);
  239. }
  240. else if (Samples.UVDir[i].x > 0)
  241. {
  242. SampleUV.x = min(SampleUV.x, HistoryBufferUVMinMax.z);
  243. }
  244. if (Samples.UVDir[i].y < 0)
  245. {
  246. SampleUV.y = max(SampleUV.y, HistoryBufferUVMinMax.y);
  247. }
  248. else if (Samples.UVDir[i].y > 0)
  249. {
  250. SampleUV.y = min(SampleUV.y, HistoryBufferUVMinMax.w);
  251. }
  252. }
  253. RawHistory0 += HistoryBuffer_0.SampleLevel(HistoryBufferSampler_0, SampleUV, 0) * Samples.Weight[i];
  254. }
  255. RawHistory0 *= Samples.FinalMultiplier;
  256. }
  257. // 双线性采样历史数据.
  258. #else
  259. {
  260. // Clamp HistoryScreenPosition to be within viewport.
  261. if (AA_MANUALLY_CLAMP_HISTORY_UV)
  262. {
  263. HistoryScreenPosition = clamp(HistoryScreenPosition, -ScreenPosAbsMax, ScreenPosAbsMax);
  264. }
  265. float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
  266. RawHistory0 = HistoryBuffer_0.SampleLevel(HistoryBufferSampler_0, HistoryBufferUV, 0);
  267. }
  268. #endif
  269. #if HISTORY_RENDER_TARGETS == 2
  270. {
  271. if (AA_MANUALLY_CLAMP_HISTORY_UV)
  272. {
  273. HistoryScreenPosition = clamp(HistoryScreenPosition, -ScreenPosAbsMax, ScreenPosAbsMax);
  274. }
  275. float2 HistoryBufferUV = HistoryScreenPosition * ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
  276. RawHistory1 = HistoryBuffer_1.SampleLevel(HistoryBufferSampler_1, HistoryBufferUV, 0);
  277. }
  278. #endif
  279. // 处理和保存历史数据的结果.
  280. FTAAHistoryPayload HistoryPayload;
  281. HistoryPayload.Color = RawHistory0;
  282. #if AA_HISTORY_PAYLOAD == HISTORY_PAYLOAD_RGB_OPACITY_COC
  283. HistoryPayload.CocRadius = RawHistory1.r;
  284. #else
  285. HistoryPayload.CocRadius = RawHistory0.a;
  286. #endif
  287. #if USE_PREEXPOSURE
  288. HistoryPayload.Color.rgb *= HistoryPreExposureCorrection;
  289. #endif
  290. HistoryPayload.Color = TransformSceneColor(HistoryPayload.Color);
  291. return HistoryPayload;
  292. }
  293. // 裁剪历史数据.
  294. FTAAHistoryPayload ClampHistory(inout FTAAIntermediaryResult IntermediaryResult, FTAAHistoryPayload History, FTAAHistoryPayload NeighborMin, FTAAHistoryPayload NeighborMax)
  295. {
  296. #if !AA_CLAMP
  297. return History;
  298. #elif AA_CLIP // 使用更紧的AABB裁剪历史数据.
  299. // 裁剪历史,这使用颜色AABB相交更紧.
  300. float4 TargetColor = Filtered;
  301. // 历史裁剪.
  302. float ClipBlend = HistoryClip( HistoryColor.rgb, TargetColor.rgb, NeighborMin.rgb, NeighborMax.rgb );
  303. // 裁剪到0~1.
  304. ClipBlend = saturate( ClipBlend );
  305. // 根据混合权重插值历史和目标颜色.
  306. HistoryColor = lerp( HistoryColor, TargetColor, ClipBlend );
  307. #if AA_FORCE_ALPHA_CLAMP
  308. HistoryColor.a = clamp( HistoryColor.a, NeighborMin.a, NeighborMax.a );
  309. #endif
  310. return HistoryColor;
  311. #else //!AA_CLIP, 使用Neighborhood clamping(邻域裁剪).
  312. History.Color = clamp(History.Color, NeighborMin.Color, NeighborMax.Color);
  313. History.CocRadius = clamp(History.CocRadius, NeighborMin.CocRadius, NeighborMax.CocRadius);
  314. return History;
  315. #endif
  316. }

此外,重点说下当前帧和历史帧的权重插值WeightedLerpFactors

  1. // Engine\Shaders\Private\TemporalAA\TAACommon.ush
  2. taa_half2 WeightedLerpFactors(taa_half WeightA, taa_half WeightB, taa_half Blend)
  3. {
  4. // 先插值获得带权重的A和B.
  5. taa_half BlendA = (taa_half(1.0) - Blend) * WeightA;
  6. taa_half BlendB = Blend * WeightB;
  7. // 计算它们和的倒数.
  8. taa_half RcpBlend = SafeRcp(BlendA + BlendB);
  9. // 用它们和的倒数归一化.
  10. BlendA *= RcpBlend;
  11. BlendB *= RcpBlend;
  12. // 输出结果.
  13. return taa_half2(BlendA, BlendB);
  14. }

上面的权重插值和线性插值不一样,关键是会将A和B的权重考虑进去,并除以它们的和来达到归一化的目的。

若用公式表达,则WeightA的插值结果WeightA‘的计算公式为:

\[{WeightA}\’ = \cfrac{(1.0-\text{Blend}) \cdot \text{WeightA} }{(1.0-\text{Blend}) \cdot \text{WeightA}+\text{Blend} \cdot \text{WeightB}}
\]

WeightB的插值结果WeightB‘的计算公式为:

\[{WeightB}\’ = \cfrac{\text{Blend} \cdot \text{WeightB} }{(1.0-\text{Blend}) \cdot \text{WeightA}+\text{Blend} \cdot \text{WeightB}}
\]

下面展示一下无AA、FXX、TAA的对比效果图:

不过,不要被上面的静态对比图给蒙蔽了,实际上,在运动的画面中,TAA还是存在不少问题,比如细小的物体的颗粒感、画面变模糊、突然消失或出现的像素有瑕疵、存在少许的延时等等。

但目前的硬件性能,在延迟管线下,TAA还是首选的抗锯齿技术。可以用一些Hack方法来缓解上述的瑕疵。

SSR全称Screen Space Reflections(屏幕空间的反射),是在屏幕空间计算光滑表面的反射效果的技术。

UE的SSR效果一览。

SSR和Cubemap、平面反射不一样,效果和效率都鉴于它们之间:

反射类型 效果 消耗 描述
Planar Reflections 高,动态 需要用镜像的摄像机渲染多一次
Screen Space Reflections 中,动态 基于屏幕空间,存在隐藏几何体和边缘裁剪的问题
Cubemap Reflections 低,静态 预生成,只适用于静态物体的反射

SSR的核心思想在于重用屏幕空间的数据:

SSR的核心算法和步骤是对每个像素执行以下步骤:

  • 计算反射射线。
  • 沿着反射射线方向追踪(可用深度缓冲)。
  • 用交点的颜色作为反射颜色。

.jpg)

SSR的步骤和依赖的屏幕空间数据,包含场景颜色、法线、深度和蒙版。

沿着反射射线方向追踪交点时,可用的方法有:

  • 固定步长的Raymarch:实现最简单,但效率差,追踪次数多。
  • 距离场:需预生成,可有效减少追踪次数。
  • Mesh/BVH:要用到分支,实现和数据结构复杂,非缓存一致性的内存访问。
  • Voxels:需预生成,内存消耗大,只能针对高端设备。
  • Hi-Z Buffer(Depth Mip-Map):GPU友好,不能完美覆盖空间。(下图)

在使用Hi-Z Buffer追踪之前,需要用最小最大值生成深度的MipMap:

对于粗糙(模糊)表面的反射方向,可以使用重要性采样(Halton序列和BRDF):

在实现过程中,可以使用符合权重条件的邻域射线来复用交点:

如何判定邻域射线可用呢?下图就是其中的一个权重算法:

另外,还有半分辨率来计算SSR,用类似TAA的Jitter方法达到多采样效果:

在追踪方式上,可用稀疏(Sparse)光线追踪优化,将光线追踪从颜色解析中解耦出来,只在半分辨率上执行,提升颜色解析到全分辨率,依然使用4邻域作为解析:

在生成重要性采样的射线时,需要过滤之,将射线模拟成椎体,获得更广范围的交点:

这就需要对场景颜色进行下采样获得Mipmap,模拟不同粗糙度的表面反射交点覆盖的范围。

SSR由于基于屏幕空间数据(相当于覆盖视锥体最前方的一层纸片),因此,会产生隐藏几何体的问题:

SSR的瑕疵,注意食指的倒影,明显被截掉了。

另外,还存在边缘裁剪(Edge Cutoff)的问题:

以上问题可以使用边缘过渡来缓解:

SSR只能用于延迟渲染管线,因为要依赖GBuffer数据、场景颜色和深度的Mipmap,它在渲染管线的流程示意图如下:

SSR内部的渲染过程图示如下:

下面转向分析UE的实现。首先来看SSR渲染的入口和逻辑:

  1. void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
  2. {
  3. (......)
  4. // 渲染光源.
  5. RenderLights(GraphBuilder, ...);
  6. (......)
  7. // 渲染反射(包含SSR)和天空光.
  8. RenderDeferredReflectionsAndSkyLighting(GraphBuilder, ...);
  9. (......)
  10. }
  11. void FDeferredShadingSceneRenderer::RenderDeferredReflectionsAndSkyLighting(
  12. FRDGBuilder& GraphBuilder,
  13. TRDGUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBuffer,
  14. FRDGTextureMSAA SceneColorTexture,
  15. FRDGTextureRef DynamicBentNormalAOTexture,
  16. FRDGTextureRef VelocityTexture,
  17. FHairStrandsRenderingData* HairDatas)
  18. {
  19. (......)
  20. for (FViewInfo& View : Views)
  21. {
  22. (......)
  23. // 处理SSR.
  24. else if (bScreenSpaceReflections)
  25. {
  26. bDenoise = DenoiserMode != 0 && CVarDenoiseSSR.GetValueOnRenderThread();
  27. bTemporalFilter = !bDenoise && View.ViewState && IsSSRTemporalPassRequired(View);
  28. ESSRQuality SSRQuality;
  29. GetSSRQualityForView(View, &SSRQuality, &DenoiserConfig);
  30. RDG_EVENT_SCOPE(GraphBuilder, "ScreenSpaceReflections(Quality=%d)", int32(SSRQuality));
  31. // 渲染SSR.
  32. RenderScreenSpaceReflections(GraphBuilder, SceneTextures, SceneColorTexture.Resolve, View, SSRQuality, bDenoise, &DenoiserInputs);
  33. }
  34. (......)
  35. }
  36. // Engine\Source\Runtime\Renderer\Private\ScreenSpaceRayTracing.cpp
  37. void RenderScreenSpaceReflections(
  38. FRDGBuilder& GraphBuilder,
  39. const FSceneTextureParameters& SceneTextures,
  40. const FRDGTextureRef CurrentSceneColor,
  41. const FViewInfo& View,
  42. ESSRQuality SSRQuality,
  43. bool bDenoiser,
  44. IScreenSpaceDenoiser::FReflectionsInputs* DenoiserInputs,
  45. FTiledScreenSpaceReflection* TiledScreenSpaceReflection)
  46. {
  47. // 处理输入纹理.
  48. FRDGTextureRef InputColor = CurrentSceneColor;
  49. if (SSRQuality != ESSRQuality::VisualizeSSR)
  50. {
  51. if (View.PrevViewInfo.CustomSSRInput.IsValid())
  52. {
  53. InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.CustomSSRInput);
  54. }
  55. else if (GSSRHalfResSceneColor && View.PrevViewInfo.HalfResTemporalAAHistory.IsValid())
  56. {
  57. InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.HalfResTemporalAAHistory);
  58. }
  59. else if (View.PrevViewInfo.TemporalAAHistory.IsValid())
  60. {
  61. InputColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.TemporalAAHistory.RT[0]);
  62. }
  63. }
  64. const bool SSRStencilPrePass = CVarSSRStencil.GetValueOnRenderThread() != 0 && SSRQuality != ESSRQuality::VisualizeSSR && TiledScreenSpaceReflection == nullptr;
  65. // 为降噪分配输入.
  66. {
  67. FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
  68. FSceneRenderTargets::Get_FrameConstantsOnly().GetBufferSizeXY(),
  69. PF_FloatRGBA, FClearValueBinding(FLinearColor(0, 0, 0, 0)),
  70. TexCreate_RenderTargetable | TexCreate_ShaderResource | TexCreate_UAV);
  71. Desc.Flags |= GFastVRamConfig.SSR;
  72. DenoiserInputs->Color = GraphBuilder.CreateTexture(Desc, TEXT("ScreenSpaceReflections"));
  73. if (bDenoiser)
  74. {
  75. Desc.Format = PF_R16F;
  76. DenoiserInputs->RayHitDistance = GraphBuilder.CreateTexture(Desc, TEXT("ScreenSpaceReflectionsHitDistance"));
  77. }
  78. }
  79. IScreenSpaceDenoiser::FReflectionsRayTracingConfig RayTracingConfigs;
  80. GetSSRShaderOptionsForQuality(SSRQuality, &RayTracingConfigs);
  81. // SSR通用shader参数.
  82. FSSRCommonParameters CommonParameters;
  83. CommonParameters.SSRParams = ComputeSSRParams(View, SSRQuality, false);
  84. CommonParameters.ViewUniformBuffer = View.ViewUniformBuffer;
  85. CommonParameters.SceneTextures = SceneTextures;
  86. if (InputColor == CurrentSceneColor || !CommonParameters.SceneTextures.GBufferVelocityTexture)
  87. {
  88. CommonParameters.SceneTextures.GBufferVelocityTexture = GraphBuilder.RegisterExternalTexture(GSystemTextures.MidGreyDummy);
  89. }
  90. FRenderTargetBindingSlots RenderTargets;
  91. RenderTargets[0] = FRenderTargetBinding(DenoiserInputs->Color, ERenderTargetLoadAction::ENoAction);
  92. if (bDenoiser)
  93. {
  94. RenderTargets[1] = FRenderTargetBinding(DenoiserInputs->RayHitDistance, ERenderTargetLoadAction::ENoAction);
  95. }
  96. // SSR的模板缓冲Pass.
  97. if (SSRStencilPrePass)
  98. {
  99. // 绑定深度缓冲.
  100. RenderTargets.DepthStencil = FDepthStencilBinding(
  101. SceneTextures.SceneDepthTexture,
  102. ERenderTargetLoadAction::ENoAction,
  103. ERenderTargetLoadAction::ELoad,
  104. FExclusiveDepthStencil::DepthNop_StencilWrite);
  105. FScreenSpaceReflectionsStencilPS::FPermutationDomain PermutationVector;
  106. PermutationVector.Set<FSSROutputForDenoiser>(bDenoiser);
  107. FScreenSpaceReflectionsStencilPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceReflectionsStencilPS::FParameters>();
  108. PassParameters->CommonParameters = CommonParameters;
  109. PassParameters->RenderTargets = RenderTargets;
  110. TShaderMapRef<FScreenSpaceReflectionsStencilPS> PixelShader(View.ShaderMap, PermutationVector);
  111. ClearUnusedGraphResources(PixelShader, PassParameters);
  112. // SSR模板Pass.
  113. GraphBuilder.AddPass(
  114. RDG_EVENT_NAME("SSR StencilSetup %dx%d", View.ViewRect.Width(), View.ViewRect.Height()),
  115. PassParameters,
  116. ERDGPassFlags::Raster,
  117. [PassParameters, &View, PixelShader](FRHICommandList& RHICmdList)
  118. {
  119. SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
  120. RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
  121. FGraphicsPipelineStateInitializer GraphicsPSOInit;
  122. FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
  123. // Clobers the stencil to pixel that should not compute SSR
  124. GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Always, SO_Replace, SO_Replace, SO_Replace>::GetRHI();
  125. SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
  126. SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
  127. RHICmdList.SetStencilRef(0x80);
  128. FPixelShaderUtils::DrawFullscreenTriangle(RHICmdList);
  129. });
  130. }
  131. // 增加SSR pass.
  132. auto SetSSRParameters = [&](auto* PassParameters)
  133. {
  134. {
  135. const FVector2D HZBUvFactor(
  136. float(View.ViewRect.Width()) / float(2 * View.HZBMipmap0Size.X),
  137. float(View.ViewRect.Height()) / float(2 * View.HZBMipmap0Size.Y));
  138. PassParameters->HZBUvFactorAndInvFactor = FVector4(
  139. HZBUvFactor.X,
  140. HZBUvFactor.Y,
  141. 1.0f / HZBUvFactor.X,
  142. 1.0f / HZBUvFactor.Y);
  143. }
  144. {
  145. FIntPoint ViewportOffset = View.ViewRect.Min;
  146. FIntPoint ViewportExtent = View.ViewRect.Size();
  147. FIntPoint BufferSize = SceneTextures.SceneDepthTexture->Desc.Extent;
  148. if (View.PrevViewInfo.TemporalAAHistory.IsValid())
  149. {
  150. ViewportOffset = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Min;
  151. ViewportExtent = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Size();
  152. BufferSize = View.PrevViewInfo.TemporalAAHistory.ReferenceBufferSize;
  153. ensure(ViewportExtent.X > 0 && ViewportExtent.Y > 0);
  154. ensure(BufferSize.X > 0 && BufferSize.Y > 0);
  155. }
  156. FVector2D InvBufferSize(1.0f / float(BufferSize.X), 1.0f / float(BufferSize.Y));
  157. PassParameters->PrevScreenPositionScaleBias = FVector4(
  158. ViewportExtent.X * 0.5f * InvBufferSize.X,
  159. -ViewportExtent.Y * 0.5f * InvBufferSize.Y,
  160. (ViewportExtent.X * 0.5f + ViewportOffset.X) * InvBufferSize.X,
  161. (ViewportExtent.Y * 0.5f + ViewportOffset.Y) * InvBufferSize.Y);
  162. PassParameters->ScreenSpaceRayTracingDebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, DenoiserInputs->Color->Desc, TEXT("DebugSSR"), true);
  163. }
  164. PassParameters->PrevSceneColorPreExposureCorrection = InputColor != CurrentSceneColor ? View.PreExposure / View.PrevViewInfo.SceneColorPreExposure : 1.0f;
  165. PassParameters->SceneColor = InputColor;
  166. PassParameters->SceneColorSampler = GSSRHalfResSceneColor ? TStaticSamplerState<SF_Bilinear>::GetRHI() : TStaticSamplerState<SF_Point>::GetRHI();
  167. PassParameters->HZB = GraphBuilder.RegisterExternalTexture(View.HZB);
  168. PassParameters->HZBSampler = TStaticSamplerState<SF_Point>::GetRHI();
  169. };
  170. // SSR的PS参数.
  171. FScreenSpaceReflectionsPS::FPermutationDomain PermutationVector;
  172. PermutationVector.Set<FSSRQualityDim>(SSRQuality);
  173. PermutationVector.Set<FSSROutputForDenoiser>(bDenoiser);
  174. FScreenSpaceReflectionsPS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceReflectionsPS::FParameters>();
  175. PassParameters->CommonParameters = CommonParameters;
  176. SetSSRParameters(&PassParameters->SSRPassCommonParameter);
  177. PassParameters->RenderTargets = RenderTargets;
  178. TShaderMapRef<FScreenSpaceReflectionsPS> PixelShader(View.ShaderMap, PermutationVector);
  179. if (TiledScreenSpaceReflection == nullptr) // 非分块SSR(PC默认方式).
  180. {
  181. ClearUnusedGraphResources(PixelShader, PassParameters);
  182. // 增加SSR RayMarch通道.
  183. GraphBuilder.AddPass(
  184. RDG_EVENT_NAME("SSR RayMarch(Quality=%d RayPerPixel=%d%s) %dx%d",
  185. SSRQuality, RayTracingConfigs.RayCountPerPixel, bDenoiser ? TEXT(" DenoiserOutput") : TEXT(""),
  186. View.ViewRect.Width(), View.ViewRect.Height()),
  187. PassParameters,
  188. ERDGPassFlags::Raster,
  189. [PassParameters, &View, PixelShader, SSRStencilPrePass](FRHICommandList& RHICmdList)
  190. {
  191. SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
  192. RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
  193. FGraphicsPipelineStateInitializer GraphicsPSOInit;
  194. FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
  195. if (SSRStencilPrePass)
  196. {
  197. // Clobers the stencil to pixel that should not compute SSR
  198. GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Equal, SO_Keep, SO_Keep, SO_Keep>::GetRHI();
  199. }
  200. SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
  201. SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
  202. RHICmdList.SetStencilRef(0x80);
  203. // 绘制全屏幕.
  204. FPixelShaderUtils::DrawFullscreenTriangle(RHICmdList);
  205. });
  206. }
  207. else // 分块SSR.
  208. {
  209. check(TiledScreenSpaceReflection->TileSize == 8); // WORK_TILE_SIZE
  210. FScreenSpaceReflectionsTileVS::FPermutationDomain VsPermutationVector;
  211. TShaderMapRef<FScreenSpaceReflectionsTileVS> VertexShader(View.ShaderMap, VsPermutationVector);
  212. PassParameters->TileListData = TiledScreenSpaceReflection->TileListStructureBufferSRV;
  213. PassParameters->IndirectDrawParameter = TiledScreenSpaceReflection->DispatchIndirectParametersBuffer;
  214. ValidateShaderParameters(VertexShader, *PassParameters);
  215. ValidateShaderParameters(PixelShader, *PassParameters);
  216. // 增加SSR RayMarch通道.
  217. GraphBuilder.AddPass(
  218. RDG_EVENT_NAME("SSR RayMarch(Quality=%d RayPerPixel=%d%s) %dx%d",
  219. SSRQuality, RayTracingConfigs.RayCountPerPixel, bDenoiser ? TEXT(" DenoiserOutput") : TEXT(""),
  220. View.ViewRect.Width(), View.ViewRect.Height()),
  221. PassParameters,
  222. ERDGPassFlags::Raster,
  223. [PassParameters, &View, VertexShader, PixelShader, SSRStencilPrePass](FRHICommandList& RHICmdList)
  224. {
  225. SCOPED_GPU_STAT(RHICmdList, ScreenSpaceReflections);
  226. RHICmdList.SetViewport(View.ViewRect.Min.X, View.ViewRect.Min.Y, 0.0f, View.ViewRect.Max.X, View.ViewRect.Max.Y, 1.0f);
  227. FGraphicsPipelineStateInitializer GraphicsPSOInit;
  228. FPixelShaderUtils::InitFullscreenPipelineState(RHICmdList, View.ShaderMap, PixelShader, /* out */ GraphicsPSOInit);
  229. if (SSRStencilPrePass)
  230. {
  231. // Clobers the stencil to pixel that should not compute SSR
  232. GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always, true, CF_Equal, SO_Keep, SO_Keep, SO_Keep>::GetRHI();
  233. }
  234. GraphicsPSOInit.PrimitiveType = GRHISupportsRectTopology ? PT_RectList : PT_TriangleList;
  235. GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GEmptyVertexDeclaration.VertexDeclarationRHI;
  236. GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
  237. GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();
  238. SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
  239. SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), *PassParameters);
  240. SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *PassParameters);
  241. RHICmdList.SetStencilRef(0x80);
  242. PassParameters->IndirectDrawParameter->MarkResourceAsUsed();
  243. RHICmdList.DrawPrimitiveIndirect(PassParameters->IndirectDrawParameter->GetIndirectRHICallBuffer(), 0);
  244. });
  245. }
  246. } // RenderScreenSpaceReflections()

以上代码可知,SSR是RenderLights之后渲染的。在笔者的PC设备中,SSR采用Quality为2、每像素射线数量为1、非分块的参数:

下面继续分析SSR RayMarch使用的Shader代码:

  1. // Engine\Shaders\Private\SSRT\SSRTReflections.usf
  2. void ScreenSpaceReflections(
  3. float4 SvPosition
  4. , out float4 OutColor
  5. #if SSR_OUTPUT_FOR_DENOISER
  6. , out float4 OutClosestHitDistance
  7. #endif
  8. )
  9. {
  10. // 获取坐标.
  11. float2 UV = SvPosition.xy * View.BufferSizeAndInvSize.zw;
  12. float2 ScreenPos = ViewportUVToScreenPos((SvPosition.xy - View.ViewRectMin.xy) * View.ViewSizeAndInvSize.zw);
  13. uint2 PixelPos = (uint2)SvPosition.xy;
  14. bool bDebugPrint = all(PixelPos == uint2(View.ViewSizeAndInvSize.xy) / 2);
  15. OutColor = 0;
  16. #if SSR_OUTPUT_FOR_DENOISER
  17. OutClosestHitDistance = -2.0;
  18. #endif
  19. // 获取GBuffer.
  20. FGBufferData GBuffer = GetGBufferDataFromSceneTextures(UV);
  21. float3 N = GBuffer.WorldNormal;
  22. const float SceneDepth = GBuffer.Depth;
  23. const float3 PositionTranslatedWorld = mul( float4( ScreenPos * SceneDepth, SceneDepth, 1 ), View.ScreenToTranslatedWorld ).xyz;
  24. const float3 V = normalize(View.TranslatedWorldCameraOrigin - PositionTranslatedWorld);
  25. // 修改GGX各向异性法线粗糙度.
  26. ModifyGGXAnisotropicNormalRoughness(GBuffer.WorldTangent, GBuffer.Anisotropy, GBuffer.Roughness, N, V);
  27. float Roughness = GetRoughness(GBuffer);
  28. float RoughnessFade = GetRoughnessFade(Roughness);
  29. // 提前退出. 如果用了模板prepass, 则无用.
  30. BRANCH if( RoughnessFade <= 0.0 || GBuffer.ShadingModelID == 0 )
  31. {
  32. return;
  33. }
  34. // 初始化粗糙度, Vis, 最近交点等数据.
  35. float a = Roughness * Roughness;
  36. float a2 = a * a;
  37. float NoV = saturate( dot( N, V ) );
  38. float G_SmithV = 2 * NoV / (NoV + sqrt(NoV * (NoV - NoV * a2) + a2));
  39. float ClosestHitDistanceSqr = INFINITE_FLOAT;
  40. // 根据质量等级设置步数, 每像素光线数量, 是否光滑.
  41. #if SSR_QUALITY == 1
  42. uint NumSteps = 8;
  43. uint NumRays = 1;
  44. bool bGlossy = false;
  45. #elif SSR_QUALITY == 2
  46. uint NumSteps = 16;
  47. uint NumRays = 1;
  48. #if SSR_OUTPUT_FOR_DENOISER
  49. bool bGlossy = true;
  50. #else
  51. bool bGlossy = false;
  52. #endif
  53. #elif SSR_QUALITY == 3
  54. uint NumSteps = 8;
  55. uint NumRays = 4;
  56. bool bGlossy = true;
  57. #else // SSR_QUALITY == 4
  58. uint NumSteps = 12;
  59. uint NumRays = 12;
  60. bool bGlossy = true;
  61. #endif
  62. if( NumRays > 1 ) // 每像素射线大于1
  63. {
  64. // 计算噪点和随机数.
  65. float2 Noise;
  66. Noise.x = InterleavedGradientNoise( SvPosition.xy, View.StateFrameIndexMod8 );
  67. Noise.y = InterleavedGradientNoise( SvPosition.xy, View.StateFrameIndexMod8 * 117 );
  68. uint2 Random = Rand3DPCG16( int3( PixelPos, View.StateFrameIndexMod8 ) ).xy;
  69. // 获取当前法线在切线空间的3个正交的基向量.
  70. float3x3 TangentBasis = GetTangentBasis( N );
  71. // 切线空间的V向量.
  72. float3 TangentV = mul( TangentBasis, V );
  73. float Count = 0;
  74. // 如果粗糙度很小, 说明是光滑的表面, 修改步数和射线数量.
  75. if( Roughness < 0.1 )
  76. {
  77. NumSteps = min( NumSteps * NumRays, 24u );
  78. NumRays = 1;
  79. }
  80. // 发射NumRays条射线.
  81. LOOP for( uint i = 0; i < NumRays; i++ )
  82. {
  83. float StepOffset = Noise.x;
  84. StepOffset -= 0.5;
  85. // Hammersley低差异序列.
  86. float2 E = Hammersley16( i, NumRays, Random );
  87. // 先对E在圆盘上采样, 再结合粗糙度和切线空间的V向量做GGX重要性采样, 最后将结果变换到切线空间, 获得了切线空间的半向量H.
  88. float3 H = mul( ImportanceSampleVisibleGGX(UniformSampleDisk(E), a2, TangentV ).xyz, TangentBasis );
  89. // 计算光源方向.
  90. float3 L = 2 * dot( V, H ) * H - V;
  91. float3 HitUVz;
  92. float Level = 0;
  93. // 如果是光滑表面, 将光源方向转化成其反射向量.
  94. if( Roughness < 0.1 )
  95. {
  96. L = reflect(-V, N);
  97. }
  98. // 执行HZB射线检测.
  99. bool bHit = RayCast(
  100. HZB, HZBSampler,
  101. PositionTranslatedWorld, L, Roughness, SceneDepth,
  102. NumSteps, StepOffset,
  103. HZBUvFactorAndInvFactor,
  104. bDebugPrint,
  105. HitUVz,
  106. Level
  107. );
  108. // 如果命中, 采样场景颜色.
  109. BRANCH if( bHit )
  110. {
  111. ClosestHitDistanceSqr = min(ClosestHitDistanceSqr, ComputeRayHitSqrDistance(PositionTranslatedWorld, HitUVz));
  112. float2 SampleUV;
  113. float Vignette;
  114. // 重投影交点.
  115. ReprojectHit(PrevScreenPositionScaleBias, GBufferVelocityTexture, GBufferVelocityTextureSampler, HitUVz, SampleUV, Vignette);
  116. // 采样场景颜色.
  117. float4 SampleColor = SampleScreenColor( SceneColor, SceneColorSampler, SampleUV ) * Vignette;
  118. SampleColor.rgb *= rcp( 1 + Luminance(SampleColor.rgb) );
  119. OutColor += SampleColor;
  120. }
  121. }
  122. OutColor /= max( NumRays, 0.0001 );
  123. OutColor.rgb *= rcp( 1 - Luminance(OutColor.rgb) );
  124. }
  125. else // 每像素射线==1
  126. {
  127. float StepOffset = InterleavedGradientNoise(SvPosition.xy, View.StateFrameIndexMod8);
  128. StepOffset -= 0.5;
  129. float3 L;
  130. if (bGlossy)
  131. {
  132. float2 E = Rand1SPPDenoiserInput(PixelPos);
  133. #if SSR_OUTPUT_FOR_DENOISER
  134. {
  135. E.y *= 1 - GGX_IMPORTANT_SAMPLE_BIAS;
  136. }
  137. #endif
  138. float3x3 TangentBasis = GetTangentBasis( N );
  139. float3 TangentV = mul( TangentBasis, V );
  140. float3 H = mul( ImportanceSampleVisibleGGX(UniformSampleDisk(E), a2, TangentV ).xyz, TangentBasis );
  141. L = 2 * dot( V, H ) * H - V;
  142. }
  143. else
  144. {
  145. L = reflect( -V, N );
  146. }
  147. float3 HitUVz;
  148. float Level = 0;
  149. // HZB射线检测.
  150. bool bHit = RayCast(
  151. HZB, HZBSampler,
  152. PositionTranslatedWorld, L, Roughness, SceneDepth,
  153. NumSteps, StepOffset,
  154. HZBUvFactorAndInvFactor,
  155. bDebugPrint,
  156. HitUVz,
  157. Level
  158. );
  159. // 处理交点后的采样数据.
  160. BRANCH if( bHit )
  161. {
  162. ClosestHitDistanceSqr = ComputeRayHitSqrDistance(PositionTranslatedWorld, HitUVz);
  163. float2 SampleUV;
  164. float Vignette;
  165. ReprojectHit(PrevScreenPositionScaleBias, GBufferVelocityTexture, GBufferVelocityTextureSampler, HitUVz, SampleUV, Vignette);
  166. OutColor = SampleScreenColor(SceneColor, SceneColorSampler, SampleUV) * Vignette;
  167. }
  168. }
  169. // 颜色过渡.
  170. OutColor *= RoughnessFade;
  171. OutColor *= SSRParams.r;
  172. #if USE_PREEXPOSURE
  173. OutColor.rgb *= PrevSceneColorPreExposureCorrection;
  174. #endif
  175. // 为降噪输出最近交点的距离.
  176. #if SSR_OUTPUT_FOR_DENOISER
  177. {
  178. OutClosestHitDistance = ComputeDenoiserConfusionFactor(
  179. ClosestHitDistanceSqr > 0,
  180. length(View.TranslatedWorldCameraOrigin - PositionTranslatedWorld),
  181. sqrt(ClosestHitDistanceSqr));
  182. }
  183. #endif
  184. }
  185. // PS主入口.
  186. void ScreenSpaceReflectionsPS(
  187. float4 SvPosition : SV_POSITION
  188. , out float4 OutColor : SV_Target0
  189. #if SSR_OUTPUT_FOR_DENOISER
  190. , out float4 OutClosestHitDistance : SV_Target1
  191. #endif
  192. )
  193. {
  194. ScreenSpaceReflections(SvPosition, OutColor
  195. #if SSR_OUTPUT_FOR_DENOISER
  196. ,OutClosestHitDistance
  197. #endif
  198. );
  199. }

下面分析射线检测RayCast的逻辑主堆栈:

  1. // 射线检测.
  2. bool RayCast(
  3. Texture2D Texture, SamplerState Sampler,
  4. float3 RayOriginTranslatedWorld, float3 RayDirection,
  5. float Roughness, float SceneDepth,
  6. uint NumSteps, float StepOffset,
  7. float4 HZBUvFactorAndInvFactor,
  8. bool bDebugPrint,
  9. out float3 OutHitUVz,
  10. out float Level)
  11. {
  12. FSSRTRay Ray = InitScreenSpaceRayFromWorldSpace(RayOriginTranslatedWorld, RayDirection, SceneDepth);
  13. // 检测单个屏幕空间的射线.
  14. return CastScreenSpaceRay(
  15. Texture, Sampler,
  16. Ray,
  17. Roughness, NumSteps, StepOffset,
  18. HZBUvFactorAndInvFactor, bDebugPrint,
  19. /* out */ OutHitUVz,
  20. /* out */ Level);
  21. } // RayCast()
  22. // 检测单个屏幕空间的射线.
  23. bool CastScreenSpaceRay(
  24. Texture2D Texture, SamplerState Sampler,
  25. FSSRTRay Ray,
  26. float Roughness,
  27. uint NumSteps, float StepOffset,
  28. float4 HZBUvFactorAndInvFactor,
  29. bool bDebugPrint,
  30. out float3 OutHitUVz,
  31. out float Level)
  32. {
  33. // 初始化射线的起点, 步长等.
  34. const float3 RayStartScreen = Ray.RayStartScreen;
  35. float3 RayStepScreen = Ray.RayStepScreen;
  36. float3 RayStartUVz = float3( (RayStartScreen.xy * float2( 0.5, -0.5 ) + 0.5) * HZBUvFactorAndInvFactor.xy, RayStartScreen.z );
  37. float3 RayStepUVz = float3( RayStepScreen.xy * float2( 0.5, -0.5 ) * HZBUvFactorAndInvFactor.xy, RayStepScreen.z );
  38. const float Step = 1.0 / NumSteps;
  39. float CompareTolerance = Ray.CompareTolerance * Step;
  40. float LastDiff = 0;
  41. Level = 1;
  42. RayStepUVz *= Step;
  43. float3 RayUVz = RayStartUVz + RayStepUVz * StepOffset;
  44. #if IS_SSGI_SHADER && SSGI_TRACE_CONE
  45. RayUVz = RayStartUVz;
  46. #endif
  47. float4 MultipleSampleDepthDiff;
  48. bool4 bMultipleSampleHit; // TODO: Might consumes VGPRS if bug in compiler.
  49. bool bFoundAnyHit = false;
  50. #if IS_SSGI_SHADER && SSGI_TRACE_CONE
  51. const float ConeAngle = PI / 4;
  52. const float d = 1;
  53. const float r = d * sin(0.5 * ConeAngle);
  54. const float Exp = 1.6; //(d + r) / (d - r);
  55. const float ExpLog2 = log2(Exp);
  56. const float MaxPower = exp2(log2(Exp) * (NumSteps + 1.0)) - 0.9;
  57. {
  58. //Level = 2;
  59. }
  60. #endif
  61. uint i;
  62. // 最多检测NumSteps次, 找到交点就退出循环. 每次检测SSRT_SAMPLE_BATCH_SIZE(4)个采样点.
  63. LOOP
  64. for (i = 0; i < NumSteps; i += SSRT_SAMPLE_BATCH_SIZE)
  65. {
  66. float2 SamplesUV[SSRT_SAMPLE_BATCH_SIZE];
  67. float4 SamplesZ;
  68. float4 SamplesMip;
  69. // 计算采样坐标, 深度和深度纹理Mip层级.
  70. #if IS_SSGI_SHADER && SSGI_TRACE_CONE // SSGI或锥体追踪
  71. {
  72. UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
  73. for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
  74. {
  75. float S = float(i + j) + StepOffset;
  76. float NormalizedPower = (exp2(ExpLog2 * S) - 0.9) / MaxPower;
  77. float Offset = NormalizedPower * NumSteps;
  78. SamplesUV[j] = RayUVz.xy + Offset * RayStepUVz.xy;
  79. SamplesZ[j] = RayUVz.z + Offset * RayStepUVz.z;
  80. }
  81. SamplesMip.xy = Level;
  82. Level += (8.0 / NumSteps) * Roughness;
  83. SamplesMip.zw = Level;
  84. Level += (8.0 / NumSteps) * Roughness;
  85. }
  86. #else // SSR执行此分支.
  87. {
  88. UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
  89. for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
  90. {
  91. SamplesUV[j] = RayUVz.xy + (float(i) + float(j + 1)) * RayStepUVz.xy;
  92. SamplesZ[j] = RayUVz.z + (float(i) + float(j + 1)) * RayStepUVz.z;
  93. }
  94. // 采样深度的Mip层级.
  95. SamplesMip.xy = Level;
  96. // 调整层级, 注意受粗糙度影响, 粗糙度越小, Level也越小.
  97. Level += (8.0 / NumSteps) * Roughness;
  98. SamplesMip.zw = Level;
  99. Level += (8.0 / NumSteps) * Roughness;
  100. }
  101. #endif
  102. // 采样场景深度.
  103. float4 SampleDepth;
  104. {
  105. UNROLL_N(SSRT_SAMPLE_BATCH_SIZE)
  106. for (uint j = 0; j < SSRT_SAMPLE_BATCH_SIZE; j++)
  107. {
  108. SampleDepth[j] = Texture.SampleLevel(Sampler, SamplesUV[j], SamplesMip[j]).r;
  109. }
  110. }
  111. // 计算是否相交.
  112. // 计算射线采样点深度和深度纹理的差异.
  113. MultipleSampleDepthDiff = SamplesZ - SampleDepth;
  114. // 检测是否小于深度对比容忍值.
  115. bMultipleSampleHit = abs(MultipleSampleDepthDiff + CompareTolerance) < CompareTolerance;
  116. // 4个采样点只要有1个满足就算相交.
  117. bFoundAnyHit = any(bMultipleSampleHit);
  118. // 找到交点, 退出循环.
  119. BRANCH
  120. if (bFoundAnyHit)
  121. {
  122. break;
  123. }
  124. LastDiff = MultipleSampleDepthDiff.w;
  125. } // for( uint i = 0; i < NumSteps; i += 4 )
  126. // 计算输出坐标.
  127. BRANCH
  128. if (bFoundAnyHit)
  129. {
  130. (......)
  131. #else // SSR
  132. {
  133. float DepthDiff0 = MultipleSampleDepthDiff[2];
  134. float DepthDiff1 = MultipleSampleDepthDiff[3];
  135. float Time0 = 3;
  136. FLATTEN
  137. if (bMultipleSampleHit[2])
  138. {
  139. DepthDiff0 = MultipleSampleDepthDiff[1];
  140. DepthDiff1 = MultipleSampleDepthDiff[2];
  141. Time0 = 2;
  142. }
  143. FLATTEN
  144. if (bMultipleSampleHit[1])
  145. {
  146. DepthDiff0 = MultipleSampleDepthDiff[0];
  147. DepthDiff1 = MultipleSampleDepthDiff[1];
  148. Time0 = 1;
  149. }
  150. FLATTEN
  151. if (bMultipleSampleHit[0])
  152. {
  153. DepthDiff0 = LastDiff;
  154. DepthDiff1 = MultipleSampleDepthDiff[0];
  155. Time0 = 0;
  156. }
  157. Time0 += float(i);
  158. float Time1 = Time0 + 1;
  159. // 利用线段交点找到更准确的交点.
  160. float TimeLerp = saturate(DepthDiff0 / (DepthDiff0 - DepthDiff1));
  161. float IntersectTime = Time0 + TimeLerp;
  162. OutHitUVz = RayUVz + RayStepUVz * IntersectTime;
  163. }
  164. #endif
  165. // 输出交点的数据.
  166. OutHitUVz.xy *= HZBUvFactorAndInvFactor.zw;
  167. OutHitUVz.xy = OutHitUVz.xy * float2( 2, -2 ) + float2( -1, 1 );
  168. OutHitUVz.xy = OutHitUVz.xy * View.ScreenPositionScaleBias.xy + View.ScreenPositionScaleBias.wz;
  169. }
  170. else
  171. {
  172. OutHitUVz = float3(0, 0, 0);
  173. }
  174. return bFoundAnyHit;
  175. } // CastScreenSpaceRay()

SSR的RayMarch每次循环执行4个采样点,以减少循环次数,深度的Level随着采样点而增加,且受粗糙度影响,这也符合物理原理:越光滑的表面,反射的颜色越清晰,对应的Mip层级越小。在测试交点时允许一定范围的深度误差,以加快检测。

根据RenderDoc截帧,可以发现其使用的场景颜色不是本帧的,而是上一帧的TAA数据:

这主要是因为TAA在后处理阶段执行,在SSR阶段,本帧的场景颜色还未执行后处理,还处于未抗锯齿的原始状态,直接使用势必降低SSR效果。由于SSR的RayMarch的SPP默认才1,意味着此阶段的颜色带有明显的噪点:

这就需要后续步骤进行降噪,UE是在TAA阶段处理的:

上:SSR带噪点纹理;中:TAA的历史帧纹理;下:TAA处理后的纹理。

AO(Ambient Occlusion)的本质是遮蔽来自环境光(非直接光),它可以通过发射很多射线来计算被遮挡的系数:

在实时渲染,计算遮挡系数存在两种方式:物体空间和屏幕空间。物体空间的AO使用真实的几何体进行射线检测,根据场景复杂度呈指数级的消耗,通常比较慢,需要复杂的物体简化和空间的数据结构技术。而屏幕空间的AO在后处理阶段完成,不需要预处理数据,不依赖场景复杂度,实现简单,消耗较低,但不是物理正确的,只能得到近似的遮挡结果。

限制表面法线形成的半球半径来近似AO,可以在封闭区域(enclosed area,如室内)比较好且有效地获得遮挡效果:

SSAO(Screen Space Ambient Occlusion)使用深度缓冲来近似场景物体,给每个像素在球体范围内采样若干次,并测试深度缓冲:

如果超过一半的采样点不被遮挡(通过了深度测试),AO将被应用。如果法线不可以,则需要用球体代替半球体:

这种基于屏幕空间的深度测试是存在瑕疵的,比如下图中的箭头所指的点,虽然不可以通过深度测试,但其实是没有被遮挡的点:

如果拥有法线信息,则可以将SSAO升级成HBAO(Horizon Based Ambient Occlusion),它会在法线形成的半球内采样,在深度缓冲中近似光线追踪:

HBAO采用更加精准的计算公式:

SSAO还可以采用半分辨率来加速计算,但也会在细小的高频物体(如草)产生闪烁的瑕疵。

除了SSAO和HBAO之外,还存在SSDO(Screen Space Directional Occlusion,屏幕空间方向遮挡)的技术,它和SSAO不同之处在于:SSAO会给每个像素生成多个采样点来累积遮挡系数,而SSDO会给每个像素生成多个方向来累积它们的辐射率:

SSDO在可见性测试也有所不同,过程更加复杂,结合下图加以描述。

对每个像素执行以下操作:

  • 在像素P点以半径\(r_{max}\)​结合法线形成的半球内计算N个采样点(A-D),每个采样点执行以下操作:
    • 将采样点投影到屏幕空间。
    • 根据深度缓冲计算表面位置。
    • 如果采样点向上,将视作被遮挡(如图中的A、B、D点)。
    • 如果采样点向下,则P点被来自这个方向(C)的光照照亮(采用模糊的环境图,滤波约等于\(2\pi/N\)​​)。

对于非直接光照的计算,SSDO采用这样的方式:将每个采样点当成很小的区域光,朝向像素法线,然后给每个采样点计算到P点的形状因子(form factor),并累积贡献量,获得一次间接光反弹的近似结果:

相比SSAO,SDAO有方向性,且有颜色,从而获得了类似Color Bleeding的GI效果:

除了上述的AO技术,还有HBAO+、HDAO(High Definition Ambient Occlusion)、Hybrid Ambient OcclusionMSSAO(Multi-Resolution Screen-Space Ambient Occlusion)VXAO(Voxel Accelerated Ambient Occlusion)GTAO(Ground-Truth Ambient Occlusion)等等AO技术。

阐述完SSAO及相关技术的理论,接下来直接进入UE的代码实现。UE的SSAO入口位于RenderBasePass和RenderLights之间:

  1. void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
  2. {
  3. (......)
  4. RenderBasePass(...);
  5. (......)
  6. // 准备光照之前的组合光照阶段,如延迟贴花、SSO等。
  7. GCompositionLighting.Reset();
  8. if (FeatureLevel >= ERHIFeatureLevel::SM5)
  9. {
  10. (......)
  11. for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
  12. {
  13. const FViewInfo& View = Views[ViewIndex];
  14. // SSAO在此接口渲染.
  15. GCompositionLighting.ProcessAfterBasePass(GraphBuilder, Scene->UniformBuffers, View, SceneTextures);
  16. }
  17. (......)
  18. }
  19. (......)
  20. RenderLights(...);
  21. }

下面分析GCompositionLighting::ProcessAfterBasePass

  1. // Engine\Source\Runtime\Renderer\Private\CompositionLighting\CompositionLighting.cpp
  2. void FCompositionLighting::ProcessAfterBasePass(
  3. FRDGBuilder& GraphBuilder,
  4. FPersistentUniformBuffers& UniformBuffers,
  5. const FViewInfo& View,
  6. TRDGUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBuffer)
  7. {
  8. FSceneRenderTargets& SceneContext = FSceneRenderTargets::Get(GraphBuilder.RHICmdList);
  9. if (CanOverlayRayTracingOutput(View))
  10. {
  11. const FSceneViewFamily& ViewFamily = *View.Family;
  12. RDG_EVENT_SCOPE(GraphBuilder, "LightCompositionTasks_PreLighting");
  13. AddPass(GraphBuilder, [&UniformBuffers, &View](FRHICommandList&)
  14. {
  15. UniformBuffers.UpdateViewUniformBuffer(View);
  16. });
  17. (......)
  18. // Forward shading SSAO is applied before the base pass using only the depth buffer.
  19. // SSAO只需要深度缓冲, 不需要GBuffer, 可以在
  20. if (!IsForwardShadingEnabled(View.GetShaderPlatform()))
  21. {
  22. FScreenPassRenderTarget FinalTarget = FScreenPassRenderTarget(GraphBuilder.RegisterExternalTexture(SceneContext.ScreenSpaceAO, TEXT("AmbientOcclusionDirect")), View.ViewRect, ERenderTargetLoadAction::ENoAction);
  23. FScreenPassTexture AmbientOcclusion;
  24. // 根据SSAO级别和类型选择不同的渲染方式.
  25. const uint32 SSAOLevels = FSSAOHelper::ComputeAmbientOcclusionPassCount(View);
  26. if (SSAOLevels)
  27. {
  28. const EGTAOType GTAOType = FSSAOHelper::GetGTAOPassType(View, SSAOLevels);
  29. TUniformBufferRef<FSceneTextureUniformParameters> SceneTexturesUniformBufferRHI = CreateSceneTextureUniformBuffer(GraphBuilder.RHICmdList, View.FeatureLevel);
  30. // 分离的GTAO.
  31. if (GTAOType == EGTAOType::EAsyncHorizonSearch || GTAOType == EGTAOType::EAsyncCombinedSpatial)
  32. {
  33. (......)
  34. AmbientOcclusion = AddPostProcessingGTAOPostAsync(GraphBuilder, View, Parameters, GTAOHorizons, FinalTarget);
  35. }
  36. else // 非分离的GTAO
  37. {
  38. if (GTAOType == EGTAOType::ENonAsync)
  39. {
  40. FGTAOCommonParameters Parameters = GetGTAOCommonParameters(GraphBuilder, View, SceneTexturesUniformBuffer, SceneTexturesUniformBufferRHI, GTAOType);
  41. AmbientOcclusion = AddPostProcessingGTAOAllPasses(GraphBuilder, View, Parameters, FinalTarget);
  42. }
  43. else // 默认情况下, UE执行此分支.
  44. {
  45. FSSAOCommonParameters Parameters = GetSSAOCommonParameters(GraphBuilder, View, SceneTexturesUniformBuffer, SceneTexturesUniformBufferRHI, SSAOLevels, true);
  46. AmbientOcclusion = AddPostProcessingAmbientOcclusion(GraphBuilder, View, Parameters, FinalTarget);
  47. }
  48. (......)
  49. }
  50. SceneContext.bScreenSpaceAOIsValid = true;
  51. }
  52. }
  53. }
  54. }

在默认情况下,UE会执行AddPostProcessingAmbientOcclusion,进入此接口分析:

  1. // @param Levels 0..3, how many different resolution levels we want to render
  2. static FScreenPassTexture AddPostProcessingAmbientOcclusion(
  3. FRDGBuilder& GraphBuilder,
  4. const FViewInfo& View,
  5. const FSSAOCommonParameters& CommonParameters,
  6. FScreenPassRenderTarget FinalTarget)
  7. {
  8. check(CommonParameters.Levels >= 0 && CommonParameters.Levels <= 3);
  9. FScreenPassTexture AmbientOcclusionInMip1;
  10. FScreenPassTexture AmbientOcclusionPassMip1;
  11. // 如果Level>=2, 则执行1~2次SetupPass, 1~2次StepPass, 1次FinalPass
  12. if (CommonParameters.Levels >= 2)
  13. {
  14. AmbientOcclusionInMip1 =
  15. AddAmbientOcclusionSetupPass(
  16. GraphBuilder,
  17. View,
  18. CommonParameters,
  19. CommonParameters.SceneDepth);
  20. FScreenPassTexture AmbientOcclusionPassMip2;
  21. if (CommonParameters.Levels >= 3)
  22. {
  23. FScreenPassTexture AmbientOcclusionInMip2 =
  24. AddAmbientOcclusionSetupPass(
  25. GraphBuilder,
  26. View,
  27. CommonParameters,
  28. AmbientOcclusionInMip1);
  29. AmbientOcclusionPassMip2 =
  30. AddAmbientOcclusionStepPass(
  31. GraphBuilder,
  32. View,
  33. CommonParameters,
  34. AmbientOcclusionInMip2,
  35. AmbientOcclusionInMip2,
  36. FScreenPassTexture(),
  37. CommonParameters.HZBInput);
  38. }
  39. AmbientOcclusionPassMip1 =
  40. AddAmbientOcclusionStepPass(
  41. GraphBuilder,
  42. View,
  43. CommonParameters,
  44. AmbientOcclusionInMip1,
  45. AmbientOcclusionInMip1,
  46. AmbientOcclusionPassMip2,
  47. CommonParameters.HZBInput);
  48. }
  49. FScreenPassTexture FinalOutput =
  50. AddAmbientOcclusionFinalPass(
  51. GraphBuilder,
  52. View,
  53. CommonParameters,
  54. CommonParameters.GBufferA,
  55. AmbientOcclusionInMip1,
  56. AmbientOcclusionPassMip1,
  57. CommonParameters.HZBInput,
  58. FinalTarget);
  59. return FinalOutput;
  60. }

上面会根据不同的Levels执行不同次数的Setup、Step、Final通道,以笔者的截帧为例,执行了一次Setup两次PS:

Setup阶段只要是对法线执行下采样,以获得半分辨率的法线。下采样的PS代码如下:

  1. // Engine\Shaders\Private\PostProcessAmbientOcclusion.usf
  2. void MainSetupPS(in noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor0 : SV_Target0)
  3. {
  4. float2 ViewPortSize = AOViewport_ViewportSize;
  5. float2 InUV = UVAndScreenPos.xy;
  6. // 4个采样点.
  7. float2 UV[4];
  8. UV[0] = InUV + float2(-0.5f, -0.5f) * InputExtentInverse;
  9. UV[1] = min(InUV + float2( 0.5f, -0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
  10. UV[2] = min(InUV + float2(-0.5f, 0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
  11. UV[3] = min(InUV + float2( 0.5f, 0.5f) * InputExtentInverse, View.BufferBilinearUVMinMax.zw);
  12. float4 Samples[4];
  13. // 获取输入纹理的4个采样点数据.
  14. UNROLL for(uint i = 0; i < 4; ++i)
  15. {
  16. #if COMPUTE_SHADER || FORWARD_SHADING
  17. // Async compute and forward shading don\'t have access to the gbuffer.
  18. Samples[i].rgb = normalize(ReconstructNormalFromDepthBuffer(float4(UV[i] * ViewPortSize, SvPosition.zw))) * 0.5f + 0.5f;
  19. #else
  20. Samples[i].rgb = GetGBufferData(UV[i], true).WorldNormal * 0.5f + 0.5f;
  21. #endif
  22. Samples[i].a = CalcSceneDepth(UV[i]);
  23. }
  24. float MaxZ = max( max(Samples[0].a, Samples[1].a), max(Samples[2].a, Samples[3].a));
  25. // 平均颜色值, 此处采样了深度相似度来作为缩放权重.
  26. float4 AvgColor = 0.0f;
  27. if (USE_NORMALS)
  28. {
  29. AvgColor = 0.0001f;
  30. {
  31. UNROLL for(uint i = 0; i < 4; ++i)
  32. {
  33. AvgColor += float4(Samples[i].rgb, 1) * ComputeDepthSimilarity(Samples[i].a, MaxZ, ThresholdInverse);
  34. }
  35. AvgColor.rgb /= AvgColor.w;
  36. }
  37. }
  38. OutColor0 = float4(AvgColor.rgb, MaxZ / Constant_Float16F_Scale);
  39. }

上面在平均颜色值时使用了深度相似度ComputeDepthSimilarity作为颜色的缩放权重:

  1. // 0表示非常不相似, 1表示非常相似.
  2. float ComputeDepthSimilarity(float DepthA, float DepthB, float TweakScale)
  3. {
  4. return saturate(1 - abs(DepthA - DepthB) * TweakScale);
  5. }

在StepPass,执行半分辨率的AO计算,而在FinalPass,执行了上采样的AO计算,它们使用了一样的PS Shader代码(但参数和宏不一样):

  1. void MainPS(in noperspective float4 UVAndScreenPos : TEXCOORD0, float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
  2. {
  3. MainPSandCS(UVAndScreenPos, SvPosition, OutColor);
  4. }
  5. // AO计算的主逻辑
  6. void MainPSandCS(in float4 UVAndScreenPos, float4 SvPosition, out float4 OutColor)
  7. {
  8. OutColor = 0;
  9. // 下列常量在C++层设置而来.
  10. float AmbientOcclusionPower = ScreenSpaceAOParams[0].x;
  11. float Ratio = ScreenSpaceAOParams[1].w;
  12. float AORadiusInShader = ScreenSpaceAOParams[1].z;
  13. float InvAmbientOcclusionDistance = ScreenSpaceAOParams[0].z;
  14. float AmbientOcclusionIntensity = ScreenSpaceAOParams[0].w;
  15. float2 ViewportUVToRandomUV = ScreenSpaceAOParams[1].xy;
  16. float AmbientOcclusionBias = ScreenSpaceAOParams[0].y;
  17. float ScaleFactor = ScreenSpaceAOParams[2].x;
  18. float ScaleRadiusInWorldSpace = ScreenSpaceAOParams[2].z;
  19. float2 UV = UVAndScreenPos.xy;
  20. float2 ScreenPos = UVAndScreenPos.zw;
  21. float InvTanHalfFov = ScreenSpaceAOParams[3].w;
  22. float3 FovFix = float3(InvTanHalfFov, Ratio * InvTanHalfFov, 1);
  23. float3 InvFovFix = 1.0f / FovFix;
  24. float SceneDepth = GetDepthFromAOInput(UV);
  25. float3 WorldNormal = GetWorldSpaceNormalFromAOInput(UV, SvPosition);
  26. // 如果不使用法线(!USE_NORMALS), ViewSpaceNormal可能是NaN.
  27. float3 ViewSpaceNormal = normalize(mul(WorldNormal, (float3x3)View.TranslatedWorldToView));
  28. float3 ViewSpacePosition = ReconstructCSPos(SceneDepth, ScreenPos);
  29. // 计算AO的半球体实际半径.
  30. float ActualAORadius = AORadiusInShader * lerp(SceneDepth, 1, ScaleRadiusInWorldSpace);
  31. // 修复之后增加偏移.
  32. if (USE_NORMALS)
  33. {
  34. ViewSpacePosition += AmbientOcclusionBias * SceneDepth * ScaleFactor * (ViewSpaceNormal * FovFix);
  35. }
  36. float2 WeightAccumulator = 0.0001f;
  37. // 根据采样质量选择不同的随机向量.
  38. #if AO_SAMPLE_QUALITY != 0
  39. // no SSAO in this pass, only upsampling
  40. #if AO_SAMPLE_QUALITY == 1
  41. // no 4x4 randomization
  42. float2 RandomVec = float2(0, 1) * ActualAORadius;
  43. {
  44. #elif AO_SAMPLE_QUALITY == 2
  45. // 从一个4x4重复的纹理中提取16个基向量(旋转和缩放)中的一个.
  46. float2 RandomVec = (Texture2DSample(RandomNormalTexture, RandomNormalTextureSampler, UV * ViewportUVToRandomUV).rg * 2 - 1) * ActualAORadius;
  47. {
  48. #else // AO_SAMPLE_QUALITY == 3
  49. // 从一个4x4重复的纹理中提取16个基向量(旋转和缩放)中的一个,如果启用TemporalAA,则随时间变化.
  50. // 通过多帧, 仅当TAA启用时, 每帧增加一点抖动可获得更高的质量, 但可能引发鬼影效果.
  51. const float2 TemporalOffset = ScreenSpaceAOParams[3].xy;
  52. // 调试模式.
  53. const bool bDebugLookups = DEBUG_LOOKUPS && ViewSpacePosition.x > 0;
  54. float2 RandomVec = (Texture2DSample(RandomNormalTexture, RandomNormalTextureSampler, TemporalOffset + UV * ViewportUVToRandomUV).rg * 2 - 1) * ActualAORadius;
  55. {
  56. #endif // AO_SAMPLE_QUALITY ==
  57. if(bDebugLookups && ViewSpacePosition.y > 0)
  58. {
  59. // top sample are not per pixel rotated
  60. RandomVec = float2(0, 1) * ActualAORadius;
  61. }
  62. float2 FovFixXY = FovFix.xy * (1.0f / ViewSpacePosition.z);
  63. float4 RandomBase = float4(RandomVec, -RandomVec.y, RandomVec.x) * float4(FovFixXY, FovFixXY);
  64. float2 ScreenSpacePos = ViewSpacePosition.xy / ViewSpacePosition.z;
  65. // .x意味着对于非常各向异性的视图用x来缩放.
  66. float InvHaloSize = 1.0f / (ActualAORadius * FovFixXY.x * 2);
  67. float3 ScaledViewSpaceNormal = ViewSpaceNormal;
  68. #if OPTIMIZATION_O1
  69. ScaledViewSpaceNormal *= 0.08f * lerp(SceneDepth, 1000, ScaleRadiusInWorldSpace);
  70. #endif
  71. UNROLL for(int i = 0; i < SAMPLESET_ARRAY_SIZE; ++i)
  72. {
  73. // -1..1
  74. float2 UnrotatedRandom = OcclusionSamplesOffsets[i].xy;
  75. float2 LocalRandom = (UnrotatedRandom.x * RandomBase.xy + UnrotatedRandom.y * RandomBase.zw);
  76. if (bDebugLookups)
  77. {
  78. (......)
  79. }
  80. else if (USE_NORMALS) // 有法线
  81. {
  82. float3 LocalAccumulator = 0;
  83. UNROLL for(uint step = 0; step < SAMPLE_STEPS; ++step)
  84. {
  85. // 运行时是常量.
  86. float Scale = (step + 1) / (float)SAMPLE_STEPS;
  87. // 运行时是常量(越高对纹理的缓存和性能越好, 越低对质量越好).
  88. float MipLevel = ComputeMipLevel(i, step);
  89. // 单步的采样点.
  90. float3 StepSample = WedgeWithNormal(ScreenSpacePos, Scale * LocalRandom, InvFovFix, ViewSpacePosition, ScaledViewSpaceNormal, InvHaloSize, MipLevel);
  91. // 组合水平方向的样本.
  92. LocalAccumulator = lerp(LocalAccumulator, float3(max(LocalAccumulator.xy, StepSample.xy), 1), StepSample.z);
  93. }
  94. // Square(): 用角度的二次曲线缩放面积, 获得更暗一点的效果.
  95. WeightAccumulator += float2(Square(1 - LocalAccumulator.x) * LocalAccumulator.z, LocalAccumulator.z);
  96. WeightAccumulator += float2(Square(1 - LocalAccumulator.y) * LocalAccumulator.z, LocalAccumulator.z);
  97. }
  98. else // 没有法线
  99. {
  100. (......)
  101. }
  102. }
  103. }
  104. #endif // #if AO_SAMPLE_QUALITY == 0
  105. OutColor.r = WeightAccumulator.x / WeightAccumulator.y;
  106. OutColor.gb = float2(0, 0);
  107. if(!bDebugLookups)
  108. {
  109. #if COMPUTE_SHADER || FORWARD_SHADING
  110. // In compute, Input1 and Input2 are not necessarily valid.
  111. float4 Filtered = 1;
  112. #else
  113. // 上采样.
  114. float4 Filtered = ComputeUpsampleContribution(SceneDepth, UV, WorldNormal);
  115. #endif
  116. // recombined result from multiple resolutions
  117. OutColor.r = lerp(OutColor.r, Filtered.r, ComputeLerpFactor());
  118. }
  119. #if !USE_AO_SETUP_AS_INPUT // FinalPass会执行此逻辑.
  120. if(!bDebugLookups)
  121. {
  122. // 全分辨率
  123. // 在距离上软过渡AO
  124. {
  125. float Mul = ScreenSpaceAOParams[4].x;
  126. float Add = ScreenSpaceAOParams[4].y;
  127. OutColor.r = lerp(OutColor.r, 1, saturate(SceneDepth * Mul + Add));
  128. }
  129. // 用户修改的AO
  130. OutColor.r = 1 - (1 - pow(abs(OutColor.r), AmbientOcclusionPower)) * AmbientOcclusionIntensity;
  131. // 只输出单通道
  132. OutColor = OutColor.r;
  133. }
  134. else
  135. {
  136. OutColor.r = pow(1 - OutColor.r, 16); // constnt is tweaked with radius and sample count
  137. }
  138. #endif
  139. // SM4不支持ddx_fine()
  140. #if !COMPUTE_SHADER && QUAD_MESSAGE_PASSING_BLUR > 0 && FEATURE_LEVEL >= FEATURE_LEVEL_SM5
  141. {
  142. // .x: AO output, .y:SceneDepth .zw:view space normal
  143. float4 CenterPixel = float4(OutColor.r, SceneDepth, normalize(ViewSpaceNormal).xy);
  144. float4 dX = ddx_fine(CenterPixel);
  145. float4 dY = ddy_fine(CenterPixel);
  146. int2 Mod = (uint2)(SvPosition.xy) % 2;
  147. float4 PixA = CenterPixel;
  148. float4 PixB = CenterPixel - dX * (Mod.x * 2 - 1);
  149. float4 PixC = CenterPixel - dY * (Mod.y * 2 - 1);
  150. float WeightA = 1.0f;
  151. float WeightB = 1.0f;
  152. float WeightC = 1.0f;
  153. // 用法线计算权重.
  154. #if QUAD_MESSAGE_PASSING_NORMAL
  155. const float NormalTweak = 4.0f;
  156. float3 NormalA = ReconstructNormal(PixA.zw);
  157. float3 NormalB = ReconstructNormal(PixB.zw);
  158. float3 NormalC = ReconstructNormal(PixC.zw);
  159. WeightB *= saturate(pow(saturate(dot(NormalA, NormalB)), NormalTweak));
  160. WeightC *= saturate(pow(saturate(dot(NormalA, NormalC)), NormalTweak));
  161. #endif
  162. // 用深度计算权重.
  163. #if QUAD_MESSAGE_PASSING_DEPTH
  164. const float DepthTweak = 1;
  165. float InvDepth = 1.0f / PixA.y;
  166. WeightB *= 1 - saturate(abs(1 - PixB.y * InvDepth) * DepthTweak);
  167. WeightC *= 1 - saturate(abs(1 - PixC.y * InvDepth) * DepthTweak);
  168. #endif
  169. // + 1.0f to avoid div by 0
  170. float InvWeightABC = 1.0f / (WeightA + WeightB + WeightC);
  171. // 缩放权重.
  172. WeightA *= InvWeightABC;
  173. WeightB *= InvWeightABC;
  174. WeightC *= InvWeightABC;
  175. // 用权重计算最终的输出颜色.
  176. OutColor = WeightA * PixA.x + WeightB * PixB.x + WeightC * PixC.x;
  177. }
  178. #endif
  179. }

上述提供了法线和深度作为权重的算法,它们的计算公式如下(摘自MSSAO):

UE在PC端的SSAO使用了法线,所以其在计算采样点时考虑了法线,由WedgeWithNormal担当:

  1. float3 WedgeWithNormal(float2 ScreenSpacePosCenter, float2 InLocalRandom, float3 InvFovFix, float3 ViewSpacePosition, float3 ScaledViewSpaceNormal, float InvHaloSize, float MipLevel)
  2. {
  3. float2 ScreenSpacePosL = ScreenSpacePosCenter + InLocalRandom;
  4. float2 ScreenSpacePosR = ScreenSpacePosCenter - InLocalRandom;
  5. float AbsL = GetHZBDepth(ScreenSpacePosL, MipLevel);
  6. float AbsR = GetHZBDepth(ScreenSpacePosR, MipLevel);
  7. float3 SamplePositionL = ReconstructCSPos(AbsL, ScreenSpacePosL);
  8. float3 SamplePositionR = ReconstructCSPos(AbsR, ScreenSpacePosR);
  9. float3 DeltaL = (SamplePositionL - ViewSpacePosition) * InvFovFix;
  10. float3 DeltaR = (SamplePositionR - ViewSpacePosition) * InvFovFix;
  11. #if OPTIMIZATION_O1
  12. float InvNormAngleL = saturate(dot(DeltaL, ScaledViewSpaceNormal) / dot(DeltaL, DeltaL));
  13. float InvNormAngleR = saturate(dot(DeltaR, ScaledViewSpaceNormal) / dot(DeltaR, DeltaR));
  14. float Weight = 1;
  15. #else
  16. float InvNormAngleL = saturate(dot(DeltaL, ScaledViewSpaceNormal) * rsqrt(dot(DeltaL, DeltaL)));
  17. float InvNormAngleR = saturate(dot(DeltaR, ScaledViewSpaceNormal) * rsqrt(dot(DeltaR, DeltaR)));
  18. float Weight =
  19. saturate(1.0f - length(DeltaL) * InvHaloSize)
  20. * saturate(1.0f - length(DeltaR) * InvHaloSize);
  21. #endif
  22. return float3(InvNormAngleL, InvNormAngleR, Weight);
  23. }

利用法线构造了一个楔形(下图),在此楔形内生成采样数据和权重。主要过程是在法线周围生成屏幕空间的左右偏移量,结合HZB深度值重建出左右两个采样位置,最后输出左右角度的倒数和权重。

从上面的代码分析可知,UE的SSAO在渲染流程、下采样、权重计算、AO混合等等和MSSAO版本高度相似:

经过上述的步骤渲染之后,会得到如下所示的全分辨率单通道ScreenSpaceAO纹理数据:

那么ScreenSpaceAO又是在哪里怎么被应用到光照中呢?下面继续追踪和分析之。

利用RenderDoc截帧分析可以发现有几个渲染阶段都会用到ScreenSpaceAO纹理:

上图显示组合非直接光和AO、渲染延迟标准光源、反射环境和天空光的渲染阶段都会引用到ScreenSpaceAO。

其中组合非直接光和AO涉及到SSAO的代码逻辑如下:

  1. // Engine\Shaders\Private\DiffuseIndirectComposite.usf
  2. Texture2D AmbientOcclusionTexture;
  3. SamplerState AmbientOcclusionSampler;
  4. void MainPS(float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
  5. {
  6. (......)
  7. // 采样SSAO数据.
  8. float DynamicAmbientOcclusion = 1.0f;
  9. #if DIM_APPLY_AMBIENT_OCCLUSION
  10. DynamicAmbientOcclusion = AmbientOcclusionTexture.SampleLevel(AmbientOcclusionSampler, BufferUV, 0).r;
  11. #endif
  12. // 计算最终AO: 材质AO * SSAO.
  13. float FinalAmbientOcclusion = GBuffer.GBufferAO * DynamicAmbientOcclusion;
  14. (......)
  15. {
  16. float AOMask = (GBuffer.ShadingModelID != SHADINGMODELID_UNLIT);
  17. // 利用AOMask和AmbientOcclusionStaticFraction作为权重插值最终AO, 然后将AO值保存到Alpha通道, 而不是直接应用到rgb颜色中.
  18. OutColor.a = lerp(1.0f, FinalAmbientOcclusion, AOMask * AmbientOcclusionStaticFraction);
  19. }
  20. }

渲染延迟标准光源涉及的SSAO计算代码如下:

  1. // Engine\Shaders\Private\DeferredShadingCommon.ush
  2. FScreenSpaceData GetScreenSpaceData(float2 UV, bool bGetNormalizedNormal = true)
  3. {
  4. FScreenSpaceData Out;
  5. Out.GBuffer = GetGBufferData(UV, bGetNormalizedNormal);
  6. // 采样SSAO
  7. float4 ScreenSpaceAO = Texture2DSampleLevel(SceneTexturesStruct.ScreenSpaceAOTexture, SceneTexturesStruct_ScreenSpaceAOTextureSampler, UV, 0);
  8. // 直接将SSAO作为最终AO.
  9. Out.AmbientOcclusion = ScreenSpaceAO.r;
  10. return Out;
  11. }
  12. FDeferredLightingSplit GetDynamicLightingSplit(..., float AmbientOcclusion, ...)
  13. {
  14. (......)
  15. FShadowTerms Shadow;
  16. // 直接将SSAO作为阴影的初始值, 后面还会叠加阴影的系数.
  17. Shadow.SurfaceShadow = AmbientOcclusion;
  18. (......)
  19. LightAccumulator_AddSplit( LightAccumulator, Lighting.Diffuse, Lighting.Specular, Lighting.Diffuse,
  20. // 注意, 此处利用Shadow.SurfaceShadow缩放了光源颜色.
  21. LightColor * LightMask * Shadow.SurfaceShadow,
  22. bNeedsSeparateSubsurfaceLightAccumulation );
  23. (......)
  24. }

延迟灯光渲染阶段将叠加了AO和阴影的系数缩放了光源颜色,从而对漫反射和高光都起了影响。

反射环境和天空光也类似,此处就不再冗余分析了。本节的最后放出UE的SSAO开启和关闭的效果对比图(上无下有):

此外,UE也支持GTAO的版本,本文就不解析了。

SSGI(Screen Space Global IIlumination)译为屏幕空间的全局光照,是基于屏幕空间的GBuffer数据进行光线追踪的GI技术。

默认情况下,UE是关闭SSGI的,需要在工程配置里显式开启:

开启SSGI后,可以增加角落、凹槽等表面的间接光照,减少它们的漏光,提升画面可信度。

上:关闭SSGI;下:开启SSGI,桌子和椅子下方的漏光明显减少。

UE的SSGI是在DiffuseIndirectAndAO内完成的,意味着它也是位于BasePass和Lighting之间。SSGI主要分为以下几个阶段:

  • SSGI渲染阶段:
    • 第一次下采样上一帧数据(4个Mip层级)。
    • 第二次下采样上一帧数据(1个Mip层级,最低级)。
    • 计算屏幕空间非直接漫反射。
  • 屏幕空间降噪:
    • 压缩元数据。
    • 重建数据。
    • 时间累积降噪。
  • 组合SSGI和SSAO等非直接光。

下面就按照上述步骤阐述SSGI的具体实现过程。首先看SSGI渲染阶段:

  1. // Engine\Source\Runtime\Renderer\Private\ScreenSpaceRayTracing.cpp
  2. void RenderScreenSpaceDiffuseIndirect(
  3. FRDGBuilder& GraphBuilder,
  4. const FSceneTextureParameters& SceneTextures,
  5. const FRDGTextureRef CurrentSceneColor,
  6. const FViewInfo& View,
  7. IScreenSpaceDenoiser::FAmbientOcclusionRayTracingConfig* OutRayTracingConfig,
  8. IScreenSpaceDenoiser::FDiffuseIndirectInputs* OutDenoiserInputs)
  9. {
  10. // 初始化质量, 标记, 尺寸等数据.
  11. const int32 Quality = FMath::Clamp( CVarSSGIQuality.GetValueOnRenderThread(), 1, 4 );
  12. bool bHalfResolution = IsSSGIHalfRes();
  13. FIntPoint GroupSize;
  14. int32 RayCountPerPixel;
  15. GetSSRTGIShaderOptionsForQuality(Quality, &GroupSize, &RayCountPerPixel);
  16. FIntRect Viewport = View.ViewRect;
  17. if (bHalfResolution)
  18. {
  19. Viewport = FIntRect::DivideAndRoundUp(Viewport, 2);
  20. }
  21. RDG_EVENT_SCOPE(GraphBuilder, "SSGI %dx%d", Viewport.Width(), Viewport.Height());
  22. const FVector2D ViewportUVToHZBBufferUV(
  23. float(View.ViewRect.Width()) / float(2 * View.HZBMipmap0Size.X),
  24. float(View.ViewRect.Height()) / float(2 * View.HZBMipmap0Size.Y)
  25. );
  26. FRDGTexture* FurthestHZBTexture = GraphBuilder.RegisterExternalTexture(View.HZB);
  27. FRDGTexture* ClosestHZBTexture = GraphBuilder.RegisterExternalTexture(View.ClosestHZB);
  28. // 重投影和下采样上一帧的颜色.
  29. FRDGTexture* ReducedSceneColor;
  30. FRDGTexture* ReducedSceneAlpha = nullptr;
  31. {
  32. // 忽略最前面mip的数量.
  33. const int32 DownSamplingMip = 1;
  34. // mip数量.
  35. const int32 kNumMips = 5;
  36. bool bUseLeakFree = View.PrevViewInfo.ScreenSpaceRayTracingInput != nullptr;
  37. // 分配ReducedSceneColor.
  38. {
  39. FIntPoint RequiredSize = SceneTextures.SceneDepthTexture->Desc.Extent / (1 << DownSamplingMip);
  40. int32 QuantizeMultiple = 1 << (kNumMips - 1);
  41. FIntPoint QuantizedSize = FIntPoint::DivideAndRoundUp(RequiredSize, QuantizeMultiple);
  42. FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
  43. FIntPoint(QuantizeMultiple * QuantizedSize.X, QuantizeMultiple * QuantizedSize.Y),
  44. PF_FloatR11G11B10,
  45. FClearValueBinding::None,
  46. TexCreate_ShaderResource | TexCreate_UAV);
  47. Desc.NumMips = kNumMips;
  48. ReducedSceneColor = GraphBuilder.CreateTexture(Desc, TEXT("SSRTReducedSceneColor"));
  49. if (bUseLeakFree)
  50. {
  51. Desc.Format = PF_A8;
  52. ReducedSceneAlpha = GraphBuilder.CreateTexture(Desc, TEXT("SSRTReducedSceneAlpha"));
  53. }
  54. }
  55. // 处理第易次下采样Pass.(有4个mip纹理)
  56. // FSSRTPrevFrameReductionCS参数处理.
  57. FSSRTPrevFrameReductionCS::FParameters DefaultPassParameters;
  58. {
  59. DefaultPassParameters.SceneTextures = SceneTextures;
  60. DefaultPassParameters.View = View.ViewUniformBuffer;
  61. DefaultPassParameters.ReducedSceneColorSize = FVector2D(
  62. ReducedSceneColor->Desc.Extent.X, ReducedSceneColor->Desc.Extent.Y);
  63. DefaultPassParameters.ReducedSceneColorTexelSize = FVector2D(
  64. 1.0f / float(ReducedSceneColor->Desc.Extent.X), 1.0f / float(ReducedSceneColor->Desc.Extent.Y));
  65. }
  66. {
  67. FSSRTPrevFrameReductionCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSSRTPrevFrameReductionCS::FParameters>();
  68. *PassParameters = DefaultPassParameters;
  69. FIntPoint ViewportOffset;
  70. FIntPoint ViewportExtent;
  71. FIntPoint BufferSize;
  72. // 是否不漏光的处理.
  73. if (bUseLeakFree)
  74. {
  75. BufferSize = View.PrevViewInfo.ScreenSpaceRayTracingInput->GetDesc().Extent;
  76. ViewportOffset = View.ViewRect.Min; // TODO
  77. ViewportExtent = View.ViewRect.Size();
  78. PassParameters->PrevSceneColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.ScreenSpaceRayTracingInput);
  79. PassParameters->PrevSceneColorSampler = TStaticSamplerState<SF_Point>::GetRHI();
  80. PassParameters->PrevSceneDepth = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.DepthBuffer);
  81. PassParameters->PrevSceneDepthSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
  82. }
  83. else
  84. {
  85. BufferSize = View.PrevViewInfo.TemporalAAHistory.ReferenceBufferSize;
  86. ViewportOffset = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Min;
  87. ViewportExtent = View.PrevViewInfo.TemporalAAHistory.ViewportRect.Size();
  88. PassParameters->PrevSceneColor = GraphBuilder.RegisterExternalTexture(View.PrevViewInfo.TemporalAAHistory.RT[0]);
  89. PassParameters->PrevSceneColorSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
  90. }
  91. PassParameters->PrevSceneColorPreExposureCorrection = View.PreExposure / View.PrevViewInfo.SceneColorPreExposure;
  92. PassParameters->PrevScreenPositionScaleBias = FVector4(
  93. ViewportExtent.X * 0.5f / BufferSize.X,
  94. -ViewportExtent.Y * 0.5f / BufferSize.Y,
  95. (ViewportExtent.X * 0.5f + ViewportOffset.X) / BufferSize.X,
  96. (ViewportExtent.Y * 0.5f + ViewportOffset.Y) / BufferSize.Y);
  97. // 给每个mip创建输出的UAV.
  98. for (int32 MipLevel = 0; MipLevel < (PassParameters->ReducedSceneColorOutput.Num() - DownSamplingMip); MipLevel++)
  99. {
  100. PassParameters->ReducedSceneColorOutput[DownSamplingMip + MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneColor, MipLevel));
  101. if (ReducedSceneAlpha)
  102. PassParameters->ReducedSceneAlphaOutput[DownSamplingMip + MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneAlpha, MipLevel));
  103. }
  104. FSSRTPrevFrameReductionCS::FPermutationDomain PermutationVector;
  105. PermutationVector.Set<FSSRTPrevFrameReductionCS::FLowerMips>(false);
  106. PermutationVector.Set<FSSRTPrevFrameReductionCS::FLeakFree>(bUseLeakFree);
  107. // 增加CS Pass以下采样颜色和alpha.
  108. TShaderMapRef<FSSRTPrevFrameReductionCS> ComputeShader(View.ShaderMap, PermutationVector);
  109. FComputeShaderUtils::AddPass(
  110. GraphBuilder,
  111. RDG_EVENT_NAME("PrevFrameReduction(LeakFree=%i) %dx%d",
  112. bUseLeakFree ? 1 : 0,
  113. View.ViewRect.Width(), View.ViewRect.Height()),
  114. ComputeShader,
  115. PassParameters,
  116. FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), 8));
  117. }
  118. // 处理第二次下采样Pass.(只有1个mip纹理)
  119. for (int32 i = 0; i < 1; i++)
  120. {
  121. int32 SrcMip = i * 3 + 2 - DownSamplingMip;
  122. int32 StartDestMip = SrcMip + 1;
  123. int32 Divisor = 1 << (StartDestMip + DownSamplingMip);
  124. FSSRTPrevFrameReductionCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FSSRTPrevFrameReductionCS::FParameters>();
  125. *PassParameters = DefaultPassParameters;
  126. PassParameters->HigherMipTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(ReducedSceneColor, SrcMip));
  127. if (bUseLeakFree)
  128. {
  129. check(ReducedSceneAlpha);
  130. PassParameters->HigherMipTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  131. PassParameters->HigherAlphaMipTexture = GraphBuilder.CreateSRV(FRDGTextureSRVDesc::CreateForMipLevel(ReducedSceneAlpha, SrcMip));
  132. PassParameters->HigherAlphaMipTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  133. }
  134. else
  135. {
  136. PassParameters->HigherMipTextureSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
  137. }
  138. PassParameters->HigherMipDownScaleFactor = 1 << (DownSamplingMip + SrcMip);
  139. PassParameters->HigherMipBufferBilinearMax = FVector2D(
  140. (0.5f * View.ViewRect.Width() - 0.5f) / float(ReducedSceneColor->Desc.Extent.X),
  141. (0.5f * View.ViewRect.Height() - 0.5f) / float(ReducedSceneColor->Desc.Extent.Y));
  142. PassParameters->ViewportUVToHZBBufferUV = ViewportUVToHZBBufferUV;
  143. PassParameters->FurthestHZBTexture = FurthestHZBTexture;
  144. PassParameters->FurthestHZBTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  145. for (int32 MipLevel = 0; MipLevel < PassParameters->ReducedSceneColorOutput.Num(); MipLevel++)
  146. {
  147. PassParameters->ReducedSceneColorOutput[MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneColor, StartDestMip + MipLevel));
  148. if (ReducedSceneAlpha)
  149. PassParameters->ReducedSceneAlphaOutput[MipLevel] = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(ReducedSceneAlpha, StartDestMip + MipLevel));
  150. }
  151. FSSRTPrevFrameReductionCS::FPermutationDomain PermutationVector;
  152. PermutationVector.Set<FSSRTPrevFrameReductionCS::FLowerMips>(true);
  153. PermutationVector.Set<FSSRTPrevFrameReductionCS::FLeakFree>(bUseLeakFree);
  154. // 第二次下采样Pass
  155. TShaderMapRef<FSSRTPrevFrameReductionCS> ComputeShader(View.ShaderMap, PermutationVector);
  156. FComputeShaderUtils::AddPass(
  157. GraphBuilder,
  158. RDG_EVENT_NAME("PrevFrameReduction(LeakFree=%i) %dx%d",
  159. bUseLeakFree ? 1 : 0,
  160. View.ViewRect.Width() / Divisor, View.ViewRect.Height() / Divisor),
  161. ComputeShader,
  162. PassParameters,
  163. FComputeShaderUtils::GetGroupCount(View.ViewRect.Size(), 8 * Divisor));
  164. }
  165. }
  166. {
  167. // 分配输出.
  168. {
  169. FRDGTextureDesc Desc = FRDGTextureDesc::Create2D(
  170. SceneTextures.SceneDepthTexture->Desc.Extent / (bHalfResolution ? 2 : 1),
  171. PF_FloatRGBA,
  172. FClearValueBinding::Transparent,
  173. TexCreate_ShaderResource | TexCreate_UAV);
  174. OutDenoiserInputs->Color = GraphBuilder.CreateTexture(Desc, TEXT("SSRTDiffuseIndirect"));
  175. Desc.Format = PF_R16F;
  176. Desc.Flags |= TexCreate_RenderTargetable;
  177. OutDenoiserInputs->AmbientOcclusionMask = GraphBuilder.CreateTexture(Desc, TEXT("SSRTAmbientOcclusion"));
  178. }
  179. // 处理FScreenSpaceDiffuseIndirectCS参数.
  180. FScreenSpaceDiffuseIndirectCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FScreenSpaceDiffuseIndirectCS::FParameters>();
  181. if (bHalfResolution)
  182. {
  183. PassParameters->PixelPositionToFullResPixel = 2.0f;
  184. PassParameters->FullResPixelOffset = FVector2D(0.5f, 0.5f); // TODO.
  185. }
  186. else
  187. {
  188. PassParameters->PixelPositionToFullResPixel = 1.0f;
  189. PassParameters->FullResPixelOffset = FVector2D(0.5f, 0.5f);
  190. }
  191. {
  192. PassParameters->ColorBufferScaleBias = FVector4(
  193. 0.5f * SceneTextures.SceneDepthTexture->Desc.Extent.X / float(ReducedSceneColor->Desc.Extent.X),
  194. 0.5f * SceneTextures.SceneDepthTexture->Desc.Extent.Y / float(ReducedSceneColor->Desc.Extent.Y),
  195. -0.5f * View.ViewRect.Min.X / float(ReducedSceneColor->Desc.Extent.X),
  196. -0.5f * View.ViewRect.Min.Y / float(ReducedSceneColor->Desc.Extent.Y));
  197. PassParameters->ReducedColorUVMax = FVector2D(
  198. (0.5f * View.ViewRect.Width() - 0.5f) / float(ReducedSceneColor->Desc.Extent.X),
  199. (0.5f * View.ViewRect.Height() - 0.5f) / float(ReducedSceneColor->Desc.Extent.Y));
  200. }
  201. PassParameters->FurthestHZBTexture = FurthestHZBTexture;
  202. PassParameters->FurthestHZBTextureSampler = TStaticSamplerState<SF_Point>::GetRHI();
  203. PassParameters->ColorTexture = ReducedSceneColor;
  204. PassParameters->ColorTextureSampler = TStaticSamplerState<SF_Bilinear>::GetRHI();
  205. PassParameters->HZBUvFactorAndInvFactor = FVector4(
  206. ViewportUVToHZBBufferUV.X,
  207. ViewportUVToHZBBufferUV.Y,
  208. 1.0f / ViewportUVToHZBBufferUV.X,
  209. 1.0f / ViewportUVToHZBBufferUV.Y );
  210. PassParameters->SceneTextures = SceneTextures;
  211. PassParameters->View = View.ViewUniformBuffer;
  212. PassParameters->IndirectDiffuseOutput = GraphBuilder.CreateUAV(OutDenoiserInputs->Color);
  213. PassParameters->AmbientOcclusionOutput = GraphBuilder.CreateUAV(OutDenoiserInputs->AmbientOcclusionMask);
  214. PassParameters->DebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, OutDenoiserInputs->Color->Desc, TEXT("DebugSSGI"));
  215. PassParameters->ScreenSpaceRayTracingDebugOutput = CreateScreenSpaceRayTracingDebugUAV(GraphBuilder, OutDenoiserInputs->Color->Desc, TEXT("DebugSSGIMarshing"), true);
  216. FScreenSpaceDiffuseIndirectCS::FPermutationDomain PermutationVector;
  217. PermutationVector.Set<FScreenSpaceDiffuseIndirectCS::FQualityDim>(Quality);
  218. // 增加SSGI计算Pass.
  219. TShaderMapRef<FScreenSpaceDiffuseIndirectCS> ComputeShader(View.ShaderMap, PermutationVector);
  220. FComputeShaderUtils::AddPass(
  221. GraphBuilder,
  222. RDG_EVENT_NAME("ScreenSpaceDiffuseIndirect(Quality=%d RayPerPixel=%d) %dx%d",
  223. Quality, RayCountPerPixel, Viewport.Width(), Viewport.Height()),
  224. ComputeShader,
  225. PassParameters,
  226. FComputeShaderUtils::GetGroupCount(Viewport.Size(), GroupSize));
  227. }
  228. OutRayTracingConfig->ResolutionFraction = bHalfResolution ? 0.5f : 1.0f;
  229. OutRayTracingConfig->RayCountPerPixel = RayCountPerPixel;
  230. } // RenderScreenSpaceDiffuseIndirect()

下面分析下采样Pass使用的CS Shader:

  1. // Engine\Shaders\Private\SSRT\SSRTPrevFrameReduction.usf
  2. [numthreads(GROUP_TILE_SIZE, GROUP_TILE_SIZE, 1)]
  3. void MainCS(
  4. uint2 DispatchThreadId : SV_DispatchThreadID,
  5. uint2 GroupId : SV_GroupID,
  6. uint2 GroupThreadId : SV_GroupThreadID,
  7. uint GroupThreadIndex : SV_GroupIndex)
  8. {
  9. #if DIM_LOWER_MIPS
  10. float2 SceneBufferUV = HigherMipDownScaleFactor * (2.0 * float2(DispatchThreadId) + 1.0) * View.BufferSizeAndInvSize.zw;
  11. #else
  12. float2 SceneBufferUV = (float2(DispatchThreadId) + 0.5) * View.BufferSizeAndInvSize.zw;
  13. #endif
  14. SceneBufferUV = clamp(SceneBufferUV, View.BufferBilinearUVMinMax.xy, View.BufferBilinearUVMinMax.zw);
  15. float2 ViewportUV = BufferUVToViewportUV(SceneBufferUV);
  16. float2 ScreenPosition = ViewportUVToScreenPos(ViewportUV);
  17. float4 PrevColor;
  18. float WorldDepth;
  19. #if DIM_LOWER_MIPS // 第二次下采样Pass进入此分支.
  20. #if DIM_LEAK_FREE
  21. {
  22. {
  23. float HZBDeviceZ = FurthestHZBTexture.SampleLevel(FurthestHZBTextureSampler, ViewportUV * ViewportUVToHZBBufferUV, 2.0).r;
  24. WorldDepth = ConvertFromDeviceZ(HZBDeviceZ);
  25. }
  26. float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
  27. float WorldBluringRadius = WorldDepthToPixelWorldRadius * WorldDepth;
  28. float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
  29. PrevColor = 0.0;
  30. // 根据深度缓存还原当前像素及相邻4个像素的世界坐标,使用世界坐标之间的距离平方衰减作为权重缩小Color的值.
  31. UNROLL_N(4)
  32. for (uint i = 0; i < 4; i++)
  33. {
  34. const float2 TexelOffset = float2(i % 2, i / 2) - 0.5;
  35. // 采样UV
  36. float2 HZBBufferUV = (ViewportUV + TexelOffset * HigherMipDownScaleFactor * View.ViewSizeAndInvSize.zw) * ViewportUVToHZBBufferUV;
  37. // 采样的深度
  38. float SampleDeviceZ = FurthestHZBTexture.SampleLevel(FurthestHZBTextureSampler, HZBBufferUV, 1.0).r;
  39. // 当前像素和采样深度的距离.
  40. float SampleDist = WorldDepth - ConvertFromDeviceZ(SampleDeviceZ);
  41. // 采样权重.
  42. float SampleWeight = 0.25 * saturate(1 - SampleDist * SampleDist * InvSquareWorldBluringRadius);
  43. float2 SampleUV = HigherMipDownScaleFactor * (2.0 * float2(DispatchThreadId) + 1.0 + TexelOffset) * 0.5 * ReducedSceneColorTexelSize;
  44. SampleUV = min(SampleUV, HigherMipBufferBilinearMax);
  45. float4 SampleColor = float4(
  46. HigherMipTexture.SampleLevel(HigherMipTextureSampler, SampleUV, 0).rgb,
  47. Texture2DSample_A8(HigherAlphaMipTexture,HigherAlphaMipTextureSampler, SampleUV));
  48. // 累加应用权重后的颜色.
  49. PrevColor += SampleColor * SampleWeight;
  50. }
  51. }
  52. #else
  53. {
  54. float2 HigherMipUV = HigherMipDownScaleFactor * (float2(DispatchThreadId) * 1.0 + 0.5) * ReducedSceneColorTexelSize;
  55. PrevColor = float4(HigherMipTexture.SampleLevel(HigherMipTextureSampler, HigherMipUV, 0).rgb, 1);
  56. }
  57. #endif
  58. #else // 第一次下采样Pass进入此分支.
  59. {
  60. float DeviceZ = SampleDeviceZFromSceneTextures(SceneBufferUV);
  61. WorldDepth = ConvertFromDeviceZ(DeviceZ);
  62. // 当前像素的在屏幕空间的镜头运动向量.
  63. float4 ThisClip = float4(ScreenPosition, DeviceZ, 1);
  64. float4 PrevClip = mul(ThisClip, View.ClipToPrevClip);
  65. float2 PrevScreen = PrevClip.xy / PrevClip.w;
  66. bool bIsSky = WorldDepth > 100 * 1000;
  67. // 获取速度.
  68. float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GBufferVelocityTextureSampler, SceneBufferUV, 0);
  69. if (EncodedVelocity.x > 0.0)
  70. {
  71. PrevScreen = ThisClip.xy - DecodeVelocityFromTexture(EncodedVelocity).xy;
  72. }
  73. float2 PrevFrameUV = PrevScreen.xy * PrevScreenPositionScaleBias.xy + PrevScreenPositionScaleBias.zw;
  74. // 获取颜色(与上面类似)
  75. #if DIM_LEAK_FREE
  76. {
  77. float3 RefWorldPosition = ComputeTranslatedWorldPosition(ScreenPosition, WorldDepth, /* bIsPrevFrame = */ false);
  78. float NoV = dot(View.TranslatedWorldCameraOrigin - normalize(RefWorldPosition), GetGBufferDataFromSceneTextures(SceneBufferUV).WorldNormal);
  79. float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
  80. float WorldBluringRadius = WorldDepthToPixelWorldRadius * WorldDepth;
  81. float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
  82. {
  83. float2 SampleUV = PrevFrameUV;
  84. SampleUV = clamp(SampleUV, View.BufferBilinearUVMinMax.xy, View.BufferBilinearUVMinMax.zw);
  85. float PrevDeviceZ = PrevSceneDepth.SampleLevel(PrevSceneDepthSampler, SampleUV, 0).r;
  86. float3 SampleWorldPosition = ComputeTranslatedWorldPosition(PrevScreen.xy, ConvertFromDeviceZ(PrevDeviceZ), /* bIsPrevFrame = */ true);
  87. float SampleDistSquare = length2(RefWorldPosition - SampleWorldPosition);
  88. float SampleWeight = saturate(1 - SampleDistSquare * InvSquareWorldBluringRadius);
  89. PrevColor = float4(PrevSceneColor.SampleLevel(PrevSceneColorSampler, SampleUV, 0).rgb * SampleWeight, SampleWeight);
  90. }
  91. }
  92. #else
  93. {
  94. PrevColor = float4(PrevSceneColor.SampleLevel(PrevSceneColorSampler, PrevFrameUV, 0).rgb, 1.0);
  95. }
  96. #endif
  97. PrevColor = -min(-PrevColor, 0.0);
  98. #if CONFIG_COLOR_TILE_CLASSIFICATION
  99. {
  100. if (bIsSky)
  101. PrevColor = 0;
  102. }
  103. #endif
  104. // 校正预曝光.
  105. #if USE_PREEXPOSURE
  106. PrevColor.rgb *= PrevSceneColorPreExposureCorrection;
  107. #endif
  108. // 应用暗角.
  109. {
  110. float Vignette = min(ComputeHitVignetteFromScreenPos(ScreenPosition), ComputeHitVignetteFromScreenPos(PrevScreen));
  111. PrevColor *= Vignette;
  112. }
  113. (......)
  114. }
  115. #endif
  116. // 输出mip 0
  117. #if DIM_LOWER_MIPS
  118. {
  119. ReducedSceneColorOutput_0[DispatchThreadId] = float4(PrevColor.rgb, 0);
  120. #if DIM_LEAK_FREE
  121. ReducedSceneAlphaOutput_0[DispatchThreadId] = PrevColor.a;
  122. #endif
  123. }
  124. #endif
  125. // 下采样低mip级别.
  126. {
  127. // 存储颜色到LDS.
  128. {
  129. SharedMemory[GROUP_PIXEL_COUNT * 0 | GroupThreadIndex] = (f32tof16(PrevColor.r) << 0) | (f32tof16(PrevColor.g) << 16);
  130. SharedMemory[GROUP_PIXEL_COUNT * 1 | GroupThreadIndex] = (f32tof16(PrevColor.b) << 0) | (f32tof16(PrevColor.a) << 16);
  131. #if DIM_LEAK_FREE
  132. SharedFurthestDepth[GroupThreadIndex] = WorldDepth;
  133. #endif
  134. }
  135. GroupMemoryBarrierWithGroupSync();
  136. // 下采样低mip级别.
  137. UNROLL
  138. for (uint MipLevel = 1; MipLevel < 3; MipLevel++)
  139. {
  140. const uint ReductionAmount = 1 << MipLevel;
  141. const uint NumberPixelInMip = GROUP_PIXEL_COUNT / (ReductionAmount * ReductionAmount);
  142. if (GroupThreadIndex < NumberPixelInMip)
  143. {
  144. uint2 OutputCoord = uint2(
  145. GroupThreadIndex % (GROUP_TILE_SIZE / ReductionAmount),
  146. GroupThreadIndex / (GROUP_TILE_SIZE / ReductionAmount));
  147. // 利用FurthestHZB执行Ray marchsing以避免自相交, 所以这里要保持保守策略。
  148. #if DIM_LEAK_FREE
  149. // 下采样深度, 计算自身和周边4个像素的最大深度(最靠近相机的深度).
  150. float FurthestDepth;
  151. {
  152. UNROLL_N(2)
  153. for (uint x = 0; x < 2; x++)
  154. {
  155. UNROLL_N(2)
  156. for (uint y = 0; y < 2; y++)
  157. {
  158. uint2 Coord = OutputCoord * 2 + uint2(x, y);
  159. uint LDSIndex = Coord.x + Coord.y * ((2 * GROUP_TILE_SIZE) / ReductionAmount);
  160. float NeighborDepth = SharedFurthestDepth[LDSIndex];
  161. if (x == 0 && y == 0)
  162. FurthestDepth = NeighborDepth;
  163. else
  164. FurthestDepth = max(FurthestDepth, NeighborDepth);
  165. }
  166. }
  167. }
  168. float WorldDepthToPixelWorldRadius = GetTanHalfFieldOfView().x * View.ViewSizeAndInvSize.z * 100;
  169. float WorldBluringRadius = WorldDepthToPixelWorldRadius * FurthestDepth;
  170. float InvSquareWorldBluringRadius = rcp(WorldBluringRadius * WorldBluringRadius);
  171. #endif
  172. // 下采样颜色, 也会考量采样深度到当前像素深度的距离作为权重以缩放颜色值.
  173. float4 ReducedColor = 0;
  174. UNROLL
  175. for (uint x = 0; x < 2; x++)
  176. {
  177. UNROLL
  178. for (uint y = 0; y < 2; y++)
  179. {
  180. uint2 Coord = OutputCoord * 2 + uint2(x, y);
  181. uint LDSIndex = Coord.x + Coord.y * ((2 * GROUP_TILE_SIZE) / ReductionAmount);
  182. uint Raw0 = SharedMemory[GROUP_PIXEL_COUNT * 0 | LDSIndex];
  183. uint Raw1 = SharedMemory[GROUP_PIXEL_COUNT * 1 | LDSIndex];
  184. float4 Color;
  185. Color.r = f16tof32(Raw0 >> 0);
  186. Color.g = f16tof32(Raw0 >> 16);
  187. Color.b = f16tof32(Raw1 >> 0);
  188. Color.a = f16tof32(Raw1 >> 16);
  189. float SampleWeight = 1.0;
  190. #if DIM_LEAK_FREE
  191. {
  192. float NeighborDepth = SharedFurthestDepth[LDSIndex];
  193. float SampleDist = (FurthestDepth - NeighborDepth);
  194. SampleWeight = saturate(1 - (SampleDist * SampleDist) * InvSquareWorldBluringRadius);
  195. }
  196. #endif
  197. ReducedColor += Color * SampleWeight;
  198. }
  199. }
  200. // 处理并输出结果.
  201. ReducedColor *= rcp(4.0);
  202. uint2 OutputPosition = GroupId * (GROUP_TILE_SIZE / ReductionAmount) + OutputCoord;
  203. if (MipLevel == 1)
  204. {
  205. ReducedSceneColorOutput_1[OutputPosition] = float4(ReducedColor.rgb, 0);
  206. #if DIM_LEAK_FREE
  207. ReducedSceneAlphaOutput_1[OutputPosition] = ReducedColor.a;
  208. #endif
  209. }
  210. else if (MipLevel == 2)
  211. {
  212. ReducedSceneColorOutput_2[OutputPosition] = float4(ReducedColor.rgb, 0);
  213. #if DIM_LEAK_FREE
  214. ReducedSceneAlphaOutput_2[OutputPosition] = ReducedColor.a;
  215. #endif
  216. }
  217. SharedMemory[GROUP_PIXEL_COUNT * 0 | GroupThreadIndex] = (f32tof16(ReducedColor.r) << 0) | (f32tof16(ReducedColor.g) << 16);
  218. SharedMemory[GROUP_PIXEL_COUNT * 1 | GroupThreadIndex] = (f32tof16(ReducedColor.b) << 0) | (f32tof16(ReducedColor.a) << 16);
  219. #if DIM_LEAK_FREE
  220. {
  221. SharedFurthestDepth[GroupThreadIndex] = FurthestDepth;
  222. }
  223. #endif
  224. } // if (GroupThreadIndex < NumberPixelInMip)
  225. } // for (uint MipLevel = 1; MipLevel < 3; MipLevel++)
  226. }
  227. } // MainCS()

经过上述处理之后,输出了下采样的拥有5个Mip级别的颜色和深度纹理:

接下来继续分析SSGI的渲染部分,此Pass的输入有场景深度、HZB、GBuffer和上一阶段生成的下采样纹理,下面直接分析其使用的CS Shader:

  1. // Engine\Shaders\Private\SSRT\SSRTDiffuseIndirect.usf
  2. [numthreads(TILE_PIXEL_SIZE_X, TILE_PIXEL_SIZE_Y, CONFIG_RAY_COUNT)]
  3. void MainCS(
  4. uint2 GroupId : SV_GroupID,
  5. uint GroupThreadIndex : SV_GroupIndex)
  6. {
  7. // 线程组的波形(wave)ID.
  8. uint GroupWaveIndex = GroupThreadIndex / 64;
  9. FSSRTTileInfos TileInfos;
  10. {
  11. const uint BinsAddress = TILE_PIXEL_COUNT * 2;
  12. uint GroupPixelId = GroupThreadIndex % TILE_PIXEL_COUNT;
  13. uint RaySequenceId = GroupThreadIndex / TILE_PIXEL_COUNT;
  14. // 计算TileCoord, 保证编译器将它用标量来加载.
  15. uint2 TileCoord = GroupId / uint2(TILE_RES_DIVISOR / TILE_PIXEL_SIZE_X, TILE_RES_DIVISOR / TILE_PIXEL_SIZE_Y);
  16. TileInfos = LoadTileInfos(TileCoord);
  17. // 存储GBuffer到LDS
  18. {
  19. BRANCH
  20. if (RaySequenceId == 0)
  21. {
  22. uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
  23. uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
  24. float2 BufferUV;
  25. float2 ScreenPos;
  26. UpdateLane2DCoordinateInformations(PixelPosition, /* out */ BufferUV, /* out */ ScreenPos);
  27. FGBufferData GBuffer = GetGBufferDataFromSceneTextures(BufferUV);
  28. float3 N = mul(float4(GBuffer.WorldNormal, 0), View.TranslatedWorldToView).xyz;
  29. float DeviceZ = SampleDeviceZFromSceneTextures(BufferUV);
  30. bool bTraceRay = GBuffer.ShadingModelID != SHADINGMODELID_UNLIT;
  31. SharedMemory[TILE_PIXEL_COUNT * 0 | GroupPixelId] = CompressN(N);
  32. SharedMemory[TILE_PIXEL_COUNT * 1 | GroupPixelId] = asuint(bTraceRay ? DeviceZ : -1.0);
  33. }
  34. else if (GroupWaveIndex == 1) // TODO.
  35. {
  36. // Clears the bins
  37. SharedMemory[BinsAddress | GroupPixelId] = 0;
  38. }
  39. }
  40. (......)
  41. }
  42. GroupMemoryBarrierWithGroupSync();
  43. // 发射射线
  44. {
  45. uint GroupPixelId;
  46. uint RaySequenceId;
  47. uint CompressedN;
  48. float DeviceZ;
  49. bool bTraceRay;
  50. #if CONFIG_SORT_RAYS
  51. {
  52. uint Raw0 = SharedMemory[LANE_PER_GROUPS * 0 | GroupThreadIndex];
  53. uint Raw1 = SharedMemory[LANE_PER_GROUPS * 1 | GroupThreadIndex];
  54. // 解压射线数据.
  55. RaySequenceId = Raw0 >> (24 + TILE_PIXEL_SIZE_X_LOG + TILE_PIXEL_SIZE_Y_LOG);
  56. GroupPixelId = (Raw0 >> 24) % TILE_PIXEL_COUNT;
  57. CompressedN = Raw0;
  58. DeviceZ = asfloat(Raw1);
  59. bTraceRay = asfloat(Raw1) > 0;
  60. }
  61. #else // !CONFIG_SORT_RAYS
  62. {
  63. GroupPixelId = GroupThreadIndex % TILE_PIXEL_COUNT;
  64. RaySequenceId = GroupThreadIndex / TILE_PIXEL_COUNT;
  65. uint Raw0 = SharedMemory[TILE_PIXEL_COUNT * 0 | GroupPixelId];
  66. uint Raw1 = SharedMemory[TILE_PIXEL_COUNT * 1 | GroupPixelId];
  67. CompressedN = Raw0;
  68. DeviceZ = asfloat(Raw1);
  69. bTraceRay = asfloat(Raw1) > 0;
  70. }
  71. #endif // !CONFIG_SORT_RAYS
  72. GroupMemoryBarrierWithGroupSync();
  73. #if DEBUG_RAY_COUNT
  74. float DebugRayCount = 0.0;
  75. #endif
  76. uint2 CompressedColor;
  77. BRANCH
  78. if (bTraceRay) // 确保需要时才发射射线
  79. {
  80. // 计算坐标, uv, 屏幕位置.
  81. uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
  82. uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
  83. float2 BufferUV;
  84. float2 ScreenPos;
  85. UpdateLane2DCoordinateInformations(PixelPosition, /* out */ BufferUV, /* out */ ScreenPos);
  86. // 随机采样光源方向.
  87. uint2 RandomSeed = ComputeRandomSeed(PixelPosition);
  88. float2 E = Hammersley16(RaySequenceId, CONFIG_RAY_COUNT, RandomSeed);
  89. float3 L = ComputeL(DecompressN(CompressedN), E);
  90. // 步进偏移
  91. float StepOffset = InterleavedGradientNoise(PixelPosition + 0.5, View.StateFrameIndexMod8);
  92. #if !SSGI_TRACE_CONE
  93. StepOffset -= 0.9;
  94. #endif
  95. bool bDebugPrint = all(PixelPosition == uint2(View.ViewSizeAndInvSize.xy) / 2);
  96. // 初始化射线.
  97. FSSRTRay Ray = InitScreenSpaceRay(ScreenPos, DeviceZ, L);
  98. float Level;
  99. float3 HitUVz;
  100. bool bHit;
  101. #if !CONFIG_SORT_RAYS
  102. // 如果分块分类可以检测出射线不能有任何交点, 则可以提前退出.
  103. bool bEarlyOut = TestRayEarlyReturn(TileInfos, Ray);
  104. BRANCH
  105. if (bEarlyOut)
  106. {
  107. bHit = false;
  108. Level = 0;
  109. HitUVz = 0;
  110. }
  111. else
  112. #endif
  113. {
  114. // HZB屏幕空间的光线追踪(SSAO已解析过)
  115. bHit = CastScreenSpaceRay(
  116. FurthestHZBTexture, FurthestHZBTextureSampler,
  117. Ray, 1, CONFIG_RAY_STEPS, StepOffset,
  118. HZBUvFactorAndInvFactor, bDebugPrint,
  119. /* out */ HitUVz,
  120. /* out */ Level);
  121. }
  122. // 如果找到交点, 则计算权重, 采样颜色, 应用权重到颜色并累加.
  123. BRANCH
  124. if (bHit)
  125. {
  126. float2 ReducedColorUV = HitUVz.xy * ColorBufferScaleBias.xy + ColorBufferScaleBias.zw;
  127. ReducedColorUV = min(ReducedColorUV, ReducedColorUVMax);
  128. float4 SampleColor = ColorTexture.SampleLevel(ColorTextureSampler, ReducedColorUV, Level);
  129. float SampleColorWeight = 1.0;
  130. // 交点表面的背面调整
  131. #if CONFIG_BACKFACE_MODULATION
  132. {
  133. float3 SampleNormal = GetGBufferDataFromSceneTextures(HitUVz.xy).WorldNormal;
  134. uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
  135. uint2 PixelPosition = ComputePixelPosition(GroupId, GroupPixelOffset);
  136. uint2 RandomSeed = ComputeRandomSeed(PixelPosition);
  137. float2 E = Hammersley16(RaySequenceId, CONFIG_RAY_COUNT, RandomSeed);
  138. float3 L = ComputeL(DecompressN(CompressedN), E);
  139. SampleColorWeight *= saturate( 1 - dot( SampleNormal, L ) );
  140. }
  141. #endif
  142. #if CONFIG_RAY_COUNT > 1
  143. SampleColorWeight *= rcp( 1 + Luminance(SampleColor.rgb) );
  144. #endif
  145. // 应用权重到颜色.
  146. float3 DiffuseColor = SampleColor.rgb * SampleColorWeight;
  147. float AmbientOcclusion = 1.0;
  148. #if CONFIG_COLOR_TILE_CLASSIFICATION
  149. {
  150. float Lumi = Luminance(DiffuseColor.rgb);
  151. AmbientOcclusion *= saturate(Lumi / 0.25);
  152. }
  153. #endif
  154. // 压缩颜色RGB和AO到XY通道中.
  155. CompressedColor.x = asuint(f32tof16(DiffuseColor.r) << 16 | f32tof16(DiffuseColor.g));
  156. CompressedColor.y = asuint(f32tof16(DiffuseColor.b) << 16 | f32tof16(AmbientOcclusion));
  157. }
  158. else
  159. {
  160. CompressedColor = uint2(0, 0);
  161. }
  162. }
  163. else if (!bTraceRay)
  164. {
  165. CompressedColor = uint2(0, 0);
  166. }
  167. uint DestPos = GroupPixelId + RaySequenceId * TILE_PIXEL_COUNT;
  168. // 保存压缩的颜色和AO数据到LDS中.
  169. SharedMemory[LANE_PER_GROUPS * 0 | DestPos] = CompressedColor.x;
  170. SharedMemory[LANE_PER_GROUPS * 1 | DestPos] = CompressedColor.y;
  171. }
  172. GroupMemoryBarrierWithGroupSync();
  173. // 将LDS数据解压并保存到UAV中.
  174. BRANCH
  175. if (GroupThreadIndex < TILE_PIXEL_COUNT)
  176. {
  177. const uint GroupPixelId = GroupThreadIndex;
  178. float3 DiffuseColor = 0;
  179. float AmbientOcclusion = 0;
  180. // LDS保存了当前像素的多条射线的数据, 此处将它们解压并累加起来.
  181. UNROLL
  182. for (uint RaySequenceId = 0; RaySequenceId < CONFIG_RAY_COUNT; RaySequenceId++)
  183. {
  184. uint SrcPos = GroupPixelId + RaySequenceId * TILE_PIXEL_COUNT;
  185. uint Row0 = SharedMemory[LANE_PER_GROUPS * 0 | SrcPos];
  186. uint Row1 = SharedMemory[LANE_PER_GROUPS * 1 | SrcPos];
  187. DiffuseColor.r += f16tof32(Row0 >> 16);
  188. DiffuseColor.g += f16tof32(Row0 >> 0);
  189. DiffuseColor.b += f16tof32(Row1 >> 16);
  190. AmbientOcclusion += f16tof32(Row1 >> 0);
  191. }
  192. // 归一化颜色和AO等数据.
  193. #if CONFIG_RAY_COUNT > 1
  194. {
  195. DiffuseColor *= rcp(float(CONFIG_RAY_COUNT));
  196. AmbientOcclusion *= rcp(float(CONFIG_RAY_COUNT));
  197. DiffuseColor *= rcp( 1 - Luminance(DiffuseColor) );
  198. }
  199. #endif
  200. DiffuseColor *= View.IndirectLightingColorScale;
  201. AmbientOcclusion = 1 - AmbientOcclusion;
  202. // 保存结果到UAV中.
  203. {
  204. uint2 GroupPixelOffset = DecodeGroupPixelOffset(GroupPixelId);
  205. uint2 OutputPixelCoordinate = ComputePixelPosition(GroupId, GroupPixelOffset);
  206. IndirectDiffuseOutput[OutputPixelCoordinate] = float4(DiffuseColor, 1.0);
  207. AmbientOcclusionOutput[OutputPixelCoordinate] = AmbientOcclusion;
  208. }
  209. } // if (GroupThreadIndex < TILE_PIXEL_COUNT)
  210. } // MainCS()

由此可知,SSGI的采样生成、光线追踪、权重计算和结果累加的过程跟SSAO比较相似。经过上面的计算,最终输出了带噪点的颜色和AO纹理:

噪点的产生是由于每像素采样数不足,估算的结果跟实际值的方差比较大。这就需要后面的降噪步骤。降噪阶段分为三个Pass:压缩元数据、重建数据、时间累积降噪。压缩元数据Pass的输入有深度、法线等GBuffer数据,下面分析其使用的CS:

  1. // Engine\Shaders\Private\ScreenSpaceDenoise\SSDCompressMetadata.usf
  2. [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
  3. void MainCS(uint2 DispatchThreadId : SV_DispatchThreadID)
  4. {
  5. // 计算UV和屏幕位置.
  6. float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
  7. float2 ViewportUV = BufferUVToViewportUV(SceneBufferUV);
  8. float2 ScreenPosition = ViewportUVToScreenPos(ViewportUV);
  9. // 获取场景元数据.
  10. FSSDSampleSceneInfos SceneMetadata = FetchCurrentSceneInfosFromGBuffer(ScreenPosition, SceneBufferUV);
  11. // 压缩元数据.
  12. FSSDCompressedSceneInfos CompressedMetadata = CompressSampleSceneInfo(DIM_METADATA_LAYOUT, SceneMetadata);
  13. // No need to keep DispatchThreadId, while SceneBufferUV is arround at highest VGPR peak.
  14. uint2 OutputPixelPostion = uint2(SceneBufferUV * BufferUVToOutputPixelPosition);
  15. // 保存压缩后的元数据.
  16. BRANCH
  17. if (all(OutputPixelPostion < ViewportMax))
  18. {
  19. CompressedMetadataOutput_0[OutputPixelPostion] = CompressedMetadata.VGPR[0];
  20. }
  21. } // MainCS

什么是场景的元数据?答案可以在FetchCurrentSceneInfosFromGBuffer

  1. FSSDSampleSceneInfos FetchCurrentSceneInfosFromGBuffer(float2 ScreenPosition, float2 BufferUV)
  2. {
  3. float DeviceZ = SampleDeviceZFromSceneTextures(BufferUV);
  4. FGBufferData GBufferData = GetGBufferDataFromSceneTextures(BufferUV);
  5. // 处理场景采样信息.
  6. FSSDSampleSceneInfos Infos = CreateSampleSceneInfos();
  7. Infos.ScreenPosition = ScreenPosition;
  8. Infos.DeviceZ = DeviceZ;
  9. Infos.WorldDepth = GBufferData.Depth;
  10. Infos.WorldNormal = GBufferData.WorldNormal;
  11. Infos.Roughness = GBufferData.Roughness;
  12. // 计算平移后的世界位置.
  13. {
  14. float2 ClipPosition = ScreenPosition * (View.ViewToClip[3][3] < 1.0f ? Infos.WorldDepth : 1.0f);
  15. Infos.TranslatedWorldPosition = mul(float4(ClipPosition, Infos.WorldDepth, 1), View.ScreenToTranslatedWorld).xyz;
  16. }
  17. // 计算视图空间的法线.
  18. Infos.ViewNormal = mul(float4(Infos.WorldNormal, 0), View.TranslatedWorldToView).xyz;
  19. return Infos;
  20. }

所谓的场景元数据,就是当前像素的屏幕坐标、设备深度、世界深度、世界法线、粗糙度、偏移世界坐标和视图空间的法线等信息。至于压缩它们的过程,则需要进入CompressSampleSceneInfo分析:

  1. // Engine\Shaders\Private\ScreenSpaceDenoise\SSDMetadata.ush
  2. FSSDCompressedSceneInfos CompressSampleSceneInfo(
  3. const uint CompressedLayout,
  4. FSSDSampleSceneInfos Infos)
  5. {
  6. FSSDCompressedSceneInfos CompressedInfos = CreateCompressedSceneInfos();
  7. (......)
  8. // 压缩深度和视图空间的法线.(默认执行此分支)
  9. else if (CompressedLayout == METADATA_BUFFER_LAYOUT_DEPTH_VIEWNORMAL)
  10. {
  11. CompressedInfos.VGPR[0] = CompressDevizeZAndN(Infos.DeviceZ, Infos.ViewNormal);
  12. }
  13. (......)
  14. return CompressedInfos;
  15. }

在SSGI涉及的压缩元数据Pass中,压缩的是深度和视图空间的法线,由CompressDevizeZAndN完成:

  1. uint CompressDevizeZAndN(float DevizeZ, float3 N)
  2. {
  3. uint FaceN;
  4. // 压缩法线.(后面由解析)
  5. EncodeNormal(/* inout */ N, /* out */ FaceN);
  6. // 将法线的xy从float转成uint.
  7. uint2 FaceCood = uint2(clamp(round(127.0 * N.xy), 0, 127.0));
  8. // 压缩法线和深度到32位uint.
  9. uint Compressed = f32tof16(DevizeZ) | (FaceN << 15) | (FaceCood.x << 18) | (FaceCood.y << 25);
  10. return Compressed;
  11. }
  12. // Engine\Shaders\Private\DeferredShadingCommon.ush
  13. void EncodeNormal( inout float3 N, out uint Face )
  14. {
  15. // 默认的法线朝向z.
  16. uint Axis = 2;
  17. // 如果法线的x值最大, 则将轴调整成x.
  18. if( abs(N.x) >= abs(N.y) && abs(N.x) >= abs(N.z) )
  19. {
  20. Axis = 0;
  21. }
  22. // 如果法线的y值最大, 则将轴调整成y.
  23. else if( abs(N.y) > abs(N.z) )
  24. {
  25. Axis = 1;
  26. }
  27. Face = Axis * 2;
  28. // 根据轴向调整法线的各个分量.
  29. N = Axis == 0 ? N.yzx : N;
  30. N = Axis == 1 ? N.xzy : N;
  31. // 将法线的值域调整到[0, 1]
  32. float MaxAbs = 1.0 / sqrt(2.0);
  33. Face += N.z > 0 ? 0 : 1;
  34. N.xy *= N.z > 0 ? 1 : -1;
  35. N.xy = N.xy * (0.5 / MaxAbs) + 0.5;
  36. }

由上可知,元数据压缩就是将场景的深度和视图空间的法线压缩到32位的无符号整型,其中深度占前15bit,法线的朝向、X、Y分别占随后的3bit、7bit、7bit。元数据压缩后的输出纹理如下所示(值做了调整):

重建数据Pass的输入有压缩的元数据、带噪点的颜色和AO纹理,下面直接分析其使用的CS Shader(结合RenderDoc截帧做了简化):

  1. // Engine\Shaders\Private\ScreenSpaceDenoise\SSDSpatialAccumulation.usf
  2. [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
  3. void MainCS(
  4. uint2 DispatchThreadId : SV_DispatchThreadID,
  5. uint2 GroupId : SV_GroupID,
  6. uint2 GroupThreadId : SV_GroupThreadID,
  7. uint GroupThreadIndex : SV_GroupIndex)
  8. {
  9. #if CONFIG_SIGNAL_INPUT_TEXTURE_TYPE == SIGNAL_TEXTURE_TYPE_FLOAT4
  10. Texture2D Signal_Textures_0 = SignalInput_Textures_0;
  11. Texture2D Signal_Textures_1 = SignalInput_Textures_1;
  12. Texture2D Signal_Textures_2 = SignalInput_Textures_2;
  13. Texture2D Signal_Textures_3 = SignalInput_Textures_3;
  14. #else
  15. (......)
  16. #endif
  17. // 计算UV.
  18. float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
  19. if (true)
  20. {
  21. SceneBufferUV = clamp(SceneBufferUV, BufferBilinearUVMinMax.xy, BufferBilinearUVMinMax.zw);
  22. }
  23. // 读取相关的元数据.
  24. FSSDCompressedSceneInfos CompressedRefSceneMetadata;
  25. FSSDSampleSceneInfos RefSceneMetadata;
  26. {
  27. CompressedRefSceneMetadata = SampleCompressedSceneMetadata(
  28. /* bPrevFrame = */ false,
  29. SceneBufferUV, BufferUVToBufferPixelCoord(SceneBufferUV));
  30. float2 ScreenPosition = DenoiserBufferUVToScreenPosition(SceneBufferUV);
  31. RefSceneMetadata = UncompressSampleSceneInfo(
  32. CONFIG_METADATA_BUFFER_LAYOUT, /* bPrevFrame = */ false,
  33. ScreenPosition, CompressedRefSceneMetadata);
  34. }
  35. // 采样相关的采样数据.
  36. #if !CONFIG_UPSCALE || 1
  37. FSSDSignalArray RefSamples;
  38. FSSDSignalFrequencyArray RefFrequencies;
  39. SampleMultiplexedSignals(
  40. Signal_Textures_0,
  41. Signal_Textures_1,
  42. Signal_Textures_2,
  43. Signal_Textures_3,
  44. GlobalPointClampedSampler,
  45. CONFIG_SIGNAL_INPUT_LAYOUT,
  46. /* MultiplexedSampleId = */ 0,
  47. /* bNormalizeSample = */ CONFIG_NORMALIZE_INPUT != 0,
  48. SceneBufferUV,
  49. /* out */ RefSamples,
  50. /* out */ RefFrequencies);
  51. #if CONFIG_NORMALIZE_INPUT
  52. FSSDSignalArray NormalizedRefSamples = RefSamples;
  53. #else
  54. // TODO(Denoiser): Decode twice instead.
  55. FSSDSignalArray NormalizedRefSamples = NormalizeToOneSampleArray(RefSamples);
  56. #endif
  57. #endif
  58. // 缩放卷积核系数.
  59. #if CONFIG_UPSCALE
  60. float KernelSpreadFactor = UpscaleFactor;
  61. #elif !CONFIG_CUSTOM_SPREAD_FACTOR
  62. const float KernelSpreadFactor = 1;
  63. #endif
  64. // 计算所需的采样数量.
  65. float RequestedSampleCount = 1024;
  66. #if CONFIG_SAMPLE_SET == SAMPLE_SET_NONE
  67. RequestedSampleCount = 1;
  68. #elif CONFIG_SAMPLE_COUNT_POLICY == SAMPLE_COUNT_POLICY_DISABLED
  69. // NOP
  70. #elif CONFIG_SAMPLE_COUNT_POLICY == SAMPLE_COUNT_POLICY_SAMPLE_ACCUMULATION_BASED
  71. {
  72. #if CONFIG_SIGNAL_BATCH_SIZE != 1
  73. #error Unable to support more than one signal.
  74. #endif
  75. RequestedSampleCount = clamp(TARGETED_SAMPLE_COUNT / RefSamples.Array[0].SampleCount, 1, MaxSampleCount);
  76. }
  77. #else
  78. #error Unknown policy to control the number of samples.
  79. #endif
  80. // 卷积核成员的别名.
  81. #if (CONFIG_SAMPLE_SET == SAMPLE_SET_STACKOWIAK_4_SETS) && CONFIG_VGPR_OPTIMIZATION
  82. float2 KernelBufferUV;
  83. uint SampleTrackId;
  84. #endif
  85. // 在空间上积累输入.
  86. FSSDSignalAccumulatorArray SignalAccumulators;
  87. {
  88. FSSDKernelConfig KernelConfig = CreateKernelConfig();
  89. #if DEBUG_OUTPUT
  90. {
  91. KernelConfig.DebugPixelPosition = DispatchThreadId;
  92. KernelConfig.DebugEventCounter = 0;
  93. }
  94. #endif
  95. // 填充卷积核配置.
  96. KernelConfig.SampleSet = CONFIG_SAMPLE_SET;
  97. KernelConfig.SampleSubSetId = CONFIG_SAMPLE_SUBSET;
  98. KernelConfig.BufferLayout = CONFIG_SIGNAL_INPUT_LAYOUT;
  99. KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
  100. KernelConfig.NeighborToRefComputation = NEIGHBOR_TO_REF_LOWEST_VGPR_PRESSURE;
  101. KernelConfig.bUnroll = CONFIG_SAMPLE_SET != SAMPLE_SET_STACKOWIAK_4_SETS;
  102. KernelConfig.bDescOrder = CONFIG_OUTPUT_MODE == OUTPUT_MODE_DRB;
  103. KernelConfig.BilateralDistanceComputation = CONFIG_BILATERAL_DISTANCE_COMPUTATION;
  104. KernelConfig.WorldBluringDistanceMultiplier = CONFIG_BILATERAL_DISTANCE_MULTIPLIER;
  105. KernelConfig.bNormalizeSample = CONFIG_NORMALIZE_INPUT != 0;
  106. KernelConfig.bSampleKernelCenter = CONFIG_UPSCALE;
  107. KernelConfig.bForceKernelCenterAccumulation = true;
  108. KernelConfig.bClampUVPerMultiplexedSignal = CONFIG_CLAMP_UV_PER_SIGNAL != 0;
  109. // 从1spp重构球谐函数.
  110. KernelConfig.bComputeSampleColorSH = DIM_STAGE == STAGE_RECONSTRUCTION && DIM_MULTI_SPP == 0;
  111. // 填充颜色空间.
  112. {
  113. UNROLL_N(SIGNAL_ARRAY_SIZE)
  114. for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
  115. {
  116. KernelConfig.BufferColorSpace[MultiplexId] = CONFIG_INPUT_COLOR_SPACE;
  117. KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_ACCUMULATION_COLOR_SPACE;
  118. }
  119. }
  120. // 设置双边滤波预设.
  121. SetBilateralPreset(CONFIG_BILATERAL_PRESET, /* inout */ KernelConfig);
  122. // SGPRs
  123. KernelConfig.BufferSizeAndInvSize = BufferSizeAndInvSize;
  124. KernelConfig.BufferBilinearUVMinMax = BufferBilinearUVMinMax;
  125. KernelConfig.KernelSpreadFactor = KernelSpreadFactor;
  126. KernelConfig.HarmonicPeriode = HarmonicPeriode;
  127. (......)
  128. // VGPRs
  129. KernelConfig.BufferUV = SceneBufferUV;
  130. {
  131. #if CONFIG_REF_METADATA_COMPRESSION == CONFIG_METADATA_BUFFER_LAYOUT
  132. // Straight up plumb down the compress layout to save any VALU.
  133. KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
  134. #else
  135. // Recompress the reference scene metadata
  136. KernelConfig.CompressedRefSceneMetadata = CompressSampleSceneInfo(CONFIG_REF_METADATA_COMPRESSION, RefSceneMetadata);
  137. #endif
  138. KernelConfig.RefBufferUV = SceneBufferUV;
  139. KernelConfig.RefSceneMetadataLayout = CONFIG_REF_METADATA_COMPRESSION;
  140. }
  141. KernelConfig.HammersleySeed = Rand3DPCG16(int3(SceneBufferUV * BufferUVToOutputPixelPosition, View.StateFrameIndexMod8)).xy;
  142. (......)
  143. // 创建未压缩的累积器.
  144. FSSDSignalAccumulatorArray UncompressedAccumulators = CreateSignalAccumulatorArray();
  145. // 当不进行缩放时,手动强制累积内核的样本.
  146. if (!KernelConfig.bSampleKernelCenter && !KernelConfig.bDescOrder)
  147. {
  148. // SIGNAL_ARRAY_SIZE默认是1, 即1spp.
  149. UNROLL_N(SIGNAL_ARRAY_SIZE)
  150. for (uint SignalMultiplexId = 0; SignalMultiplexId < SIGNAL_ARRAY_SIZE; SignalMultiplexId++)
  151. {
  152. const uint BatchedSignalId = ComputeSignalBatchIdFromSignalMultiplexId(KernelConfig, SignalMultiplexId);
  153. FSSDSignalDomainKnowledge DomainKnowledge = GetSignalDomainKnowledge(BatchedSignalId);
  154. uint2 RefPixelCoord = floor(KernelConfig.BufferUV * KernelConfig.BufferSizeAndInvSize.xy);
  155. // 采样数据.
  156. FSSDSignalSample CenterSample = TransformSignalSampleForAccumulation(
  157. KernelConfig,
  158. SignalMultiplexId,
  159. RefSceneMetadata,
  160. RefSamples.Array[SignalMultiplexId],
  161. RefPixelCoord);
  162. // 采样累积信息.
  163. FSSDSampleAccumulationInfos SampleInfos;
  164. SampleInfos.Sample = CenterSample;
  165. SampleInfos.Frequency = RefFrequencies.Array[SignalMultiplexId];
  166. SampleInfos.FinalWeight = 1.0;
  167. SampleInfos.InvFrequency = GetSignalWorldBluringRadius(SampleInfos.Frequency, RefSceneMetadata, DomainKnowledge);
  168. if (KernelConfig.BilateralDistanceComputation == SIGNAL_WORLD_FREQUENCY_PRECOMPUTED_BLURING_RADIUS)
  169. {
  170. SampleInfos.InvFrequency = SampleInfos.Frequency.WorldBluringRadius;
  171. }
  172. // 累积样本.
  173. AccumulateSample(
  174. /* inout */ UncompressedAccumulators.Array[SignalMultiplexId],
  175. SampleInfos);
  176. }
  177. }
  178. #if CONFIG_SAMPLE_SET == SAMPLE_SET_STACKOWIAK_4_SETS
  179. {
  180. KernelConfig.SampleCount = clamp(uint(RequestedSampleCount) / kStackowiakSampleSetCount, 1, MaxSampleCount);
  181. (......)
  182. {
  183. // 将内核中心放在四边形的中心, 在样本偏移中进行了半像素偏移.
  184. KernelConfig.BufferUV = float2(DispatchThreadId | 1) * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
  185. // Id of the pixel in the quad. This is to match hard coded first samples of the sample set.
  186. KernelConfig.SampleTrackId = ((DispatchThreadId.x & 1) | ((DispatchThreadId.y & 1) << 1));
  187. }
  188. (......)
  189. }
  190. #elif CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_RECT || CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_ELLIPSE
  191. (......)
  192. #endif // CONFIG_SAMPLE_SET == SAMPLE_SET_DIRECTIONAL_*
  193. FSSDCompressedSignalAccumulatorArray CompressedAccumulators = CompressAccumulatorArray(UncompressedAccumulators, CONFIG_ACCUMULATOR_VGPR_COMPRESSION);
  194. if (1)
  195. {
  196. // 累积卷积核.
  197. AccumulateKernel(
  198. KernelConfig,
  199. Signal_Textures_0,
  200. Signal_Textures_1,
  201. Signal_Textures_2,
  202. Signal_Textures_3,
  203. /* inout */ UncompressedAccumulators,
  204. /* inout */ CompressedAccumulators);
  205. }
  206. (......)
  207. // 当按降序进行累积时,在任何累积之后手动采样内核的中心.
  208. if (!KernelConfig.bSampleKernelCenter && KernelConfig.bDescOrder)
  209. {
  210. KernelConfig.BufferUV = SceneBufferUV;
  211. SampleAndAccumulateCenterSampleAsItsOwnCluster(
  212. KernelConfig,
  213. Signal_Textures_0,
  214. Signal_Textures_1,
  215. Signal_Textures_2,
  216. Signal_Textures_3,
  217. /* inout */ UncompressedAccumulators,
  218. /* inout */ CompressedAccumulators);
  219. }
  220. #if CONFIG_ACCUMULATOR_VGPR_COMPRESSION == ACCUMULATOR_COMPRESSION_DISABLED
  221. SignalAccumulators = UncompressedAccumulators;
  222. #else
  223. SignalAccumulators = UncompressAccumulatorArray(CompressedAccumulators, CONFIG_ACCUMULATOR_VGPR_COMPRESSION);
  224. #endif
  225. }
  226. (......)
  227. // 将空间积累信号按不同的模式转换成多路复用信号.
  228. uint MultiplexCount = 1;
  229. FSSDSignalArray OutputSamples = CreateSignalArrayFromScalarValue(0.0);
  230. FSSDSignalFrequencyArray OutputFrequencies = CreateInvalidSignalFrequencyArray();
  231. {
  232. {
  233. MultiplexCount = CONFIG_SIGNAL_BATCH_SIZE;
  234. UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
  235. for (uint MultiplexId = 0; MultiplexId < CONFIG_SIGNAL_BATCH_SIZE; MultiplexId++)
  236. {
  237. UncompressSignalAccumulator(/* inout */ SignalAccumulators.Array[MultiplexId]);
  238. OutputSamples.Array[MultiplexId] = SignalAccumulators.Array[MultiplexId].Moment1;
  239. // 输出最小的逆反频率作为新世界模糊半径的subsequent pass.
  240. OutputFrequencies.Array[MultiplexId] = SignalAccumulators.Array[MultiplexId].MinFrequency;
  241. }
  242. }
  243. (......)
  244. }
  245. (......)
  246. // 计算输出像素位置.
  247. uint2 OutputPixelPostion;
  248. #if CONFIG_VGPR_OPTIMIZATION && !CONFIG_UPSCALE // TODO(Denoiser)
  249. {
  250. OutputPixelPostion = (uint2(KernelBufferUV * BufferUVToOutputPixelPosition) & ~0x1) | (uint2(SampleTrackId, SampleTrackId >> 1) & 0x1);
  251. (......)
  252. }
  253. #else
  254. OutputPixelPostion = ViewportMin + DispatchThreadId;
  255. #endif
  256. BRANCH
  257. if (all(OutputPixelPostion < ViewportMax))
  258. {
  259. // 输出多路复合信号.
  260. (......)
  261. {
  262. OutputMultiplexedSignal(
  263. SignalOutput_UAVs_0,
  264. SignalOutput_UAVs_1,
  265. SignalOutput_UAVs_2,
  266. SignalOutput_UAVs_3,
  267. CONFIG_SIGNAL_OUTPUT_LAYOUT,
  268. MultiplexCount,
  269. OutputPixelPostion,
  270. OutputSamples,
  271. OutputFrequencies);
  272. }
  273. }
  274. } // MainCS

经过数据重建之后,输出的颜色和AO噪点有所降低:

经过数据重建之后的颜色和AO对比图。左侧是重建前的数据,右侧是重建后的数据。

由于数据重建之后的图像依然存在明显的噪点,这就需要降噪的最后一个阶段:时间累积降噪。它的输入有当前帧和上一帧的压缩元数据、场景颜色、时间累积数据等。其使用的CS Shader如下:

  1. // Engine\Shaders\Private\ScreenSpaceDenoise\SSDTemporalAccumulation.usf
  2. void TemporallyAccumulate(
  3. uint2 DispatchThreadId : SV_DispatchThreadID,
  4. uint2 GroupId : SV_GroupID,
  5. uint2 GroupThreadId : SV_GroupThreadID,
  6. uint GroupThreadIndex : SV_GroupIndex)
  7. {
  8. // 计算buffer UV.
  9. float2 SceneBufferUV = DispatchThreadId * ThreadIdToBufferUV.xy + ThreadIdToBufferUV.zw;
  10. if (true)
  11. {
  12. SceneBufferUV = clamp(SceneBufferUV, BufferBilinearUVMinMax.xy, BufferBilinearUVMinMax.zw);
  13. }
  14. // 采样当前帧数据.
  15. FSSDCompressedSceneInfos CompressedRefSceneMetadata = SampleCompressedSceneMetadata(
  16. /* bPrevFrame = */ false,
  17. SceneBufferUV, BufferUVToBufferPixelCoord(SceneBufferUV));
  18. float DeviceZ;
  19. {
  20. FSSDSampleSceneInfos RefInfo = UncompressSampleSceneInfo(
  21. CONFIG_METADATA_BUFFER_LAYOUT, /* bIsPrevFrame = */ false,
  22. DenoiserBufferUVToScreenPosition(SceneBufferUV),
  23. CompressedRefSceneMetadata);
  24. DeviceZ = RefInfo.DeviceZ;
  25. }
  26. // 重投影到上一帧.
  27. float3 HistoryScreenPosition = float3(DenoiserBufferUVToScreenPosition(SceneBufferUV), DeviceZ);
  28. bool bIsDynamicPixel = false;
  29. if (1)
  30. {
  31. float4 ThisClip = float4(HistoryScreenPosition, 1);
  32. float4 PrevClip = mul(ThisClip, View.ClipToPrevClip);
  33. float3 PrevScreen = PrevClip.xyz * rcp(PrevClip.w);
  34. float3 Velocity = HistoryScreenPosition - PrevScreen;
  35. if (1)
  36. {
  37. float4 EncodedVelocity = GBufferVelocityTexture.SampleLevel(GlobalPointClampedSampler, SceneBufferUV, 0);
  38. if (EncodedVelocity.x > 0.0)
  39. {
  40. Velocity = DecodeVelocityFromTexture(EncodedVelocity);
  41. }
  42. }
  43. HistoryScreenPosition -= Velocity;
  44. }
  45. // 采样多路复合信号.
  46. FSSDSignalArray CurrentFrameSamples;
  47. FSSDSignalFrequencyArray CurrentFrameFrequencies;
  48. SampleMultiplexedSignals(
  49. SignalInput_Textures_0,
  50. SignalInput_Textures_1,
  51. SignalInput_Textures_2,
  52. SignalInput_Textures_3,
  53. GlobalPointClampedSampler,
  54. CONFIG_SIGNAL_INPUT_LAYOUT,
  55. /* MultiplexedSampleId = */ 0,
  56. /* bNormalizeSample = */ CONFIG_NORMALIZED_INPUT != 0,
  57. SceneBufferUV,
  58. /* out */ CurrentFrameSamples,
  59. /* out */ CurrentFrameFrequencies);
  60. // 采样历史帧数据.
  61. FSSDSignalArray HistorySamples = CreateSignalArrayFromScalarValue(0.0);
  62. {
  63. float2 HistoryBufferUV = HistoryScreenPosition.xy* ScreenPosToHistoryBufferUV.xy + ScreenPosToHistoryBufferUV.zw;
  64. float2 ClampedHistoryBufferUV = clamp(HistoryBufferUV, HistoryBufferUVMinMax.xy, HistoryBufferUVMinMax.zw);
  65. bool bIsPreviousFrameOffscreen = any(HistoryBufferUV != ClampedHistoryBufferUV);
  66. BRANCH
  67. if (!bIsPreviousFrameOffscreen)
  68. {
  69. FSSDKernelConfig KernelConfig = CreateKernelConfig();
  70. #if DEBUG_OUTPUT
  71. {
  72. KernelConfig.DebugPixelPosition = DispatchThreadId;
  73. KernelConfig.DebugEventCounter = 0;
  74. }
  75. #endif
  76. // 填充卷积核配置.
  77. KernelConfig.SampleSet = CONFIG_HISTORY_KERNEL;
  78. KernelConfig.bSampleKernelCenter = true;
  79. KernelConfig.BufferLayout = CONFIG_SIGNAL_HISTORY_LAYOUT;
  80. KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
  81. KernelConfig.bUnroll = true;
  82. KernelConfig.bPreviousFrameMetadata = true;
  83. KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_MIN_METADATA;
  84. KernelConfig.bClampUVPerMultiplexedSignal = CONFIG_CLAMP_UV_PER_SIGNAL != 0;
  85. // 允许做双边拒绝历史时的一点错误,以适配每帧TAA抖动.
  86. KernelConfig.WorldBluringDistanceMultiplier = max(CONFIG_BILATERAL_DISTANCE_MULTIPLIER, 3.0);
  87. // 设置双边预设.
  88. SetBilateralPreset(CONFIG_HISTORY_BILATERAL_PRESET, /* inout */ KernelConfig);
  89. // 卷积核的SGPR配置.
  90. KernelConfig.BufferSizeAndInvSize = HistoryBufferSizeAndInvSize;
  91. KernelConfig.BufferBilinearUVMinMax = HistoryBufferUVMinMax;
  92. #if CONFIG_CLAMP_UV_PER_SIGNAL
  93. {
  94. UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
  95. for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  96. {
  97. uint MultiplexId = BatchedSignalId / CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
  98. KernelConfig.PerSignalUVMinMax[MultiplexId] = HistoryBufferScissorUVMinMax[MultiplexId];
  99. }
  100. }
  101. #endif
  102. // 卷积核的VGPR配置.
  103. KernelConfig.BufferUV = HistoryBufferUV + BufferUVBilinearCorrection;
  104. KernelConfig.bIsDynamicPixel = bIsDynamicPixel;
  105. #if CONFIG_PATCH_PREV_SCENE_DEPTH
  106. {
  107. KernelConfig.RefBufferUV = HistoryBufferUV;
  108. KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
  109. KernelConfig.bPreviousFrameRefMetadata = true;
  110. FSSDSampleSceneInfos PrevRefInfo = UncompressSampleSceneInfo(
  111. CONFIG_METADATA_BUFFER_LAYOUT, /* bIsPrevFrame = */ false,
  112. BufferUVToBufferPixelCoord(SceneBufferUV),
  113. CompressedRefSceneMetadata);
  114. PrevRefInfo.ScreenPosition = HistoryScreenPosition.xy;
  115. PrevRefInfo.DeviceZ = HistoryScreenPosition.z;
  116. PrevRefInfo.WorldDepth = ConvertFromDeviceZ(HistoryScreenPosition.z);
  117. float4 ClipPosition = float4(HistoryScreenPosition.xy * (View.ViewToClip[3][3] < 1.0f ? PrevRefInfo.WorldDepth : 1.0f), PrevRefInfo.WorldDepth, 1);
  118. PrevRefInfo.TranslatedWorldPosition = mul(ClipPosition, View.PrevScreenToTranslatedWorld).xyz + (View.PreViewTranslation.xyz - View.PrevPreViewTranslation.xyz);
  119. KernelConfig.CompressedRefSceneMetadata = CompressSampleSceneInfo(
  120. KernelConfig.RefSceneMetadataLayout,
  121. PrevRefInfo);
  122. }
  123. #else
  124. {
  125. KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
  126. KernelConfig.RefBufferUV = SceneBufferUV;
  127. KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
  128. }
  129. #endif
  130. // 计算随机信号.
  131. ISOLATE
  132. {
  133. KernelConfig.Randoms[0] = InterleavedGradientNoise(SceneBufferUV * BufferUVToOutputPixelPosition, View.StateFrameIndexMod8);
  134. }
  135. FSSDSignalAccumulatorArray SignalAccumulators = CreateSignalAccumulatorArray();
  136. FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
  137. // 累计卷积核.
  138. AccumulateKernel(
  139. KernelConfig,
  140. PrevHistory_Textures_0,
  141. PrevHistory_Textures_1,
  142. PrevHistory_Textures_2,
  143. PrevHistory_Textures_3,
  144. /* inout */ SignalAccumulators,
  145. /* inout */ UnusedCompressedAccumulators);
  146. // 从累加器导出历史帧采样.
  147. UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
  148. for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  149. {
  150. HistorySamples.Array[BatchedSignalId] = SignalAccumulators.Array[BatchedSignalId].Moment1;
  151. BRANCH
  152. if (bCameraCut[BatchedSignalId])
  153. {
  154. HistorySamples.Array[BatchedSignalId] = CreateSignalSampleFromScalarValue(0.0);
  155. }
  156. }
  157. UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
  158. for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  159. {
  160. FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
  161. FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
  162. // 应用历史帧的预曝光.
  163. #if COMPILE_SIGNAL_COLOR
  164. HistorySamples.Array[BatchedSignalId].SceneColor.rgb *= HistoryPreExposureCorrection;
  165. #endif
  166. } // for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  167. } // if (!bIsPreviousFrameOffscreen)
  168. }
  169. #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_DIFFUSE_INDIRECT_AND_AO && 0
  170. DebugOutput[DispatchThreadId] = float4(
  171. HistorySamples.Array[0].SampleCount / 4096,
  172. 0,
  173. 0,
  174. 0);
  175. #endif
  176. const bool bPostRejectionBlending = true;
  177. // 历史帧数据的摒弃.
  178. #if (CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_MINMAX_BOUNDARIES || CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_VAR_BOUNDARIES)
  179. {
  180. FSSDKernelConfig KernelConfig = CreateKernelConfig();
  181. #if DEBUG_OUTPUT
  182. {
  183. KernelConfig.DebugPixelPosition = DispatchThreadId;
  184. KernelConfig.DebugEventCounter = 0;
  185. }
  186. #endif
  187. {
  188. KernelConfig.bSampleKernelCenter = CONFIG_USE_REJECTION_BUFFER != 0;
  189. // 历史帧摒弃已经通过任何重投影模糊了系数. 为了优先考虑样本比精度更大的抑制稳定性,所以只取参考样本的模糊距离,这取决于当前帧的深度和像素大小。
  190. KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_REF_METADATA_ONLY;
  191. KernelConfig.NeighborToRefComputation = NEIGHBOR_TO_REF_LOWEST_VGPR_PRESSURE;
  192. if (CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK)
  193. KernelConfig.BilateralDistanceComputation = SIGNAL_WORLD_FREQUENCY_PRECOMPUTED_BLURING_RADIUS;
  194. KernelConfig.WorldBluringDistanceMultiplier = CONFIG_BILATERAL_DISTANCE_MULTIPLIER;
  195. #if CONFIG_REJECTION_SAMPLE_SET == SAMPLE_SET_NXN
  196. {
  197. KernelConfig.SampleSet = SAMPLE_SET_NXN;
  198. KernelConfig.BoxKernelRadius = 3;
  199. KernelConfig.bUnroll = false;
  200. }
  201. #else
  202. {
  203. KernelConfig.SampleSet = CONFIG_REJECTION_SAMPLE_SET;
  204. KernelConfig.bUnroll = true;
  205. }
  206. #endif
  207. if (CONFIG_USE_REJECTION_BUFFER)
  208. {
  209. // 历史摒弃有被去噪信号的两个矩(moment).
  210. KernelConfig.MultiplexedSignalsPerSignalDomain = 2;
  211. KernelConfig.BufferLayout = CONFIG_SIGNAL_HISTORY_REJECTION_LAYOUT;
  212. KernelConfig.bNormalizeSample = false;
  213. for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
  214. {
  215. KernelConfig.BufferColorSpace[MultiplexId] = CONFIG_REJECTION_BUFFER_COLOR_SPACE;
  216. KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_HISTORY_REJECTION_COLOR_SPACE;
  217. }
  218. // 强制采样内核中心,因为它将包含两个用于匹配场景元数据的矩.
  219. KernelConfig.bForceKernelCenterAccumulation = true;
  220. }
  221. else
  222. {
  223. KernelConfig.MultiplexedSignalsPerSignalDomain = CONFIG_MULTIPLEXED_SIGNALS_PER_SIGNAL_DOMAIN;
  224. KernelConfig.BufferLayout = CONFIG_SIGNAL_INPUT_LAYOUT;
  225. KernelConfig.bNormalizeSample = true;
  226. for (uint MultiplexId = 0; MultiplexId < SIGNAL_ARRAY_SIZE; MultiplexId++)
  227. {
  228. KernelConfig.AccumulatorColorSpace[MultiplexId] = CONFIG_HISTORY_REJECTION_COLOR_SPACE;
  229. }
  230. if (MAX_SIGNAL_BATCH_SIZE == 1)
  231. {
  232. KernelConfig.bForceAllAccumulation = CurrentFrameSamples.Array[0].SampleCount == 0;
  233. }
  234. }
  235. SetBilateralPreset(CONFIG_BILATERAL_PRESET, /* inout */ KernelConfig);
  236. }
  237. // SGPR配置.
  238. {
  239. KernelConfig.BufferSizeAndInvSize = BufferSizeAndInvSize;
  240. KernelConfig.BufferBilinearUVMinMax = BufferBilinearUVMinMax;
  241. }
  242. // VGPR配置.
  243. {
  244. KernelConfig.BufferUV = SceneBufferUV;
  245. {
  246. KernelConfig.CompressedRefSceneMetadata = CompressedRefSceneMetadata;
  247. KernelConfig.RefBufferUV = SceneBufferUV;
  248. KernelConfig.RefSceneMetadataLayout = CONFIG_METADATA_BUFFER_LAYOUT;
  249. }
  250. }
  251. // 累积当前帧以节省不必要的双边性能评估.
  252. FSSDSignalAccumulatorArray SignalAccumulators = CreateSignalAccumulatorArray();
  253. {
  254. FSSDSampleSceneInfos RefSceneMetadata = UncompressRefSceneMetadata(KernelConfig);
  255. FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
  256. FSSDSignalArray CenterSample = CurrentFrameSamples;
  257. if (KernelConfig.bNormalizeSample)
  258. {
  259. CenterSample = NormalizeToOneSampleArray(CurrentFrameSamples);
  260. }
  261. AccumulateRefSampleAsKernelCenter(
  262. KernelConfig,
  263. /* inout */ SignalAccumulators,
  264. /* inout */ UnusedCompressedAccumulators,
  265. KernelConfig.RefBufferUV,
  266. RefSceneMetadata,
  267. CenterSample,
  268. CurrentFrameFrequencies);
  269. }
  270. {
  271. FSSDCompressedSignalAccumulatorArray UnusedCompressedAccumulators = CreateUninitialisedCompressedAccumulatorArray();
  272. #if CONFIG_USE_REJECTION_BUFFER
  273. AccumulateKernel(
  274. KernelConfig,
  275. HistoryRejectionSignal_Textures_0,
  276. HistoryRejectionSignal_Textures_1,
  277. HistoryRejectionSignal_Textures_2,
  278. HistoryRejectionSignal_Textures_3,
  279. /* inout */ SignalAccumulators,
  280. /* inout */ UnusedCompressedAccumulators);
  281. #else
  282. AccumulateKernel(
  283. KernelConfig,
  284. SignalInput_Textures_0,
  285. SignalInput_Textures_1,
  286. SignalInput_Textures_2,
  287. SignalInput_Textures_3,
  288. /* inout */ SignalAccumulators,
  289. /* inout */ UnusedCompressedAccumulators);
  290. #endif
  291. }
  292. // 裁剪历史数据.
  293. UNROLL_N(CONFIG_SIGNAL_BATCH_SIZE)
  294. for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  295. {
  296. FSSDSignalSample NeighborMoment1 = CreateSignalSampleFromScalarValue(0.0);
  297. FSSDSignalSample NeighborMoment2 = CreateSignalSampleFromScalarValue(0.0);
  298. #if CONFIG_REJECTION_INPUT_MODE == REJECTION_INPUT_MODE_1UNNORMALIZED
  299. {
  300. float NormalizeFactor = SafeRcp(SignalAccumulators.Array[BatchedSignalId].Moment1.SampleCount);
  301. NeighborMoment1 = MulSignal(SignalAccumulators.Array[BatchedSignalId].Moment1, NormalizeFactor);
  302. #if COMPILE_MOMENT2_ACCUMULATOR
  303. NeighborMoment2 = MulSignal(SignalAccumulators.Array[BatchedSignalId].Moment2, NormalizeFactor);
  304. #endif
  305. }
  306. #elif CONFIG_REJECTION_INPUT_MODE == REJECTION_INPUT_MODE_2PRETRANSFORMED_MOMMENTS
  307. {
  308. #if SIGNAL_ARRAY_SIZE != 2 * MAX_SIGNAL_BATCH_SIZE
  309. #error Invalid signal array size.
  310. #endif
  311. float NormalizeFactor = SafeRcp(SignalAccumulators.Array[BatchedSignalId * 2 + 0].Moment1.SampleCount);
  312. NeighborMoment1 = MulSignal(SignalAccumulators.Array[BatchedSignalId * 2 + 0].Moment1, NormalizeFactor);
  313. NeighborMoment2 = MulSignal(SignalAccumulators.Array[BatchedSignalId * 2 + 1].Moment1, NormalizeFactor);
  314. }
  315. #else
  316. #error NOrmalized samples.
  317. #endif
  318. #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS && 0
  319. FSSDSignalSample Temp = TransformSignalForPostRejection(NeighborMoment1);
  320. DebugOutput[DispatchThreadId] = float4(
  321. Temp.SceneColor.rgb,
  322. 0);
  323. #endif
  324. FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
  325. FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
  326. // 裁剪历史数据.
  327. #if CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_VAR_BOUNDARIES
  328. {
  329. #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_AO
  330. const float StdDevMultiplier = 6.00;
  331. #else
  332. const float StdDevMultiplier = 1.25;
  333. #endif
  334. FSSDSignalSample StdDev = SqrtSignal(AbsSignal(SubtractSignal(NeighborMoment2, PowerSignal(NeighborMoment1, 2))));
  335. FSSDSignalSample NeighborMin = AddSignal(NeighborMoment1, MulSignal(StdDev, -StdDevMultiplier));
  336. FSSDSignalSample NeighborMax = AddSignal(NeighborMoment1, MulSignal(StdDev, StdDevMultiplier));
  337. if (0)
  338. {
  339. FSSDSignalSample QuantizationErrorMin = MulSignal(NeighborMoment1, 1 - SafeRcp(HistorySample.SampleCount));
  340. FSSDSignalSample QuantizationErrorMax = MulSignal(NeighborMoment1, 1 + SafeRcp(HistorySample.SampleCount));
  341. NeighborMin = MinSignal(NeighborMin, QuantizationErrorMin);
  342. NeighborMax = MaxSignal(NeighborMax, QuantizationErrorMax);
  343. }
  344. // 变换历史数据,使其在正确的组件空间中,并规范化为裁剪盒.
  345. FSSDSignalSample NormalizedHistorySample = NormalizeToOneSample(HistorySample);
  346. FSSDSignalSample TransformedHistorySample = TransformInputBufferForPreRejection(NormalizedHistorySample);
  347. // 裁剪历史.
  348. FSSDSignalSample ClampedTransformedHistorySample = ClampSignal(TransformedHistorySample, NeighborMin, NeighborMax);
  349. // 变换历史回到线性组件空间.
  350. FSSDSignalSample ClampedHistorySample = TransformSignalForPostRejection(ClampedTransformedHistorySample);
  351. // 重估抗鬼影的历史数据.
  352. {
  353. FSSDSignalSample RejectedDiff = AbsSignal(SubtractSignal(ClampedTransformedHistorySample, TransformedHistorySample));
  354. // 计算历史数据被改变的程度.
  355. float RejectionFactor = 0.0;
  356. #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS && (CONFIG_HISTORY_REJECTION_COLOR_SPACE & COLOR_SPACE_LCOCG)
  357. {
  358. #if !COMPILE_SIGNAL_COLOR
  359. #error Need to compile signal color.
  360. #endif
  361. RejectionFactor = abs(
  362. Luma_To_LumaLog(ClampedTransformedHistorySample.SceneColor.x) -
  363. Luma_To_LumaLog(TransformedHistorySample.SceneColor.x));
  364. RejectionFactor = max(RejectionFactor, 1 * max(RejectedDiff.SceneColor.y, RejectedDiff.SceneColor.z));
  365. RejectionFactor = max(RejectionFactor, 1 * RejectedDiff.SceneColor.a);
  366. }
  367. #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_POLYCHROMATIC_PENUMBRA_HARMONIC || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_AO || CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_DIFFUSE_INDIRECT_AND_AO
  368. {
  369. RejectionFactor = abs(ClampedTransformedHistorySample.MissCount - TransformedHistorySample.MissCount);
  370. }
  371. #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SSGI
  372. {
  373. RejectionFactor = abs(ClampedTransformedHistorySample.MissCount - TransformedHistorySample.MissCount);
  374. }
  375. #else
  376. #error Unsupported signal rejection.
  377. #endif
  378. // 计算一个初始历史权重,就像已经删除了样本一样.
  379. float FinalHistoryWeight = HistorySample.SampleCount * saturate(1 - RejectionFactor);
  380. // 在进行摒弃前的积累时,需要确保输入权重通过.
  381. if (!bPostRejectionBlending)
  382. {
  383. FinalHistoryWeight = max(FinalHistoryWeight, CurrentFrameSample.SampleCount);
  384. }
  385. // 当执行上采样时, 可能拥有非法的输入样本.
  386. FinalHistoryWeight = max(FinalHistoryWeight, NeighborMoment1.SampleCount * 0.1);
  387. // 应用历史权重.
  388. HistorySample = MulSignal(ClampedHistorySample, FinalHistoryWeight);
  389. HistorySample.SampleCount = FinalHistoryWeight;
  390. }
  391. }
  392. #elif CONFIG_HISTORY_REJECTION == HISTORY_REJECTION_MINMAX_BOUNDARIES
  393. {
  394. FSSDSignalSample NeighborMin = SignalAccumulators.Array[BatchedSignalId].Min;
  395. FSSDSignalSample NeighborMax = SignalAccumulators.Array[BatchedSignalId].Max;
  396. // 如果没有邻居有样本, 最大样本数将为0.
  397. bool bIsValid = NeighborMax.SampleCount > 0.0;
  398. float RejectedSampleCount = 0;
  399. HistorySample = MulSignal(TransformSignalForPostRejection(ClampSignal(TransformInputBufferForPreRejection(NormalizeToOneSample(HistorySample)), NeighborMin, NeighborMax)), HistorySample.SampleCount - RejectedSampleCount);
  400. // 所有的裁剪盒都是无效的,所以历史样本也无效.
  401. FLATTEN
  402. if (!bIsValid)
  403. {
  404. HistorySample = CreateSignalSampleFromScalarValue(0.0);
  405. }
  406. }
  407. #endif
  408. // 扩大最小的逆反频率.
  409. if (1)
  410. {
  411. CurrentFrameFrequencies.Array[BatchedSignalId] = MinSignalFrequency(
  412. CurrentFrameFrequencies.Array[BatchedSignalId],
  413. SignalAccumulators.Array[BatchedSignalId].MinFrequency);
  414. }
  415. HistorySamples.Array[BatchedSignalId] = HistorySample;
  416. CurrentFrameSamples.Array[BatchedSignalId] = CurrentFrameSample;
  417. } // for (uint BatchedSignalId = 0; BatchedSignalId < CONFIG_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  418. }
  419. #endif // CONFIG_HISTORY_REJECTION > 0
  420. // 处理并保存当前像素的所有历史样本.
  421. {
  422. UNROLL
  423. for (uint BatchedSignalId = 0; BatchedSignalId < MAX_SIGNAL_BATCH_SIZE; BatchedSignalId++)
  424. {
  425. FSSDSignalSample CurrentFrameSample = CurrentFrameSamples.Array[BatchedSignalId];
  426. FSSDSignalSample HistorySample = HistorySamples.Array[BatchedSignalId];
  427. FSSDSignalFrequency CurrentFrequency = CurrentFrameFrequencies.Array[BatchedSignalId];
  428. float TargetedSampleCount;
  429. {
  430. float2 ScreenPosition = DenoiserBufferUVToScreenPosition(SceneBufferUV);
  431. FSSDSampleSceneInfos RefSceneMetadata = UncompressSampleSceneInfo(
  432. CONFIG_METADATA_BUFFER_LAYOUT, /* bPrevFrame = */ false,
  433. ScreenPosition, CompressedRefSceneMetadata);
  434. // Use the diameter, because that is the distance between two pixel.
  435. float PixelWorldBluringRadius = ComputeWorldBluringRadiusCausedByPixelSize(RefSceneMetadata);
  436. float WorldBluringRadius = WorldBluringRadiusToBilateralWorldDistance(PixelWorldBluringRadius);
  437. #if CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_SHADOW_VISIBILITY_MASK
  438. {
  439. float ResolutionFraction = 0.5;
  440. float ToleratedNoiseRatio = 0.25 * rcp(9 * sqrt(2));
  441. float OutputPixelRadius = CurrentFrequency.WorldBluringRadius * rcp(PixelWorldBluringRadius) * ResolutionFraction;
  442. TargetedSampleCount = clamp(OutputPixelRadius * OutputPixelRadius * (PI * ToleratedNoiseRatio), 1, TARGETED_SAMPLE_COUNT);
  443. }
  444. #elif CONFIG_SIGNAL_PROCESSING == SIGNAL_PROCESSING_REFLECTIONS
  445. {
  446. float2 NormalizedScreenMajorAxis;
  447. float InifinityMajorViewportRadius;
  448. float InifinityMinorViewportRadius;
  449. ProjectSpecularLobeToScreenSpace(
  450. RefSceneMetadata,
  451. /* out */ NormalizedScreenMajorAxis,
  452. /* out */ InifinityMajorViewportRadius,
  453. /* out */ InifinityMinorViewportRadius);
  454. InifinityMajorViewportRadius *= View.ViewSizeAndInvSize.x;
  455. InifinityMinorViewportRadius *= View.ViewSizeAndInvSize.x;
  456. TargetedSampleCount = PI * InifinityMajorViewportRadius * InifinityMinorViewportRadius;
  457. TargetedSampleCount = clamp(TargetedSampleCount, 1, TARGETED_SAMPLE_COUNT);
  458. }
  459. #else
  460. {
  461. TargetedSampleCount = TARGETED_SAMPLE_COUNT;
  462. }
  463. #endif
  464. }
  465. float PreviousFrameWeight = min(HistorySample.SampleCount, TargetedSampleCount - CurrentFrameSample.SampleCount);
  466. float PreviousFrameMultiplier = HistorySample.SampleCount > 0 ? PreviousFrameWeight / HistorySample.SampleCount : 0;
  467. // 信号的预变换.
  468. HistorySample = TransformSignal(
  469. HistorySample,
  470. /* SrcBasis = */ STANDARD_BUFFER_COLOR_SPACE,
  471. /* DestBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE);
  472. CurrentFrameSample = TransformSignal(
  473. CurrentFrameSample,
  474. /* SrcBasis = */ STANDARD_BUFFER_COLOR_SPACE,
  475. /* DestBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE);
  476. // 混合当前帧和历史帧的样本.
  477. HistorySample = AddSignal(MulSignal(HistorySample, PreviousFrameMultiplier), CurrentFrameSample);
  478. // 信号的后置变换.
  479. HistorySample = TransformSignal(
  480. HistorySample,
  481. /* SrcBasis = */ CONFIG_HISTORY_BLENDING_COLOR_SPACE,
  482. /* DestBasis = */ STANDARD_BUFFER_COLOR_SPACE);
  483. HistorySamples.Array[BatchedSignalId] = HistorySample;
  484. }
  485. }
  486. // 白名单应该输出,以确保编译器编译出所有最终不需要的东西.
  487. uint MultiplexCount = 1;
  488. FSSDSignalArray OutputSamples = CreateSignalArrayFromScalarValue(0.0);
  489. FSSDSignalFrequencyArray OutputFrequencies = CreateInvalidSignalFrequencyArray();
  490. {
  491. MultiplexCount = CONFIG_SIGNAL_BATCH_SIZE;
  492. UNROLL
  493. for (uint BatchedSignalId = 0; BatchedSignalId < MultiplexCount; BatchedSignalId++)
  494. {
  495. OutputSamples.Array[BatchedSignalId] = HistorySamples.Array[BatchedSignalId];
  496. OutputFrequencies.Array[BatchedSignalId] = CurrentFrameFrequencies.Array[BatchedSignalId];
  497. }
  498. }
  499. uint2 OutputPixelPostion = BufferUVToBufferPixelCoord(SceneBufferUV);
  500. BRANCH
  501. if (all(OutputPixelPostion < ViewportMax))
  502. {
  503. OutputMultiplexedSignal(
  504. SignalHistoryOutput_UAVs_0,
  505. SignalHistoryOutput_UAVs_1,
  506. SignalHistoryOutput_UAVs_2,
  507. SignalHistoryOutput_UAVs_3,
  508. CONFIG_SIGNAL_HISTORY_LAYOUT,
  509. MultiplexCount,
  510. OutputPixelPostion,
  511. OutputSamples,
  512. OutputFrequencies);
  513. }
  514. } // TemporallyAccumulate
  515. // 时间累积主入口.
  516. [numthreads(TILE_PIXEL_SIZE, TILE_PIXEL_SIZE, 1)]
  517. void MainCS(
  518. uint2 DispatchThreadId : SV_DispatchThreadID,
  519. uint2 GroupId : SV_GroupID,
  520. uint2 GroupThreadId : SV_GroupThreadID,
  521. uint GroupThreadIndex : SV_GroupIndex)
  522. {
  523. // 时间累积主调用.
  524. TemporallyAccumulate(DispatchThreadId, GroupId, GroupThreadId, GroupThreadIndex);
  525. }

在实时光线追踪领域,降噪算法有很多,诸如使用引导的模糊内核的滤波,机器学习驱动滤波器或重要采样,通过更好的准随机序列(如蓝色噪声和时空积累)改进采样方案以及近似技术,尝试用某种空间结构来量化结果(如探针、辐照度缓存)。

滤波(Filtering)技术有Gaussian、Bilateral、À-TrousGuided以及Median,这些方法常用于过滤蒙特卡洛追踪的模糊照片。特别是由特性缓冲区(如延迟渲染的GBuffer)和特殊缓冲区(如first-bounce data, reprojected path length, view position)驱动的引导滤波器已被广泛使用。

采样(Sampling)技术有TAA、Spatio-Temporal Filter、SVGF(Spatio-Temporal Variance Guided Filter)、Adaptive SVGF (A-SVGF)、BMFR(Blockwise Multi-Order Feature Regression)、ReSTIR(Spatiotemporal Importance Resampling for Many-Light Ray Tracing)等技术。

近似(approximation )技术常用于尝试微调路径跟踪器的不同方面的行为。

还有基于深度学习的技术(如DLSS),更多降噪的说明可以参看Ray Tracing Denoising

从上面分析的代码来看,UE的屏幕空间降噪综合使用了滤波、采样的若干种技术(双边滤波、空间卷积、时间卷积、随机采样、信号和频率等等)。

结果时间累积之后,可以看到画面的噪点更少且不明显了:

SSGI组合的最后一个阶段,它结合降噪阶段输出的颜色、AO以及GBuffer,组合成当前帧的带非直接光和AO的场景颜色:

组合Pass使用的是PS,代码如下:

  1. // Engine\Shaders\Private\DiffuseIndirectComposite.usf
  2. void MainPS(float4 SvPosition : SV_POSITION, out float4 OutColor : SV_Target0)
  3. {
  4. float2 BufferUV = SvPositionToBufferUV(SvPosition);
  5. float2 ScreenPosition = SvPositionToScreenPosition(SvPosition).xy;
  6. // 采样GBuffer.
  7. FGBufferData GBuffer = GetGBufferDataFromSceneTextures(BufferUV);
  8. // 采样每帧动态生成的AO.
  9. float DynamicAmbientOcclusion = 1.0f;
  10. #if DIM_APPLY_AMBIENT_OCCLUSION
  11. DynamicAmbientOcclusion = AmbientOcclusionTexture.SampleLevel(AmbientOcclusionSampler, BufferUV, 0).r;
  12. #endif
  13. // 计算最终要被应用的AO.
  14. float FinalAmbientOcclusion = GBuffer.GBufferAO * DynamicAmbientOcclusion;
  15. OutColor.rgb = 0.0f;
  16. OutColor.a = 1.0f;
  17. // 应用漫反射非直接光.
  18. #if DIM_APPLY_DIFFUSE_INDIRECT
  19. {
  20. float3 DiffuseColor = GBuffer.DiffuseColor;
  21. if (UseSubsurfaceProfile(GBuffer.ShadingModelID))
  22. {
  23. DiffuseColor = GBuffer.StoredBaseColor;
  24. }
  25. OutColor.rgb += DiffuseColor * DiffuseIndirectTexture.SampleLevel(DiffuseIndirectSampler, BufferUV, 0).rgb;
  26. }
  27. #endif
  28. // 应用AO到场景颜色. 因为在延迟直接照明之前,假设SceneColor中的所有照明都是间接照明.
  29. {
  30. float AOMask = (GBuffer.ShadingModelID != SHADINGMODELID_UNLIT);
  31. OutColor.a = lerp(1.0f, FinalAmbientOcclusion, AOMask * AmbientOcclusionStaticFraction);
  32. }
  33. }

SSGI的组合逻辑跟SSAO非常相似。

除了上述章节的后处理技术,实际上,还存在很多本章未阐述的技术,例如:

  • 泛光(Bloom)
  • 景深(DOF)
  • 自动曝光(又叫人眼适应,Eye Adaptation)
  • 暗角(Vignette)
  • 颗粒(Grain)
  • 颜色分级(Color Grading)
  • 颜色查找(LUT)
  • ……

这些就留待读者自己去挖掘和研读UE源码了。

 

本篇主要阐述了UE的后处理,包含传统的后处理技术,如抗锯齿、色调映射、Gamma校正、屏幕半分比等,以及广义的后处理技术,如SSR、SSAO、SSGI等。

当然,还有更多后处理本篇为涉及,这里就抛砖引玉,让读者对UE有了基础理解之后,再迈向后处理技术的更广阔的天地。

按惯例,本篇也布置一些小思考,以助理解和加深本篇知识的掌握和理解:

  • 阐述后处理管线的主要流程和步骤。
  • 阐述PassSequence的作用和注意事项。
  • 实现SMAA。
  • 实现自定义色调映射的后处理材质,以替换UE的默认色调映射。

 

  • 感谢所有参考文献的作者,部分图片来自参考文献和网络,侵删。
  • 本系列文章为笔者原创,只发表在博客园上,欢迎分享本文链接,但未经同意,不允许转载
  • 系列文章,未完待续,完整目录请戳内容纲目
  • 系列文章,未完待续,完整目录请戳内容纲目
  • 系列文章,未完待续,完整目录请戳内容纲目

 

版权声明:本文为timlly原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/timlly/p/15048404.html