通过模拟这些质点的运动，再连接这些质点，近似实现头发的物理运动模拟。为了减少头发的物理计算量，将发丝分为引导发丝(guide strand)和从属发丝(follow strand)，发丝的动力学模拟和风场模拟仅采用引导发丝来计算，由引导发丝通过偏移产生从属发丝的结果，而碰撞检测与矫正将对所有发丝都进行处理。

约束：TressFX针对头发的物理真实现象，对发丝的物理运动进行了如下约束，用以进行发丝的物理计算：

发丝根部的两个质点与角色(actor)或头发附着体(hair-attached object)保持相对固定，后续质点受重力的作用(gravitational force)而自由落体；
发丝本身有一定的弯曲形状(global shape constraints)需要保持；
发丝在跟随角色或附着体运动时需要实现相应的质点运动策略(velocity shock propagation)，即牵连加速度对头发位置会产生影响；
发丝中每个质点的位置都会受前后质点的影响(local shape constraints)，遵循弹簧质点模型；
发丝在风场中会受风力(wind influence)的影响；
发丝每一段的长度(length constriants)需要保持不变；
发丝碰撞检测系统基于有符号距离场(SDF)。

动力学计算

头发的物理模拟过程完全使用compute shader进行计算，开始前先简单复习一下一会儿会使用的计算管线的一些概念。

一图看懂计算管线系统参数

在TressFX中，仅会使用Dispatch(N, 1, 1)和[numthreads(THREAD_GROUP_SIZE, 1, 1)]，其中THREAD_GROUP_SIZE为固定值64（既符合NVIDIA显卡基于SIMD32要求的必须是32的倍数，又满足AMD显卡基于wavefront要求的必须是64的倍数）。

即TressFX使用的配置为：每Dispatch一次，有(N*1*1)个线程组，每个线程组内有(THREAD_GROUP_SIZE*1*1)个线程。

Shader代码中，仅使用SV_GroupIndex、SV_GroupID和SV_DispatchThreadID，类似如下代码：

[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void IntegrationAndGlobalShapeConstraints(
    uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID,
    uint3 DTid : SV_DispatchThreadID)
{
    //...
}

SV_GroupIndex：当前线程在线程组内的唯一索引值。
SV_GroupThreadID：当前线程在线程组内的唯一ID，由于始终使用[numthreads(THREAD_GROUP_SIZE, 1, 1)]，因此只有x分量有效，yz分量值始终为0，而x分量的值又即为GIndex的值，因此在TressFX的shader代码中不传该值了。
SV_GroupID：当前线程所在的线程组在整个分发的线程组中的ID，由于始终使用Dispatch(N, 1, 1)，GId仅使用x分量，yz分量值始终为0。
SV_DispatchThreadID：当前线程在整个分发的所有线程组中的ID,在所有分发的线程中唯一，为uint3(SV_GroupID * [numthreads] + SV_GroupThreadID)，同样仅x分量有效，TressFX的shader代码中虽然入参有它但未使用过该值。

重力与全局形状约束

针对场景中的每个头发，均需要1个pass来计算头发的重力和全局形状约束。红色箭头线是当前pass的输入，绿色箭头线是当前pass的输出，后文的图片中相同的颜色箭头线所表示的意义是一样的。

重力与全局形状约束的Pass图

将头发发丝的初始顶点数据和发丝的骨骼数据（来自于.tfx文件，vertex position, bone indices and bone weight，每根发丝由多个顶点组成，而每根发丝上的顶点都共用一套bone indices and bone weight，因为只有发根会和头部模型固连，头部模型骨骼的影响作用于发根的顶点，进而传递到发丝上其他的顶点上）、上一帧处理输出的顶点数据结果输入该pass，进行重力与全局形状约束的处理后，输出当前帧的头发发丝的顶点位置、更新上上帧和上帧的头发发丝的顶点位置。

在看具体的重力与全局形状约束算法之前，我们先看看传递给这个pass的constant buffer。下面的这个tressfxSimParameters会在重力与全局形状约束、速度震动传播、局部形状约束、长度约束和风场模拟、引导发丝生成从属发丝的pass中都会被使用（有且仅有这一个constant buffer）。这几个pass用到的参数各不相同，我将在使用到的时候具体介绍参数的意义。

// constants that change frame to frame
[[vk::binding(13, 0)]] cbuffer tressfxSimParameters : register(b13, space0)
{
    float4 g_Wind;
    float4 g_Wind1;
    float4 g_Wind2;
    float4 g_Wind3;

    float4 g_Shape;       // damping, local stiffness, global stiffness, global range.
    float4 g_GravTimeTip; // gravity maginitude (assumed to be in negative y direction.)
    int4   g_SimInts;     // length iterations, local iterations, collision flag.
    int4   g_Counts;      // num strands per thread group, num follow hairs per guid hair, num verts per strand.
    float4 g_VSP;         // VSP parmeters

    float g_ResetPositions;
    float g_ClampPositionDelta;
    float g_pad1;
    float g_pad2;

    row_major float4x4 g_BoneSkinningMatrix[AMD_TRESSFX_MAX_NUM_BONES]; // #define AMD_TRESSFX_MAX_NUM_BONES 512
}

#define g_NumOfStrandsPerThreadGroup      g_Counts.x
#define g_NumFollowHairsPerGuideHair      g_Counts.y
#define g_NumVerticesPerStrand            g_Counts.z // Shader中从未使用的变量，因为THREAD_GROUP_SIZE的值固定，且有g_NumOfStrandsPerThreadGroup，
                                                     // 用(THREAD_GROUP_SIZE / g_NumOfStrandsPerThreadGroup)就能计算出每根发丝上的顶点数

#define g_NumLocalShapeMatchingIterations g_SimInts.y

#define g_GravityMagnitude                g_GravTimeTip.x
#define g_TimeStep                        g_GravTimeTip.y
#define g_TipSeparationFactor             g_GravTimeTip.z

在C++中对应的结构体为TressFXSimulationParams：

struct TressFXSimulationParams
{
    AMD::float4 m_Wind;
    AMD::float4 m_Wind1;
    AMD::float4 m_Wind2;
    AMD::float4 m_Wind3;

    AMD::float4 m_Shape;
    // float m_Damping;                           // damping
    // float m_StiffnessForLocalShapeMatching;    // local stiffness
    // float m_StiffnessForGlobalShapeMatching;   // global stiffness
    // float m_GlobalShapeMatchingEffectiveRange; // global range

    AMD::float4 m_GravTimeTip;
    // float m_GravityMagnitude; // gravity
    // float m_TimeStep;         // time step size
    // float m_TipSeparationFactor;
    // float m_velocityShockPropogation;

    AMD::sint4 m_SimInts; // 4th component unused.
    // int m_NumLengthConstraintIterations;
    // int m_NumLocalShapeMatchingIterations;
    // int m_bCollision;
    // int m_CPULocalIterations;

    AMD::sint4 m_Counts; // 4th component unused.
    // int m_NumOfStrandsPerThreadGroup;
    // int m_NumFollowHairsPerGuideHair;
    // int m_NumVerticesPerStrand; // should be 2^n (n is integer and greater than 2) and less than
                                   // or equal to TRESSFX_SIM_THREAD_GROUP_SIZE. i.e. 8, 16, 32 or 64

    AMD::float4 m_VSP;

    float g_ResetPositions;
    float g_ClampPositionDelta;

    float g_pad1;
    float g_pad2;

    AMD::float4x4 m_BoneSkinningMatrix[AMD_TRESSFX_MAX_NUM_BONES]; // #define AMD_TRESSFX_MAX_NUM_BONES 512
};

再来看一下这几个buffer，它们将跟随我们后面几乎所有的passes（可以对照RenderPass总览图），TressFX将用它们存储所有的发丝顶点数据。

// UAVs
[[vk::binding(0, 1)]] RWStructuredBuffer<float4> g_HairVertexPositions         : register(u0, space1);  // 对应RenderPass图中的资源<08>
[[vk::binding(1, 1)]] RWStructuredBuffer<float4> g_HairVertexPositionsPrev     : register(u1, space1);  // 对应RenderPass图中的资源<09>
[[vk::binding(2, 1)]] RWStructuredBuffer<float4> g_HairVertexPositionsPrevPrev : register(u2, space1);  // 对应RenderPass图中的资源<10>
[[vk::binding(3, 1)]] RWStructuredBuffer<float4> g_HairVertexTangents          : register(u3, space1);  // 对应RenderPass图中的资源<13>
// SRVs
[[vk::binding(4, 0)]] StructuredBuffer<float4> g_InitialHairPositions          : register(t4, space0);  // 对应RenderPass图中的资源<06>
[[vk::binding(12, 0)]] StructuredBuffer<BoneSkinningData> g_BoneSkinningData   : register(t12, space0); // 对应RenderPass图中的资源<07>

下面开始正式进入这一节的正题，TressFX的重力计算使用带阻尼的Verlet自由落体运动公式：

带阻尼的Verlet自由落体运动公式

其中：DampingCoeff为阻尼系数（取值范围[0, 1]，默认值0.035），Gravity为重力加速度（方向锁定为-y轴，默认值9.8），TimeStep为时间步长（即一帧的时间间隔，单位为秒，60帧时步长约是0.01666667），在公式中是积分微元；NewPos对应的是当前计算帧的位置，CurPos是上一帧的位置，OldPos是上上帧的位置。

根据实际物理现象，TressFX限制发丝根部的两个质点与附着体保持相对固定，即发丝根部的两个质点不计算重力。

发丝根部的两个质点

TressFX的全局形状约束约束的是发丝根部之后的若干个质点，由于头发有一定的刚度，它们需要保持一个基本的弯曲形状，形成类似定型的效果。

全局形状约束与范围

该约束在量化实现上为：在计算重力给质点带来的影响后，补偿性地加上一个抵抗变化的位置改变。

全局形状约束公式

其中，GlobalConstraintStiffiness是全局形状约束的刚度值（取值范围[0, 1]，默认值0.01），这个值越大，刚度越高，抵抗变化的补偿值就越大，于是表现上就是发丝越不容易弯曲，越靠近InitialPos；InitialPos是发丝建模时的初始位置，CurrentPos是经过前面的重力计算后得出的位置，NewPos是经过全局形状约束后的输出位置。

InitialPos和CurrentPos

由于全局形状约束仅作用于发丝根部之后的若干个质点，TressFX引入一个系数GlobalConstraintRange来量化“若干”的范围（该系数取值[0, 1]，默认值0.3），用来表示需要全局形状约束的质点个数的占比，对应于“全局形状约束与范围”图(上上图)中的红色方框部分的质点。

综上，重力与全局形状约束的pass将使用以下的参数：

float4 g_Shape
g_Shape.x: DampingCoeff
g_Shape.z: GlobalConstraintStiffness
g_Shape.w: GlobalConstraintRange
float4 g_GravTimeTip
g_GravTimeTip.x -> g_GravityMagnitude: Gravity
g_GravTimeTip.y -> g_TimeStep: TimeStep
int4 g_Counts
g_Counts.x -> g_NumOfStrandsPerThreadGroup: num strands per thread group
g_Counts.y -> g_NumFollowHairsPerGuideHair: num follow hairs per guid hair
float g_ResetPositions：如果非0.0f将会重置currentPos和oldPos为initialPos
（TressFX用的float，使用int或许好看些，盲猜TressFX原先想的是重置为该值，后来重置为initialPos后，没改掉这个值的类型）
float4x4 g_BoneSkinningMatrix数组：骨骼反向绑定矩阵

重力与全局形状约束的着色器入口点是IntegrationAndGlobalShapeConstraints，它针对发丝的质点（顶点）并行计算。

//--------------------------------------------------------------------------------------
//
//  IntegrationAndGlobalShapeConstraints
//
//  Compute shader to simulate the gravitational force with integration and to maintain the
//  global shape constraints.
//
// One thread computes one vertex.
//
//--------------------------------------------------------------------------------------
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void IntegrationAndGlobalShapeConstraints(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    uint globalStrandIndex, localStrandIndex, globalVertexIndex, localVertexIndex; // 作为CalcIndicesInVertexLevelMaster的输出
    uint numVerticesInTheStrand, indexForSharedMem, strandType;                    // 作为CalcIndicesInVertexLevelMaster的输出
    CalcIndicesInVertexLevelMaster(GIndex, GId.x,
        globalStrandIndex, localStrandIndex,
        globalVertexIndex, localVertexIndex,
        numVerticesInTheStrand, indexForSharedMem, strandType);
    ...
}

如上代码中的CalcIndicesInVertexLevelMaster函数计算每一根发丝（仅针对guide hair，之后的物理计算pass直到UpdateFollowHairVertices前，都是仅针对guide hair做计算，UpdateFollowHairVertices会根据guide hair通过偏移来生成follow hair）的每一个顶点的索引。

void CalcIndicesInVertexLevelMaster(
    uint local_id, // 当前线程在线程组内的索引
    uint group_id, // 当前线程所在的线程组在整个分发的线程组中的索引
    inout uint globalStrandIndex, inout uint localStrandIndex,
    inout uint globalVertexIndex, inout uint localVertexIndex,
    inout uint numVerticesInTheStrand, inout uint indexForSharedMem, inout uint strandType)
{
    // indexForSharedMem意义：以[当前线程在线程组内的索引值]作为[线程组内共享内存的索引]
    // 该索引所指向的共享内存的位置为当前线程所持有的内存，即当前的发丝顶点所持有的内存
    indexForSharedMem = local_id;

    // numVerticesInTheStrand意义：一根发丝的顶点数
    // 由于限制了THREAD_GROUP_SIZE为64，所以一根发丝最多只能有64个顶点
    // 为了线程个数凑整对齐，一根发丝上的顶点数限制取值为2的幂次方，即有效值为2、4、8、16、32、64
    numVerticesInTheStrand = (THREAD_GROUP_SIZE / g_NumOfStrandsPerThreadGroup); // g_NumOfStrandsPerThreadGroup即g_Counts.x，参见上文本pass使用的参数

    // localStrandIndex意义：当前顶点所属的局部发丝索引，局部范围为线程组内
    // 假设：一根发丝有16个顶点，那么传下来的g_NumOfStrandsPerThreadGroup即为4，前一行代码算出一根发丝上的顶点数：numVerticesInTheStrand=64/4=16
    // 那么：当前线程组内的每一个线程（代表一个顶点）所属的局部头发索引值为localStrandIndex
    //                   +---------------------------------------------------------+
    // localStrandIndex: |00|01|02|03|00|01|02|03|00|01|02|03|00|01|02|03|...|02|03|
    //                   |---------------------------------------------------------|
    // local_id:         |00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|...|62|63|
    //                   +---------------------------------------------------------+
    // 一根头发上的16个顶点不是连续的线程在处理，该线程组内的第一根头发上的顶点分别由第00号、04号、08号、12号、……、60号线程在处理
    localStrandIndex = local_id % g_NumOfStrandsPerThreadGroup;

    // globalStrandIndex意义：发丝全局唯一索引 = (线程组索引 * 每个线程组的发丝数量 + 当前发丝在线程组内的局部索引) * 跳过从属发丝
    globalStrandIndex = group_id * g_NumOfStrandsPerThreadGroup + localStrandIndex;
    globalStrandIndex *= (g_NumFollowHairsPerGuideHair+1);

    // localVertexIndex意义：局部顶点索引，按上表（localStrandIndex与local_id对照表）可推出
    localVertexIndex = (local_id - localStrandIndex) / g_NumOfStrandsPerThreadGroup;

    // globalVertexIndex意义：全局顶点索引，类似globalStrandIndex可推出
    globalVertexIndex = globalStrandIndex * numVerticesInTheStrand + localVertexIndex;

    // GetStrandType始终返回0，目前没有使用strandType
    // 盲猜TressFX原先想在每根头发上加上头发类型，根据类型可以选择不同的头发参数进行计算，但是目前tressfxSimParameters只会传一份头发参数进来
    // 这个功能估计是想实现出一个hair object可以有多种配置参数，实现出刘海部分和头发主体部分分别使用不同的参数，当然这样也会带来新的问题，
    // 比如Editor上不好设计出简洁的UI给技美调参，增大使用难度，又如在Editor上选定了一个hair object后（一般作为某个entity的一个component），
    // 需要针对不同的配置参数影响到的发丝给出一个可视化的呈现，这才能让技美感知到它正在调整的参数对应的发丝是哪些。另外，如果要给定StrandType
    // 还需要美术在制作发丝模型时还需要另外给出发丝的类型，TressFX还要实现出相应的工具链上的解决方案。
    strandType = GetStrandType(globalStrandIndex);
}

在CalcIndicesInVertexLevelMaster中出参有一个indexForSharedMem，用它来访问的共享内存如下：

// 线程组内共享变量，整个线程组内的所有线程共享
groupshared float4 sharedPos[THREAD_GROUP_SIZE];
groupshared float4 sharedTangent[THREAD_GROUP_SIZE];
groupshared float  sharedLength[THREAD_GROUP_SIZE];

接下来进入正题，开始重力的积分计算。

{   // IntegrationAndGlobalShapeConstraints
    ...

    float4 initialPos = g_InitialHairPositions[globalVertexIndex];
    // Apply bone skinning to initial position
    float4 bone_quat;
    BoneSkinningData skinningData = g_BoneSkinningData[globalStrandIndex]; // g_BoneSkinningData数据存储了所有发丝的骨骼蒙皮数据
    initialPos.xyz = ApplyVertexBoneSkinning(initialPos.xyz, skinningData, bone_quat);
    // g_BoneSkinningData是对应于每一根发丝的，包括guide hair和follow hair
    // 由于我们的follow hair实际上不需要这些骨骼相关的数据，它们是在UpdateFollowHairVertices中直接偏移生成的，因此follow hair位置上的BoneSkinningData全为0（可见下方图）
    // 事实上，这里可以只把guide hair使用的骨骼相关的数据传下来，而不需要传follow hair的，这样可以节省一些带宽，然后用globalStrandIndex/(g_NumFollowHairsPerGuideHair+1)替换globalStrandIndex来得到索引
    // ApplyVertexBoneSkinning函数，骨骼蒙皮相关的内容，略，详见：https://github.com/GPUOpen-Effects/TressFX/blob/v4.1.0/src/Shaders/TressFXSimulation.hlsl#L549-L571
    // === 到此，我们准备好了：InitialPos ===

    // position when this step starts. In other words, a position from the last step.
    float4 currentPos = sharedPos[indexForSharedMem] = g_HairVertexPositions[globalVertexIndex]; // currentPos来自于前一帧的数据
    // === 到此，我们准备好了：InitialPos、CurrentPos ===

    GroupMemoryBarrierWithGroupSync(); // 由于我们对sharedPos共享区内的变量有写入操作，这里同步等待整个线程组内的线程都走到该语句后，才继续往下执行

    float4 oldPos;
    oldPos = g_HairVertexPositionsPrev[globalVertexIndex]; // oldPos来自于前一帧中保存的再前一帧的数据
    // === 到此，我们准备好了：InitialPos、CurrentPos、OldPos ===

    // 获取阻尼系数，StrandType尚未使用，GetDamping始终返回g_Shape.x
    float dampingCoeff = GetDamping(strandType);

    // reset if we got teleported
    if (g_ResetPositions != 0.0f)
    {
        currentPos = initialPos;
        oldPos = initialPos;
    }

    // 发丝根部的两个质点不移动，currentPos的w分量为0.0或1.0，发丝根部的两个质点该分量为0.0
    // 该值通过从g_InitialHairPositions给定的顶点的position.w给定，在C++上传递过来时就配置好了
    // bool IsMovable(float4 particle)
    // {
    //     if ( particle.w > 0 )
    //         return true;
    //     return false;
    // }
    // 这里TressFX的实现，其实不需要专门做个w分量去存它，我们前面已经在GPU侧算出来是否是发丝根部的两个质点了：
    // if (localVertexIndex == 0 || localVertexIndex == 1) { 发丝根部 } else { 非发丝根部 }
    // Integrate根据带阻尼的Verlet自由落体运动公式计算出新位置
    if ( IsMovable(currentPos) )
        sharedPos[indexForSharedMem].xyz = Integrate(currentPos.xyz, oldPos.xyz, initialPos.xyz, dampingCoeff);
    else
        sharedPos[indexForSharedMem] = initialPos;

    ...
}

从RenderDoc抓帧数据中也能看出，follow hair不需要bone data数据

// 重力积分，即前面给出的：带阻尼的Verlet自由落体运动公式
float3 Integrate(float3 curPosition, float3 oldPosition, float3 initialPos, float dampingCoeff = 1.0f)
{
    float3 force = g_GravityMagnitude * float3(0, -1.0f, 0);
    float decay = exp(-dampingCoeff * g_TimeStep * 60.0f);
    return curPosition + decay * (curPosition - oldPosition) + force * g_TimeStep * g_TimeStep;
}

接下来是全局形状约束的实现。

{   // IntegrationAndGlobalShapeConstraints
    ...

    // Global Shape Constraints
    float stiffnessForGlobalShapeMatching = GetGlobalStiffness(strandType); // 全局形状约束的刚度值，g_Shape.z
    float globalShapeMatchingEffectiveRange = GetGlobalRange(strandType);   // 全局形状约束的影响范围，g_Shape.w

    if ( stiffnessForGlobalShapeMatching > 0 && globalShapeMatchingEffectiveRange )
    {
        // 这句可以用它替换：if (localVertexIndex != 0 && localVertexIndex != 1)，这样就可以不使用position的w分量去存一个是否是发丝根的数据
        if ( IsMovable(sharedPos[indexForSharedMem]) )
        {
            // 判断localVertexIndex是否在影响范围(Range*numVerticesInTheStrand)内，即Range是个[0, 1]的比例值
            if ( (float)localVertexIndex < globalShapeMatchingEffectiveRange * (float)numVerticesInTheStrand )
            {
                float factor = stiffnessForGlobalShapeMatching;
                float3 del = factor * (initialPos - sharedPos[indexForSharedMem]).xyz;
                sharedPos[indexForSharedMem].xyz += del; // 对应前面给出的全局形状约束补偿公式，CurrentPos位置加上一个到InitialPos的补偿值
            }
        }
    }

    // 更新顶点数据，包括上一帧和上上帧的数据
    UpdateFinalVertexPositions(currentPos, sharedPos[indexForSharedMem], globalVertexIndex, localVertexIndex, numVerticesInTheStrand);
}

void UpdateFinalVertexPositions(float4 oldPosition, float4 newPosition, int globalVertexIndex, int localVertexIndex, int numVerticesInTheStrand)
{
    g_HairVertexPositionsPrevPrev[globalVertexIndex] = g_HairVertexPositionsPrev[globalVertexIndex];
    g_HairVertexPositionsPrev[globalVertexIndex] = oldPosition;
    g_HairVertexPositions[globalVertexIndex] = newPosition;
}

最后，TressFX将数据刷到了g_HairVertexPositions、g_HairVertexPositionsPrev、g_HairVertexPositionsPrevPrev上，这些数据又会作为后续pass的输入数据。

速度震动传播

速度震动传播（直接翻译自TressFX的叫法：propagate velocity shock (VSP)）的主要描述是，当头发附着体有一个加速度的改变时，会有牵引加速度，因此会带来力的作用，头发即会因此而产生飘动。比如长头发的玩家角色突然往前加速跑动，头发会因此而被扬起，该玩家在抖头，头发也会因此而抖动。头发附着体的速度改变（震动）会从头皮传播到头发上。该过程针对每个头发模型需要2个pass来进行计算。

第一个pass用来计算头发附着体的变化量与瞬时加速度，由于发丝根部的两个顶点默认不可与角色相对移动，因此正好可用来计算整根发丝的整体变换，因而该pass仅需要针对每根guide hair发丝进行处理。

VSP的Pass图1

第二个pass用来将效果作用到每根头发的每个顶点上。

VSP的Pass图2

第一个pass将计算出的整体变换数据存储到g_StrandLevelData中，供第二个pass中使用。

struct StrandLevelData
{
    // Quat = Quaternion
    float4 skinningQuat;   // 骨骼变换旋转量
    float4 vspQuat;        // 头发附着体的旋转变化量
    float4 vspTranslation; // 头发附着体的平移变化量，其中w分量用来存储加速度影响因子
};

[[vk::binding(4, 1)]] RWStructuredBuffer<StrandLevelData> g_StrandLevelData : register(u4, space1);

我们先来看第一个pass，shader的入口点是CalculateStrandLevelData。

//--------------------------------------------------------------------------------------
//
//  Calculate Strand Level Data
//
//  Propagate velocity shock resulted by attached based mesh
//
// One thread computes one strand.
//
//--------------------------------------------------------------------------------------
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void CalculateStrandLevelData(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    uint local_id, group_id, globalStrandIndex, numVerticesInTheStrand, globalRootVertexIndex, strandType;
    CalcIndicesInStrandLevelMaster(GIndex, GId.x, globalStrandIndex, numVerticesInTheStrand, globalRootVertexIndex, strandType);
    ...
}

前面提到，该pass仅需要针对每根guide hair发丝进行处理，因此此处调用的是CalcIndicesInStrandLevelMaster（前一小节中，我们研究过了针对每根发丝上每个顶点计算索引的CalcIndicesInVertexLevelMaster函数）。

void CalcIndicesInStrandLevelMaster(uint local_id, uint group_id,
    inout uint globalStrandIndex, inout uint numVerticesInTheStrand, inout uint globalRootVertexIndex, inout uint strandType)
{
    // 发丝全局唯一索引，同CalcIndicesInVertexLevelMaster中的计算方法
    globalStrandIndex = THREAD_GROUP_SIZE * group_id + local_id;
    globalStrandIndex *= (g_NumFollowHairsPerGuideHair+1);

    // 每根发丝上的顶点数，同CalcIndicesInVertexLevelMaster中的计算方法
    numVerticesInTheStrand = (THREAD_GROUP_SIZE / g_NumOfStrandsPerThreadGroup);

    // 得到当前的发丝最根部处的顶点的全局唯一索引
    // 注意，我们在CalcIndicesInVertexLevelMaster函数中曾提到过，一根发丝上的顶点的处理线程不是连续的（见上文CalcIndicesInVertexLevelMaster函数中的注释），
    // 这里千万不要搞混了，只是处理线程（的索引）是不连续的，但是同一根发丝上的顶点数据在存储的GPU内存中是连续的！
    // 那么g_HairVertexPositions[globalRootVertexIndex]是当前发丝根部的第一个顶点位置数据，
    // 而g_HairVertexPositions[globalRootVertexIndex+1]是当前发丝根部的第二个顶点位置数据。
    globalRootVertexIndex = globalStrandIndex * numVerticesInTheStrand;

    // 未使用，始终返回0
    strandType = GetStrandType(globalStrandIndex);
}

接下来要计算出头发附着体的整体变换与瞬时加速度。

整体变换如下图所示，假设上一帧角色在t1时刻的位置，当前帧变换到了t2时刻的位置。针对图中所标示的发丝，发丝根部与角色头皮相连接处的两个顶点位置始终与角色保持相对固定，因此使发丝根部第一个顶点指向第二个顶点的方向构建向量，那么，t1时刻该向量为u，t2时刻该向量为v。

整体变换

由此：发丝根部顶点的位移为平移变化量(vspTranslation)，向量u到向量v的改变量为旋转变化量(vspQuaternion)，旋转变化量可以用一个四元数来记录。

有了前两帧和当前帧的平移变化量，我们可以计算出一个瞬时加速度。

瞬时加速度公式

TimeStep在重力计算时也出现过，帧间间隔的时间足够小，该公式可以等效为瞬时加速度的计算公式。

接下来，来看一看TressFX中的shader代码实现。

{   // CalculateStrandLevelData
    ...

    float4 pos_old_old[2]; // previous previous positions for vertex 0 (root) and vertex 1.
    float4 pos_old[2]; // previous positions for vertex 0 (root) and vertex 1.
    float4 pos_new[2]; // current positions for vertex 0 (root) and vertex 1.

    pos_old_old[0] = g_HairVertexPositionsPrevPrev[globalRootVertexIndex];
    pos_old_old[1] = g_HairVertexPositionsPrevPrev[globalRootVertexIndex + 1];

    pos_old[0] = g_HairVertexPositionsPrev[globalRootVertexIndex];
    pos_old[1] = g_HairVertexPositionsPrev[globalRootVertexIndex + 1];

    pos_new[0] = g_HairVertexPositions[globalRootVertexIndex];
    pos_new[1] = g_HairVertexPositions[globalRootVertexIndex + 1];

    float3 u = normalize(pos_old[1].xyz - pos_old[0].xyz); // t1时刻（上一帧）时的u向量
    float3 v = normalize(pos_new[1].xyz - pos_new[0].xyz); // t2时刻（当前帧）时的v向量

    // Compute rotation and translation which transform pos_old to pos_new.
    // Since the first two vertices are immovable, we can assume that there is no scaling during tranform.
    float4 rot = QuatFromTwoUnitVectors(u, v);                                    // 根据u&v向量算出旋转变化量
    float3 trans = pos_new[0].xyz - MultQuaternionAndVector(rot, pos_old[0].xyz); // 因为有旋转的缘故，先把上一帧顶点的位置带上旋转的影响，再和当前帧顶点的位置计算出一个平移变化量

    float vspCoeff = GetVelocityShockPropogation();    // TressFX的注释直译叫速度传播系数，或称“加速度影响因子”更合适，默认值从g_VSP.x中取得，该值从CPU侧送下来
    float vspAccelThreshold  = GetVSPAccelThreshold(); // 加速度阈值，从g_VSP.y中取得，该值从CPU侧送下来

    // Increate the VSP coefficient by checking pseudo-acceleration to handle over-stretching when the character moves very fast
    float accel = length(pos_new[1] - 2.0 * pos_old[1] + pos_old_old[1]); // 应用瞬时加速度公式算出瞬时加速度

    // 这里，TressFX限制当计算出的加速度过大，超过了给定的加速度阈值后，设置加速度影响因子为1.0f，即完全不影响（不计算头发的加速度影响）
    // 我认为，这里这么设计的考虑是，假设场景中有一个传送的机关，玩家通过该机关会被传送到地图的另外一个位置上，此时会有一个非常大的位置变化，
    // 由以上公式算出来的加速度会非常大，这会导致毛发系统此时的计算结果不可信，限制为1.0f用来规避这样的场景。
    if (accel > vspAccelThreshold) // expose this value?
        vspCoeff = 1.0f;

    // 此处的加速度和加速度影响因子也能算得上是一个小的效果优化突破口吧，当前TressFX的实现里，是一个要么1.0f，要么技美调参传下
    // 的GetVelocityShockPropogation()给定值，我们在此处是能够细化加速度所产生的影响，进而动态调整加速度影响因子值的。
    // 在TressFX官方给出的一篇文章里，还加上了一个低阈值：
    // if (accel < vspAccelThresholdMin) { vspCoeff = 0.9f; }
    // 这是具体的项目场景限制了技美能调整的vspCoeff取值在0.0f到0.9f之间，当加速度过小时，即角色的运动状态是在匀速或者速度变化很小地移动，那么
    // 头发不应该有过于明显的飘起效果，此时将该值设置为一个较大值，让头发的飘起效果不明显些。这个具体的判断条件之类就可以根据具体的项目需求自定义了。

    // 写到g_StrandLevelData中供下一个pass用
    g_StrandLevelData[globalStrandIndex].vspQuat = rot;
    g_StrandLevelData[globalStrandIndex].vspTranslation = float4(trans, vspCoeff);

    // 以下部分是骨骼变换带来的旋转量，也计算出来存在g_StrandLevelData中

    // skinning

    // Copy data into shared memory
    float4 initialPos = g_InitialHairPositions[globalRootVertexIndex]; // rest position

    // Apply bone skinning to initial position
    BoneSkinningData skinningData = g_BoneSkinningData[globalStrandIndex];

    float4 bone_quat;
    initialPos.xyz = ApplyVertexBoneSkinning(initialPos.xyz, skinningData, bone_quat);

    g_StrandLevelData[globalStrandIndex].skinningQuat = bone_quat;
}

到此，我们也能知道该pass需要使用到的constant buffer中的参数了。

float4 g_VSP
g_VSP.x: 速度传播系数/加速度影响因子
g_VSP.y: 加速度阈值
int4 g_Counts
g_Counts.x -> g_NumOfStrandsPerThreadGroup: num strands per thread group
g_Counts.y -> g_NumFollowHairsPerGuideHair: num follow hairs per guid hair
float4x4 g_BoneSkinningMatrix数组：骨骼反向绑定矩阵

其中，加速度影响因子的值TressFX默认设置为0.8f。在当前pass的shader的最后，它被一同存进了g_StrandLevelData[globalStrandIndex].vspTranslation.w，在下一个pass中，我们能看到，它的值越大发丝就越不容易飘起来。

第二个pass将头发附着体的变换带来的影响作用到每根头发的每个顶点上，该pass的shader入口点是VelocityShockPropagation。这个pass除了在CalcIndicesInVertexLevelMaster时用到了g_Counts这个参数，其他地方都没有再用到多的constant buffer，参数来自于上一个pass的g_StrandLevelData。

//--------------------------------------------------------------------------------------
//
//  VelocityShockPropagation
//
//  Propagate velocity shock resulted by attached based mesh
//
// One thread computes one vetex.
//
//--------------------------------------------------------------------------------------
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void VelocityShockPropagation(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    uint globalStrandIndex, localStrandIndex, globalVertexIndex, localVertexIndex;
    uint numVerticesInTheStrand, indexForSharedMem, strandType;
    CalcIndicesInVertexLevelMaster(GIndex, GId.x,
        globalStrandIndex, localStrandIndex,
        globalVertexIndex, localVertexIndex,
        numVerticesInTheStrand, indexForSharedMem, strandType);

    if (localVertexIndex < 2)
        return;

    float4 vspQuat = g_StrandLevelData[globalStrandIndex].vspQuat;
    float4 vspTrans = g_StrandLevelData[globalStrandIndex].vspTranslation;
    float vspCoeff = vspTrans.w;

    // 在此行之上的代码就不详细注解了。这里拿到当前处理的顶点的当前帧位置和上一帧位置。
    float4 pos_new_n = g_HairVertexPositions[globalVertexIndex];
    float4 pos_old_n = g_HairVertexPositionsPrev[globalVertexIndex];

    // 这里我们可以看到，每根头发上除了根部的两个顶点外的其他顶点，新的位置由两部分决定：
    // 一部分是：(pos_new_n.xyz)
    //  这一部分即当前头发顶点经过重力和全局形状约束的计算后本应该在的位置，假设该位置为P
    // 另一部分是：(MultQuaternionAndVector(vspQuat, pos_new_n.xyz) + vspTrans.xyz)
    //  这一部分将P带上头发附着体本身旋转和平移的位置变换，假设该位置为Q
    // 最后，用(1.f-vspCoeff)和(vspCoeff)在这两个位置中插值作为最终值
    pos_new_n.xyz = (1.f - vspCoeff) * pos_new_n.xyz + vspCoeff * (MultQuaternionAndVector(vspQuat, pos_new_n.xyz) + vspTrans.xyz);
    pos_old_n.xyz = (1.f - vspCoeff) * pos_old_n.xyz + vspCoeff * (MultQuaternionAndVector(vspQuat, pos_old_n.xyz) + vspTrans.xyz);

    g_HairVertexPositions[globalVertexIndex].xyz = pos_new_n.xyz;
    g_HairVertexPositionsPrev[globalVertexIndex].xyz = pos_old_n.xyz;
}

如上代码片段中的注释，那么经过该pass的计算后，最新的位置应为：(1.0f-vspCoeff) * P + vspCoeff * Q

其中，P位置为发丝完全不受头发附着体的变换所影响时的位置，Q位置为把发丝与附着体间的连接当成完全的刚性连接，完全受到头发附着体的变换所影响时的位置。那么，当加速度影响因子vspCoeff越大时，发丝与附着体间的连接刚性越大，也就越不容易飘动了。当vspCoeff=1.0f时，前面重力和全局形状约束计算的结果所乘以的系数(1.0f-vspCoeff)=0（重力和全局形状约束不起作用），此时发丝变成了刚体，与头发附着体是完全的刚性连接（注意，虽然将vspCoeff设为1.0f时会使得重力和全局形状约束不影响发丝，发丝会跟随附着体发生刚性变换进行平移或旋转，但后续的局部形状约束、长度约束、风场模拟等仍然对发丝生效，因此，若从最终引擎呈现的效果上看，将vspCoeff设为1.0f并不会使头发完全不飘动）。

看完VSP的实现，我们其实能够知道，TressFX在处理速度变化带来的影响时，最终参与计算改变头发顶点位置的其实只有位置变化量，这种计算方式是物理不正确的，它会导致发丝拉长，后续的长度约束能在一定程度上缓解这一问题，但当头发的刚度和阻尼较小，而加速度过大，速度突变过大时，还是会有比较明显的拉长现象。

局部形状约束

局部形状约束计算出的是一根发丝上某个顶点（即：质点）受前后质点影响后的新位置，每根发丝都遵循弹簧质点模型，由于角色的移动带来了发丝的长度的变化，这个变化在大多数情况下使得发丝处于拉伸的状态，因此我们以拉伸状态下进行分析，可以推广到压缩时的场景。

在计算局部形状约束时我们不考虑重力作用产生的影响（重力作用已经在重力约束中计算了），且假设发丝上的每个顶点的质量都相等，我们以发根前5个顶点进行分析，后续顶点的计算可以依次推广得到。

发丝处于原长状态时，发丝根部前2个顶点为O1和O2，后3个顶点为A、B、C，发丝根部前两个顶点始终保持与角色刚性固连。当角色移动，发丝拉长，O1O2长度不变，O2A、AB、BC变长。O1、O2是固连端，我们先站在O2上分析O2A的伸长变化，当O2A的长度变化确定后，再逐段分析自由端的AB、BC的伸长变化，如下图所示，依此类推直到发尾。

发丝顶点变化及约束前后展示图

由图，O2A的伸长量为x1，由于O2和A的质量假设相同，则站在O2上看，由于O2端固连，O2顶点不会移动，但O2端和A端的弹簧伸长量应均为二分之一的x1，A点移动量应为x1，TressFX在计算时O2点保持不动，而A点仅和自由端一样移动了二分之一的x1；接下来分析自由端AB，AB的伸长量为x2，A端和B端的伸长量均为二分之一的x2，A点和B点的移动量均为二分之一的x2；在往后的BC段，B点和C点的移动量均为二分之一的x3；依此类推直至发尾。

TressFX用一个Pass来计算局部形状约束，同时会使用在计算VSP时输出的整体变换数据g_StrandLevelData。

局部形状约束的Pass图

//--------------------------------------------------------------------------------------
//
//  LocalShapeConstraints
//
//  Compute shader to maintain the local shape constraints.
//
// One thread computes one strand.
//
//--------------------------------------------------------------------------------------
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void LocalShapeConstraints(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    // 每根发丝一个线程并行计算，因为发丝的后段对前段有计算上的依赖关系，无法做顶点级别的并行
    uint local_id, group_id, globalStrandIndex, numVerticesInTheStrand, globalRootVertexIndex, strandType;
    CalcIndicesInStrandLevelMaster(GIndex, GId.x, globalStrandIndex, numVerticesInTheStrand, globalRootVertexIndex, strandType);

    // stiffness for local shape constraints
    float stiffnessForLocalShapeMatching = GetLocalStiffness(strandType); // GetLocalStiffness返回g_Shape.y

    //1.0 for stiffness makes things unstable sometimes.
    stiffnessForLocalShapeMatching = 0.5f*min(stiffnessForLocalShapeMatching, 0.95f); // 这里的0.5是当弹簧两端的质点质量一样时，弹簧两端的伸长量一样，均为该段总伸长量的一半

    //--------------------------------------------
    // Local shape constraint for bending/twisting
    //--------------------------------------------
    {
        float4 boneQuat = g_StrandLevelData[globalStrandIndex].skinningQuat; // 骨骼变换

        // vertex 1 through n-1
        // 处理的是发丝上的第2到n个顶点
        for (uint localVertexIndex = 1; localVertexIndex < numVerticesInTheStrand - 1; localVertexIndex++)
        {
            uint globalVertexIndex = globalRootVertexIndex + localVertexIndex;

            float4 pos = g_HairVertexPositions[globalVertexIndex];               // 当前顶点的位置
            float4 pos_plus_one = g_HairVertexPositions[globalVertexIndex + 1];  // 朝发尾方向下一个顶点的位置
            float4 pos_minus_one = g_HairVertexPositions[globalVertexIndex - 1]; // 朝发根方向上一个顶点的位置

            // 当前顶点、下一个、上一个顶点的初始的世界坐标系下的位置（即带上骨骼变换）
            float3 bindPos = MultQuaternionAndVector(boneQuat, g_InitialHairPositions[globalVertexIndex].xyz);
            float3 bindPos_plus_one = MultQuaternionAndVector(boneQuat, g_InitialHairPositions[globalVertexIndex + 1].xyz);
            float3 bindPos_minus_one = MultQuaternionAndVector(boneQuat, g_InitialHairPositions[globalVertexIndex - 1].xyz);

            float3 lastVec = pos.xyz - pos_minus_one.xyz; // 朝发根方向上一个顶点位置至当前顶点位置的向量，上一个顶点位置处的发丝切线方向

            float4 invBone = InverseQuaternion(boneQuat); // 这个变量没有用
            float3 vecBindPose = bindPos_plus_one - bindPos;      // 当前位置处的发丝初始切线方向向量
            float3 lastVecBindPose = bindPos - bindPos_minus_one; // 上一个顶点位置处的发丝初始切线方向向量
            // 由上一个顶点位置处的初始切线方向向量和当前位置处的发丝初始切线方向向量计算出初始状态时的发丝旋转/扭转量
            float4 rotGlobal = QuatFromTwoUnitVectors(normalize(lastVecBindPose), normalize(lastVec));

            // vecBindPose是有大小和方向的向量，此行即：MultQuaternionAndVector(rotGlobal, (bindPos_plus_one - bindPos)) + pos.xyz
            // 计算出的orgPos_i_plus_1_InGlobalFrame是下一个顶点在没有受到重力与全局形状约束、速度震动传播时应所在的位置
            float3 orgPos_i_plus_1_InGlobalFrame = MultQuaternionAndVector(rotGlobal, vecBindPose) + pos.xyz;
            // 计算(orgPos_i_plus_1_InGlobalFrame - pos_plus_one.xyz)就得到了没有受到上述约束和受到上述约束的变化量delta
            // 乘上刚度系数就得到了局部形状约束的作用量，由于假设了发丝上各顶点的质量一致，前面已经乘过0.5了
            float3 del = stiffnessForLocalShapeMatching * (orgPos_i_plus_1_InGlobalFrame - pos_plus_one.xyz);

            if (IsMovable(pos))
                pos.xyz -= del.xyz; // 应用到自由端的当前顶点的位置

            if (IsMovable(pos_plus_one))
                pos_plus_one.xyz += del.xyz; // 应用到自由端的下一个顶点的位置

            g_HairVertexPositions[globalVertexIndex].xyz = pos.xyz;
            g_HairVertexPositions[globalVertexIndex + 1].xyz = pos_plus_one.xyz;
        }
    }

    return;
}

局部形状约束参考图（向量vec由pos指向pos_plus_one，向量lastVec由pos_minus_one指向pos，均为骨骼变化前的，vecBindPose和lastVecBindPose是带上骨骼变换计算后的）

从代码中我们也能知道该pass需要使用到的constant buffer中的参数了。

float4 g_Shape
g_Shape.y: LocalConstraintStiffness，局部形状约束的刚度值，默认为0.8f
int4 g_Counts
g_Counts.x -> g_NumOfStrandsPerThreadGroup: num strands per thread group
g_Counts.y -> g_NumFollowHairsPerGuideHair: num follow hairs per guid hair

长度约束（和风场计算）

从物理现象上来说，发丝的长度是定长的（肉眼感知不到），但是在前面的物理计算过程中，会导致我们的发丝长度发生改变，这种改变是非预期的，是TressFX物理计算模型的副作用，我们通过长度约束来缓解这一问题。

该过程针对每个头发模型使用1个pass进行迭代计算，分为CPU迭代和GPU迭代，CPU迭代会多次Dispatch，GPU迭代会在shader中循环重复计算。

长度约束的Pass图

TressFX采用的长度约束算法如下图所示：

长度约束算法

由于假设了发丝每个顶点的质量一致，因此w均相同，约简即：

长度约束公式

其中li为发丝当前的长度，li0为发丝原长。

此即shader代码中ApplyDistanceConstraint函数做的事情。

我们看该算法，能够发现每次最小的约束单元计算时会改变两个顶点的位置，因此我们只能分离出奇偶部分，分别做并行处理，计算出其中一对后，先同步线程再算另一对。

长度约束并行方法

//--------------------------------------------------------------------------------------
//
//  LengthConstriantsWindAndCollision
//
//  Compute shader to move the vertex position based on wind, maintain the lenght constraints
//  and handles collisions.
//  注意！TressFX这里的碰撞处理的计算，是头发和胶囊体的碰撞计算，是一个非常简单粗糙的碰撞矫正，和基于SDF建场方法的碰撞矫正不是同一个！
//
// One thread computes one vertex.
//
//--------------------------------------------------------------------------------------
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void LengthConstriantsWindAndCollision(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    uint globalStrandIndex, localStrandIndex, globalVertexIndex, localVertexIndex, numVerticesInTheStrand, indexForSharedMem, strandType;
    CalcIndicesInVertexLevelMaster(GIndex, GId.x, globalStrandIndex, localStrandIndex, globalVertexIndex, localVertexIndex, numVerticesInTheStrand, indexForSharedMem, strandType);

    uint numOfStrandsPerThreadGroup = g_NumOfStrandsPerThreadGroup;

    //------------------------------
    // Copy data into shared memory
    //------------------------------
    sharedPos[indexForSharedMem] = g_HairVertexPositions[globalVertexIndex];
    sharedLength[indexForSharedMem] = g_HairRestLengthSRV[globalVertexIndex]; // 每根发丝上两两顶点间的长度，在CPU中计算，详见后文中的列出的cpp代码。该长度也能够在GPU中通过发丝的初始顶点位置计算出来。
    GroupMemoryBarrierWithGroupSync();

    //------------
    // Wind
    //------------
    // 风场计算部分放在下一节中再详细展开，这里简单看看就好
    if ( g_Wind.x != 0 || g_Wind.y != 0 || g_Wind.z != 0 )
    {
        float4 force = float4(0, 0, 0, 0);

        float frame = g_Wind.w;

        if ( localVertexIndex >= 2 && localVertexIndex < numVerticesInTheStrand-1 )
        {
            // combining four winds.
            float a = ((float)(globalStrandIndex % 20))/20.0f;
            float3  w = a*g_Wind.xyz + (1.0f-a)*g_Wind1.xyz + a*g_Wind2.xyz + (1.0f-a)*g_Wind3.xyz;

            uint sharedIndex = localVertexIndex * numOfStrandsPerThreadGroup + localStrandIndex;

            float3 v = sharedPos[sharedIndex].xyz - sharedPos[sharedIndex+numOfStrandsPerThreadGroup].xyz;
            float3 force = -cross(cross(v, w), v);
            sharedPos[sharedIndex].xyz += force*g_TimeStep*g_TimeStep;
        }
    }
    GroupMemoryBarrierWithGroupSync();

    //----------------------------
    // Enforce length constraints
    //----------------------------
    uint a = floor(numVerticesInTheStrand/2.0f);     // 发丝中间偏尾部的顶点，假设一根发丝32个顶点，则a=16
    uint b = floor((numVerticesInTheStrand-1)/2.0f); // 发丝中部偏根部的顶点，假设一根发丝32个顶点，则a=15

    // 返回g_SimInts.x，GPU中的循环迭代次数
    int nLengthContraintIterations = GetLengthConstraintIterations();

    for ( int iterationE=0; iterationE < nLengthContraintIterations; iterationE++ )
    {
        uint sharedIndex = 2*localVertexIndex * numOfStrandsPerThreadGroup + localStrandIndex;
        // TressFX默认采用的一个THREAD_GROUP是64个线程，假设按上面的一根发丝32个顶点，那么numOfStrandsPerThreadGroup即为64/32=2

        // 这儿的sharedIndex的计算和风场计算中的不同，前面多乘了一个2。
        // 后面的代码中的两个if判断localVertexIndex<a和localVertexIndex<b会把后半段的发丝的顶点过滤掉，shader中只有前半段发丝顶点
        // 所处理的线程工作（执行if中的代码），后半段发丝顶点的线程直接跳过了。
        // 而前面多乘的一个2，会导致这些只有前半段发丝顶点会执行if中的内容的线程，实际上处理的是一根发丝上连续的奇偶两对顶点，有点绕。

        // 还是按前面的假设来举个例子理解一下，一根发丝的32个顶点对应32个线程，其中只有发根的前16个顶点所对应的线程会走if里面的代码，
        // 而这16个会走if里面的代码的线程，分别处理的是连续的奇偶两对顶点，如下：
        // 顶点对应的线程：    t0         t1         t2         t3      ...       t13            t14        t15
        // 处理的顶点线段： v0v1&v1v2  v2v3&v3v4  v4v5&v5v6  v6v7&v7v8  ...  v26v27&v27v28  v28v29&v29v30  v30v31

        if( localVertexIndex < a )
            ApplyDistanceConstraint( // 对当前的顶点（偶顶点对，例如v0v1）计算长度约束，ApplyDistanceConstraint函数实现的分析见下方
                sharedPos[sharedIndex],                            // 当前顶点
                sharedPos[sharedIndex+numOfStrandsPerThreadGroup], // 当前顶点的下一个顶点（朝发尾方向）
                sharedLength[sharedIndex].x);                      // 这一段的原始长度

        GroupMemoryBarrierWithGroupSync(); // 务必要等同步

        if( localVertexIndex < b )
            ApplyDistanceConstraint( // 对当前顶点的下一个顶点（奇顶点对，例如v1v2）计算长度约束
                sharedPos[sharedIndex+numOfStrandsPerThreadGroup],
                sharedPos[sharedIndex+numOfStrandsPerThreadGroup*2],
                sharedLength[sharedIndex+numOfStrandsPerThreadGroup].x);

        GroupMemoryBarrierWithGroupSync(); // 务必要等同步
    }

    //------------------------------------------
    // Collision handling with capsule objects
    //------------------------------------------
    float4 oldPos = g_HairVertexPositionsPrev[globalVertexIndex];
    // 简单粗糙的一个与胶囊体的碰撞矫正，用基于SDF的碰撞矫正后其实不需要它的，略，对ResolveCapsuleCollisions感兴趣可以详见：
    // https://github.com/GPUOpen-Effects/TressFX/blob/v4.1.0/src/Shaders/TressFXSimulation.hlsl#L831-L863
    bool bAnyColDetected = ResolveCapsuleCollisions(sharedPos[indexForSharedMem], oldPos);
    GroupMemoryBarrierWithGroupSync();

    //-------------------
    // Compute tangent
    //-------------------
    // If this is the last vertex in the strand, we can't get tangent from subtracting from the next vertex, need to use last vertex to current
    // 计算发丝顶点的切线方向，这个切线方向会作为后面头发渲染时所使用的顶点切线。计算过程也很简单，就是用下一个顶点位置减去当前的顶点位置然后
    // normalize后得到的单位向量。对于发丝末尾的最后一个顶点，它没有下一个顶点了，就让这个末尾顶点的切线方向和倒数第二个的顶点的切线向量保持一致。
    uint indexForTangent = (localVertexIndex == numVerticesInTheStrand - 1) ? indexForSharedMem - numOfStrandsPerThreadGroup : indexForSharedMem;
    float3 tangent = sharedPos[indexForTangent + numOfStrandsPerThreadGroup].xyz - sharedPos[indexForTangent].xyz;
    g_HairVertexTangents[globalVertexIndex].xyz = normalize(tangent);

    //---------------------------------------
    // clamp velocities, rewrite history
    //---------------------------------------
    float3 positionDelta = sharedPos[indexForSharedMem].xyz - oldPos;
    float speedSqr = dot(positionDelta, positionDelta);
    if (speedSqr > g_ClampPositionDelta * g_ClampPositionDelta) {
        positionDelta *= g_ClampPositionDelta * g_ClampPositionDelta / speedSqr;
        g_HairVertexPositionsPrev[globalVertexIndex].xyz = sharedPos[indexForSharedMem].xyz - positionDelta; // 处理的是上一帧的顶点位置
    }

    //---------------------------------------
    // update global position buffers
    //---------------------------------------
    g_HairVertexPositions[globalVertexIndex] = sharedPos[indexForSharedMem];

    if (bAnyColDetected)
        g_HairVertexPositionsPrev[globalVertexIndex] = sharedPos[indexForSharedMem];

    return;
}

void ApplyDistanceConstraint(inout float4 pos0, inout float4 pos1, float targetDistance, float stiffness = 1.0)
                                                                                        // TressFX没有放开stiffness的调整，始终是1.0
{   // 应用长度约束的算法，对照的就是前面的算法公式实现
    float3 delta = pos1.xyz - pos0.xyz;
    float distance = max(length(delta), 1e-7);
    float stretching = 1 - targetDistance / distance;
    delta = stretching * delta;

    float2 multiplier = ConstraintMultiplier(pos0, pos1); // 见下方，很容易理解

    pos0.xyz += multiplier[0] * delta * stiffness;
    pos1.xyz -= multiplier[1] * delta * stiffness;
}

float2 ConstraintMultiplier(float4 particle0, float4 particle1)
{
    if (IsMovable(particle0))
    {
        if (IsMovable(particle1))
            return float2(0.5, 0.5); // 如果两端顶点都可以移动，就均分约束带来的调整量
        else
            return float2(1, 0);     // 如果只有一端顶点可以移动，就把调整量全部算在这个顶点上
    }
    else
    {
        if (IsMovable(particle1))
            return float2(0, 1);     // 理论上来说，不会走到这个分支
        else
            return float2(0, 0);     // 两端顶点都不可动，不调整了
    }
}

ComputeRestLengths函数在CPU中计算的每根发丝上两两顶点间的长度，这个长度也能够在GPU中通过发丝的初始顶点位置数据计算出来的，但是TressFX还是在CPU上预先计算好了，下发给GPU中直接使用。

void TressFXAsset::ComputeRestLengths()
{
    Vector3* pos = (Vector3*)m_positions.data();
    float* restLen = (float*)m_restLengths.data();

    int index = 0;

    // Calculate rest lengths
    for (int i = 0; i < m_numTotalStrands; i++)
    {
        int indexRootVert = i * m_numVerticesPerStrand;

        for (int j = 0; j < m_numVerticesPerStrand - 1; j++)
        {
            restLen[index++] = (pos[indexRootVert + j] - pos[indexRootVert + j + 1]).Length();
        }

        // Since number of edges are one less than number of vertices in hair strand, below
        // line acts as a placeholder.
        restLen[index++] = 0;
    }
}

该pass只用到的一个新的constant buffer中的参数g_SimInts.x：

float4 g_SimInts
g_SimInts.x: GPU中的循环迭代次数
int4 g_Counts
g_Counts.x -> g_NumOfStrandsPerThreadGroup: num strands per thread group
g_Counts.y -> g_NumFollowHairsPerGuideHair: num follow hairs per guid hair

风场计算

风场计算处理的是毛发在有风的环境下的效果，如自然风、吹风机等影响下毛发的表现。风场采用的是如下所示的四棱锥风场，TressFX称它为pyramid wind（金字塔风）。

四棱锥风场

风场计算的Shader代码在前一节中我们已经看到过了，这里再单独拿出来：

...

if ( localVertexIndex >= 2 && localVertexIndex < numVerticesInTheStrand-1 )
{
    // combining four winds.
    float a = ((float)(globalStrandIndex % 20))/20.0f;
    float3  w = a*g_Wind.xyz + (1.0f-a)*g_Wind1.xyz + a*g_Wind2.xyz + (1.0f-a)*g_Wind3.xyz;

    uint sharedIndex = localVertexIndex * numOfStrandsPerThreadGroup + localStrandIndex;

    float3 v = sharedPos[sharedIndex].xyz - sharedPos[sharedIndex+numOfStrandsPerThreadGroup].xyz;
    float3 force = -cross(cross(v, w), v);
    sharedPos[sharedIndex].xyz += force*g_TimeStep*g_TimeStep;
}

...

对应的算法方程如下：

风场算法

TressFX为了在这个风场的四棱锥中“随机均匀”地生成一个风的方向，引入了一个a，并且用求20的余数的方式（发丝上顶点数量不会是20的倍数）生成一个极其简单的假随机数，然后与四棱锥的四条边方向做blend，得到一个在四棱锥风场范围内的一个风向，用这个融合风向作为当前顶点的风场计算时的风向。

TressFX把风的计算放在了和长度约束共同的一个Pass中，这个合并到一个Pass中的做法我没有想明白是为什么，长度约束有迭代次数，会根据迭代次数多次Dispatch这一个Pass，而风场计算放在这个Pass中会导致风场的计算也被计算了多次，然而g_TimeStep并不会除以迭代次数，这样就会造成风场计算跳帧了。我认为应该将风场计算单独成一个Pass进行处理。

另外，在计算在风的作用下引起的毛发顶点的位移量时，用了毛发的切线方向向量v与风的方向向量w做叉积，再与毛发的切线方向v做叉积，得到最终的位移方向，如下图所示。

计算风场影响下的发丝顶点位移方向

这个方向计算出来是对的，但是TressFX把这个向量的xyz大小直接作为了风力影响的大小，而叉积计算结果的xyz大小值（注意这不是模）是没有实际的物理意义的，这就导致了TressFX中的风场是坏的，没有效果，我们得将它分离开，风力大小和方向得分别计算，然后再叠加。该问题已向开源仓反馈：https://github.com/GPUOpen-Effects/TressFX/issues/47。修改完后的Shader如下所示。

...

if ( localVertexIndex >= 2 && localVertexIndex < numVerticesInTheStrand-1 )
{
    ...

    float3 v = sharedPos[sharedIndex].xyz - sharedPos[sharedIndex+numOfStrandsPerThreadGroup].xyz;
    float force_mul = length(v) * length(w); // FIXBUG!! 分离出大小
    v = normalize(v); // FIXBUG!! 用normalize的向量计算
    w = normalize(w); // FIXBUG!! 用normalize的向量计算

    float3 force = -cross(cross(v, w), v);
    force *= force_mul; // FIXBUG!! 把大小和方向再组合
    sharedPos[sharedIndex].xyz += force*g_TimeStep*g_TimeStep; // 简单地用风力乘以时间间隔片的平方作为最终的移动量大小，我理解这不是物理正确的
}

...

在Shader中，我们直接使用了W1~W4四个向量（Shader中为g_Wind、g_Wind1、g_Wind2、g_Wind3），对应于四棱锥风场的四个边，而最外层给技美调整的接口只有风的大小方向和范围角度，这四个向量将在CPU侧通过风的方向和范围角度来生成，如下代码所示。

//////////////////// Quaternion.cpp ////////////////////

// 设置当前的四元数的值为：绕axis轴旋转angle_radian弧度的四元数改变量
void Quaternion::SetRotation(const Vector3& axis, float angle_radian)
{
    // This function assumes that the axis vector has been normalized.
    float halfAng = 0.5f * angle_radian;
    float sinHalf = sin(halfAng);
    w             = cos(halfAng);
    x = sinHalf * axis.x;
    y = sinHalf * axis.y;
    z = sinHalf * axis.z;
}

//////////////////// TressFXHairObject.cpp ////////////////////

void TressFXHairObject::SetWind(const Vector3& windDir, float windMag, int frame)
{
    float wM = windMag * (pow(sin(frame * 0.01f), 2.0f) + 0.5f);

    Vector3 windDirN(windDir);
    windDirN.Normalize();

    Vector3 XAxis(1.0f, 0, 0);
    Vector3 xCrossW = XAxis.Cross(windDirN);

    Quaternion rotFromXAxisToWindDir;
    rotFromXAxisToWindDir.SetIdentity();

    float angle = asin(xCrossW.Length());

    if (angle > 0.001)
    {
        // 计算出从X轴方向XAxis旋转到风的方向windDirN所需要的旋转改变量并存到rotFromXAxisToWindDir中
        rotFromXAxisToWindDir.SetRotation(xCrossW.Normalize(), angle);
    }

    // 注意：风的范围角度，决定了四棱锥风场的张角，应该暴露出去给技美调参的，但是这一项TressFX的Demo写死了
    float angleToWideWindCone = DEG_TO_RAD2(40.f);

    // 计算出并更新四棱锥风场的四条棱到constant buffer，SetWindCorner函数详见下方，TressFX的计算思想是：
    // 以X轴的方向先应用风的范围角度，得到相对于X轴的四条棱向量，再将四条棱向量旋转到风的方向
    SetWindCorner(rotFromXAxisToWindDir,
        Vector3(0, 1.0, 0),
        angleToWideWindCone,
        wM,
        m_SimCB[m_SimulationFrame % 2]->m_Wind);
    SetWindCorner(rotFromXAxisToWindDir,
        Vector3(0, -1.0, 0),
        angleToWideWindCone,
        wM,
        m_SimCB[m_SimulationFrame % 2]->m_Wind1);
    SetWindCorner(rotFromXAxisToWindDir,
        Vector3(0, 0, 1.0),
        angleToWideWindCone,
        wM,
        m_SimCB[m_SimulationFrame % 2]->m_Wind2);
    SetWindCorner(rotFromXAxisToWindDir,
        Vector3(0, 0, -1.0),
        angleToWideWindCone,
        wM,
        m_SimCB[m_SimulationFrame % 2]->m_Wind3);

    // fourth component unused. (used to store frame number, but no longer used).
}

// Wind is in a pyramid around the main wind direction.
// To add a random appearance, the shader will sample some direction
// within this cone based on the strand index.
// This function computes the vector for each edge of the pyramid.
static void SetWindCorner(Quaternion rotFromXAxisToWindDir,
                          Vector3 rotAxis,
                          float angleToWideWindCone,
                          float wM,
                          AMD::float4& outVec)
{
    static const Vector3 XAxis(1.0f, 0, 0);
    Quaternion rot(rotAxis, angleToWideWindCone);
    Vector3    newWindDir = rotFromXAxisToWindDir * rot * XAxis;
    outVec.x = newWindDir.x * wM;
    outVec.y = newWindDir.y * wM;
    outVec.z = newWindDir.z * wM;
    outVec.w = 0;  // unused.
}

另外，TressFX还在风的大小上做了点手脚，通过当前的帧数来计算一个sin函数，使得给定大小的风在一定的强度范围内有一个小波动，用来增强真实感，如下代码所示。

void TressFXHairObject::SetWind(const Vector3& windDir, float windMag, int frame)
{
    float wM = windMag * (pow(sin(frame * 0.01f), 2.0f) + 0.5f); // sin的平方，0~1 --> windMag * (0.5~1.5)范围内波动

    ...
}

看了风场计算的算法，我们其实能知道，这种做法其实不是物理正确的，只是一种简单的替代实现方案。正常来说，力的解算一般在物理计算中靠前的位置，我理解在这里应该是放在重力解算之后就做风力的解算，但正是因为这只是个简单的方案，可以不用考虑这个问题。

引导发丝生成从属发丝

这一步很简单了，单纯地由引导发丝生成从属发丝，增多发丝的数量。前面的计算仅针对了引导发丝，这一步通过加一个偏移来生成更多的从属发丝。

由guide hair(粗线)生成follow hair(细线)

TressFX用一个Pass来做这个生成动作。

引导发丝生成从属发丝的Pass图

每个线程处理一根引导发丝上的一个顶点。

// One thread computes one vertex.
[numthreads(THREAD_GROUP_SIZE, 1, 1)]
void UpdateFollowHairVertices(uint GIndex : SV_GroupIndex,
    uint3 GId : SV_GroupID, uint3 DTid : SV_DispatchThreadID)
{
    uint globalStrandIndex, localStrandIndex, globalVertexIndex, localVertexIndex;
    uint numVerticesInTheStrand, indexForSharedMem, strandType;
    CalcIndicesInVertexLevelMaster(GIndex, GId.x,
        globalStrandIndex, localStrandIndex,
        globalVertexIndex, localVertexIndex,
        numVerticesInTheStrand, indexForSharedMem, strandType);

    sharedPos[indexForSharedMem] = g_HairVertexPositions[globalVertexIndex];    // 引导发丝的position
    sharedTangent[indexForSharedMem] = g_HairVertexTangents[globalVertexIndex]; // 引导发丝的tangent
    GroupMemoryBarrierWithGroupSync();

    for ( uint i = 0; i < g_NumFollowHairsPerGuideHair; i++ ) // 每根从属发丝遍历
    {
        int globalFollowVertexIndex = globalVertexIndex + numVerticesInTheStrand * (i + 1);
        int globalFollowStrandIndex = globalStrandIndex + i + 1;

        // g_TipSeparationFactor是技美给定的头发尾端的间隔影响因子，
        // 由(localVertexIndex / numVerticesInTheStrand)的计算我们可知，越靠近发丝的根部，影响越小，发根第一个顶点一般在附着体上，
        // localVertexIndex为0，不受影响，越靠近发丝的尾部，影响越大。这个效果类似于一个毛囊里一般生有多根毛发（人头1个毛囊一般2~3根头发）
        // 加了个1.0，确保比1.0大，TipSeparationFactor一般在(0.0f, 1.0f)范围内调整，较浓密的毛发一般取值0.1f左右即可，稀疏的毛发可以增大该值
        float factor = g_TipSeparationFactor*((float)localVertexIndex / (float)numVerticesInTheStrand) + 1.0f;
        // g_FollowHairRootOffset是在CPU上生成的随机均匀的偏移量，该偏移量乘以上一行计算出的影响因子，得到该发丝上的该顶点的最终偏移量
        float3 followPos = sharedPos[indexForSharedMem].xyz + factor*g_FollowHairRootOffset[globalFollowStrandIndex].xyz;
        g_HairVertexPositions[globalFollowVertexIndex].xyz = followPos;

        g_HairVertexTangents[globalFollowVertexIndex] = sharedTangent[indexForSharedMem];
    }
}

在Shader中我们需要g_FollowHairRootOffset这个在CPU上生成的针对每根发丝每个顶点的随机均匀偏移量。由引导发丝生成的从属发丝，它们所在的位置一定是与引导发丝近似平行的。因此，我们需要计算出引导发丝的上的每个顶点的切线方向，用来指导生成对应的从属发丝的顶点。

如下代码所示，g_FollowHairRootOffset的生成在TressFXAsset::GenerateFollowHairs函数中进行。

bool TressFXAsset::GenerateFollowHairs(int numFollowHairsPerGuideHair, float tipSeparationFactor, float maxRadiusAroundGuideHair)
{
    ...

    m_followRootOffsets.resize(m_numTotalStrands);

    // type-cast to Vector3 to handle data easily. 
    Vector3* pos = m_positions.data();
    Vector3* followOffset = m_followRootOffsets.data();

    // Generate follow hairs
    for (int i = 0; i < m_numGuideStrands; i++)
    {
        int indexGuideStrand = i * (m_numFollowStrandsPerGuide + 1);
        int indexRootVertMaster = indexGuideStrand * m_numVerticesPerStrand;

        memcpy(&pos[indexRootVertMaster], &positionsGuide[i*m_numVerticesPerStrand], sizeof(Vector3)*m_numVerticesPerStrand);
        m_strandUV[indexGuideStrand] = strandUVGuide[i];

        followOffset[indexGuideStrand].Set(0, 0, 0);                // guide hair不需要偏移
        followOffset[indexGuideStrand].w = (float)indexGuideStrand; // 当前的followRootOffset所属的发丝id

        Vector3 v01 = pos[indexRootVertMaster + 1] - pos[indexRootVertMaster]; // 切线方向
        v01.Normalize();                                                       // normalize后得到切线方向单位向量

        // Find two orthogonal unit tangent vectors to v01
        Vector3 t0, t1;
        GetTangentVectors(v01, t0, t1);
        // GetTangentVectors将发丝顶点处的切线方向单位向量转成正交系下的两个分量，GetTangentVectors函数详见：
        // https://github.com/GPUOpen-Effects/TressFX/blob/v4.1.0/src/TressFX/TressFXAsset.cpp#L37-L63
        // 我们只有在正交系下才能表征和计算出平行于切线方向的偏移量

        for (int j = 0; j < m_numFollowStrandsPerGuide; j++)
        {
            int indexStrandFollow = indexGuideStrand + j + 1;
            int indexRootVertFollow = indexStrandFollow * m_numVerticesPerStrand;

            m_strandUV[indexStrandFollow] = m_strandUV[indexGuideStrand];

            // offset vector from the guide strand's root vertex position
            Vector3 offset = GetRandom(-maxRadiusAroundGuideHair, maxRadiusAroundGuideHair) * t0 +
            GetRandom(-maxRadiusAroundGuideHair, maxRadiusAroundGuideHair) * t1;
                             // GetRandom函数：static float GetRandom(float Min, float Max) { 
                             //                 return ((float(rand()) / float(RAND_MAX)) * (Max - Min)) + Min; }
            // 分解成正交系下的u(t0)、v(t1)，就可以分别对uv方向取随机的偏移了，偏移值的范围在正负maxRadiusAroundGuideHair之间，
            // 这样就能保证生成出的从属发丝在极差maxRadiusAroundGuideHair范围内平行于引导发丝
            followOffset[indexStrandFollow] = offset;
            followOffset[indexStrandFollow].w = (float)indexGuideStrand; // 当前的followRootOffset所属的发丝id

            ...
        }
    }

    ...
}

后记

在写这篇文章的时候，发现TressFX 5.0版本在5月底的时候发布了，但是它不再作为单独的项目仓发布了，而是做成了UE4的patch，直接打到UE4的源码里，开源在https://github.com/GPUOpenSoftware/UnrealEngine/tree/TressFX5-4.27。

TressFX的这次更新最主要的看点在支持了Marschner光照模型、增加了TAA时间抗锯齿、做了阴影的优化。

有时间和机会我们再去扒一扒他们新的实现，看看都做了啥:-)