frameIsEarly - buffer生成后不立即latch

现象

在分析掉帧问题时经常会遇到SurfaceFlinger侧BufferTx中有buffer但不立即latch的现象,下面以这个问题来分析这种现象的原因进而讲解frameIsEarly实现的作用。

以frameIsEarly/Bilibili_round7的 SF-3439336 帧为例为例

这里记一下这个延迟latch的buffer对应app的vsyncId

逻辑追踪

正常情况下SF在commit的latchBuffers时会对每个有buffer的layer进行latchBuffer

而3439336没有latchBuffer的操作

latchBuffers 逻辑

bool SurfaceFlinger::latchBuffers() {
    ATRACE_CALL();
    //...
    mDrawingState.traverse([&](Layer* layer) {
        //...
        if (layer->hasReadyFrame()) {
            frameQueued = true;
            if (layer->shouldPresentNow(expectedPresentTime)) {
                // 准备好的layer会在下面进行latchBuffer操作
                mLayersWithQueuedFrames.emplace(layer);
            } else {
                ATRACE_NAME("!layer->shouldPresentNow()");
                layer->useEmptyDamage();
            }
        } else {
            layer->useEmptyDamage();
        }
    });
    // ...
     
    if (!mLayersWithQueuedFrames.empty()) {
        // mStateLock is needed for latchBuffer as LayerRejecter::reject()
        // writes to Layer current state. See also b/119481871
        Mutex::Autolock lock(mStateLock);
 
        for (const auto& layer : mLayersWithQueuedFrames) {
            // 对每个layer的latchBuffer操作
            if (layer->latchBuffer(visibleRegions, latchTime, expectedPresentTime)) {
                mLayersPendingRefresh.push_back(layer);
                newDataLatched = true;
            }
            layer->useSurfaceDamage();
        }
    }
    //...
}

可以看到 latchBuffer 的前提是 layerhasReadyFrame()

hasReadyFrame

bool BufferLayer::hasReadyFrame() const {
    return hasFrameUpdate() || getSidebandStreamChanged() || getAutoRefresh();
}

正常有buffer情况第一个hasFrameUpdate()为true,其他为false

hasFrameUpdate

bool BufferStateLayer::hasFrameUpdate() const {
    const State& c(getDrawingState());
    return (mDrawingStateModified || mDrawingState.modified) && (c.buffer != nullptr || c.bgColorLayer != nullptr);
}

c.bufferBufferStateLayer::setBuffer 中设置

bool BufferStateLayer::setBuffer(std::shared_ptr<renderengine::ExternalTexture>& buffer,
                                 const BufferData& bufferData, nsecs_t postTime,
                                 nsecs_t desiredPresentTime, bool isAutoTimestamp,
                                 std::optional<nsecs_t> dequeueTime,
                                 const FrameTimelineInfo& info) {
    ATRACE_CALL();
    // ...
    mDrawingState.buffer = std::move(buffer);

没有进行setBuffer

可以看到setBuffer中添加了trace,追踪下可以发现latchBuffer前没有进行setBuffer

https://cs.android.com/android/platform/superproject/+/master:frameworks/native/services/surfaceflinger/BufferStateLayer.cpp;drc=642f229422a6e85e46807e6371db2a3b5785b806;bpv=1;bpt=1;l=342?gsn=setBuffer&gs=kythe://android.googlesource.com/platform/superproject?lang=c%2B%2B?path=frameworks/native/services/surfaceflinger/BufferStateLayer.h#99VZbhu2CtbtlTJVv1JZxjVIM36WGmLi2br5ldcgVAc&gs=kythe://android.googlesource.com/platform/superproject?lang=c%2B%2B?path=frameworks/native/services/surfaceflinger/BufferStateLayer.cpp#Wn7OnL4WkT6IT9IrEO6DIBnsq-WhKO_KkmBWoupD96k)

setBuffer调用栈

uint32_t SurfaceFlinger::setClientStateLocked(const FrameTimelineInfo& frameTimelineInfo,
                                              ComposerState& composerState,
                                              int64_t desiredPresentTime, bool isAutoTimestamp,
                                              int64_t postTime, uint32_t permissions) {
    layer_state_t& s = composerState.state;
    // ...
    const uint64_t what = s.what;
    // ...
    if (what & layer_state_t::eBufferChanged) {
        std::shared_ptr<renderengine::ExternalTexture> buffer =
                getExternalTextureFromBufferData(*s.bufferData, layer->getDebugName());
        if (layer->setBuffer(buffer, *s.bufferData, postTime, desiredPresentTime, isAutoTimestamp,
                             dequeueBufferTimestamp, frameTimelineInfo)) {
            flags |= eTraversalNeeded;
        }
    } else if (frameTimelineInfo.vsyncId != FrameTimelineInfo::INVALID_VSYNC_ID) {
        layer->setFrameTimelineVsyncForBufferlessTransaction(frameTimelineInfo, postTime);
    }
    // ...

composerState.state.what 是会通过 Transaction 加上 eBufferChanged

SurfaceComposerClient::Transaction& SurfaceComposerClient::Transaction::setBuffer(
        const sp<SurfaceControl>& sc, const sp<GraphicBuffer>& buffer,
        const std::optional<sp<Fence>>& fence, const std::optional<uint64_t>& optFrameNumber,
        ReleaseBufferCallback callback) {
    // ...
    s->what |= layer_state_t::eBufferChanged;
    // ...

这个 setBuffer 调用栈如下

看看 BLASTBufferQueue::acquireNextBufferLocked 侧是否成功

acquireNextBufferLocked

acquireBuffer从Buffer-TX有上升的情况下可以说明成功,也可以通过以下trace分析acquireBuffer包括setbuffer成功

看看这一帧的app侧acquireBuffer的trace

可以看到有frame=且没有releaseBuffer的trace说明调用了tsetBuffer

void BLASTBufferQueue::acquireNextBufferLocked(
        const std::optional<SurfaceComposerClient::Transaction*> transaction) {
    // ...
    // 关键trace
    BBQ_TRACE("frame=%" PRIu64, bufferItem.mFrameNumber);
 
    if (buffer == nullptr) {
        // 这里最终会调用到BufferQueueConsumer::releaseBuffer,会存储releaseBuffer的trace
        mBufferItemConsumer->releaseBuffer(bufferItem, Fence::NO_FENCE);
        BQA_LOGE("Buffer was empty");
        return;
    }
 
    if (rejectBuffer(bufferItem)) {
        BQA_LOGE("rejecting buffer:active_size=%dx%d, requested_size=%dx%d "
                 "buffer{size=%dx%d transform=%d}",
                 mSize.width, mSize.height, mRequestedSize.width, mRequestedSize.height,
                 buffer->getWidth(), buffer->getHeight(), bufferItem.mTransform);
        mBufferItemConsumer->releaseBuffer(bufferItem, Fence::NO_FENCE);
        acquireNextBufferLocked(transaction);
        return;
    }
    // ...
    t->setBuffer(mSurfaceControl, buffer, fence, bufferItem.mFrameNumber, releaseBufferCallback);

既然acquire时buffer != null所以SurfaceComposerClient::Transaction::setBuffer也会将

swhat |= layer_state_t::eBufferChanged;

这说明app侧acquireBuffer没有问题,那就可能是SurfaceFlinger处的setBuffer前面的栈没有调用上

flushTransactionQueues

在来看看setBuffer的调用栈

这里切入主题直接看flushTransactionQueues,

bool SurfaceFlinger::flushTransactionQueues(int64_t vsyncId) {
    // ...
            while (!mTransactionQueue.empty()) {
                auto& transaction = mTransactionQueue.front();
                const bool pendingTransactions =
                        mPendingTransactionQueues.find(transaction.applyToken) !=
                        mPendingTransactionQueues.end();
                const auto ready = [&]() REQUIRES(mStateLock) {
                    if (pendingTransactions) {
                        ATRACE_NAME("pendingTransactions");
                        return TransactionReadiness::NotReady;
                    }
                    // 这个返回值供下面判断使用
                    return transactionIsReadyToBeApplied(transaction, transaction.frameTimelineInfo,
                                                         transaction.isAutoTimestamp,
                                                         transaction.desiredPresentTime,
                                                         transaction.originUid, transaction.states,
                                                         bufferLayersReadyToPresent,
                                                         transactions.size(),
                                                         /*tryApplyUnsignaled*/ false);
                }();
                ATRACE_INT("TransactionReadiness", static_cast<int>(ready));
                if (ready != TransactionReadiness::Ready) {
                    if (ready == TransactionReadiness::NotReadyBarrier) {
                        transactionsPendingBarrier++;
                    }
                    mPendingTransactionQueues[transaction.applyToken].push(std::move(transaction));
                } else {
                    // ...
                    // transactions设置,if需要TransactionReadiness::Ready
                    transactions.emplace_back(std::move(transaction));
                }
                mTransactionQueue.pop_front();
                ATRACE_INT("TransactionQueue", mTransactionQueue.size());
            }
            // ...
            // transactions使用,之后去setBuffer
            return applyTransactions(transactions, vsyncId);
        }
    }
}
transactionIsReadyToBeApplied

需要正常setbuffer就需要此函数返回TransactionReadiness::Ready

auto SurfaceFlinger::transactionIsReadyToBeApplied(TransactionState& transaction,
        const FrameTimelineInfo& info, bool isAutoTimestamp, int64_t desiredPresentTime,
        uid_t originUid, const Vector<ComposerState>& states,
        const std::unordered_map<
            sp<IBinder>, uint64_t, SpHash<IBinder>>& bufferLayersReadyToPresent,
        size_t totalTXapplied, bool tryApplyUnsignaled) const -> TransactionReadiness {
    ATRACE_FORMAT("transactionIsReadyToBeApplied vsyncId: %" PRId64, info.vsyncId);
    // ...
    if (isAutoTimestamp && frameIsEarly(expectedPresentTime, info.vsyncId)) {
        ATRACE_NAME("frameIsEarly");
        return TransactionReadiness::NotReady;
    }
    // ...
    ATRACE_FORMAT("%s allowLatchUnsignaled=%s", layer->getName().c_str(),
                      allowLatchUnsignaled ? "true" : "false");

里又几个关键的trace,我们看看不setBuffer时的情况

前面知道app的vsyncId是3439330,对应trace,可以看到,其走了frameIsEarly,会返回TransactionReadiness::NotReady

http://minio.898311.xyz:8900/blogimg/16946096593103.png

看下一个sf正常合成的情况,这种正常合成”allowLatchUnsignaled=”

那么这样就可以确定这里不setBuffer和latchBuffer的原因是这里判定为了frameIsEarly

frameIsEarly 逻辑

可以看到和FrameTimeline相关

bool SurfaceFlinger::frameIsEarly(nsecs_t expectedPresentTime, int64_t vsyncId) const {
    // The amount of time SF can delay a frame if it is considered early based
    // on the VsyncModulator::VsyncConfig::appWorkDuration
    constexpr static std::chrono::nanoseconds kEarlyLatchMaxThreshold = 100ms;
 
    const auto currentVsyncPeriod = mScheduler->getDisplayStatInfo(systemTime()).vsyncPeriod;
    // 当前vsync的一半
    const auto earlyLatchVsyncThreshold = currentVsyncPeriod / 2;
    // 获取FrameTimeline中的预测时间
    const auto prediction = mFrameTimeline->getTokenManager()->getPredictionsForToken(vsyncId);
    if (!prediction.has_value()) {
        return false;
    }
 
    if (std::abs(prediction->presentTime - expectedPresentTime) >=
        kEarlyLatchMaxThreshold.count()) {
        return false;
    }
     // True则isEarly,不setBuffer
    return prediction->presentTime >= expectedPresentTime &&
            prediction->presentTime - expectedPresentTime >= earlyLatchVsyncThreshold;
}

理解predictionpresentTime以及frameIsEarly的逻辑,首先需要明白FrameTimeline的预测逻辑,具体可以参考APP FrameTimeline trace逻辑Expected timeline 章节

这里说一下几个时间信息

  • HW_VSYNC:屏幕(硬件)发出vsync 信号
  • present fence:当前帧成功显示到屏幕的时候,present fence就会signal,一般和HW_VSYNC对应
  • nextVsyncTime:系统计算预测的下一个HW_VSYNC时间,也即SW_VSYNC
  • sfWorkDuration = timing.readyDuration = sf Expected timeline
  • appWorkDuration = timing.workDuration = app Expected timeline

  • app.prediction.startTime + appWorkDuration = app.prediction.endTime
  • app.prediction.endTime = sf.prediction.startTime
  • sf.prediction.startTime + sfWorkDuration = sf.prediction.endTime = sf.prediction.presentTime

app和sf的prediction是分开的

再回到frameIsEarly的逻辑

可以看出返回值主要和predictionpresentTimeexpectedPresentTime相关

vsyncId 对应的是app的id,predictionpresentTime = app.prediction.presentTime = app.prediction.endTime + sfWorkDuration

expectedPresentTime 是由 SurfaceFlinger::commit 传入的 expectedVsyncTime,往前追踪是vsyncCallback传入,最终可以追踪到是nextVsyncTime的值,就和名称对上了是present fence时间

也就是sf开始合成的时间至多比app期望结束时间早半个vsync周期,多于半个vsync周期时则认为sf合成来早了,此时并不是app期望合成时间,就判定为frameIsEarly

回到最前面的没有latch的trace

可以看懂app过早绘制完成了,而sf未合成的帧并非其期望的合成时间,所以没有进行setBuffer

作用

这个目前看主要有以下两种作用

  1. frameIsEarly对下面这种场景特别重要,app隔一帧绘制,一般出现在非全屏播放视频的场景,sf的fps和app(视频)fps不一致的情况下,以frameIsEarly/Douyin_round3为例,如下图:
  • 正常情况下,app-1 sf-1;app-3 sf-3;app-5 sf-5,对应vsync的合成
  • 如果没有frameIsEarly:
    • app-1被sf-0提前latch
    • app-2不绘制,sf-1没buffer不latch
    • app-3如果绘制慢了一点点没赶上sf-2(此时并没有超过expected timeline,不掉帧),app-3就会在sf-3上latch
    • app-4继续不绘制
    • app-5绘制正常赶上sf-4,此时sf-4会latch
    • 那么此时就出现这种情况,SurfaceFlinger端合成帧的序号为:sf-0、sf-3、sf-4
    • 这种情况下明显出现两次帧更新间隔不稳定的情况,这会出现视觉卡顿,而此时app并未发生掉帧。

  1. 另一种还可以防止有多个layer时,此layer提前更新的问题,其他layer绘制慢而更新时间不一致问题(frameIsEarly能够保证同一个app-vsync下所有layer不掉帧时,其绘制的buffer能够给到sf同时合成)

出现的场景

一般出现在以下两种场景:

  • SW-VSYNC在自我纠正时,可能是在帧率切换时或者input时间发生时
  • 前一帧未上帧,导致未上帧的原因很多,可能是上上一帧掉帧,也可能是app自身未上帧或未绘制,也可能是系统导致TimerDispatch 没有触发app-vsync

SW-VSYNC在自我纠正时

frameIsEarly/Bilibili_round7为例

此时SW-VSYNC还在自我纠正,时间和周期不一致,此时app的Expected Timeline的期望开始合成时间(endTime)和sf的Expected Timeline的期望开始合成时间(startTime)也无法吻合,这种也是上面分析的情况。

前一帧未上帧

导致未上帧的原因很多,可能是前面掉帧,也可能是app自身未上帧或未绘制,也可能是系统原因导致TimerDispatch 没有触发app-vsync

前面掉帧导致下一帧未上帧,以frameIsEarly/home.perfetto-trace为例

系统原因导致TimerDispatch 没有触发app-vsync场景可以见:有buffer但不latch-app不绘制