frameIsEarly - buffer生成后不立即latch
现象
在分析掉帧问题时经常会遇到SurfaceFlinger侧BufferTx中有buffer但不立即latch的现象,下面以这个问题来分析这种现象的原因进而讲解frameIsEarly实现的作用。
以frameIsEarly/Bilibili_round7的 SF-3439336 帧为例为例

这里记一下这个延迟latch的buffer对应app的vsyncId

逻辑追踪
正常情况下SF在commit的latchBuffers时会对每个有buffer的layer进行latchBuffer

而3439336没有latchBuffer的操作

latchBuffers 逻辑
bool SurfaceFlinger::latchBuffers() {
ATRACE_CALL();
//...
mDrawingState.traverse([&](Layer* layer) {
//...
if (layer->hasReadyFrame()) {
frameQueued = true;
if (layer->shouldPresentNow(expectedPresentTime)) {
// 准备好的layer会在下面进行latchBuffer操作
mLayersWithQueuedFrames.emplace(layer);
} else {
ATRACE_NAME("!layer->shouldPresentNow()");
layer->useEmptyDamage();
}
} else {
layer->useEmptyDamage();
}
});
// ...
if (!mLayersWithQueuedFrames.empty()) {
// mStateLock is needed for latchBuffer as LayerRejecter::reject()
// writes to Layer current state. See also b/119481871
Mutex::Autolock lock(mStateLock);
for (const auto& layer : mLayersWithQueuedFrames) {
// 对每个layer的latchBuffer操作
if (layer->latchBuffer(visibleRegions, latchTime, expectedPresentTime)) {
mLayersPendingRefresh.push_back(layer);
newDataLatched = true;
}
layer->useSurfaceDamage();
}
}
//...
}可以看到 latchBuffer 的前提是 layer→hasReadyFrame()
hasReadyFrame
bool BufferLayer::hasReadyFrame() const {
return hasFrameUpdate() || getSidebandStreamChanged() || getAutoRefresh();
}正常有buffer情况第一个hasFrameUpdate()为true,其他为false
hasFrameUpdate
bool BufferStateLayer::hasFrameUpdate() const {
const State& c(getDrawingState());
return (mDrawingStateModified || mDrawingState.modified) && (c.buffer != nullptr || c.bgColorLayer != nullptr);
}c.buffer在 BufferStateLayer::setBuffer 中设置
bool BufferStateLayer::setBuffer(std::shared_ptr<renderengine::ExternalTexture>& buffer,
const BufferData& bufferData, nsecs_t postTime,
nsecs_t desiredPresentTime, bool isAutoTimestamp,
std::optional<nsecs_t> dequeueTime,
const FrameTimelineInfo& info) {
ATRACE_CALL();
// ...
mDrawingState.buffer = std::move(buffer);没有进行setBuffer
可以看到setBuffer中添加了trace,追踪下可以发现latchBuffer前没有进行setBuffer

setBuffer调用栈

uint32_t SurfaceFlinger::setClientStateLocked(const FrameTimelineInfo& frameTimelineInfo,
ComposerState& composerState,
int64_t desiredPresentTime, bool isAutoTimestamp,
int64_t postTime, uint32_t permissions) {
layer_state_t& s = composerState.state;
// ...
const uint64_t what = s.what;
// ...
if (what & layer_state_t::eBufferChanged) {
std::shared_ptr<renderengine::ExternalTexture> buffer =
getExternalTextureFromBufferData(*s.bufferData, layer->getDebugName());
if (layer->setBuffer(buffer, *s.bufferData, postTime, desiredPresentTime, isAutoTimestamp,
dequeueBufferTimestamp, frameTimelineInfo)) {
flags |= eTraversalNeeded;
}
} else if (frameTimelineInfo.vsyncId != FrameTimelineInfo::INVALID_VSYNC_ID) {
layer->setFrameTimelineVsyncForBufferlessTransaction(frameTimelineInfo, postTime);
}
// ...composerState.state.what 是会通过 Transaction 加上 eBufferChanged
SurfaceComposerClient::Transaction& SurfaceComposerClient::Transaction::setBuffer(
const sp<SurfaceControl>& sc, const sp<GraphicBuffer>& buffer,
const std::optional<sp<Fence>>& fence, const std::optional<uint64_t>& optFrameNumber,
ReleaseBufferCallback callback) {
// ...
s->what |= layer_state_t::eBufferChanged;
// ...这个 setBuffer 调用栈如下

看看 BLASTBufferQueue::acquireNextBufferLocked 侧是否成功
acquireNextBufferLocked
acquireBuffer从Buffer-TX有上升的情况下可以说明成功,也可以通过以下trace分析acquireBuffer包括setbuffer成功
看看这一帧的app侧acquireBuffer的trace

可以看到有frame=且没有releaseBuffer的trace说明调用了t→setBuffer
void BLASTBufferQueue::acquireNextBufferLocked(
const std::optional<SurfaceComposerClient::Transaction*> transaction) {
// ...
// 关键trace
BBQ_TRACE("frame=%" PRIu64, bufferItem.mFrameNumber);
if (buffer == nullptr) {
// 这里最终会调用到BufferQueueConsumer::releaseBuffer,会存储releaseBuffer的trace
mBufferItemConsumer->releaseBuffer(bufferItem, Fence::NO_FENCE);
BQA_LOGE("Buffer was empty");
return;
}
if (rejectBuffer(bufferItem)) {
BQA_LOGE("rejecting buffer:active_size=%dx%d, requested_size=%dx%d "
"buffer{size=%dx%d transform=%d}",
mSize.width, mSize.height, mRequestedSize.width, mRequestedSize.height,
buffer->getWidth(), buffer->getHeight(), bufferItem.mTransform);
mBufferItemConsumer->releaseBuffer(bufferItem, Fence::NO_FENCE);
acquireNextBufferLocked(transaction);
return;
}
// ...
t->setBuffer(mSurfaceControl, buffer, fence, bufferItem.mFrameNumber, releaseBufferCallback);既然acquire时buffer != null所以SurfaceComposerClient::Transaction::setBuffer也会将
s→what |= layer_state_t::eBufferChanged;
这说明app侧acquireBuffer没有问题,那就可能是SurfaceFlinger处的setBuffer前面的栈没有调用上
flushTransactionQueues
在来看看setBuffer的调用栈

这里切入主题直接看flushTransactionQueues,
bool SurfaceFlinger::flushTransactionQueues(int64_t vsyncId) {
// ...
while (!mTransactionQueue.empty()) {
auto& transaction = mTransactionQueue.front();
const bool pendingTransactions =
mPendingTransactionQueues.find(transaction.applyToken) !=
mPendingTransactionQueues.end();
const auto ready = [&]() REQUIRES(mStateLock) {
if (pendingTransactions) {
ATRACE_NAME("pendingTransactions");
return TransactionReadiness::NotReady;
}
// 这个返回值供下面判断使用
return transactionIsReadyToBeApplied(transaction, transaction.frameTimelineInfo,
transaction.isAutoTimestamp,
transaction.desiredPresentTime,
transaction.originUid, transaction.states,
bufferLayersReadyToPresent,
transactions.size(),
/*tryApplyUnsignaled*/ false);
}();
ATRACE_INT("TransactionReadiness", static_cast<int>(ready));
if (ready != TransactionReadiness::Ready) {
if (ready == TransactionReadiness::NotReadyBarrier) {
transactionsPendingBarrier++;
}
mPendingTransactionQueues[transaction.applyToken].push(std::move(transaction));
} else {
// ...
// transactions设置,if需要TransactionReadiness::Ready
transactions.emplace_back(std::move(transaction));
}
mTransactionQueue.pop_front();
ATRACE_INT("TransactionQueue", mTransactionQueue.size());
}
// ...
// transactions使用,之后去setBuffer
return applyTransactions(transactions, vsyncId);
}
}
}transactionIsReadyToBeApplied
需要正常setbuffer就需要此函数返回TransactionReadiness::Ready
auto SurfaceFlinger::transactionIsReadyToBeApplied(TransactionState& transaction,
const FrameTimelineInfo& info, bool isAutoTimestamp, int64_t desiredPresentTime,
uid_t originUid, const Vector<ComposerState>& states,
const std::unordered_map<
sp<IBinder>, uint64_t, SpHash<IBinder>>& bufferLayersReadyToPresent,
size_t totalTXapplied, bool tryApplyUnsignaled) const -> TransactionReadiness {
ATRACE_FORMAT("transactionIsReadyToBeApplied vsyncId: %" PRId64, info.vsyncId);
// ...
if (isAutoTimestamp && frameIsEarly(expectedPresentTime, info.vsyncId)) {
ATRACE_NAME("frameIsEarly");
return TransactionReadiness::NotReady;
}
// ...
ATRACE_FORMAT("%s allowLatchUnsignaled=%s", layer->getName().c_str(),
allowLatchUnsignaled ? "true" : "false");里又几个关键的trace,我们看看不setBuffer时的情况
前面知道app的vsyncId是3439330,对应trace,可以看到,其走了frameIsEarly,会返回TransactionReadiness::NotReady

看下一个sf正常合成的情况,这种正常合成”allowLatchUnsignaled=”

那么这样就可以确定这里不setBuffer和latchBuffer的原因是这里判定为了frameIsEarly
frameIsEarly 逻辑
可以看到和FrameTimeline相关
bool SurfaceFlinger::frameIsEarly(nsecs_t expectedPresentTime, int64_t vsyncId) const {
// The amount of time SF can delay a frame if it is considered early based
// on the VsyncModulator::VsyncConfig::appWorkDuration
constexpr static std::chrono::nanoseconds kEarlyLatchMaxThreshold = 100ms;
const auto currentVsyncPeriod = mScheduler->getDisplayStatInfo(systemTime()).vsyncPeriod;
// 当前vsync的一半
const auto earlyLatchVsyncThreshold = currentVsyncPeriod / 2;
// 获取FrameTimeline中的预测时间
const auto prediction = mFrameTimeline->getTokenManager()->getPredictionsForToken(vsyncId);
if (!prediction.has_value()) {
return false;
}
if (std::abs(prediction->presentTime - expectedPresentTime) >=
kEarlyLatchMaxThreshold.count()) {
return false;
}
// True则isEarly,不setBuffer
return prediction->presentTime >= expectedPresentTime &&
prediction->presentTime - expectedPresentTime >= earlyLatchVsyncThreshold;
}理解prediction→presentTime以及frameIsEarly的逻辑,首先需要明白FrameTimeline的预测逻辑,具体可以参考APP FrameTimeline trace逻辑 的 Expected timeline 章节
这里说一下几个时间信息
- HW_VSYNC:屏幕(硬件)发出vsync 信号
- present fence:当前帧成功显示到屏幕的时候,present fence就会signal,一般和HW_VSYNC对应
- nextVsyncTime:系统计算预测的下一个HW_VSYNC时间,也即SW_VSYNC
- sfWorkDuration = timing.readyDuration = sf Expected timeline
- appWorkDuration = timing.workDuration = app Expected timeline

- app.prediction.startTime + appWorkDuration = app.prediction.endTime
- app.prediction.endTime = sf.prediction.startTime
- sf.prediction.startTime + sfWorkDuration = sf.prediction.endTime = sf.prediction.presentTime
app和sf的prediction是分开的
再回到frameIsEarly的逻辑
可以看出返回值主要和prediction→presentTime、expectedPresentTime相关
vsyncId 对应的是app的id,prediction→presentTime = app.prediction.presentTime = app.prediction.endTime + sfWorkDuration
expectedPresentTime 是由 SurfaceFlinger::commit 传入的 expectedVsyncTime,往前追踪是vsyncCallback传入,最终可以追踪到是nextVsyncTime的值,就和名称对上了是present fence时间

- prediction→presentTime >= expectedPresentTime
- app. prediction.endTime + sfWorkDuration >= sf. prediction.startTime + sfWorkDuration
- app. prediction.endTime >= sf. prediction.startTime
- prediction→presentTime - expectedPresentTime >= earlyLatchVsyncThreshold
- app. prediction.endTime >= sf. prediction.startTime + vsyncdur/2
- + vsyncdur/2 这种情况是给app-vsync在自我纠正时使用的,此时app-vsync和sf-vsync的偏移并不准确
也就是sf开始合成的时间至多比app期望结束时间早半个vsync周期,多于半个vsync周期时则认为sf合成来早了,此时并不是app期望合成时间,就判定为frameIsEarly
回到最前面的没有latch的trace
可以看懂app过早绘制完成了,而sf未合成的帧并非其期望的合成时间,所以没有进行setBuffer

作用
这个目前看主要有以下两种作用
- frameIsEarly对下面这种场景特别重要,app隔一帧绘制,一般出现在非全屏播放视频的场景,sf的fps和app(视频)fps不一致的情况下,以frameIsEarly/Douyin_round3为例,如下图:
- 正常情况下,app-1 → sf-1;app-3 → sf-3;app-5 → sf-5,对应vsync的合成
- 如果没有frameIsEarly:
- app-1被sf-0提前latch
- app-2不绘制,sf-1没buffer不latch
- app-3如果绘制慢了一点点没赶上sf-2(此时并没有超过expected timeline,不掉帧),app-3就会在sf-3上latch
- app-4继续不绘制
- app-5绘制正常赶上sf-4,此时sf-4会latch
- 那么此时就出现这种情况,SurfaceFlinger端合成帧的序号为:sf-0、sf-3、sf-4
- 这种情况下明显出现两次帧更新间隔不稳定的情况,这会出现视觉卡顿,而此时app并未发生掉帧。

- 另一种还可以防止有多个layer时,此layer提前更新的问题,其他layer绘制慢而更新时间不一致问题(frameIsEarly能够保证同一个app-vsync下所有layer不掉帧时,其绘制的buffer能够给到sf同时合成)
出现的场景
一般出现在以下两种场景:
- SW-VSYNC在自我纠正时,可能是在帧率切换时或者input时间发生时
- 前一帧未上帧,导致未上帧的原因很多,可能是上上一帧掉帧,也可能是app自身未上帧或未绘制,也可能是系统导致TimerDispatch 没有触发app-vsync
SW-VSYNC在自我纠正时
frameIsEarly/Bilibili_round7为例
此时SW-VSYNC还在自我纠正,时间和周期不一致,此时app的Expected Timeline的期望开始合成时间(endTime)和sf的Expected Timeline的期望开始合成时间(startTime)也无法吻合,这种也是上面分析的情况。

前一帧未上帧
导致未上帧的原因很多,可能是前面掉帧,也可能是app自身未上帧或未绘制,也可能是系统原因导致TimerDispatch 没有触发app-vsync
前面掉帧导致下一帧未上帧,以frameIsEarly/home.perfetto-trace为例

系统原因导致TimerDispatch 没有触发app-vsync场景可以见:有buffer但不latch-app不绘制