Closed Bug 1076166 Opened 10 years ago Closed 10 years ago

GLContextEGL::SwapBuffers() blocks on Flame Kit-Kat on Master, but not on Silk Branch

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mchang, Assigned: mchang)

Details

Attachments

(3 files)

Silk Rebased on:
Gecko: 9e83c932d26ab77dc0684f4bc1a96a0f4d53539e
Gaia: 8f452cbfb9049ffab7adc5c9283d373fbbbedafd

On a Flame device, on Kit-Kat, I instrumented how long it takes to swapBuffers on a simple box animation here - http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355

On Master, we block until the next vsync occurs. We block for ~10ms on master. On the Silk Branch, where we composite after a notification from the hardware composer, I see that we block on swapBuffers for ~2-3ms. Investigate why this happens.

Sotaro, any ideas? Thanks!
Flags: needinfo?(sotaro.ikeda.g)
I've been investigating and most of the time spent is somewhere in the hwc.cpp hardware composer driver. I can't find the exact replica of what we have in b2g-flame-kk, but it's in b2g-flame/hardware/qcom/display/libhwcomposer/hwc.cpp. The mostly equivalent is here - https://android.googlesource.com/platform/hardware/qcom/display/+/master/msm8960/libhwcomposer/hwc.cpp:224. Most of the time is spent in hwc_prepare_primary on mozilla-central master. For example, if we spent 11 ms in swap buffers, 9ms goes to hwc_prepare_primary. Still have to dig some more.
mchang, I do not at all about silk branch. Can you provide more information?

- Do both branchs uses same flame-kk Base image?
- How do you measure swapBuffers time?
- How do you know "On Master, we block until the next vsync occurs".
- Did you already checked b2g-v2.1 and b2g-v2.0?
- When you measure the time on master, did you enable vsync callback?
- Can you upload actual measure results?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(mchang)
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> mchang, I do not at all about silk branch. Can you provide more information?
> 
> - Do both branchs uses same flame-kk Base image?

Yes, both use the flame-kk v.180 base image.

> - How do you measure swapBuffers time?

I measured it here - http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355. I got the mozilla::TimeStamp before and after the call to SwapBuffers. Most of the time is spent b2g-flame-kk/hardware/qcom/display/libhwcomposer/hwc.cpp::hwc_prepare_primary. I still have to dig more into that function.

> - How do you know "On Master, we block until the next vsync occurs".

After every vsync callback here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2d.cpp&case=true#234 - I store the vsync timestamp as a mozilla::TimeStamp that is synced with the android vsync timestamp. At CompositorOGL.cpp::SwapBuffers, I compare the end timestamp to the vsync timestamp, and it is always ~17ms intervals. The CompositorOGL.cpp::SwapBuffer call usually takes ~8-10ms on master. On silk's branch, which is based off last Thursday's master branch, usually only blocks for 2-3 ms.

> - Did you already checked b2g-v2.1 and b2g-v2.0?

No, not yet. I can do that.

> - When you measure the time on master, did you enable vsync callback?

Yes. The vsync callback is enabled on both silk / master.

> - Can you upload actual measure results?

A lot of it is manual instrumentation that looks like this:

// One frame. SwapBuffer duration is in CompositorOGL. Vsync start is ms from when the vsync started. end is from after swpa buffers.
I/Gecko   ( 2848): SwapBuffer duration: 9.475313, vsync start 8.526581, end 18.001894

// UpdateFBSurface is measured right after here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#743
I/Gecko   ( 2848): HwcComposer::UpdateFBSurface 0.868021

// Measured here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#751
I/Gecko   ( 2848): HwcComposer::BeforePrepare 0.936198, mPrepared 0
I/Gecko   ( 2848): Calling mwcPrepare: 0.002657

// This is in QC's driver at b2g-flame/hardware/qcom/display/libhwcomposer/hwc.cpp::hwc_prepare. Each instrumentation here is in the hwc_prepare loop.
E/qdhwcomposer( 2848): HardwareComposer::lockTime: 0.000000
E/qdhwcomposer( 2848): HardwareComposer::VirtualDisplay: 0.000000
E/qdhwcomposer( 2848): HardwareComposer::ExternalDisplay : 0.000000
E/qdhwcomposer( 2848): HardwareComposer::PrimaryDisplay: 7.000000
E/qdhwcomposer( 2848): HardwareComposer::LoopTime : 7.000000
E/qdhwcomposer( 2848): HardwareComposer::hwc_prepare: 8.000000, num displays: 3

// Backup to widget/gonk/HwcComposer2D.cpp
I/Gecko   ( 2848): After mwcPrepare: 8.059688

// Before Commit() - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#769
I/Gecko   ( 2848): HwcComposer::Before 9.056979
I/Gecko   ( 2848): HwcComposer::SetFBRelease 9.579427

http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#772 - At the end of render
I/Gecko   ( 2848): HwcComposer::Render 9.723229
// End CompositorOGL::SwapFrames()


// Next Frame
I/Gecko   ( 2848): SwapBuffer duration: 9.752396, vsync start 8.248623, end 18.001019
I/Gecko   ( 2848): HwcComposer::UpdateFBSurface 0.855105
I/Gecko   ( 2848): HwcComposer::BeforePrepare 0.917188, mPrepared 0
I/Gecko   ( 2848): Calling mwcPrepare: 0.002865
E/qdhwcomposer( 2848): HardwareComposer::lockTime: 0.000000
E/qdhwcomposer( 2848): HardwareComposer::VirtualDisplay: 0.000000
E/qdhwcomposer( 2848): HardwareComposer::ExternalDisplay : 0.000000
E/qdhwcomposer( 2848): HardwareComposer::PrimaryDisplay: 5.000000
E/qdhwcomposer( 2848): HardwareComposer::LoopTime : 6.000000
E/qdhwcomposer( 2848): HardwareComposer::hwc_prepare: 6.000000, num displays: 3
I/Gecko   ( 2848): After mwcPrepare: 6.096823
I/Gecko   ( 2848): HwcComposer::Before 7.058594
I/Gecko   ( 2848): HwcComposer::SetFBRelease 7.638751
I/Gecko   ( 2848): HwcComposer::Render 7.787344
I/Gecko   ( 2848): SwapBuffer duration: 7.823021, vsync start 9.970811, end 17.793832
Flags: needinfo?(mchang)
This bug's tile could easy to misunderstand, it seems better to change the name. Comment 0 says GLContextEGL::SwapBuffers() take time. Actual eglSwapBuffers() is called in GonkDisplayJB::UpdateFBSurface(). And from Comment 3, it does not take time.

http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libdisplay/GonkDisplayJB.cpp#297
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> This bug's tile could easy to misunderstand, it seems better to change the
> name. Comment 0 says GLContextEGL::SwapBuffers() take time. Actual
> eglSwapBuffers() is called in GonkDisplayJB::UpdateFBSurface(). And from
> Comment 3, it does not take time.
> 
> http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libdisplay/
> GonkDisplayJB.cpp#297

The bug titile looks like eglSwapBuffers() seems taking time.
(In reply to Mason Chang [:mchang] from comment #0)
> Silk Rebased on:
> Gecko: 9e83c932d26ab77dc0684f4bc1a96a0f4d53539e
> Gaia: 8f452cbfb9049ffab7adc5c9283d373fbbbedafd
> 
> On a Flame device, on Kit-Kat, I instrumented how long it takes to
> swapBuffers on a simple box animation here -
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/
> CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355
> 
> On Master, we block until the next vsync occurs. We block for ~10ms on
> master. On the Silk Branch, where we composite after a notification from the
> hardware composer, I see that we block on swapBuffers for ~2-3ms.
> Investigate why this happens.
> 
> Sotaro, any ideas? Thanks!

Isn't it blocked by vsync? eglSwapBuffers() is not blocked by vsync, but HwComposer could be blocked by vsync. As in comment 4, GLContextEGL::SwapBuffers() and eglSwapBuffers() are different. eglSwapBuffers() is only OpenGL. GLContextEGL::SwapBuffers() is (OpenGL + HwComposer).
Summary: glSwapBuffers blocks on Flame Kit-Kat on Master, but not on Silk Branch → GLContextEGL::SwapBuffers() blocks on Flame Kit-Kat on Master, but not on Silk Branch
Almost all the time is spent in a driver call, not the fences on master. This occurs in b2g-flame/hardware/qcom/display/liboverlay/overlayMdp.h:176, which does ioctl(fd, MSMFB_OVERLAY_PREPARE, &list) < 0), which is a call into the driver. On silk branches, where we compose right after a vsync, we have the following timings from CompositorOGL:

I/Gecko   ( 1237): SwapBuffer duration: 3.368594, vsync start 2.337685, end 5.706279
I/Gecko   ( 1237): HwcComposer::UpdateFBSurface 1.547344
I/Gecko   ( 1237): HwcComposer::BeforePrepare 1.667344, mPrepared 0
I/Gecko   ( 1237): Calling mwcPrepare: 0.005052
E/qdhwcomposer( 1237): HardwareComposer::lockTime: 0.000000
E/qdhwcomposer( 1237): HardwareComposer::VirtualDisplay: 0.000000
E/qdhwcomposer( 1237): HardwareComposer::ExternalDisplay : 0.000000
E/qdhwcomposer( 1237): hwc_prepare_primary middle: 0.000000
E/qdhwcomposer( 1237): hwc_prepare_only: 0.000000
E/qdhwcomposer( 1237): FBUpdateNonSplit::prepare: 0.000000
E/qdoverlay( 1237): mdp_wraper::validateAndSet: 0.000000, count: 1
E/qdoverlay( 1237): mdPCtrl::ValidateAndSet total: 0.000000
E/qdoverlay( 1237): Overlay::ValidateAndSet 0.000000
E/qdhwcomposer( 1237): FB::prepareAndValidate: middle: 0.000000, end: 0.000000
E/qdhwcomposer( 1237): hwc_prepare_validate: 0.000000
E/qdhwcomposer( 1237): hwc_copy_prepare: 0.000000
E/qdhwcomposer( 1237): HardwareComposer::PrimaryDisplay: 0.000000
E/qdhwcomposer( 1237): HardwareComposer::LoopTime : 0.000000
E/qdhwcomposer( 1237): HardwareComposer::hwc_prepare: 0.000000, num displays: 3
I/Gecko   ( 1237): After mwcPrepare: 0.511770
I/Gecko   ( 1237): HwcComposer::Before 2.238334
I/Gecko   ( 1237): HwcComposer::SetFBRelease 2.813854
I/Gecko   ( 1237): HwcComposer::Render 2.971823

On Master:
I/Gecko   ( 1728): SwapBuffer duration: 6.527500, vsync start 11.818349, end 18.345849
I/Gecko   ( 1728): HwcComposer::UpdateFBSurface 0.930781
I/Gecko   ( 1728): HwcComposer::BeforePrepare 1.002812, mPrepared 0
I/Gecko   ( 1728): Calling mwcPrepare: 0.002396
E/qdhwcomposer( 1728): HardwareComposer::lockTime: 0.000000
E/qdhwcomposer( 1728): HardwareComposer::VirtualDisplay: 0.000000
E/qdhwcomposer( 1728): HardwareComposer::ExternalDisplay : 0.000000
E/qdhwcomposer( 1728): hwc_prepare_primary middle: 0.000000
E/qdhwcomposer( 1728): hwc_prepare_only: 0.000000
E/qdhwcomposer( 1728): FBUpdateNonSplit::prepare: 0.000000
E/qdoverlay( 1728): mdp_wraper::validateAndSet: 3.000000, count: 1
E/qdoverlay( 1728): mdPCtrl::ValidateAndSet total: 3.000000
E/qdoverlay( 1728): Overlay::ValidateAndSet 3.000000
E/qdhwcomposer( 1728): FB::prepareAndValidate: middle: 0.000000, end: 3.000000
E/qdhwcomposer( 1728): hwc_prepare_validate: 4.000000
E/qdhwcomposer( 1728): hwc_copy_prepare: 4.000000
E/qdhwcomposer( 1728): HardwareComposer::PrimaryDisplay: 4.000000
E/qdhwcomposer( 1728): HardwareComposer::LoopTime : 4.000000
E/qdhwcomposer( 1728): HardwareComposer::hwc_prepare: 4.000000, num displays: 3
I/Gecko   ( 1728): After mwcPrepare: 4.235833
I/Gecko   ( 1728): HwcComposer::Before 5.284062
I/Gecko   ( 1728): HwcComposer::SetFBRelease 5.826302
I/Gecko   ( 1728): HwcComposer::Render 5.959947

Explanations for where these are instrumented are in comment 3. I wonder if since we start composites on master close to vsync time (see 11.8 from the previous vsync or -4.8ms from the next one), its too close to vsync time that the driver has to lock its frame buffers. Digging.
Driver timings in master:

Silk:

The problem is that every time we composite, the driver has to grab a lock here - b2g-flame-kk/kernel/drivers/video/msm/mdss/mdp3_ctrl.c:1624. (mdp3_overlay_prepare). On silk, grabbing this lock is fast, usually 10-20 microseconds. On master, this lock takes ~4-6 milliseconds. This session lock is locked / unlocks at many places in the driver. I can't explicitly find out if we're locking on vsyncs as the driver just initializes a vsync callback handler, but this lock gets locked everywhere in the driver, for example grabbing an overlay. This lock is also locked whenever we change vsync state, such as enabling / disabling vsync. 

I don't want to keep digging into a hole, but it seems like it's fast to grab these locks after a vsync as the display shouldn't be doing anything. If we try to grab the lock near the vsync timer, we have some contention that doesn't exist right after a vsync.
Looks like we found the issue is located in the driver from comment 8. Closing for now.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: