Closed Bug 1076166 Opened 11 years ago Closed 11 years ago

GLContextEGL::SwapBuffers() blocks on Flame Kit-Kat on Master, but not on Silk Branch

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mchang, Assigned: mchang)

Details

Attachments

(3 files)

Silk Rebased on: Gecko: 9e83c932d26ab77dc0684f4bc1a96a0f4d53539e Gaia: 8f452cbfb9049ffab7adc5c9283d373fbbbedafd On a Flame device, on Kit-Kat, I instrumented how long it takes to swapBuffers on a simple box animation here - http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355 On Master, we block until the next vsync occurs. We block for ~10ms on master. On the Silk Branch, where we composite after a notification from the hardware composer, I see that we block on swapBuffers for ~2-3ms. Investigate why this happens. Sotaro, any ideas? Thanks!
Flags: needinfo?(sotaro.ikeda.g)
I've been investigating and most of the time spent is somewhere in the hwc.cpp hardware composer driver. I can't find the exact replica of what we have in b2g-flame-kk, but it's in b2g-flame/hardware/qcom/display/libhwcomposer/hwc.cpp. The mostly equivalent is here - https://android.googlesource.com/platform/hardware/qcom/display/+/master/msm8960/libhwcomposer/hwc.cpp:224. Most of the time is spent in hwc_prepare_primary on mozilla-central master. For example, if we spent 11 ms in swap buffers, 9ms goes to hwc_prepare_primary. Still have to dig some more.
mchang, I do not at all about silk branch. Can you provide more information? - Do both branchs uses same flame-kk Base image? - How do you measure swapBuffers time? - How do you know "On Master, we block until the next vsync occurs". - Did you already checked b2g-v2.1 and b2g-v2.0? - When you measure the time on master, did you enable vsync callback? - Can you upload actual measure results?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(mchang)
(In reply to Sotaro Ikeda [:sotaro] from comment #2) > mchang, I do not at all about silk branch. Can you provide more information? > > - Do both branchs uses same flame-kk Base image? Yes, both use the flame-kk v.180 base image. > - How do you measure swapBuffers time? I measured it here - http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355. I got the mozilla::TimeStamp before and after the call to SwapBuffers. Most of the time is spent b2g-flame-kk/hardware/qcom/display/libhwcomposer/hwc.cpp::hwc_prepare_primary. I still have to dig more into that function. > - How do you know "On Master, we block until the next vsync occurs". After every vsync callback here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2d.cpp&case=true#234 - I store the vsync timestamp as a mozilla::TimeStamp that is synced with the android vsync timestamp. At CompositorOGL.cpp::SwapBuffers, I compare the end timestamp to the vsync timestamp, and it is always ~17ms intervals. The CompositorOGL.cpp::SwapBuffer call usually takes ~8-10ms on master. On silk's branch, which is based off last Thursday's master branch, usually only blocks for 2-3 ms. > - Did you already checked b2g-v2.1 and b2g-v2.0? No, not yet. I can do that. > - When you measure the time on master, did you enable vsync callback? Yes. The vsync callback is enabled on both silk / master. > - Can you upload actual measure results? A lot of it is manual instrumentation that looks like this: // One frame. SwapBuffer duration is in CompositorOGL. Vsync start is ms from when the vsync started. end is from after swpa buffers. I/Gecko ( 2848): SwapBuffer duration: 9.475313, vsync start 8.526581, end 18.001894 // UpdateFBSurface is measured right after here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#743 I/Gecko ( 2848): HwcComposer::UpdateFBSurface 0.868021 // Measured here - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#751 I/Gecko ( 2848): HwcComposer::BeforePrepare 0.936198, mPrepared 0 I/Gecko ( 2848): Calling mwcPrepare: 0.002657 // This is in QC's driver at b2g-flame/hardware/qcom/display/libhwcomposer/hwc.cpp::hwc_prepare. Each instrumentation here is in the hwc_prepare loop. E/qdhwcomposer( 2848): HardwareComposer::lockTime: 0.000000 E/qdhwcomposer( 2848): HardwareComposer::VirtualDisplay: 0.000000 E/qdhwcomposer( 2848): HardwareComposer::ExternalDisplay : 0.000000 E/qdhwcomposer( 2848): HardwareComposer::PrimaryDisplay: 7.000000 E/qdhwcomposer( 2848): HardwareComposer::LoopTime : 7.000000 E/qdhwcomposer( 2848): HardwareComposer::hwc_prepare: 8.000000, num displays: 3 // Backup to widget/gonk/HwcComposer2D.cpp I/Gecko ( 2848): After mwcPrepare: 8.059688 // Before Commit() - http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#769 I/Gecko ( 2848): HwcComposer::Before 9.056979 I/Gecko ( 2848): HwcComposer::SetFBRelease 9.579427 http://dxr.mozilla.org/mozilla-central/source/widget/gonk/HwcComposer2D.cpp?from=HwcComposer2D.cpp&case=true#772 - At the end of render I/Gecko ( 2848): HwcComposer::Render 9.723229 // End CompositorOGL::SwapFrames() // Next Frame I/Gecko ( 2848): SwapBuffer duration: 9.752396, vsync start 8.248623, end 18.001019 I/Gecko ( 2848): HwcComposer::UpdateFBSurface 0.855105 I/Gecko ( 2848): HwcComposer::BeforePrepare 0.917188, mPrepared 0 I/Gecko ( 2848): Calling mwcPrepare: 0.002865 E/qdhwcomposer( 2848): HardwareComposer::lockTime: 0.000000 E/qdhwcomposer( 2848): HardwareComposer::VirtualDisplay: 0.000000 E/qdhwcomposer( 2848): HardwareComposer::ExternalDisplay : 0.000000 E/qdhwcomposer( 2848): HardwareComposer::PrimaryDisplay: 5.000000 E/qdhwcomposer( 2848): HardwareComposer::LoopTime : 6.000000 E/qdhwcomposer( 2848): HardwareComposer::hwc_prepare: 6.000000, num displays: 3 I/Gecko ( 2848): After mwcPrepare: 6.096823 I/Gecko ( 2848): HwcComposer::Before 7.058594 I/Gecko ( 2848): HwcComposer::SetFBRelease 7.638751 I/Gecko ( 2848): HwcComposer::Render 7.787344 I/Gecko ( 2848): SwapBuffer duration: 7.823021, vsync start 9.970811, end 17.793832
Flags: needinfo?(mchang)
This bug's tile could easy to misunderstand, it seems better to change the name. Comment 0 says GLContextEGL::SwapBuffers() take time. Actual eglSwapBuffers() is called in GonkDisplayJB::UpdateFBSurface(). And from Comment 3, it does not take time. http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libdisplay/GonkDisplayJB.cpp#297
(In reply to Sotaro Ikeda [:sotaro] from comment #4) > This bug's tile could easy to misunderstand, it seems better to change the > name. Comment 0 says GLContextEGL::SwapBuffers() take time. Actual > eglSwapBuffers() is called in GonkDisplayJB::UpdateFBSurface(). And from > Comment 3, it does not take time. > > http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libdisplay/ > GonkDisplayJB.cpp#297 The bug titile looks like eglSwapBuffers() seems taking time.
(In reply to Mason Chang [:mchang] from comment #0) > Silk Rebased on: > Gecko: 9e83c932d26ab77dc0684f4bc1a96a0f4d53539e > Gaia: 8f452cbfb9049ffab7adc5c9283d373fbbbedafd > > On a Flame device, on Kit-Kat, I instrumented how long it takes to > swapBuffers on a simple box animation here - > http://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/ > CompositorOGL.cpp?from=CompositorOGL.cpp&case=true#1355 > > On Master, we block until the next vsync occurs. We block for ~10ms on > master. On the Silk Branch, where we composite after a notification from the > hardware composer, I see that we block on swapBuffers for ~2-3ms. > Investigate why this happens. > > Sotaro, any ideas? Thanks! Isn't it blocked by vsync? eglSwapBuffers() is not blocked by vsync, but HwComposer could be blocked by vsync. As in comment 4, GLContextEGL::SwapBuffers() and eglSwapBuffers() are different. eglSwapBuffers() is only OpenGL. GLContextEGL::SwapBuffers() is (OpenGL + HwComposer).
Summary: glSwapBuffers blocks on Flame Kit-Kat on Master, but not on Silk Branch → GLContextEGL::SwapBuffers() blocks on Flame Kit-Kat on Master, but not on Silk Branch
Almost all the time is spent in a driver call, not the fences on master. This occurs in b2g-flame/hardware/qcom/display/liboverlay/overlayMdp.h:176, which does ioctl(fd, MSMFB_OVERLAY_PREPARE, &list) < 0), which is a call into the driver. On silk branches, where we compose right after a vsync, we have the following timings from CompositorOGL: I/Gecko ( 1237): SwapBuffer duration: 3.368594, vsync start 2.337685, end 5.706279 I/Gecko ( 1237): HwcComposer::UpdateFBSurface 1.547344 I/Gecko ( 1237): HwcComposer::BeforePrepare 1.667344, mPrepared 0 I/Gecko ( 1237): Calling mwcPrepare: 0.005052 E/qdhwcomposer( 1237): HardwareComposer::lockTime: 0.000000 E/qdhwcomposer( 1237): HardwareComposer::VirtualDisplay: 0.000000 E/qdhwcomposer( 1237): HardwareComposer::ExternalDisplay : 0.000000 E/qdhwcomposer( 1237): hwc_prepare_primary middle: 0.000000 E/qdhwcomposer( 1237): hwc_prepare_only: 0.000000 E/qdhwcomposer( 1237): FBUpdateNonSplit::prepare: 0.000000 E/qdoverlay( 1237): mdp_wraper::validateAndSet: 0.000000, count: 1 E/qdoverlay( 1237): mdPCtrl::ValidateAndSet total: 0.000000 E/qdoverlay( 1237): Overlay::ValidateAndSet 0.000000 E/qdhwcomposer( 1237): FB::prepareAndValidate: middle: 0.000000, end: 0.000000 E/qdhwcomposer( 1237): hwc_prepare_validate: 0.000000 E/qdhwcomposer( 1237): hwc_copy_prepare: 0.000000 E/qdhwcomposer( 1237): HardwareComposer::PrimaryDisplay: 0.000000 E/qdhwcomposer( 1237): HardwareComposer::LoopTime : 0.000000 E/qdhwcomposer( 1237): HardwareComposer::hwc_prepare: 0.000000, num displays: 3 I/Gecko ( 1237): After mwcPrepare: 0.511770 I/Gecko ( 1237): HwcComposer::Before 2.238334 I/Gecko ( 1237): HwcComposer::SetFBRelease 2.813854 I/Gecko ( 1237): HwcComposer::Render 2.971823 On Master: I/Gecko ( 1728): SwapBuffer duration: 6.527500, vsync start 11.818349, end 18.345849 I/Gecko ( 1728): HwcComposer::UpdateFBSurface 0.930781 I/Gecko ( 1728): HwcComposer::BeforePrepare 1.002812, mPrepared 0 I/Gecko ( 1728): Calling mwcPrepare: 0.002396 E/qdhwcomposer( 1728): HardwareComposer::lockTime: 0.000000 E/qdhwcomposer( 1728): HardwareComposer::VirtualDisplay: 0.000000 E/qdhwcomposer( 1728): HardwareComposer::ExternalDisplay : 0.000000 E/qdhwcomposer( 1728): hwc_prepare_primary middle: 0.000000 E/qdhwcomposer( 1728): hwc_prepare_only: 0.000000 E/qdhwcomposer( 1728): FBUpdateNonSplit::prepare: 0.000000 E/qdoverlay( 1728): mdp_wraper::validateAndSet: 3.000000, count: 1 E/qdoverlay( 1728): mdPCtrl::ValidateAndSet total: 3.000000 E/qdoverlay( 1728): Overlay::ValidateAndSet 3.000000 E/qdhwcomposer( 1728): FB::prepareAndValidate: middle: 0.000000, end: 3.000000 E/qdhwcomposer( 1728): hwc_prepare_validate: 4.000000 E/qdhwcomposer( 1728): hwc_copy_prepare: 4.000000 E/qdhwcomposer( 1728): HardwareComposer::PrimaryDisplay: 4.000000 E/qdhwcomposer( 1728): HardwareComposer::LoopTime : 4.000000 E/qdhwcomposer( 1728): HardwareComposer::hwc_prepare: 4.000000, num displays: 3 I/Gecko ( 1728): After mwcPrepare: 4.235833 I/Gecko ( 1728): HwcComposer::Before 5.284062 I/Gecko ( 1728): HwcComposer::SetFBRelease 5.826302 I/Gecko ( 1728): HwcComposer::Render 5.959947 Explanations for where these are instrumented are in comment 3. I wonder if since we start composites on master close to vsync time (see 11.8 from the previous vsync or -4.8ms from the next one), its too close to vsync time that the driver has to lock its frame buffers. Digging.
Driver timings in master: Silk: The problem is that every time we composite, the driver has to grab a lock here - b2g-flame-kk/kernel/drivers/video/msm/mdss/mdp3_ctrl.c:1624. (mdp3_overlay_prepare). On silk, grabbing this lock is fast, usually 10-20 microseconds. On master, this lock takes ~4-6 milliseconds. This session lock is locked / unlocks at many places in the driver. I can't explicitly find out if we're locking on vsyncs as the driver just initializes a vsync callback handler, but this lock gets locked everywhere in the driver, for example grabbing an overlay. This lock is also locked whenever we change vsync state, such as enabling / disabling vsync. I don't want to keep digging into a hole, but it seems like it's fast to grab these locks after a vsync as the display shouldn't be doing anything. If we try to grab the lock near the vsync timer, we have some contention that doesn't exist right after a vsync.
Looks like we found the issue is located in the driver from comment 8. Closing for now.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: