April 15, 2013

Best articles about Blink rendering engine according to me

It is now over a week since announcement of Blink, a rendering engine for the Chromium project.

I hope it could be useful to provide links to the best articles about it, which have good, technical contents.

Thoughts on Blink from HTML5 Test is a good summary about history of Chrome, WebKit, and puts this recent announcement in context. For even more context (nothing about Blink) you can read Paul Irish's excellent WebKit for Developers post.

Peter-Paul Koch (probably best known for quirksmode.org) has good articles about Blink: Blink and Blinkbait.

I also found it interesting to ready Krzysztof Kowalczyk's Thoughts on Blink.

Highly recommended Google+ posts by Chromium developers:
If you're interested in the technical details or want to participate in the discussions, why not follow blink-dev, the mailing list of the project?

February 9, 2013

Bumpy upgrades: udev-171 -> udev-197, iptables-1.4.13 -> iptables-1.4.16.3

I guess many people may hit similar problems, so here is my experience of the upgrades. Generally it was pretty smooth, but required paying attention to the details and some documentation/forums lookups.

udev-171 -> udev-197 upgrade

  1. Make sure you have CONFIG_DEVTMPFS=y in kernel .config, otherwise the system becomes unbootable for sure (I think the error message during boot mentions that config option, which is good).
  2. The ebuild also asks for CONFIG_BLK_DEV_BSG=y, not sure if that's strictly needed but I'm including it here for completeness.
  3. Things work fine for me without DEVTMPFS_MOUNT. I haven't tried with it enabled, I guess it's optional.
  4. I do not have a split /usr. YMMV then if you do.
  5. Make sure to run "rc-update del udev-postmount".
  6. Expect network device names to change (I guess this is a non-issue for systems with a single network card). This can really mess up things in quite surprising ways. It seems /etc/udev/rules.d/70-persistent-net.rules no longer works (bug #453494). Note that the "new way" to do the same thing (http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames) is disabled by default in Gentoo (see /etc/udev/rules.d/80-net-name-slot.rules). For now I've adjusted my firewall and other configs, but I think I'll need to figure out the new persistent net naming system.

iptables-1.4.13 -> iptables-1.4.16.3

* Loading iptables state and starting firewall ...
WARNING: The state match is obsolete. Use conntrack instead.
iptables-restore v1.4.16.3: state: option "--state" must be specified

It can be really non-obvious what to do with this one. Change your rules from e.g. "-m state --state RELATED" to "-m conntrack --ctstate RELATED". See http://forums.gentoo.org/viewtopic-t-940302.html for more info.
  Also note that iptables-restore doesn't really provide good error messages, e.g. "iptables-restore: line 48 failed". I didn't find a way to make it say what exactly was wrong (the line in question was just a COMMIT line, it didn't actually identify the real offending line). These mysterious errors are usually caused by missing kernel support for some firewall features/targets.

two upgrades together

Actually what adds to the confusion is having these two upgrades done simultaneously. This makes it harder to identify which upgrade is responsible for which breakage. For an even smoother ride, I'd recommend upgrading iptables first, making sure the updated rules work, and then proceed with udev.

January 28, 2013

State of Chromium Open Source packages

Let me present an informal an unofficial state of Chromium Open Source packages as I see it. Note a possible bias: I'm a Chromium developer (and this post represents my views, not the projects'), and a Gentoo Linux developer (and Chromium package maintenance lead - this is a team effort, and the entire team deserves credit, especially for keeping stable and beta ebuilds up to date).

  1. Gentoo Linux - ships stable, beta and dev channels. Security updates are promptly pushed to stable. NaCl (NativeClient) is enabled, although pNaCl (Portable NaCl) is disabled. Up to 23 use_system_... gyp switches are enabled (depending on USE flags).
  2. Arch Linux - ships stable channel, promptly reacts to security updates. NaCl is enabled, following Gentoo closely - I consider that good, and I'm glad people find that code useful. :) 5 use_system_... gyp switches are enabled. A notable thing is that the PKGBUILD is one of the shortest and simplest among Chromium packages - this seems to follow from The Arch Way. There is also chromium-dev on AUR - it is more heavily based on the Gentoo package, and tracks the upstream dev channel. Uses 19 use_system_... gyp switches.
  3. FreeBSD / OpenBSD - ship stable channel, and are doing pretty well, especially when taking amount of BSD-specific patching into account. NaCl is disabled.
  4. ALT Linux - ships stable channel. NaCl seems to be disabled by default, I'm not sure what's actually shipped in compiled package. Uses 11 use_system_... gyp switches.
  5. Debian - ancient 6.x version in Squeeze, 22.x in sid at the time of this writing. This is two major milestones behind, and is missing security updates. Not recommended at this moment. :( If you are on Debian, my advice is to use Google Chrome, since official debs should work, and monitor state of the open source Chromium package. You can always return to it when it gets updated.
  6. Fedora - not in official repositories, but Tom "spot" Callaway has an unofficial repo. Note: currently the version in that repo is 23.x, one major version behind on stable. Tom wrote an article in 2009 called Chromium: Why it isn't in Fedora yet as a proper package, so there is definitely an interest to get it packaged for Fedora, which I appreciate. Many of the issues he wrote about are now fixed, and I hope to work on getting the remaining ones fixed. Please stay tuned!
This is not intended to be an exhaustive list. I'm aware of openSUSE packages, there seems to be something happening for Ubuntu, and I've heard of Slackware, Pardus, PCLinuxOS and CentOS packaging. I do not follow these closely enough though to provide a meaningful "review".

Some conclusions: different distros package Chromium differently. Pay attention to the packaging lag: with about 6 weeks upstream release cycle and each major update being a security one, this matters. Support for NativeClient is another point. There are extension and Web Store apps that use it, and when more and more sites start to use it, this will become increasingly important. Then it is interesting why on some distros some bundled libraries are used even though upstream provides an option to use a system library that is known to work on other distros.

Finally, I like how different maintainers look at each other's packages, and how patches and bugs are frequently being sent upstream.

January 4, 2013

Signal handler safety, re-entering malloc

This is a story from real-world development. From signal(7):


   Async-signal-safe functions
       A  signal  handler  function must be very careful,
       since processing elsewhere may be interrupted at some
       arbitrary point in the execution of the program.
       POSIX has the concept of "safe function".  If a signal
       interrupts the execution of an  unsafe  function,
       and handler calls an unsafe function, then the behavior
       of the program is undefined.

After that a list of safe functions follows, and one notable things is that malloc and free are async-signal-unsafe!

I hit this issue while enabling tcmalloc's debugallocation for Chromium Debug builds. We have a StackDumpSignalHandler for tests, which prints a stack trace on various crashing signals for easier debugging. It's very useful, and worked fine for a pretty long while (which means that "but it works!" is not a valid argument for doing unsafe things).

Now when I enabled debugallocation, I noticed hangs triggered by the stack trace display. In one example, this stack trace:

@0  0x00000000019c6c85 in tcmalloc::Abort () at third_party/tcmalloc/chromium/src/base/abort.cc:15
@1  0x00000000019b39c1 in LogPrintf (severity=-4,
    pat=0x32aeb18 "memory allocation/deallocation mismatch at %p: allocated with %s being deallocated with %s", ap=0x7fff52c379e8)
    at third_party/tcmalloc/chromium/src/base/logging.h:210
@2  0x00000000019b3a8b in RAW_LOG (lvl=-4,
    pat=0x32aeb18 "memory allocation/deallocation mismatch at %p: allocated with %s being deallocated with %s")
    at third_party/tcmalloc/chromium/src/base/logging.h:230
@3  0x00000000019c3fb1 in MallocBlock::CheckLocked (this=0x7fd18f143400, type=-21308287)
    at ./third_party/tcmalloc/chromium/src/debugallocation.cc:461
@4  0x00000000019c3c42 in MallocBlock::CheckAndClear (this=0x7fd18f143400, type=-21308287)
    at ./third_party/tcmalloc/chromium/src/debugallocation.cc:401
@5  0x00000000019c436a in MallocBlock::Deallocate (this=0x7fd18f143400, type=-21308287)
    at ./third_party/tcmalloc/chromium/src/debugallocation.cc:557
@6  0x00000000019c1929 in DebugDeallocate (ptr=0x7fd18f143420, type=-21308287)
    at ./third_party/tcmalloc/chromium/src/debugallocation.cc:998
@7  0x00000000028d1482 in tc_delete (p=0x7fd18f143420) at ./third_party/tcmalloc/chromium/src/debugallocation.cc:1232
@8  0x000000000097dc04 in cc::ResourceProvider::deleteResourceInternal (this=0x7fd191827da0, it=...) at cc/resource_provider.cc:242
@9  0x000000000097daaf in cc::ResourceProvider::deleteResource (this=0x7fd191827da0, id=1) at cc/resource_provider.cc:230
@10 0x00000000006f9824 in (anonymous namespace)::ResourceProviderTest_Basic_Test::TestBody (this=0x7fd18dc5abf0)
    at cc/resource_provider_unittest.cc:328
@11 0x00000000008ec801 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x7fd18dc5abf0,
    method=&virtual testing::Test::TestBody(), location=0x29463ab "the test body") at testing/gtest/src/gtest.cc:2071
@12 0x00000000008e9665 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x7fd18dc5abf0,
    method=&virtual testing::Test::TestBody(), location=0x29463ab "the test body") at testing/gtest/src/gtest.cc:2123
@13 0x00000000008dee0d in testing::Test::Run (this=0x7fd18dc5abf0) at testing/gtest/src/gtest.cc:2143
@14 0x00000000008df3ea in testing::TestInfo::Run (this=0x7fd191823020) at testing/gtest/src/gtest.cc:2319
@15 0x00000000008df8dc in testing::TestCase::Run (this=0x7fd19181f0d0) at testing/gtest/src/gtest.cc:2426
@16 0x00000000008e3eea in testing::internal::UnitTestImpl::RunAllTests (this=0x7fd19829dd60) at testing/gtest/src/gtest.cc:4249

generates SIGSEGV (tcmalloc::Abort). This is just debugallocation having stricter checks about usage of dynamically allocated memory. Now the StackDumpSignalHandler kicks in, and internally calls malloc. But we're already inside malloc code as you can see on the above stack trace (see frame @7, bold font), and re-entering it tries to take locks that are already held, resulting in a hang.

The fix required several changes:
  • no dynamic memory, and that includes std::string and std::vector, which use it internally
  • no buffered stdio or iostreams, they are not async-signal-safe (that includes fflush)
  • custom code for number-to-string conversion that doesn't need dynamically allocated memory (snprintf is not on the list of safe functions as of POSIX.1-2008; it seems to work on a glibc-2.15-based system, but as said before this is not a good assumption to make); in this code I've named it itoa_r, and it supports both base-10 and base-16 conversions, and also negative numbers for base-10
  • warming up backtrace(3): now this is really tricky, and backtrace(3) itself is not whitelisted for being safe; in fact, on the very first call it does some memory allocations; for now I've just added a call to backtrace() from a context that is safe and happens before the signal handler may be executed; implementing backtrace(3) in a known-safe way would be another fun thing to do
Note that for the above, I've also added a unit test that triggers the deadlock scenario. This will hopefully catch cases where calling backtrace(3) leads to trouble.

For more info, feel free to read the articles below:

November 30, 2012

Google Chrome / Chromium : Failed to move to new PID namespace: Cannot allocate memory

If you're seeing a message like "Failed to move to new PID namespace: Cannot allocate memory" when running Chrome, this is actually a problem with the Linux kernel.

For more context, see http://code.google.com/p/chromium/issues/detail?id=110756 . In case you wonder what the fix is, the patch is available at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=976a702ac9eeacea09e588456ab165dc06f9ee83, and it should be in Linux-3.7-rc6.

September 28, 2012

Debugging SELinux file context mismatches

I originally posted the question on gentoo-hardened ML, but Sven Vermeulen advised me to file a bug, so there it is: bug #436474.

The problem I hit is that my ~/.config/chromium/ directory should have unconfined_u:object_r:chromium_xdg_config_t context, but it has unconfined_u:object_r:xdg_config_home_t instead.

I could manually force the "right" context, but it turned out even removing the directory in question and allowing the browser to re-create it still results in wrong context. Looks like something deeper is broken (maybe just on my system), and fixing the root cause is always better. After all, other people may hit this problem too.

Here is what error messages appear on chromium launch:


$ chromium
[2557:2557:1727940797:ERROR:process_singleton_linux.cc(263)] Failed to
create /home/ph/.config/chromium/SingletonLock: Permission denied
[2557:2557:1727941544:ERROR:chrome_browser_main.cc(1552)] Failed to
create a ProcessSingleton for your profile directory. This means that
running multiple instances would start multiple browser processes rather
than opening a new window in the existing process. Aborting now to avoid
profile corruption.

And SELinux messages:

# audit2allow -d
#============= chromium_t ==============
allow chromium_t xdg_config_home_t:file create;
allow chromium_t xdg_config_home_t:lnk_file { read create };

[  107.872466] type=1400 audit(1348505952.982:67): avc:  denied  { read
} for  pid=2166 comm="chrome" name="SingletonLock" dev="sda1" ino=522327
scontext=unconfined_u:unconfined_r:chromium_t
tcontext=unconfined_u:object_r:xdg_config_home_t tclass=lnk_file
[  107.873916] type=1400 audit(1348505952.983:68): avc:  denied  {
create } for  pid=2178 comm="Chrome_FileThre"
name=".org.chromium.Chromium.ZO3dGF"
scontext=unconfined_u:unconfined_r:chromium_t
tcontext=unconfined_u:object_r:xdg_config_home_t tclass=file

If you have any ideas how to further debug it, or how to solve it, please share (e.g. comment on the bug or send me an e-mail). Thanks!

September 20, 2012

Stabilization hiccup with dev-perl/net-server-2.6.0

What happened?

Sep 13th I stabilized net-analyzer/munin-2.0.5-r1 (security bug #412881). I use automated repoman checks and USE="-ipv6", and everything was fine at the time I committed the stabilization (also, see no mention of net-server in that security bug).

Sep 14th Seraphim Mellos filed bug #434978 about munin pulling in ~arch net-server.

Sep 16th x86@ team has been re-added to security bug #412881. Meanwhile Mr_Bones_ pinged me on irc. Also, Diego Elio Pettenò (flameeyes) filed bug #435242 against repoman not catching the dependency problem.

Sep 17th I stabilized dev-perl/net-server-2.6.0 on x86, fixing the immediate problem.

Sep 18th the repoman fix has been released in portage-2.1.11.18 and 2.2.0_alpha129.

Now the only remaining thing to do is pushing the portage/repoman fix to stable. I especially like how quickly the fix for root cause (repoman check) has been produced and released.