Effective Android Profiling with Systrace

Here at Jana, we’re building an Android app with the developing world in mind. As such, we strive to make sure that our users have a great experience, even on devices with more limited processing power. So when we start seeing metrics suggesting that our users are having to wait tens of seconds to start our app, we know we need to get to the bottom of it. But how do you effectively trace call times across an app pushing 150k SLOC?

Any profiling solution would ideally meet two goals:

  • Require minimal, if any, modification to existing code
  • Support varying levels of event granulrity
  • Be fast enough that it can permanently enabled, or be simple to enable when needed and to completely remove when not desired.

If you’re a Python developer, at this point you’re probably feeling pretty smug about decorators – slap an @profile on the usual suspects and watch the timings roll in. In an ideal world, we could take the notion of python decorators and port it to Java – decorate suspect methods or classes with an annotation et voilá! Sadly, Java annotations, while accessible at runtime, don’t allow for the sort of function wrapping decorators provide, and can’t even be used for compile-time code emission, as one might do in a C-style language to automatically insert timing calls before and after functions of interest. While one could theoretically gain access to the CPP’s macro powers by running all their Java code through it, I’m sure not even Rube Goldberg would endorse such a solution. But where there’s reflection there’s a way – if we intercept class instantiation, generate a fake class and then proxy method calls at runtime, the proxy can handle our performance logging then call through to the real method! The advantage here is that instead of altering our codebase to instrument every method, we only have to change the object creation, and still get performance data for every method call on that instance.

What’s a Proxy Anyway?

Before we jump directly into the nitty gritty, let’s take a second to go over about what I’m talking about here when I say ‘proxy’. Invoking a method on an instance of a Java class can be thought of as taking a bundle of arguments and a method name and handing it off (dispatching it) to the instance. The instance then reads the name and the arguments, works out the correct code to execute, then hands a result back to the caller. What we will be doing is introducing a class that masquerades as a target, and intercepts the methods dispatched to it. It can then execute it’s own code, including calling through to the original target, and choose what to return to the original caller. In this case, we won’t be modifying the return values and side effects, but we will be adding some extra logic to record the start and stop time of the original method call.

Bunches of OATs

Java has a convenient set of tools for dealing with proxying method calls as part of the lang.reflect.Proxy library. However, this is only useful if all of the objects you are trying to track have defined Java interfaces that they implement. In order to track any method call on any object, we needed to be able to pull out methods even if they aren’t defined as part of an interface. Luckily, some smart people have already put together dexmaker, which allows us to generate proxy classes with the same methods as our existing code, compile them down to Android DEX or OAT packages and then classload them right back into the runtime. So now, for each of our helper classes we can intercept each method call and log a trace to Android’s system trace buffer. But, there’s a downside here – recompiling half your app every time you load it doesn’t really help your load time, and developers don’t like sitting around. So how do we get this to only run when we want it to?

Enter build flavours

Gradle (for all its flaws) has at least a handful of redeeming features. If you’ve worked on a larger codebase, you may be familiar with one of my favourites: build flavours. Instead of having nasty #ifdefs all through your source code, flavours let you simply have two (or more!) source trees, so that you can keep your debugging or profiling logic when you want it and NOP it out when you don’t. Here’s an example pair of build.gradle files showing how you’d connect up a module you don’t really want compiled in all the time:

That’s all pretty neat, but where’s the profiling?

Now that we have the build framework in place, it’s time to actually start working with systrace.
Systrace pulls it’s data from the system’s ftrace log, which is a kernel level low-overhead event log. In addition to all of the system metrics that get written to this log, such as CPU/GPU usage, view rendering, resource loading and all sorts of other events, we can record our own custom events using the API exposed by the android.os.Trace package. All this information can then be read out using the systrace tool into a rather fetching interactive webpage, which makes it very easy to navigate around your app’s execution timeline and examine each method call and it’s children, as well as any other trace data you may have gathered. Here’s a snippet showing the wrapper class we created for profiling our helper classes, though you can use it for any sort of class. To use it, all you need to do is take your existing class instantiations:

And wrap them:

Of course, if you’re doing this a lot for objects with the same constructor, it
makes sense to just make a convenience method:

So how does the wrapper work? There’s not all that much to it – given the target instance, we pull out it’s class, recompile it if it doesn’t exist in our dex cache1 and then return a new instance of the proxy with the InvocationHandler set to the associated ProfilingWrapper instance. When a method is called on the proxy, it passes the details to the invoke method, which emits the start/stop trace markers and handles calling through to the real method.

So this gets us tracing for a good portion of what happens on our main thread. But, as a responsible Android developer, one should never be doing too much work on the main thread. So how do we keep track of things happening in the background? Enter TraceRunnable. Like the ProfilingWrapper, all this is a simple interface that takes an existing runnable, and wraps it with some start/stop trace entries if tracing is possible.

Pretty simple. And adding it to an existing code base is simple enough that you can just about do it with sed2:

Gathering and interpreting the data

OK, so your code is instrumented to the hilt. Now how do you gather that data and make it useful? Hardcore mode would be to run adb atrace and parse the ftrace buffer output with your mind. For us mere mortals, the Android SDK ships with a systrace utility that dumps the trace then slaps it into an interactive viewer. All you need to do is run the command below (edited to your taste), then open up the generate html in your browser of choice.

Once you’ve generated a trace of the user interaction that seems slow, open it up and take a look.

There’s a lot to take in, so let’s go bit by bit.

CPU state.

What’s hogging all your cores? Useful for checking that your app is actually the bottleneck, and not just contending for cycles with another package.

UI thread timeline

The most blocking events – If you see a long block of process activity here, that almost certainly means jank for the end user.

Background threads

Everything run on a new thread / process will be listed here, identified by the PID it ran under (not super useful) but with all of the trace segments emitted as part of that thread (very useful).

As we can see here, the onResume method of our main activity is directly dependent on calls to our API! This should definitely be offloaded to a new thread so that the UI isn’t blocked on network traffic.

Slice children

Just because a slice is large doesn’t mean it’s the problem itself – is anything lower in the call graph taking longer than it should?

Slice details

Click a slice to see the detailed breakdown of real and CPU time for the slice itself (Self time) and for the slice plus it’s children. 2.5ms might seem pretty good for a network call, but this is between a phone and a computer on the same network. For phones running at EDGE speed across the atlantic, such a call would take orders of magnitude longer!

Pretty nifty, right?

I like to think so. If you feel the same way and like getting you hands dirty, get in touch! We’re always looking for people to muck in and help us find new ways to #measureit and improve the experience of our users.

1. By default, the DexMaker implementation will only maintain a cache of the proxy class for the application lifetime. To obtain better startup times, as well as less effect on the measurements from profiler overhead, it would be wise to modify the dex cache to only recompile if the source bytecode has changed. This is left as an exercise to the reader 🙂
2. If you can actually do this with sed and sed alone, I want to see it.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Pingbacks & Trackbacks

  1. Learning to let go | Jana Technology