How to Debug an ANR with traces file
ANR’s are some of the most less understood phenomenon of Android development. This article let’s you know the ways to debug an ANR when you have no options left.
But first, What is an ANR and when does it occur?
Answer : It’s all about the UI/Main thread
The Android Developer website says that an ANR occurs,
- While your activity is in the foreground, your app has not responded to an input event or
BroadcastReceiver
(such as key press or screen touch events) within 5 seconds - While you do not have an activity in the foreground, your
BroadcastReceiver
hasn't finished executing within a considerable amount of time
Once an ANR occurs, the one and only question that should ring in your mind is
“What is blocking the UI thread for more than 6 seconds?”
Sometimes, the answer is very straightforward and you may know it very well within a few minutes of debugging. But usually, ANR’s that occur in the wild are not that straight forward. It’s not easy to reproduce.
9 out of 10 times, it would be hard to reproduce ANR’s and it has to be hypothesized from the logs that we get from Play Console.
Here is a structured way of debugging ANR when you stare at logs from Play Console. Get hold of the anr traces file either from the device or from Play Console
Step 1 : Look at the status of the Main Thread
Step 2 : Look for clues that may be blocking the Main Thread. There may be other threads too. Note down the thread id
Step 3 : Search for thread with tid found in step 2
Step 4 : Inspect the status of the thread in[Step3]. Continue finding clues of what may be blocking the thread by tracking the ‘tid’ . Once you read a dead end, start hypothesizing
Step 5: Write a hypothesis and try validating the hypothesis by inspecting code
Let’s take some examples and walk through them
Get hold of the traces file. Formally called traces.txt, this is present in various names in various phones and OS’s. But usually can be found in the data/traces folder of the phone or on the Google Play Console
Example 1 : Let’s take a very simple case of ANR
Let’s assume the ANR occurred and you get the traces log from Google Play Console.
Let’s see what the main thread is up to
Thread of interest : Main Thread
Status of thread : Sleeping
File of interest : MainActivity
Any other things of interest ? : There was a click event denoted by onClick
Let’s read this together. “The Main Thread was sleeping when a click event occurred”. Let’s ask ourselves. Why is the main thread sleeping? The main thread is supposed to be active and be responding to events. Not sleeping😴. If the main thread is sleeping and not responding to events, after 6 seconds the system throws an ANR.
Let’s dig deeper. This time through code. Let’s see what’s the code in the Main Activity and see if we can get some clues over there.
Ah Gotcha ! Look at this. This click listener makes the Thread to sleep for 30 seconds. And the Thread happens to the Main Thread 😒. So when a click event happens after the thread was put to sleep, the ANR would have occurred.
And this is the cause of the ANR. This was a very simple case. Let’s see something a bit more complicated.
Example 2 :
Step 1 : What’s up with the Main Thread
Thread of interest : Main Thread
Status of thread : Blocked
File of interest : MainActivity
Any other clues of interest ?
→ There is some lock going on
→ Main thread is trying to lock a resource and that resource is being held by a thread with id 19
→ The lock happens at a method called “setNumber” in a class called “com.anr.sample.Datadump”
Let’s chase this thread with id 19. You can search the logs with the term “tid=19” and it leads to this thread shown below.
What’s happening here?
The thread is named “SampleWriterThread”. It’s sleeping. (One more reason to name every thread that you create. It would be easily traceable)
Let’s read it all together. With all the information above, we can say that, “Main Thread is trying to access a resource called setNumber in com.anr.sample.DataDump . But this resource is being held by another thread called SampleWriterThread. And SampleWriterThread is apparently sleeping. This is making the MainThread to starve. Since the MainThread is starved, it could not fulfill it’s duties hence leading to the ANR”
Now, let’s go to the code and see what’s happening. The first place to inspect is obviously the setNumber method.
Yeah ! Here you go. This method is making the accessor thread to sleep. And the method is a Synchronized method. So parallel access is not allowed. Hence MainThread would have been the second thread to access this function.
Let’s look at the code for the MainActivity.
Everything looks cool.
Let’s look at App.java
Here we can see “SampleWriterThread” being created during App Launch. This thread has been put to sleep inside setNumber(). Hence the contention for resources. A good way to solve this is to make this particular data manipulation Asynchronous. Perhaps introduce a queue that can queue up all requests and broadcast when the request is served. This way thread blocks could be avoided.
I guess this method throws some light on how to debug ANR’s. Of course there are times when there would be no clues and all that you search ends up in vain. In those cases, try to think about what all things can go wrong and work backwards. It’s hard, but you can get there.