matrix源码分析(一):TraceCanary

微信的APM Matrix开源有段时间了,我在看了源码后发现,其作为国民第一应用出品的APM,确实是精益求精,业内领先的。TraceCanary 是其中的一个工具,主要用来监控方法耗时,启动耗时, ANR、掉帧情况等。其分两部分,一部分是以插件的形式在编译期在每个方法执行前后插桩, 用于统计方法耗时;另一部分是运行时的sdk,真正的耗时、卡顿、ANR监控逻辑所在。

插桩插件

插桩主要是为了统计方法耗时。gradle插件的代码一般都有个plugin入口,很好找:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
* Created by zhangshaowen on 17/6/16.
*/
class MatrixPlugin implements Plugin<Project> {
private static final String TAG = "Matrix.MatrixPlugin"

@Override
void apply(Project project) {
project.extensions.create("matrix", MatrixExtension)
project.matrix.extensions.create("trace", MatrixTraceExtension)
project.matrix.extensions.create("removeUnusedResources", MatrixDelUnusedResConfiguration)
if (!project.plugins.hasPlugin('com.android.application')) {
throw new GradleException('Matrix Plugin, Android Application plugin required')
}

project.afterEvaluate {
def android = project.extensions.android
def configuration = project.matrix
android.applicationVariants.all { variant ->

if (configuration.trace.enable) {
com.tencent.matrix.trace.transform.MatrixTraceTransform.inject(project, configuration.trace, variant.getVariantData().getScope())
}

if (configuration.removeUnusedResources.enable) {
if (Util.isNullOrNil(configuration.removeUnusedResources.variant) || variant.name.equalsIgnoreCase(configuration.removeUnusedResources.variant)) {
Log.i(TAG, "removeUnusedResources %s", configuration.removeUnusedResources)
RemoveUnusedResourcesTask removeUnusedResourcesTask = project.tasks.create("remove" + variant.name.capitalize() + "UnusedResources", RemoveUnusedResourcesTask)
removeUnusedResourcesTask.inputs.property(RemoveUnusedResourcesTask.BUILD_VARIANT, variant.name)
project.tasks.add(removeUnusedResourcesTask)
removeUnusedResourcesTask.dependsOn variant.packageApplication
variant.assemble.dependsOn removeUnusedResourcesTask
}
}

}
}
}
}

可以看到走进了MatrixTraceTransform的inject方法。这里还有个bonus,RemoveUnusedResourcesTask,顾名思义,删除无用资源的task,会在assemble之前触发,内部具体实现后续分析APK Checker的时候再细说。接着看MatrixTraceTransform的inject方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public static void inject(Project project, MatrixTraceExtension extension, VariantScope variantScope) {

.......

try {
String[] hardTask = getTransformTaskName(extension.getCustomDexTransformName(), variant.getName());
for (Task task : project.getTasks()) {
for (String str : hardTask) {
if (task.getName().equalsIgnoreCase(str) && task instanceof TransformTask) {
TransformTask transformTask = (TransformTask) task;
Log.i(TAG, "successfully inject task:" + transformTask.getName());
Field field = TransformTask.class.getDeclaredField("transform");
field.setAccessible(true);
field.set(task, new MatrixTraceTransform(config, transformTask.getTransform()));
break;
}
}
}
} catch (Exception e) {
Log.e(TAG, e.toString());
}

}

private static String[] getTransformTaskName(String customDexTransformName, String buildTypeSuffix) {
if (!Util.isNullOrNil(customDexTransformName)) {
return new String[]{customDexTransformName + "For" + buildTypeSuffix};
} else {
String[] names = new String[]{
"transformClassesWithDexBuilderFor" + buildTypeSuffix,
"transformClassesWithDexFor" + buildTypeSuffix,
};
return names;
}
}

可以看到就是通过反射加代理替换transformClassesWithDexForXXX或者新版本gradle的transformClassesWithDexBuilderForXXX task里的transform,来实现在transformClassesWithDexForXXX/transformClassesWithDexBuilderForXXX执行之前执行matrix trace的插桩逻辑。为什么要选择这个时候呢?因为这是打dex的task,在这之前proguard已经执行完了,避免因为插桩代码导致本可以内联的方法无法内联导致dex方法数增多。接着来看transform里的逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
private void doTransform(TransformInvocation transformInvocation) throws ExecutionException, InterruptedException {
final boolean isIncremental = transformInvocation.isIncremental() && this.isIncremental();

/**
* step 1
*/
long start = System.currentTimeMillis();

List<Future> futures = new LinkedList<>();

final MappingCollector mappingCollector = new MappingCollector();
final AtomicInteger methodId = new AtomicInteger(0);
final ConcurrentHashMap<String, TraceMethod> collectedMethodMap = new ConcurrentHashMap<>();

futures.add(executor.submit(new ParseMappingTask(mappingCollector, collectedMethodMap, methodId)));

Map<File, File> dirInputOutMap = new ConcurrentHashMap<>();
Map<File, File> jarInputOutMap = new ConcurrentHashMap<>();
Collection<TransformInput> inputs = transformInvocation.getInputs();

for (TransformInput input : inputs) {

for (DirectoryInput directoryInput : input.getDirectoryInputs()) {
futures.add(executor.submit(new CollectDirectoryInputTask(dirInputOutMap, directoryInput, isIncremental)));
}

for (JarInput inputJar : input.getJarInputs()) {
futures.add(executor.submit(new CollectJarInputTask(inputJar, isIncremental, jarInputOutMap, dirInputOutMap)));
}
}

for (Future future : futures) {
future.get();
}
futures.clear();

Log.i(TAG, "[doTransform] Step(1)[Parse]... cost:%sms", System.currentTimeMillis() - start);


/**
* step 2
*/
start = System.currentTimeMillis();
MethodCollector methodCollector = new MethodCollector(executor, mappingCollector, methodId, config, collectedMethodMap);
methodCollector.collect(dirInputOutMap.keySet(), jarInputOutMap.keySet());
Log.i(TAG, "[doTransform] Step(2)[Collection]... cost:%sms", System.currentTimeMillis() - start);

/**
* step 3
*/
start = System.currentTimeMillis();
MethodTracer methodTracer = new MethodTracer(executor, mappingCollector, config, methodCollector.getCollectedMethodMap(), methodCollector.getCollectedClassExtendMap());
methodTracer.trace(dirInputOutMap, jarInputOutMap);
Log.i(TAG, "[doTransform] Step(3)[Trace]... cost:%sms", System.currentTimeMillis() - start);

}

三步:先收集dir/jar输入;扫描输入中的method 存在列表里;执行插桩。比较常规的插桩套路,插桩的代码在TraceMethodAdapter, 就不贴出来了。主要做了:

  • 方法执行前插入AppMethodBeat.i(id)
  • 执行后插入AppMethodBeat.o(id)
  • 对于activity的子类:
    • 如果没有重写onWindowFocusChanged方法,则插入重写的onWindowFocusChanged方法,方法内调super后再调AppMethodBeat.at(Activity activity, boolean isFocus)方法打点
    • 如果有重写onWindowFocusChanged方法,则在onWindowFocusChanged方法后调AppMethodBeat.at(Activity activity, boolean isFocus)方法打点

可见AppMethodBeat是方法耗时打点的入口。

runtime sdk

Matrix集成的官方示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
public class MatrixApplication extends Application {
private static final String TAG = "Matrix.Application";

private static Context sContext;

private static SQLiteLintConfig initSQLiteLintConfig() {
try {
/**
* HOOK模式下,SQLiteLint会自己去获取所有已执行的sql语句及其耗时(by hooking sqlite3_profile)
* @see 而另一个模式:SQLiteLint.SqlExecutionCallbackMode.CUSTOM_NOTIFY , 则需要调用 {@link SQLiteLint#notifySqlExecution(String, String, int)}来通知
* SQLiteLint 需要分析的、已执行的sql语句及其耗时
* @see TestSQLiteLintActivity#doTest()
*/
return new SQLiteLintConfig(SQLiteLint.SqlExecutionCallbackMode.HOOK);
} catch (Throwable t) {
return new SQLiteLintConfig(SQLiteLint.SqlExecutionCallbackMode.HOOK);
}
}

@Override
public void onCreate() {
super.onCreate();
DynamicConfigImplDemo dynamicConfig = new DynamicConfigImplDemo();
boolean matrixEnable = dynamicConfig.isMatrixEnable();
boolean fpsEnable = dynamicConfig.isFPSEnable();
boolean traceEnable = dynamicConfig.isTraceEnable();

sContext = this;
MatrixLog.i(TAG, "MatrixApplication.onCreate");

Matrix.Builder builder = new Matrix.Builder(this);
builder.patchListener(new TestPluginListener(this));

//trace
TraceConfig traceConfig = new TraceConfig.Builder()
.dynamicConfig(dynamicConfig)
.enableFPS(fpsEnable)
.enableEvilMethodTrace(traceEnable)
.enableAnrTrace(traceEnable)
.enableStartup(traceEnable)
.splashActivities("sample.tencent.matrix.SplashActivity;")
.isDebug(true)
.isDevEnv(false)
.build();

TracePlugin tracePlugin = (new TracePlugin(traceConfig));
builder.plugin(tracePlugin);

if (matrixEnable) {

//resource
Intent intent = new Intent();
ResourceConfig.DumpMode mode = ResourceConfig.DumpMode.AUTO_DUMP;
MatrixLog.i(TAG, "Dump Activity Leak Mode=%s", mode);
intent.setClassName(this.getPackageName(), "com.tencent.mm.ui.matrix.ManualDumpActivity");
ResourceConfig resourceConfig = new ResourceConfig.Builder()
.dynamicConfig(dynamicConfig)
.setAutoDumpHprofMode(mode)
// .setDetectDebuger(true) //matrix test code
.setNotificationContentIntent(intent)
.build();
builder.plugin(new ResourcePlugin(resourceConfig));
ResourcePlugin.activityLeakFixer(this);

//io
IOCanaryPlugin ioCanaryPlugin = new IOCanaryPlugin(new IOConfig.Builder()
.dynamicConfig(dynamicConfig)
.build());
builder.plugin(ioCanaryPlugin);


// prevent api 19 UnsatisfiedLinkError
//sqlite
SQLiteLintConfig sqlLiteConfig;
try {
sqlLiteConfig = new SQLiteLintConfig(SQLiteLint.SqlExecutionCallbackMode.CUSTOM_NOTIFY);
} catch (Throwable t) {
sqlLiteConfig = new SQLiteLintConfig(SQLiteLint.SqlExecutionCallbackMode.CUSTOM_NOTIFY);
}
builder.plugin(new SQLiteLintPlugin(sqlLiteConfig));
}

Matrix.init(builder.build());

//start only startup tracer, close other tracer.
tracePlugin.start();
}


public static Context getContext() {
return sContext;
}
}

Trace Canary / Resource Canary / IO Canary / SQLite Lint 这四个功能都是独立的plugin,Application初始化的时候在Matrix Builder中设置对应的plugin即可集成。在Matrix实例构建的时候会调每个plugin的init方法,来看下TracePlugin的init方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
@Override
public void init(Application app, PluginListener listener) {
super.init(app, listener);
MatrixLog.i(TAG, "trace plugin init, trace config: %s", traceConfig.toString());
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.JELLY_BEAN) {
MatrixLog.e(TAG, "[FrameBeat] API is low Build.VERSION_CODES.JELLY_BEAN(16), TracePlugin is not supported");
unSupportPlugin();
return;
}

anrTracer = new AnrTracer(traceConfig);

frameTracer = new FrameTracer(traceConfig);

evilMethodTracer = new EvilMethodTracer(traceConfig);

startupTracer = new StartupTracer(traceConfig);
}

@Override
public void start() {
super.start();
if (!isSupported()) {
MatrixLog.w(TAG, "[start] Plugin is unSupported!");
return;
}
MatrixLog.w(TAG, "start!");
Runnable runnable = new Runnable() {
@Override
public void run() {

if (!UIThreadMonitor.getMonitor().isInit()) {
try {
UIThreadMonitor.getMonitor().init(traceConfig);
} catch (java.lang.RuntimeException e) {
MatrixLog.e(TAG, "[start] RuntimeException:%s", e);
return;
}
}

AppMethodBeat.getInstance().onStart();

UIThreadMonitor.getMonitor().onStart();

anrTracer.onStartTrace();

frameTracer.onStartTrace();

evilMethodTracer.onStartTrace();

startupTracer.onStartTrace();
}
};

if (Thread.currentThread() == Looper.getMainLooper().getThread()) {
runnable.run();
} else {
MatrixLog.w(TAG, "start TracePlugin in Thread[%s] but not in mainThread!", Thread.currentThread().getId());
MatrixHandlerThread.getDefaultMainHandler().post(runnable);
}

}

创建了四个tracer实例,然后调用了:AppMethodBeat单例的onStart()方法、UIThreadMonitor单例的onStart()方法以及每个tracer的onStartTrace方法。

  • anrTracer: 主要进行ANR的监控上报
  • frameTracer:掉帧数和帧率的监控上报
  • evilMethodTracer: 方法耗时的监控上报
  • startupTracer: 启动速度的监控上报

他们都继承自Tracer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public abstract class Tracer extends LooperObserver implements ITracer {

private volatile boolean isAlive = false;
private static final String TAG = "Matrix.Tracer";

@CallSuper
protected void onAlive() {
MatrixLog.i(TAG, "[onAlive] %s", this.getClass().getName());

}

@CallSuper
protected void onDead() {
MatrixLog.i(TAG, "[onDead] %s", this.getClass().getName());
}

@Override
final synchronized public void onStartTrace() {
if (!isAlive) {
this.isAlive = true;
onAlive();
}
}

@Override
final synchronized public void onCloseTrace() {
if (isAlive) {
this.isAlive = false;
onDead();
}
}

@Override
public void onForeground(boolean isForeground) {

}

@Override
public boolean isAlive() {
return isAlive;
}

public boolean isForeground() {
return AppActiveMatrixDelegate.INSTANCE.isAppForeground();
}
}

Tracer继承自LooperObserver,这也是Matrix自己定义的类。这里需要先介绍下Matrix监控主线程卡顿的原理,和BlockCanary一样: Looper的loop方法中会循环取MessageQueue里的Message执行,执行前后分别会打印一次特殊的log,通过设置主线程looper中的printer,拦截到特殊的log后就能知道主线程dispatch某个message的耗时了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
* Run the message queue in this thread. Be sure to call
* {@link #quit()} to end the loop.
*/
public static void loop() {
final Looper me = myLooper();
if (me == null) {
throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
}
final MessageQueue queue = me.mQueue;
......
for (;;) {
Message msg = queue.next(); // might block
if (msg == null) {
// No message indicates that the message queue is quitting.
return;
}

// This must be in a local variable, in case a UI event sets the logger
final Printer logging = me.mLogging;
if (logging != null) {
logging.println(">>>>> Dispatching to " + msg.target + " " +
msg.callback + ": " + msg.what);
}
......
try {
msg.target.dispatchMessage(msg);
dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
} finally {
if (traceTag != 0) {
Trace.traceEnd(traceTag);
}
}
......
if (logging != null) {
logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
}
......
}
}

Matrix按照这个思路在dispatch Message的开始和结束通过观察者模式通知LooperObserver进行耗时统计,EvilMethodTracer、FrameTracer和AnrTracer这仨observer都注册到了UIThreadMonitor(observable)上,在主线程message dispatch的前后LooperMonitor会通知UIThreadMonitor,UIThreadMonitor通知observer进行打点。

EvilMethodTracer

EvilMethodTracer以TraceConfig中配置的方法耗时阈值(默认是700毫秒)来判断某个message dispatch时间是否过长,超过阈值则把进程信息,CPU使用率,调用栈(包含具体方法耗时),这帧input、animation、traversal耗时等信息上报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
@Override
public void dispatchBegin(long beginMs, long cpuBeginMs, long token) {
super.dispatchBegin(beginMs, cpuBeginMs, token);
indexRecord = AppMethodBeat.getInstance().maskIndex("EvilMethodTracer#dispatchBegin");
}


@Override
public void doFrame(String focusedActivityName, long start, long end, long frameCostMs, long inputCostNs, long animationCostNs, long traversalCostNs) {
queueTypeCosts[0] = inputCostNs;
queueTypeCosts[1] = animationCostNs;
queueTypeCosts[2] = traversalCostNs;
}


@Override
public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
long start = config.isDevEnv() ? System.currentTimeMillis() : 0;
try {
long dispatchCost = endMs - beginMs;
// 超过阈值上报耗时方法信息
if (dispatchCost >= evilThresholdMs) {
long[] data = AppMethodBeat.getInstance().copyData(indexRecord);
long[] queueCosts = new long[3];
System.arraycopy(queueTypeCosts, 0, queueCosts, 0, 3);
String scene = AppMethodBeat.getVisibleScene();
MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(isForeground(), scene, data, queueCosts, cpuEndMs - cpuBeginMs, endMs - beginMs, endMs));
}
} finally {
indexRecord.release();
if (config.isDevEnv()) {
String usage = Utils.calculateCpuUsage(cpuEndMs - cpuBeginMs, endMs - beginMs);
MatrixLog.v(TAG, "[dispatchEnd] token:%s cost:%sms cpu:%sms usage:%s innerCost:%s",
token, endMs - beginMs, cpuEndMs - cpuBeginMs, usage, System.currentTimeMillis() - start);
}
}
}

// 耗时分析和上报
private class AnalyseTask implements Runnable {
long[] queueCost;
long[] data;
long cpuCost;
long cost;
long endMs;
String scene;
boolean isForeground;

AnalyseTask(boolean isForeground, String scene, long[] data, long[] queueCost, long cpuCost, long cost, long endMs) {
this.isForeground = isForeground;
this.scene = scene;
this.cost = cost;
this.cpuCost = cpuCost;
this.data = data;
this.queueCost = queueCost;
this.endMs = endMs;
}

void analyse() {

// process
int[] processStat = Utils.getProcessPriority(Process.myPid());
String usage = Utils.calculateCpuUsage(cpuCost, cost);
LinkedList<MethodItem> stack = new LinkedList();
if (data.length > 0) {
TraceDataUtils.structuredDataToStack(data, stack, true, endMs);
TraceDataUtils.trimStack(stack, Constants.TARGET_EVIL_METHOD_STACK, new TraceDataUtils.IStructuredDataFilter() {
@Override
public boolean isFilter(long during, int filterCount) {
return during < filterCount * Constants.TIME_UPDATE_CYCLE_MS;
}

@Override
public int getFilterMaxCount() {
return Constants.FILTER_STACK_MAX_COUNT;
}

@Override
public void fallback(List<MethodItem> stack, int size) {
MatrixLog.w(TAG, "[fallback] size:%s targetSize:%s stack:%s", size, Constants.TARGET_EVIL_METHOD_STACK, stack);
Iterator iterator = stack.listIterator(Math.min(size, Constants.TARGET_EVIL_METHOD_STACK));
while (iterator.hasNext()) {
iterator.next();
iterator.remove();
}
}
});
}


StringBuilder reportBuilder = new StringBuilder();
StringBuilder logcatBuilder = new StringBuilder();
long stackCost = Math.max(cost, TraceDataUtils.stackToString(stack, reportBuilder, logcatBuilder));
String stackKey = TraceDataUtils.getTreeKey(stack, stackCost);

MatrixLog.w(TAG, "%s", printEvil(scene, processStat, isForeground, logcatBuilder, stack.size(), stackKey, usage, queueCost[0], queueCost[1], queueCost[2], cost)); // for logcat

if (stackCost >= Constants.DEFAULT_ANR_INVALID || processStat[0] > 10) {
MatrixLog.w(TAG, "The checked anr task was not executed on time. "
+ "The possible reason is that the current process has a low priority. just pass this report");
return;
}

// report
try {
TracePlugin plugin = Matrix.with().getPluginByClass(TracePlugin.class);
if (null == plugin) {
return;
}
JSONObject jsonObject = new JSONObject();
jsonObject = DeviceUtil.getDeviceInfo(jsonObject, Matrix.with().getApplication());

jsonObject.put(SharePluginInfo.ISSUE_STACK_TYPE, Constants.Type.NORMAL);
jsonObject.put(SharePluginInfo.ISSUE_COST, stackCost);
jsonObject.put(SharePluginInfo.ISSUE_CPU_USAGE, usage);
jsonObject.put(SharePluginInfo.ISSUE_SCENE, scene);
jsonObject.put(SharePluginInfo.ISSUE_TRACE_STACK, reportBuilder.toString());
jsonObject.put(SharePluginInfo.ISSUE_STACK_KEY, stackKey);

Issue issue = new Issue();
issue.setTag(SharePluginInfo.TAG_PLUGIN_EVIL_METHOD);
issue.setContent(jsonObject);
plugin.onDetectIssue(issue);

} catch (JSONException e) {
MatrixLog.e(TAG, "[JSONException error: %s", e);
}

}

@Override
public void run() {
analyse();
}

private String printEvil(String scene, int[] processStat, boolean isForeground, StringBuilder stack, long stackSize, String stackKey, String usage, long inputCost,
long animationCost, long traversalCost, long allCost) {
StringBuilder print = new StringBuilder();
print.append(String.format("-\n>>>>>>>>>>>>>>>>>>>>> maybe happens Jankiness!(%sms) <<<<<<<<<<<<<<<<<<<<<\n", allCost));
print.append("|* scene: ").append(scene).append("\n");
print.append("|* [ProcessStat]").append("\n");
print.append("|*\t\tPriority: ").append(processStat[0]).append("\n");
print.append("|*\t\tNice: ").append(processStat[1]).append("\n");
print.append("|*\t\tForeground: ").append(isForeground).append("\n");
print.append("|* [CPU]").append("\n");
print.append("|*\t\tusage: ").append(usage).append("\n");
print.append("|* [doFrame]").append("\n");
print.append("|*\t\tinputCost: ").append(inputCost).append("\n");
print.append("|*\t\tanimationCost: ").append(animationCost).append("\n");
print.append("|*\t\ttraversalCost: ").append(traversalCost).append("\n");
print.append("|* [Trace]").append("\n");
print.append("|*\t\tStackSize: ").append(stackSize).append("\n");
print.append("|*\t\tStackKey: ").append(stackKey).append("\n");

if (config.isDebug()) {
print.append(stack.toString());
}

print.append("=========================================================================");
return print.toString();
}
}

为了减少频繁调用System.currentTimeMillis()带来的性能开销,Matrix启动了一个后台线程,每5毫秒更新一次时间戳,AppMethodBeat打点的时候直接用计算好的时间戳。

可能你会问,这帧input、animation、traversal耗时是怎么算出来的? Choreographer并没有给我们这样的接口。这就得说说Choreographer的源码了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
void doFrame(long frameTimeNanos, int frame) {
final long startNanos;
synchronized (mLock) {
if (!mFrameScheduled) {
return; // no work to do
}

if (DEBUG_JANK && mDebugPrintNextFrameTimeDelta) {
mDebugPrintNextFrameTimeDelta = false;
Log.d(TAG, "Frame time delta: "
+ ((frameTimeNanos - mLastFrameTimeNanos) * 0.000001f) + " ms");
}

long intendedFrameTimeNanos = frameTimeNanos;
startNanos = System.nanoTime();
final long jitterNanos = startNanos - frameTimeNanos;
if (jitterNanos >= mFrameIntervalNanos) {
final long skippedFrames = jitterNanos / mFrameIntervalNanos;
if (skippedFrames >= SKIPPED_FRAME_WARNING_LIMIT) {
Log.i(TAG, "Skipped " + skippedFrames + " frames! "
+ "The application may be doing too much work on its main thread.");
}
final long lastFrameOffset = jitterNanos % mFrameIntervalNanos;
if (DEBUG_JANK) {
Log.d(TAG, "Missed vsync by " + (jitterNanos * 0.000001f) + " ms "
+ "which is more than the frame interval of "
+ (mFrameIntervalNanos * 0.000001f) + " ms! "
+ "Skipping " + skippedFrames + " frames and setting frame "
+ "time to " + (lastFrameOffset * 0.000001f) + " ms in the past.");
}
frameTimeNanos = startNanos - lastFrameOffset;
}

if (frameTimeNanos < mLastFrameTimeNanos) {
if (DEBUG_JANK) {
Log.d(TAG, "Frame time appears to be going backwards. May be due to a "
+ "previously skipped frame. Waiting for next vsync.");
}
scheduleVsyncLocked();
return;
}

if (mFPSDivisor > 1) {
long timeSinceVsync = frameTimeNanos - mLastFrameTimeNanos;
if (timeSinceVsync < (mFrameIntervalNanos * mFPSDivisor) && timeSinceVsync > 0) {
scheduleVsyncLocked();
return;
}
}

mFrameInfo.setVsync(intendedFrameTimeNanos, frameTimeNanos);
mFrameScheduled = false;
mLastFrameTimeNanos = frameTimeNanos;
}

try {
Trace.traceBegin(Trace.TRACE_TAG_VIEW, "Choreographer#doFrame");
AnimationUtils.lockAnimationClock(frameTimeNanos / TimeUtils.NANOS_PER_MS);

mFrameInfo.markInputHandlingStart();
doCallbacks(Choreographer.CALLBACK_INPUT, frameTimeNanos);

mFrameInfo.markAnimationsStart();
doCallbacks(Choreographer.CALLBACK_ANIMATION, frameTimeNanos);

mFrameInfo.markPerformTraversalsStart();
doCallbacks(Choreographer.CALLBACK_TRAVERSAL, frameTimeNanos);

doCallbacks(Choreographer.CALLBACK_COMMIT, frameTimeNanos);
} finally {
AnimationUtils.unlockAnimationClock();
Trace.traceEnd(Trace.TRACE_TAG_VIEW);
}

if (DEBUG_FRAMES) {
final long endNanos = System.nanoTime();
Log.d(TAG, "Frame " + frame + ": Finished, took "
+ (endNanos - startNanos) * 0.000001f + " ms, latency "
+ (startNanos - frameTimeNanos) * 0.000001f + " ms.");
}
}

Choreographer负责接收和处理所有界面更新消息,在收到VSYNC信号的时候统一处理,按顺序执行CALLBACK_INPUT(input事件)、CALLBACK_ANIMATION(动画)、CALLBACK_TRAVERSAL(layout、draw)、CALLBACK_COMMIT(ActivityThread post的,执行scheduleTrimMemory)这几种callback,这几种callback分别在各自的CallbackQueue里存着,只要在对应的CallbackQueue的头插入一个callback,就能统计出input、animation、traversal这几个阶段的耗时。

具体代码在UIThreadMonitor中,首先其初始化的时候会反射拿到Choreographer里的CALLBACK_INPUT、CALLBACK_ANIMATION、CALLBACK_TRAVERSAL这三种CallbackQueue,接着反射拿到他们的addCallbackLocked方法以备调用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
private Object callbackQueueLock;
private Object[] callbackQueues;
private Method addTraversalQueue;
private Method addInputQueue;
private Method addAnimationQueue;
private Choreographer choreographer;
private long frameIntervalNanos = 16666666;
private int[] queueStatus = new int[CALLBACK_LAST + 1];
private boolean[] callbackExist = new boolean[CALLBACK_LAST + 1]; // ABA
private long[] queueCost = new long[CALLBACK_LAST + 1];
private boolean isInit = false;

public void init(TraceConfig config) {
if (Thread.currentThread() != Looper.getMainLooper().getThread()) {
throw new AssertionError("must be init in main thread!");
}
this.config = config;
choreographer = Choreographer.getInstance();
callbackQueueLock = reflectObject(choreographer, "mLock");
callbackQueues = reflectObject(choreographer, "mCallbackQueues");

addInputQueue = reflectChoreographerMethod(callbackQueues[CALLBACK_INPUT], ADD_CALLBACK, long.class, Object.class, Object.class);
addAnimationQueue = reflectChoreographerMethod(callbackQueues[CALLBACK_ANIMATION], ADD_CALLBACK, long.class, Object.class, Object.class);
addTraversalQueue = reflectChoreographerMethod(callbackQueues[CALLBACK_TRAVERSAL], ADD_CALLBACK, long.class, Object.class, Object.class);
frameIntervalNanos = reflectObject(choreographer, "mFrameIntervalNanos");

LooperMonitor.register(new LooperMonitor.LooperDispatchListener() {
@Override
public boolean isValid() {
return isAlive;
}

@Override
public void dispatchStart() {
super.dispatchStart();
UIThreadMonitor.this.dispatchBegin();
}

@Override
public void dispatchEnd() {
super.dispatchEnd();
UIThreadMonitor.this.dispatchEnd();
}

});
......
this.isInit = true;
}

然后在其创建时候的start和每次message的dispatchEnd方法中会调用addFrameCallback(CALLBACK_INPUT, this, true);给input类型的CallbackQueue头上插入一个callback(就是UIThreadMonitor自己)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
private synchronized void addFrameCallback(int type, Runnable callback, boolean isAddHeader) {
if (callbackExist[type]) {
MatrixLog.w(TAG, "[addFrameCallback] this type %s callback has exist! isAddHeader:%s", type, isAddHeader);
return;
}

if (!isAlive && type == CALLBACK_INPUT) {
MatrixLog.w(TAG, "[addFrameCallback] UIThreadMonitor is not alive!");
return;
}
try {
synchronized (callbackQueueLock) {
Method method = null;
switch (type) {
case CALLBACK_INPUT:
method = addInputQueue;
break;
case CALLBACK_ANIMATION:
method = addAnimationQueue;
break;
case CALLBACK_TRAVERSAL:
method = addTraversalQueue;
break;
}
if (null != method) {
method.invoke(callbackQueues[type], !isAddHeader ? SystemClock.uptimeMillis() : -1, callback, null);
callbackExist[type] = true;
}
}
} catch (Exception e) {
MatrixLog.e(TAG, e.toString());
}
}

执行到UIThreadMonitor这个input callback的时候会进行打点,然后给CALLBACK_ANIMATION的CallbackQueue头插入callback,这个callback的执行就表示input执行结束,animation执行开始,从而统计出处理input阶段耗时。CALLBACK_TRAVERSAL同,不赘述。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Override
public void run() {
final long start = System.nanoTime();
try {
doFrameBegin(token);
doQueueBegin(CALLBACK_INPUT);

addFrameCallback(CALLBACK_ANIMATION, new Runnable() {

@Override
public void run() {
doQueueEnd(CALLBACK_INPUT);
doQueueBegin(CALLBACK_ANIMATION);
}
}, true);

addFrameCallback(CALLBACK_TRAVERSAL, new Runnable() {

@Override
public void run() {
doQueueEnd(CALLBACK_ANIMATION);
doQueueBegin(CALLBACK_TRAVERSAL);
}
}, true);

} finally {
if (config.isDevEnv()) {
MatrixLog.d(TAG, "[UIThreadMonitor#run] inner cost:%sns", System.nanoTime() - start);
}
}
}

到这里其实耗时方法统计的原理已经清楚了。

AnrTracer

ANR耗时在方法耗时的基础上进行上报,在开始dispatch的时候post一个5秒后执行的runnable,然后在结束的时候cancel这个runnable。runnable执行的时候就说明发生了ANR,进行数据上报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Override
public void dispatchBegin(long beginMs, long cpuBeginMs, long token) {
super.dispatchBegin(beginMs, cpuBeginMs, token);
anrTask = new AnrHandleTask(AppMethodBeat.getInstance().maskIndex("AnrTracer#dispatchBegin"), token);
if (traceConfig.isDevEnv()) {
MatrixLog.v(TAG, "* [dispatchBegin] token:%s index:%s", token, anrTask.beginRecord.index);
}
anrHandler.postDelayed(anrTask, Constants.DEFAULT_ANR - (SystemClock.uptimeMillis() - token));
}

@Override
public void dispatchEnd(long beginMs, long cpuBeginMs, long endMs, long cpuEndMs, long token, boolean isBelongFrame) {
super.dispatchEnd(beginMs, cpuBeginMs, endMs, cpuEndMs, token, isBelongFrame);
if (traceConfig.isDevEnv()) {
MatrixLog.v(TAG, "[dispatchEnd] token:%s cost:%sms cpu:%sms usage:%s",
token, endMs - beginMs, cpuEndMs - cpuBeginMs, Utils.calculateCpuUsage(cpuEndMs - cpuBeginMs, endMs - beginMs));
}
if (null != anrTask) {
anrTask.getBeginRecord().release();
anrHandler.removeCallbacks(anrTask);
}
}

FrameTracer

FrameTracer主要统计的是掉帧数和帧率,掉帧数有时候相比帧率能更好的反映视觉上的卡顿,比如30帧每秒可能不高 但是只要帧数稳定,人眼感觉到的依然是流畅的画面,而一旦某帧执行时间过长,那卡顿感就比较明显了。掉帧数通过某帧耗时除以16.666666秒来计算。一帧的耗时就是通过上面所说的msg dispatch前后时间计算的,然后doFrame当执行到INPUT类型的callback的时候就标记这是更新界面的message。

StartupTracer

主要统计应用冷启、热启、首屏、单个activity启动等阶段耗时:

1
2
3
4
5
6
7
8
* firstMethod.i       LAUNCH_ACTIVITY   onWindowFocusChange   LAUNCH_ACTIVITY    onWindowFocusChange
* ^ ^ ^ ^ ^
* | | | | |
* |---------app---------|---|---firstActivity---|---------...---------|---careActivity---|
* |<--applicationCost-->|
* |<----firstScreenCost---->|
* |<---------------------------allCost(cold)------------------------->|
* . |<--allCost(warm)-->|

启动和首屏出现耗时的起点都是应用中的第一个方法开始执行的时间,通过前面的插桩已经可以拿到了。因为handler处理message的优先顺序是:

msg自身的callback(不向下传递) > handler自身的mCallback(如果handleMessage返回true不向下传递,false继续传给handler的handleMessage) > handler的handleMessage()

所以通过反射加代理替换掉ActivityThread的mH的mCallback,就可以进行对activity和application启动的打点。

在Android 9.0以前 activity的启动通过发送一个what是LAUNCH_ACTIVITY的msg给mH,mH收到后调用ActivityThread的handleLaunchActivity从而开始启动activity; 9.0及以后去除了activity生命周期相关的msg.what,统一使用whatEXECUTE_TRANSACTION,obj是ClientTransaction实例的msg来通知mH执行生命周期。ClientTransaction其callbacks是ClientTransactionItem list,launch activity的时候其callback是LaunchActivityItem (继承自ClientTransactionItem), 可据此判断调起activity。

application创建结束的时间计算方法是:只要activity开始launch或者service开始create或者broadcastReceiver开始创建就算application创建结束。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
 public class ActivityThreadHacker {
private static final String TAG = "Matrix.ActivityThreadHacker";
private static long sApplicationCreateBeginTime = 0L;
private static long sApplicationCreateEndTime = 0L;
private static long sLastLaunchActivityTime = 0L;
public static AppMethodBeat.IndexRecord sLastLaunchActivityMethodIndex = new AppMethodBeat.IndexRecord();
public static AppMethodBeat.IndexRecord sApplicationCreateBeginMethodIndex = new AppMethodBeat.IndexRecord();
public static int sApplicationCreateScene = -100;

public static void hackSysHandlerCallback() {
try {
sApplicationCreateBeginTime = SystemClock.uptimeMillis();
sApplicationCreateBeginMethodIndex = AppMethodBeat.getInstance().maskIndex("ApplicationCreateBeginMethodIndex");
Class<?> forName = Class.forName("android.app.ActivityThread");
Field field = forName.getDeclaredField("sCurrentActivityThread");
field.setAccessible(true);
Object activityThreadValue = field.get(forName);
Field mH = forName.getDeclaredField("mH");
mH.setAccessible(true);
Object handler = mH.get(activityThreadValue);
Class<?> handlerClass = handler.getClass().getSuperclass();
Field callbackField = handlerClass.getDeclaredField("mCallback");
callbackField.setAccessible(true);
Handler.Callback originalCallback = (Handler.Callback) callbackField.get(handler);
HackCallback callback = new HackCallback(originalCallback);
callbackField.set(handler, callback);
MatrixLog.i(TAG, "hook system handler completed. start:%s SDK_INT:%s", sApplicationCreateBeginTime, Build.VERSION.SDK_INT);
} catch (Exception e) {
MatrixLog.e(TAG, "hook system handler err! %s", e.getCause().toString());
}
}

public static long getApplicationCost() {
return ActivityThreadHacker.sApplicationCreateEndTime - ActivityThreadHacker.sApplicationCreateBeginTime;
}

public static long getEggBrokenTime() {
return ActivityThreadHacker.sApplicationCreateBeginTime;
}

public static long getLastLaunchActivityTime() {
return ActivityThreadHacker.sLastLaunchActivityTime;
}


private final static class HackCallback implements Handler.Callback {
private static final int LAUNCH_ACTIVITY = 100;
private static final int CREATE_SERVICE = 114;
private static final int RECEIVER = 113;
public static final int EXECUTE_TRANSACTION = 159; // for Android 9.0
private static boolean isCreated = false;
private static int hasPrint = 10;

private final Handler.Callback mOriginalCallback;

HackCallback(Handler.Callback callback) {
this.mOriginalCallback = callback;
}

@Override
public boolean handleMessage(Message msg) {

if (!AppMethodBeat.isRealTrace()) {
return null != mOriginalCallback && mOriginalCallback.handleMessage(msg);
}

boolean isLaunchActivity = isLaunchActivity(msg);
if (hasPrint > 0) {
MatrixLog.i(TAG, "[handleMessage] msg.what:%s begin:%s isLaunchActivity:%s", msg.what, SystemClock.uptimeMillis(), isLaunchActivity);
hasPrint--;
}
if (isLaunchActivity) {
ActivityThreadHacker.sLastLaunchActivityTime = SystemClock.uptimeMillis();
ActivityThreadHacker.sLastLaunchActivityMethodIndex = AppMethodBeat.getInstance().maskIndex("LastLaunchActivityMethodIndex");
}

if (!isCreated) {
if (isLaunchActivity || msg.what == CREATE_SERVICE || msg.what == RECEIVER) { // todo for provider
ActivityThreadHacker.sApplicationCreateEndTime = SystemClock.uptimeMillis();
ActivityThreadHacker.sApplicationCreateScene = msg.what;
isCreated = true;
}
}

return null != mOriginalCallback && mOriginalCallback.handleMessage(msg);
}

private Method method = null;

private boolean isLaunchActivity(Message msg) {
if (Build.VERSION.SDK_INT > Build.VERSION_CODES.O_MR1) {
if (msg.what == EXECUTE_TRANSACTION && msg.obj != null) {
try {
if (null == method) {
Class clazz = Class.forName("android.app.servertransaction.ClientTransaction");
method = clazz.getDeclaredMethod("getCallbacks");
method.setAccessible(true);
}
List list = (List) method.invoke(msg.obj);
if (!list.isEmpty()) {
return list.get(0).getClass().getName().endsWith(".LaunchActivityItem");
}
} catch (Exception e) {
MatrixLog.e(TAG, "[isLaunchActivity] %s", e);
}
}
return msg.what == LAUNCH_ACTIVITY;
} else {
return msg.what == LAUNCH_ACTIVITY;
}
}
}


}

单个activity从启动到展示UI的耗时是从launch activity到onWindowFocusChanged这段时间,onWindowFocusChanged(true)表示activity已经获取到焦点,可以和用户进行交互。

到这里Matrix的TraceCanary部分就分析完了,可以看出和BlockCanCary是大体一样的原理,但微信就是做的细致和完美,包括细化每帧内几部分的执行时间、掉帧数的计算、插桩性能、上报数据大小等方面都超出竞品。

其他的APK CheckerResource CanarySQLite LintIO Canary等工具后续一一来分析。