本文虽名为《Android系统服务的注册缓存机制分析》,但主要记录的是笔者最近解决一个单机型Bug的经历。在解决这个Bug的过程当中,我对于Android系统服务的注册缓存机制也有了更深刻的了解。因此,本文的核心是getSystemService
这个系统函数背后的复杂机制。android
最近基于DroidPlugin作了一个Demo,测试的时候发现,在几款小米手机(Mix 二、Mix 2s、小米5)的Android 8版本上都会出现插件加载闪退的状况,抓取到的日志以下:vim
基于这个信息,我找到了对应的系统源码:缓存
/** * Do a quick check to validate if a package name belongs to a UID. * * @throws SecurityException if the package name doesn't belong to the given * UID, or if ownership cannot be verified. */ public void checkPackage(int uid, String packageName) { try { if (mService.checkPackage(uid, packageName) != MODE_ALLOWED) { throw new SecurityException( "Package " + packageName + " does not belong to " + uid); } } catch (RemoteException e) { throw e.rethrowFromSystemServer(); } }
结合错误信息可知,是因为checkPackage
方法使用了插件的包名给系统作校验,而log中的uid(10525
)对应的是宿主的包名,因此报了这个错误。app
因而我直接写了一个Demo,核心代码以下:ide
AppOpsManager mAppOps = (AppOpsManager) getSystemService(Context.APP_OPS_SERVICE); Log.i("vimerr", "-->" + Process.myUid() + "/" + getPackageName()); mAppOps.checkPackage(Process.myUid(), getPackageName());
将其直接在DroidPlugin的源码中加载,能够在任意机型获得了相似的错误:函数
至此,能够得出第一个结论:测试
这是一个DroidPlugin固有的错误,它不能正确地处理插件对
checkPackage
的调用。fetch
基于此,下文将主要围绕DroidPlugin的代码分析这个问题。ui
经过代码,能够发现DroidPlugin其实作了这个调用的处理:this
那么为何仍是不行呢?难道是installHook
这个逻辑失败了?因而我扩展了上面的测试用例:
ClipboardManager clipboardManager = (ClipboardManager)getSystemService(Context.CLIPBOARD_SERVICE); AppOpsManager mAppOps = (AppOpsManager) getSystemService(Context.APP_OPS_SERVICE); UserManager manager = (UserManager) getSystemService(Context.USER_SERVICE); Log.i("vimerr", "clip -->" + clipboardManager.getPrimaryClip()); Log.i("vimerr", "-->" + Process.myUid() + "/" + getPackageName()); mAppOps.checkPackage(Process.myUid(), getPackageName());
并在ServiceManagerCacheBinderHook#invoke
的入口加上日志:
@Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { try { Log.d("vimerr", method + "/" + mServiceName); IBinder originService = MyServiceManager.getOriginService(mServiceName); ...... } }
结果越来奇怪了:
剪切板服务和appops
这个服务走了相同的installHook
逻辑,为何一个hook成功了(能在invoke
被拦截),另外一个却失败了?
经过不少次尝试以后,我把目标汇集到了一个方法上(类名:PluginProcessManager
):
//这里为了解决某些插件调用系统服务时,系统服务必需要求要以host包名的身份去调用的问题。 public static void fakeSystemService(Context hostContext, Context targetContext) { if (VERSION.SDK_INT >= VERSION_CODES.ICE_CREAM_SANDWICH_MR1 && !TextUtils.equals(hostContext.getPackageName(), targetContext.getPackageName())) { long b = System.currentTimeMillis(); fakeSystemServiceInner(hostContext, targetContext); Log.i(TAG, "Fake SystemService for originContext=%s context=%s,cost %s ms", targetContext.getPackageName(), targetContext.getPackageName(), (System.currentTimeMillis() - b)); } }
经过注释来看,该方法是解决我这个问题的(心里窃喜)!!可是为啥仍是崩了呢?怀着懵逼的想法,我注释了fakeSystemServiceInner
这个调用,结果居然OK了。
此刻的我信仰有点崩塌,说好的解决问题呢?冷静一下,我决定看下fakeSystemServiceInner
源码(核心部分):
private static void fakeSystemServiceInner(Context hostContext, Context targetContext) { try { Context baseContext = getBaseContext(targetContext); if (mFakedContext.containsValue(baseContext)) { return; } else if (mServiceCache != null) { ...... } Object SYSTEM_SERVICE_MAP = null; try { SYSTEM_SERVICE_MAP = FieldUtils.readStaticField(baseContext.getClass(), "SYSTEM_SERVICE_MAP"); } catch (Exception e) { Log.w(TAG, "readStaticField(SYSTEM_SERVICE_MAP) from %s fail", e, baseContext.getClass()); } if (SYSTEM_SERVICE_MAP == null) { try { SYSTEM_SERVICE_MAP = FieldUtils.readStaticField(Class.forName("android.app.SystemServiceRegistry"), "SYSTEM_SERVICE_FETCHERS"); } catch (Exception e) { Log.e(TAG, "readStaticField(SYSTEM_SERVICE_FETCHERS) from android.app.SystemServiceRegistry fail", e); } } if (SYSTEM_SERVICE_MAP != null && (SYSTEM_SERVICE_MAP instanceof Map)) { //如没有,则建立一个新的。 Map<?, ?> sSYSTEM_SERVICE_MAP = (Map<?, ?>) SYSTEM_SERVICE_MAP; Context originContext = getBaseContext(hostContext); Object mServiceCache = FieldUtils.readField(originContext, "mServiceCache"); if (mServiceCache instanceof List) { ((List) mServiceCache).clear(); } for (Object key : sSYSTEM_SERVICE_MAP.keySet()) { if (sSkipService.contains(key)) { continue; } Object serviceFetcher = sSYSTEM_SERVICE_MAP.get(key); try { Method getService = serviceFetcher.getClass().getMethod("getService", baseContext.getClass()); getService.invoke(serviceFetcher, originContext); } catch (InvocationTargetException e) { Throwable cause = e.getCause(); if (cause != null) { Log.w(TAG, "Fake system service faile", e); } else { Log.w(TAG, "Fake system service faile", e); } } catch (Exception e) { Log.w(TAG, "Fake system service faile", e); } } mServiceCache = FieldUtils.readField(originContext, "mServiceCache"); FieldUtils.writeField(baseContext, "mServiceCache", mServiceCache); //for context ContentResolver ContentResolver cr = baseContext.getContentResolver(); if (cr != null) { Object crctx = FieldUtils.readField(cr, "mContext"); if (crctx != null) { FieldUtils.writeField(crctx, "mServiceCache", mServiceCache); } } } if (!mFakedContext.containsValue(baseContext)) { mFakedContext.put(baseContext.hashCode(), baseContext); } } catch (Exception e) { Log.e(TAG, "fakeSystemServiceOldAPI", e); } }
发现这个逻辑十分绕,hostContext
和targetContext
各自什么意思,反射读取的一些字段又是干吗的?
看来只有弄清楚这段代码的目的,才能进一步解决问题了。
首先从咱们的Demo入手,以Android 9的代码为例,getSystemService
这个方法最终的调用逻辑以下:
@Override public Object getSystemService(String name) { return SystemServiceRegistry.getSystemService(this, name); }
继续追踪:
/** * Manages all of the system services that can be returned by {@link Context#getSystemService}. * Used by {@link ContextImpl}. */ final class SystemServiceRegistry { ...... private static final HashMap<String, ServiceFetcher<?>> SYSTEM_SERVICE_FETCHERS = new HashMap<String, ServiceFetcher<?>>(); ...... * Gets a system service from a given context. */ public static Object getSystemService(ContextImpl ctx, String name) { ServiceFetcher<?> fetcher = SYSTEM_SERVICE_FETCHERS.get(name); return fetcher != null ? fetcher.getService(ctx) : null; } ...... }
也就是说,每一个服务都会对应一个ServiceFetcher
的对象,它的建立过程是静态的,以下:
static { ...... //剪切板服务 registerService(Context.CLIPBOARD_SERVICE, ClipboardManager.class, new CachedServiceFetcher<ClipboardManager>() { @Override public ClipboardManager createService(ContextImpl ctx) throws ServiceNotFoundException { return new ClipboardManager(ctx.getOuterContext(), ctx.mMainThread.getHandler()); }}); ...... //appops服务 registerService(Context.APP_OPS_SERVICE, AppOpsManager.class, new CachedServiceFetcher<AppOpsManager>() { @Override public AppOpsManager createService(ContextImpl ctx) throws ServiceNotFoundException { IBinder b = ServiceManager.getServiceOrThrow(Context.APP_OPS_SERVICE); IAppOpsService service = IAppOpsService.Stub.asInterface(b); return new AppOpsManager(ctx, service); }}); ...... }
那么,这个CachedServiceFetcher
又具体是什么东西呢?它的定义也在SystemServiceRegistry
中:
/** * Override this class when the system service constructor needs a * ContextImpl and should be cached and retained by that context. */ static abstract class CachedServiceFetcher<T> implements ServiceFetcher<T> { private final int mCacheIndex; CachedServiceFetcher() { // Note this class must be instantiated only by the static initializer of the // outer class (SystemServiceRegistry), which already does the synchronization, // so bare access to sServiceCacheSize is okay here. mCacheIndex = sServiceCacheSize++; } @Override @SuppressWarnings("unchecked") public final T getService(ContextImpl ctx) { final Object[] cache = ctx.mServiceCache; final int[] gates = ctx.mServiceInitializationStateArray; for (;;) { boolean doInitialize = false; synchronized (cache) { // Return it if we already have a cached instance. T service = (T) cache[mCacheIndex]; if (service != null || gates[mCacheIndex] == ContextImpl.STATE_NOT_FOUND) { return service; } // If we get here, there's no cached instance. // Grr... if gate is STATE_READY, then this means we initialized the service // once but someone cleared it. // We start over from STATE_UNINITIALIZED. if (gates[mCacheIndex] == ContextImpl.STATE_READY) { gates[mCacheIndex] = ContextImpl.STATE_UNINITIALIZED; } // It's possible for multiple threads to get here at the same time, so // use the "gate" to make sure only the first thread will call createService(). // At this point, the gate must be either UNINITIALIZED or INITIALIZING. if (gates[mCacheIndex] == ContextImpl.STATE_UNINITIALIZED) { doInitialize = true; gates[mCacheIndex] = ContextImpl.STATE_INITIALIZING; } } if (doInitialize) { // Only the first thread gets here. T service = null; @ServiceInitializationState int newState = ContextImpl.STATE_NOT_FOUND; try { // This thread is the first one to get here. Instantiate the service // *without* the cache lock held. service = createService(ctx); newState = ContextImpl.STATE_READY; } catch (ServiceNotFoundException e) { onServiceNotFound(e); } finally { synchronized (cache) { cache[mCacheIndex] = service; gates[mCacheIndex] = newState; cache.notifyAll(); } } return service; } // The other threads will wait for the first thread to call notifyAll(), // and go back to the top and retry. synchronized (cache) { while (gates[mCacheIndex] < ContextImpl.STATE_READY) { try { cache.wait(); } catch (InterruptedException e) { Log.w(TAG, "getService() interrupted"); Thread.currentThread().interrupt(); return null; } } } } } public abstract T createService(ContextImpl ctx) throws ServiceNotFoundException; }
经过分析getService
的逻辑可知:
若是是首次建立,则会缓存一份
若是非首次建立,直接读取缓存,缓存是ctx.mServiceCache
缓存是在ContextImpl
这个类中的:
/** * Common implementation of Context API, which provides the base * context object for Activity and other application components. */ class ContextImpl extends Context { ...... // The system service cache for the system services that are cached per-ContextImpl. final Object[] mServiceCache = SystemServiceRegistry.createServiceCache(); ...... }
由此,咱们得出了getSystemService
背后发生的事情:
每一个ContextImpl
建立的时候会持有一个mServiceCache
字段,缓存这些服务的fetcher
每一个服务对应一个fetcher,服务的建立是在createService
里面的,一开始并无执行
每一个服务第一次实际调用,也就是fetcher的getService
触发的时候会执行createService
的逻辑,并缓存起来
简单来讲,这个一个懒加载+缓存的经典设计,对于Google工程师来讲应该是常规操做了。可是至迟也没看出剪切板服务和appops
服务有任何的不一样!那么:
为何在这两个服务demo中调用以后会产生不一样的效果呢?
为何去掉fakeSystemService
就能正确执行呢?
是时候从新看下刚才的代码了。
若是掌握了系统服务的注册缓存机制,那么刚才的代码就比较容易读懂了,如下是注释版:
/** * @param hostContext 宿主的context * @param targetContext 插件的context */ private static void fakeSystemServiceInner(Context hostContext, Context targetContext) { try { // 获取插件的 mBase,即 ContextImpl对象 Context baseContext = getBaseContext(targetContext); if (mFakedContext.containsValue(baseContext)) { return; } else if (mServiceCache != null) { ...... } Object SYSTEM_SERVICE_MAP = null; // 获取插件的SYSTEM_SERVICE_MAP字段,它的Key包含了全部的系统服务 try { // 低版本的位置 SYSTEM_SERVICE_MAP = FieldUtils.readStaticField(baseContext.getClass(), "SYSTEM_SERVICE_MAP"); } catch (Exception e) { Log.w(TAG, "readStaticField(SYSTEM_SERVICE_MAP) from %s fail", e, baseContext.getClass()); } if (SYSTEM_SERVICE_MAP == null) { try { SYSTEM_SERVICE_MAP = FieldUtils.readStaticField(Class.forName("android.app.SystemServiceRegistry"), "SYSTEM_SERVICE_FETCHERS"); } catch (Exception e) { Log.e(TAG, "readStaticField(SYSTEM_SERVICE_FETCHERS) from android.app.SystemServiceRegistry fail", e); } } if (SYSTEM_SERVICE_MAP != null && (SYSTEM_SERVICE_MAP instanceof Map)) { //如没有,则建立一个新的。 Map<?, ?> sSYSTEM_SERVICE_MAP = (Map<?, ?>) SYSTEM_SERVICE_MAP; // 获取插件的 mBase,即 ContextImpl对象 Context originContext = getBaseContext(hostContext); // 获取宿主ContextImpl的mServiceCache,也就是系统服务的fetcher缓存 Object mServiceCache = FieldUtils.readField(originContext, "mServiceCache"); // 清空缓存 if (mServiceCache instanceof List) { ((List) mServiceCache).clear(); } for (Object key : sSYSTEM_SERVICE_MAP.keySet()) { // 不须要替换包名的跳过,显然appops是须要的 if (sSkipService.contains(key)) { continue; } Object serviceFetcher = sSYSTEM_SERVICE_MAP.get(key); try { // 调用插件的 getService,致使插件的ContextImpl缓存所有赋值,从后面看这一句貌似没有必要 Method getService = serviceFetcher.getClass().getMethod("getService", baseContext.getClass()); getService.invoke(serviceFetcher, originContext); } catch (InvocationTargetException e) { Throwable cause = e.getCause(); if (cause != null) { Log.w(TAG, "Fake system service faile", e); } else { Log.w(TAG, "Fake system service faile", e); } } catch (Exception e) { Log.w(TAG, "Fake system service faile", e); } } // 读取宿主的ContextImpl的缓存 mServiceCache = FieldUtils.readField(originContext, "mServiceCache"); // 用宿主的覆盖插件的,注意宿主的前面已经clear了 FieldUtils.writeField(baseContext, "mServiceCache", mServiceCache); //for context ContentResolver ContentResolver cr = baseContext.getContentResolver(); if (cr != null) { Object crctx = FieldUtils.readField(cr, "mContext"); if (crctx != null) { FieldUtils.writeField(crctx, "mServiceCache", mServiceCache); } } } if (!mFakedContext.containsValue(baseContext)) { mFakedContext.put(baseContext.hashCode(), baseContext); } } catch (Exception e) { Log.e(TAG, "fakeSystemServiceOldAPI", e); } }
简单来讲,这个方法把宿主和插件的ContextImpl对象的mServiceCache
字段所有清空重置了,为何要这么作的呢?
咱们知道DroidPlugin的核心是经过Hook Binder来拦截系统服务,可是有些服务启动的很早,在咱们Hook以前就已经建立而且缓存了,那么咱们就须要经过这个逻辑清除缓存,让下一次调用从新建立,而此时建立的就是咱们Hook过的代理对象了。
若是仔细阅读了上面的分析,可能你已经发现了问题的缘由。能够看到,DroidPlugin认为mServiceCache
字段是一个List
,可是9.0的代码倒是Object []
类型,这就是问题所在了:
因为类型判断错误,致使这个缓存根本没有清理掉,甚至还用宿主的缓存覆盖了插件的缓存。
而在Android 5.0上,它确实是一个List
:
// The system service cache for the system services that are // cached per-ContextImpl. Package-scoped to avoid accessor // methods. final ArrayList<Object> mServiceCache = new ArrayList<Object>();
那么为何我注释了这个方法也能正确运行呢?
那是由于插件的ContextImpl在这个场景确实是没有初始化的,纯粹是由于宿主(已经初始化并缓存)覆盖致使的,若是直接注释,也就不会被覆盖了。
因此解法就是新增一个判断:
...... if (mServiceCache instanceof List) { ((List) mServiceCache).clear(); } // For 高版本 if (mServiceCache instanceof Object[]) { int len = ((Object[])mServiceCache).length; mServiceCache = new Object[len]; FieldUtils.writeField(originContext, "mServiceCache", mServiceCache); } ......
解决这个问题后我重试了小米/Android8.0的机型,发现这个问题在个人工程依然存在,可是DroidPlugin居然好了!陷入新一轮懵逼,通过一番测试后,发现是我在初始化这个以前先初始化了灯塔SDK,若是将其调整在后面就不会有这个问题!
大胆的猜想,多是这个SDK致使了AppOps在attach以后、fakeSystemService
以前建立了,而此时是没有Hook的。只惋惜这个SDK和MIUI对于我来讲都是黑盒系统,对于checkPackage
的探究已经让我收获颇丰,至少在其余常规时机调用这个方法已经不会有问题了,这个小米+Android8的问题也能够经过调整初始化顺序解决,再去深究一个没有源码的问题恐怕也收益不大,因而就此收手吧。
这个问题算是一个典型的DroidPlugin式问题了:
因为Hook了系统,就要不断的去兼容系统,最可怕的是你不能一会儿知道下一个版本你须要兼容哪些改动。