vendredi 21 août 2015

ProFinder - The ProGuard obfuscation tracker.



ProFinder is a tool which identifies classes between two versions of decompiled ProGuard obfuscated applications.

When updating XHangouts, an Xposed module, I have to change the obfuscated class names used for hooking every time a new release of Hangouts is made available. Since the underlying Hangouts code is usually the same there is (usually) no need to change anything in the module except the actual class name this is hooked.

Between two minor revisions of a particular application many classes are often left relatively similar. Of course under normal circumstances the class name is the best method of tracking a class from one revision to the next. Unfortunately ProGuarded applications like Google Hangouts prevent this obvious approach since an obfuscated class could be easily named "avq" in v1.0 and, for instance, "bbr" in v1.1 despite very few changes within.

Manually finding the new class names of every class XHangouts uses is a real pain. Google Hangouts 4.0 has over 8600 classes to compare against. That's up from "just" about 6000 classes in Hangouts 3.3. ProFinder can do something in 30 seconds with a fair degree of accuracy what could take me an hour or more by hand.

Algorithm
ProFinder employs two different methods of identifying classes which reflects two properties an "old" (the one we know) version of a class shares with its "new" (the one we are searching for) version. Signature strings and syntactical similarities.

Signature Strings
The signature string method uses literal const-strings (constant strings) embedded within a class. Strings found within a class that meet a customizable length requirement are considered signature strings. These strings help to make up a class's signature or identifier. Since individual signature strings are not guaranteed to be unique this method is only reliable for classes which are composed of multiple sig strings. The more, the better (i.e. more identifying). The signature string "finding logic" is run over both the old class and every new class. The old class's signature strings are compared against those of every new class. The class with the largest number of intersecting or shared signature strings is the best match. If the percentage of intersecting strings (a ratio of intersections / total) is greater than or equal to a customizable amount, the algorithm has succeeded and the best match is considered the new version of the old class.

Diff
The diff method is a visual algorithm that compares the number of line-by-line syntactical similarities between the old class and every new class which meets a customizable line tolerance requirement. The core of this algorithm is really just

Code:


diff -y --suppress-common-lines old.smali new.smali | wc -l

. Since running the diff utility on thousands of classes (many of which are obviously vastly different) is a fairly time consuming operation the line tolerance requirement ensures that we only compare against new classes that are within a certain plus or minus total line-number range of the old class. For example, if our old class is 500 lines long and the line tolerance is 50 we will only diff new classes that are between 450 and 550 lines inclusively. Once diff has been run against all the applicable new classes the new class with the fewest number of line differences is the best match and reported to be the new version of the old class.

Usage
ProFinder can be invoked directly on the CLI as long as it's executable (e.g. ./profinder.php --args). Alternatively you can use your preferred PHP binary (e.g. php profinder.php --args).

Code:


REQUIRED ARGUMENTS:
    -c, --classes            A comma-separated list of old classes to find in the new directory
    -o, --old                Path of the old version's smali directory
    -n, --new                Path of the new version's smali directory

 OPTIONS:
    -l, --sig-len            The minimum number of characters a particular signature string can be (default: 3)
    -m, --sig-match          The minimum pct of matches for a successful signature class match (default: 0.70)
    -r, --sig-occr          The minimum number of signature strings an old class must have  (default: 3)
    -t, --tolerance          Only apply the diff method on new classes +/- this number of lines (default: 150)
 
 FLAGS:
    -d, --always-try-diff    Use the diff method even if signature matching was successful (default: off)
    -v, --verbose            Enable additional descriptive log output (default: off)


ProFinder by Example
Here's a great example of ProFinder in action. We're going to use the script to update XHangouts to be compatible with Google Hangouts 3.3 from 3.1. This is something I've already done manually so it's a useful test for judging algorithmic accuracy against a known good set of values. If you'd like to follow along nab the Hangouts 3.1 and Hangouts 3.3 APKs from your favorite APK provider. Once that's done decompile them with Apktool. You can pass along the --no-res parameter since we only need the "smali" directory. After that we're ready to spin up ProFinder!

Code:


Klingon:APKs kmark$ ./profinder.php -o Hangouts3.1/decompiled/smali -n Hangouts3.3/decompiled/smali -c cnx,vo,wi,cmx,wq,cny,cnb,vz,cne,bev,bbn,eaf
ProFinder v1.0
-------------
cnx -> coa
vo -> vo
wi -> wi
cmx -> cna
wq -> wq
cny -> cob
cnb -> cne
vz -> vz
cne -> cnh
bev -> bex
bbn -> bbp
eaf -> efe


100% accuracy. Your milage may vary but this is a good example of what a naive algorithm can accomplish in a sizable real-world application.

Let's try that again, this time with verbose output enabled.




Code:


Klingon:APKs kmark$ ./profinder.php -v -o Hangouts3.1/decompiled/smali -n Hangouts3.3/decompiled/smali -c cnx,vo,wi,cmx,wq,cny,cnb,vz,cne,bev,bbn,eaf
ProFinder v1.0
 with PHP 5.6.11
-------------
Calculating old smali statistics...
Calculating new smali statistics...
5966 old classes, 6140 new classes
-------------
Finding potential matches for cnx...
This class has 35 signature strings.
Finding classes with matching signature strings...
Found 458 classes with at least one common signature.
cnx -> coa
 - 100.00% signature match
 - Next best match is acu with a 42.86% signature match.
-------------
Finding potential matches for vo...
This class has 0 signature strings.
Finding syntactically similar classes...
4503 classes fall within the line tolerance.
vo -> vo
 - 0 differences.
 - Next best match is vp with 76 differences.
-------------
Finding potential matches for wi...
This class has 4 signature strings.
Finding classes with matching signature strings...
Found 4 classes with at least one common signature.
wi -> wi
 - 100.00% signature match
 - Next best match is vz with a 50.00% signature match.
-------------
Finding potential matches for cmx...
This class has 76 signature strings.
Finding classes with matching signature strings...
Found 450 classes with at least one common signature.
cmx -> cna
 - 100.00% signature match
 - Next best match is cnc with a 36.84% signature match.
-------------
Finding potential matches for wq...
This class has 14 signature strings.
Finding classes with matching signature strings...
Found 43 classes with at least one common signature.
wq -> wq
 - 100.00% signature match
 - Next best match is eez with a 14.29% signature match.
-------------
Finding potential matches for cny...
This class has 29 signature strings.
Finding classes with matching signature strings...
Found 450 classes with at least one common signature.
cny -> cob
 - 100.00% signature match
 - Next best match is acu with a 27.59% signature match.
-------------
Finding potential matches for cnb...
This class has 0 signature strings.
Finding syntactically similar classes...
3812 classes fall within the line tolerance.
cnb -> cne
 - 1 differences.
 - Next best match is vi with 15 differences.
-------------
Finding potential matches for vz...
This class has 56 signature strings.
Finding classes with matching signature strings...
Found 11 classes with at least one common signature.
vz -> vz
 - 100.00% signature match
 - Next best match is vt with a 8.93% signature match.
-------------
Finding potential matches for cne...
This class has 220 signature strings.
Finding classes with matching signature strings...
Found 515 classes with at least one common signature.
cne -> cnh
 - 100.00% signature match
 - Next best match is adr with a 36.82% signature match.
-------------
Finding potential matches for bev...
This class has 1 signature strings.
Finding syntactically similar classes...
1790 classes fall within the line tolerance.
bev -> bex
 - 25 differences.
 - Next best match is dsc with 201 differences.
-------------
Finding potential matches for bbn...
This class has 0 signature strings.
Finding syntactically similar classes...
4020 classes fall within the line tolerance.
bbn -> bbp
 - 4 differences.
 - Next best match is jw with 29 differences.
-------------
Finding potential matches for eaf...
This class has 12 signature strings.
Finding classes with matching signature strings...
Found 2 classes with at least one common signature.
eaf -> efe
 - 100.00% signature match
 - Next best match is efd with a 8.33% signature match.





Snazzy. We can see when ProFinder falls back to the diff method when signature strings are of no help. And let's give it one more go with the --always-try-diff flag which will allow us to see what the diff method comes up with even if signature strings appeared to do the job.




Code:


Klingon:APKs kmark$ ./profinder.php -v --always-try-diff -o Hangouts3.1/decompiled/smali -n Hangouts3.3/decompiled/smali -c cnx,vo,wi,cmx,wq,cny,cnb,vz,cne,bev,bbn,eaf
ProFinder v1.0
 with PHP 5.6.11
-------------
Calculating old smali statistics...
Calculating new smali statistics...
5966 old classes, 6140 new classes
-------------
Finding potential matches for cnx...
This class has 35 signature strings.
Finding classes with matching signature strings...
Found 458 classes with at least one common signature.
cnx -> coa
 - 100.00% signature match
 - Next best match is acu with a 42.86% signature match.
Finding syntactically similar classes...
75 classes fall within the line tolerance.
cnx -> coa
 - 61 differences.
 - Next best match is drc with 932 differences.
-------------
Finding potential matches for vo...
This class has 0 signature strings.
Finding syntactically similar classes...
4503 classes fall within the line tolerance.
vo -> vo
 - 0 differences.
 - Next best match is vp with 76 differences.
-------------
Finding potential matches for wi...
This class has 4 signature strings.
Finding classes with matching signature strings...
Found 4 classes with at least one common signature.
wi -> wi
 - 100.00% signature match
 - Next best match is vz with a 50.00% signature match.
Finding syntactically similar classes...
2526 classes fall within the line tolerance.
wi -> wi
 - 0 differences.
 - Next best match is bsu with 151 differences.
-------------
Finding potential matches for cmx...
This class has 76 signature strings.
Finding classes with matching signature strings...
Found 450 classes with at least one common signature.
cmx -> cna
 - 100.00% signature match
 - Next best match is cnc with a 36.84% signature match.
Finding syntactically similar classes...
32 classes fall within the line tolerance.
cmx -> cna
 - 137 differences.
 - Next best match is ddc with 1488 differences.
-------------
Finding potential matches for wq...
This class has 14 signature strings.
Finding classes with matching signature strings...
Found 43 classes with at least one common signature.
wq -> wq
 - 100.00% signature match
 - Next best match is eez with a 14.29% signature match.
Finding syntactically similar classes...
95 classes fall within the line tolerance.
wq -> wq
 - 0 differences.
 - Next best match is bgy with 896 differences.
-------------
Finding potential matches for cny...
This class has 29 signature strings.
Finding classes with matching signature strings...
Found 450 classes with at least one common signature.
cny -> cob
 - 100.00% signature match
 - Next best match is acu with a 27.59% signature match.
Finding syntactically similar classes...
139 classes fall within the line tolerance.
cny -> cob
 - 48 differences.
 - Next best match is com/google/android/apps/hangouts/realtimechat/ShemTestingIntentService with 661 differences.
-------------
Finding potential matches for cnb...
This class has 0 signature strings.
Finding syntactically similar classes...
3812 classes fall within the line tolerance.
cnb -> cne
 - 1 differences.
 - Next best match is vi with 15 differences.
-------------
Finding potential matches for vz...
This class has 56 signature strings.
Finding classes with matching signature strings...
Found 11 classes with at least one common signature.
vz -> vz
 - 100.00% signature match
 - Next best match is vt with a 8.93% signature match.
Finding syntactically similar classes...
6 classes fall within the line tolerance.
vz -> vz
 - 0 differences.
 - Next best match is com/google/android/gms/people/accountswitcherview/SelectedAccountNavigationView with 3586 differences.
-------------
Finding potential matches for cne...
This class has 220 signature strings.
Finding classes with matching signature strings...
Found 515 classes with at least one common signature.
cne -> cnh
 - 100.00% signature match
 - Next best match is adr with a 36.82% signature match.
Finding syntactically similar classes...
3 classes fall within the line tolerance.
cne -> cnh
 - 229 differences.
 - Next best match is gju with 5615 differences.
-------------
Finding potential matches for bev...
This class has 1 signature strings.
Finding syntactically similar classes...
1790 classes fall within the line tolerance.
bev -> bex
 - 25 differences.
 - Next best match is dsc with 201 differences.
-------------
Finding potential matches for bbn...
This class has 0 signature strings.
Finding syntactically similar classes...
4020 classes fall within the line tolerance.
bbn -> bbp
 - 4 differences.
 - Next best match is jw with 29 differences.
-------------
Finding potential matches for eaf...
This class has 12 signature strings.
Finding classes with matching signature strings...
Found 2 classes with at least one common signature.
eaf -> efe
 - 100.00% signature match
 - Next best match is efd with a 8.33% signature match.
Finding syntactically similar classes...
73 classes fall within the line tolerance.
eaf -> efe
 - 117 differences.
 - Next best match is ddx with 1111 differences.





Cool. The diff method appeared to do just as well (albeit drastically slower) as signature strings when available.

System Requirements
As long as it can run a PHP >= 5.4 CLI script you're good. See below for specific Windows information. Why PHP? I wanted this sooner rather than later to assist with XHangouts support for Hangouts 4.0. I may reimplement this in Java (or something) eventually. The general concept would not be difficult to port.

Running on Windows
At the moment the diff method will not work on Windows since it requires the diff and wc utilities (plus pipes). I suppose it could be done under Cygwin but I have yet to try that. If you're ok settling with signature strings only just open up the script and place a continue; on a new line above verbose('Finding syntactically similar classes...');

Download
Available via GitHub Gist

Licensing

Code:


Copyright 2015 Kevin Mark

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


XDA:DevDB Information
ProFinder, Tool/Utility for all devices (see above for details)

Contributors
Kevin M
Source Code: https://gist.github.com/kmark/a01c1463242e435f6cb5


Version Information
Status: Stable
Current Stable Version: v1.0
Stable Release Date: 2015-08-21

Created 2015-08-21
Last Updated 2015-08-21














Attached Files






File Type: zip profinder.php.zip -
[Click for QR Code]
(4.5 KB)







Aucun commentaire:

Enregistrer un commentaire