ProFinder is a tool which identifies classes between two versions of decompiled ProGuard obfuscated applications.
When updating XHangouts, an Xposed module, I have to change the obfuscated class names used for hooking every time a new release of Hangouts is made available. Since the underlying Hangouts code is usually the same there is (usually) no need to change anything in the module except the actual class name this is hooked.
Between two minor revisions of a particular application many classes are often left relatively similar. Of course under normal circumstances the class name is the best method of tracking a class from one revision to the next. Unfortunately ProGuarded applications like Google Hangouts prevent this obvious approach since an obfuscated class could be easily named "avq" in v1.0 and, for instance, "bbr" in v1.1 despite very few changes within.
Manually finding the new class names of every class XHangouts uses is a real pain. Google Hangouts 4.0 has over 8600 classes to compare against. That's up from "just" about 6000 classes in Hangouts 3.3. ProFinder can do something in 30 seconds with a fair degree of accuracy what could take me an hour or more by hand.
Algorithm
ProFinder employs two different methods of identifying classes which reflects two properties an "old" (the one we know) version of a class shares with its "new" (the one we are searching for) version. Signature strings and syntactical similarities.
Signature Strings
The signature string method uses literal const-strings (constant strings) embedded within a class. Strings found within a class that meet a customizable length requirement are considered signature strings. These strings help to make up a class's signature or identifier. Since individual signature strings are not guaranteed to be unique this method is only reliable for classes which are composed of multiple sig strings. The more, the better (i.e. more identifying). The signature string "finding logic" is run over both the old class and every new class. The old class's signature strings are compared against those of every new class. The class with the largest number of intersecting or shared signature strings is the best match. If the percentage of intersecting strings (a ratio of intersections / total) is greater than or equal to a customizable amount, the algorithm has succeeded and the best match is considered the new version of the old class.
Diff
The diff method is a visual algorithm that compares the number of line-by-line syntactical similarities between the old class and every new class which meets a customizable line tolerance requirement. The core of this algorithm is really just
. Since running the diff utility on thousands of classes (many of which are obviously vastly different) is a fairly time consuming operation the line tolerance requirement ensures that we only compare against new classes that are within a certain plus or minus total line-number range of the old class. For example, if our old class is 500 lines long and the line tolerance is 50 we will only diff new classes that are between 450 and 550 lines inclusively. Once diff has been run against all the applicable new classes the new class with the fewest number of line differences is the best match and reported to be the new version of the old class.
Usage
ProFinder can be invoked directly on the CLI as long as it's executable (e.g. ./profinder.php --args). Alternatively you can use your preferred PHP binary (e.g. php profinder.php --args).
ProFinder by Example
Here's a great example of ProFinder in action. We're going to use the script to update XHangouts to be compatible with Google Hangouts 3.3 from 3.1. This is something I've already done manually so it's a useful test for judging algorithmic accuracy against a known good set of values. If you'd like to follow along nab the Hangouts 3.1 and Hangouts 3.3 APKs from your favorite APK provider. Once that's done decompile them with Apktool. You can pass along the --no-res parameter since we only need the "smali" directory. After that we're ready to spin up ProFinder!
100% accuracy. Your milage may vary but this is a good example of what a naive algorithm can accomplish in a sizable real-world application.
Let's try that again, this time with verbose output enabled.
Snazzy. We can see when ProFinder falls back to the diff method when signature strings are of no help. And let's give it one more go with the --always-try-diff flag which will allow us to see what the diff method comes up with even if signature strings appeared to do the job.
Cool. The diff method appeared to do just as well (albeit drastically slower) as signature strings when available.
System Requirements
As long as it can run a PHP >= 5.4 CLI script you're good. See below for specific Windows information. Why PHP? I wanted this sooner rather than later to assist with XHangouts support for Hangouts 4.0. I may reimplement this in Java (or something) eventually. The general concept would not be difficult to port.
Running on Windows
At the moment the diff method will not work on Windows since it requires the diff and wc utilities (plus pipes). I suppose it could be done under Cygwin but I have yet to try that. If you're ok settling with signature strings only just open up the script and place a continue; on a new line above verbose('Finding syntactically similar classes...');
Download
Available via GitHub Gist
Licensing
XDA:DevDB Information
ProFinder, Tool/Utility for all devices (see above for details)
Contributors
Kevin M
Source Code: https://gist.github.com/kmark/a01c1463242e435f6cb5
Version Information
Status: Stable
Current Stable Version: v1.0
Stable Release Date: 2015-08-21
Created 2015-08-21
Last Updated 2015-08-21
When updating XHangouts, an Xposed module, I have to change the obfuscated class names used for hooking every time a new release of Hangouts is made available. Since the underlying Hangouts code is usually the same there is (usually) no need to change anything in the module except the actual class name this is hooked.
Between two minor revisions of a particular application many classes are often left relatively similar. Of course under normal circumstances the class name is the best method of tracking a class from one revision to the next. Unfortunately ProGuarded applications like Google Hangouts prevent this obvious approach since an obfuscated class could be easily named "avq" in v1.0 and, for instance, "bbr" in v1.1 despite very few changes within.
Manually finding the new class names of every class XHangouts uses is a real pain. Google Hangouts 4.0 has over 8600 classes to compare against. That's up from "just" about 6000 classes in Hangouts 3.3. ProFinder can do something in 30 seconds with a fair degree of accuracy what could take me an hour or more by hand.
Algorithm
ProFinder employs two different methods of identifying classes which reflects two properties an "old" (the one we know) version of a class shares with its "new" (the one we are searching for) version. Signature strings and syntactical similarities.
Signature Strings
The signature string method uses literal const-strings (constant strings) embedded within a class. Strings found within a class that meet a customizable length requirement are considered signature strings. These strings help to make up a class's signature or identifier. Since individual signature strings are not guaranteed to be unique this method is only reliable for classes which are composed of multiple sig strings. The more, the better (i.e. more identifying). The signature string "finding logic" is run over both the old class and every new class. The old class's signature strings are compared against those of every new class. The class with the largest number of intersecting or shared signature strings is the best match. If the percentage of intersecting strings (a ratio of intersections / total) is greater than or equal to a customizable amount, the algorithm has succeeded and the best match is considered the new version of the old class.
Diff
The diff method is a visual algorithm that compares the number of line-by-line syntactical similarities between the old class and every new class which meets a customizable line tolerance requirement. The core of this algorithm is really just
Code:
diff -y --suppress-common-lines old.smali new.smali | wc -lUsage
ProFinder can be invoked directly on the CLI as long as it's executable (e.g. ./profinder.php --args). Alternatively you can use your preferred PHP binary (e.g. php profinder.php --args).
Code:
REQUIRED ARGUMENTS:
-c, --classes A comma-separated list of old classes to find in the new directory
-o, --old Path of the old version's smali directory
-n, --new Path of the new version's smali directory
OPTIONS:
-l, --sig-len The minimum number of characters a particular signature string can be (default: 3)
-m, --sig-match The minimum pct of matches for a successful signature class match (default: 0.70)
-r, --sig-occr The minimum number of signature strings an old class must have (default: 3)
-t, --tolerance Only apply the diff method on new classes +/- this number of lines (default: 150)
FLAGS:
-d, --always-try-diff Use the diff method even if signature matching was successful (default: off)
-v, --verbose Enable additional descriptive log output (default: off)Here's a great example of ProFinder in action. We're going to use the script to update XHangouts to be compatible with Google Hangouts 3.3 from 3.1. This is something I've already done manually so it's a useful test for judging algorithmic accuracy against a known good set of values. If you'd like to follow along nab the Hangouts 3.1 and Hangouts 3.3 APKs from your favorite APK provider. Once that's done decompile them with Apktool. You can pass along the --no-res parameter since we only need the "smali" directory. After that we're ready to spin up ProFinder!
Code:
Klingon:APKs kmark$ ./profinder.php -o Hangouts3.1/decompiled/smali -n Hangouts3.3/decompiled/smali -c cnx,vo,wi,cmx,wq,cny,cnb,vz,cne,bev,bbn,eaf
ProFinder v1.0
-------------
cnx -> coa
vo -> vo
wi -> wi
cmx -> cna
wq -> wq
cny -> cob
cnb -> cne
vz -> vz
cne -> cnh
bev -> bex
bbn -> bbp
eaf -> efeLet's try that again, this time with verbose output enabled.
Snazzy. We can see when ProFinder falls back to the diff method when signature strings are of no help. And let's give it one more go with the --always-try-diff flag which will allow us to see what the diff method comes up with even if signature strings appeared to do the job.
Cool. The diff method appeared to do just as well (albeit drastically slower) as signature strings when available.
System Requirements
As long as it can run a PHP >= 5.4 CLI script you're good. See below for specific Windows information. Why PHP? I wanted this sooner rather than later to assist with XHangouts support for Hangouts 4.0. I may reimplement this in Java (or something) eventually. The general concept would not be difficult to port.
Running on Windows
At the moment the diff method will not work on Windows since it requires the diff and wc utilities (plus pipes). I suppose it could be done under Cygwin but I have yet to try that. If you're ok settling with signature strings only just open up the script and place a continue; on a new line above verbose('Finding syntactically similar classes...');
Download
Available via GitHub Gist
Licensing
Code:
Copyright 2015 Kevin Mark
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.ProFinder, Tool/Utility for all devices (see above for details)
Contributors
Kevin M
Source Code: https://gist.github.com/kmark/a01c1463242e435f6cb5
Version Information
Status: Stable
Current Stable Version: v1.0
Stable Release Date: 2015-08-21
Created 2015-08-21
Last Updated 2015-08-21
Aucun commentaire:
Enregistrer un commentaire