Editor’s Observe: Maintaining with ever-changing low-level working system internals is not any small activity, particularly in the case of the iOS dyld cache, a key aspect that underpins how iOS apps work together with the OS layer. This submit, by Francesco Tamagni, focuses on leveraging foundational items of radare2 to learn and interpret the codecs important to performing cell safety investigations.
The usually-overlooked spine of our business are the people working to construct and preserve the instruments together with the safety researchers and pen testers who uncover important vulnerabilities and privateness points. Bear in mind, their work retains us all secure and Francesco’s effort to proceed to coach newcomers and skilled alike permits the broader group to do their jobs successfully. We’re proud that the NowSecure crew is amongst these main the cost, tackling these challenges, creating academic materials, and dealing to make sure that the instruments we depend on stay purposeful and strong to maximise the effectiveness of all pen testers and researchers, whilst the bottom shifts beneath all the group’s ft.
For the final eight years, my day-to-day job as a analysis engineer for NowSecure has been to assist create instruments to automate dynamic cell utility safety testing of iOS apps, and its orchestration on actual gadgets. An enormous a part of that occurs through dynamic binary instrumentation, due to Frida. This requires a deep understanding of how the system works, apps work together with it, and knowledge flows. That understanding can solely be achieved by reverse engineering cell apps and system elements.
Once I first started reversing iOS apps, I quickly found that the system library information don’t reside on the file system, definitely not within the path pointed to by the dynamic linking data. That baffled me at first, because it in all probability does most individuals who embark on reverse engineering Apple platforms.
It seems that every one these libraries are as a substitute prelinked collectively in a single huge executable file known as dyld shared library cache. This file is then mapped within the handle area of all executables working on the system, by the dynamic loader and linker (dyld).
With the intention to look inside these libraries, it is advisable get aware of the dyld shared library cache (DSC) and the instruments obtainable to navigate these huge binary blobs. My instrument of selection for this activity is radare2, which I like, and for which I’ve been contributing plenty of code to help the continually evolving construction of Apple’s DSC.
This primary installment of a three-part collection of weblog posts covers the fundamentals: the right way to receive the DSC and use radare2 to open and navigate dyld shared caches, their metadata, and the code they include.
That lays the inspiration for the next posts, which is able to cowl discovering cross-references as a result of it’s a primary side of reverse engineering and information you with examples.
DSC from Above
The DSC resides on the machine’s file system. Its path and the variety of information on which it’s cut up into depend upon the OS, its model and the {hardware} sort:
macOS | One in every of: /System/Library/dyld/dyld_shared_cache_ /System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_ |
iOS | One in every of: /System/Library/Caches/com.apple.dyld/dyld_shared_cache_ /personal/preboot/Cryptexes/OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_ |
Simulators | They’re below ~/Library/Developer/CoreSimulator/Caches/dyld for every put in simulator runtime. |
Beginning across the time of iOS 15.x, Apple began to separate the DSC into a number of information, with the identical naming conference as above, plus a progressive-number suffix. The variety of splits varies wildly throughout programs and variations, between one and some tens.
The DSC incorporates many of the system libraries and the majority of the entire executable code shipped with the OS, so that they’re giant by nature, totalling between 1 and 4 GB per OS model.
How you can Get the DSC Information
There are a number of other ways to acquire the DSC file(s):
- The simplest and most dependable approach is to make use of the advantageous ipsw open supply instrument by Blacktop which (amongst different wonderful issues) automates the obtain of the best IPSW file and the extraction of the dyld cache from its file system:
- Instance obtain: ipsw obtain appledb –os iOS –model ‘17.2.1’ –machine iPhone15,4
- Instance extract: ipsw extract iPhone15,4_17.2.1_21C66_Restore.ipsw -d
- Alternatively, it’s doable to retrieve them manually from the IPSW for the goal os (by downloading e.g. from from ipsw.me).
- extract the second-largest dmg from ipsw zip (was the most important one for pre-cryptex period variations)
- file system pictures on iOS 18 ipsw information are additionally encrypted
- mount it on macOS and fetch the information in accordance with the paths above
- extract the second-largest dmg from ipsw zip (was the most important one for pre-cryptex period variations)
- You may even extract them from a working machine, however this requires further care and energy to succeed and probably a separate weblog submit so I gained’t go into particulars about it now.
Contained in the DSC
Earlier than opening the DSC utilizing radare2, let’s bear in mind a number of guiding ideas:
- We wish to open the entire cache, not extract single libraries in separate information.
- There are instruments on the market (together with Xcode) which produce single dylib information from the cache, however the result’s incomplete as a result of the libraries get remodeled whereas being linked collectively in a single executable. Specifically, every library can name others simply by direct (or oblique) branches, as a substitute of counting on the standard mechanism of dynamic importing of public symbols. This makes the bundling of libraries within the DSC fairly a one-way and “lossy” course of
- The cache code is rather a lot, so we don’t wish to analyze all of it blindly and wait hours; as a substitute we’ll need to give attention to simply what’s needed.
- The information extracted by Xcode, positioned at “~/Library/Developer/Xcode/iOS DeviceSupport/” can nonetheless be helpful for some preliminary grepping in an effort to slim down the libraries we’re focused on
- As all the time with radare2, the extra a priori data we result in the issue we’re making an attempt to unravel, the upper the reward by way of efficiency we get from the instrument.
Warning: maintain calm and use radare2 from GitHub!
That is an previous mantra (there are actual t-shirts on the market with this slogan) however it’s nonetheless true!
Since radare2 is continually evolving and bleeding edge by nature, we all the time advocate to make use of the newest code straight from the Git repository, Listed here are the full set up directions however if you happen to’re on a Unix-based machine (together with macOS), it’s merely a matter of opening a terminal and typing:
Opening the Entire Dyld Cache
To open the DSC, we have to specify the trail utilizing the dsc:// URL scheme, which tells r2 to make use of the DSC-specific I/O plugin. This takes care of rebasing pointers below the hood, and abstracts the presence of a number of information within the cache. If the cache consists of a number of information, simply level it to the primary one (the one with out the numerical suffix).
If we open the cache immediately:
a warning message is instantly printed:
the place r2 is suggesting us to set the R_DYLDCACHE_FILTER atmosphere variable to limit the set of libraries for which metadata (like symbols, strings, and so forth.) can be loaded:
- It’s a colon-separated listing of library names
- They are going to be matched towards the precise listing of libraries current within the cache (case delicate, partial match on full paths)
- The matching libraries are loaded, and their direct dependencies
- However the entire cache will nonetheless be mapped and visual
That’s a approach to reduce the preliminary loading time and reminiscence overhead down, and requires realizing forward of time what libraries we’re focused on (at the least vaguely). For instance, if we wish the Basis framework, the libSystem sub-libraries and libdispatch, we will run it like this:
This filter would be the one utilized to all examples beneath, except specified in any other case.
Navigating the Libraries
As soon as the cache file is open, all of the resident libraries contribute their very own Mach-O sections, and r2 treats the consequence as a single executable, the place every part title is prefixed with the library path it’s originating from.
For instance, let’s listing all sections from all of the libraries that are at the moment loaded utilizing the iS command. Right here, the output is truncated after the primary 100 strains (for brevity):
Strings, symbols and lessons are all loaded for the libraries matching the filter (and their dependencies), and may be accessed through the r2 instructions used for doing the identical on common executables.
Essentially the most handy approach to get to named entities in r2 is to listing flags (f command) after which grep that listing (~) for partial names. For instance:
To visualise lengthy lists and grep them interactively, r2 supplies the ~… command which may be very helpful. It additionally has the visible browse mode, reachable through the Vb command, which permits customers to visually navigate gadgets like flags and lessons. These instructions are fairly laborious to point out in a weblog submit, although, so I encourage readers to attempt them firsthand.
Any digital handle can then be seeked to (s command):
Then, for instance, let’s disassemble a number of directions utilizing pd:
In essence, any r2 command can be utilized simply the identical as it might be on single executables / libraries.
Watch out to keep away from instructions that require analyzing the entire code or mapped reminiscence. We’ll later share examples of discovering cross-references, the place for efficiency causes we’ll have to limit evaluation to solely the parts of code we’re focused on.
Get Data Concerning the Cache Itself
Every file of those composing the DSC defines a set of maps which dyld then makes use of to load particular parts of code and knowledge into the corresponding digital reminiscence addresses.
The DSC reminiscence maps may be proven utilizing the iSS (data, listing segments) command:
The place the paddr column exhibits the offset within the DSC file, and the vaddr column exhibits the corresponding handle in reminiscence (with no ASLR slide utilized). The Mach-O segments of the one libraries are all clumped collectively within the cache maps with the corresponding reminiscence entry permissions.
Observe that the default handle that r2 takes us to when the DSC is opened (0x180000000) is the digital handle of the primary map outlined within the first file, which can also be the place the “essential” DSC header is positioned.
With the intention to get details about the DSC header itself and the executable pictures contained within the cache, the iH command (data, header) can be utilized (right here truncated for brevity):
The output is in JSON by default, and it’s helpful for automation duties (like r2pipe scripting, as we’ll see shortly).
Exported Symbols vs. Debug Symbols
The entire listing of symbols outlined by the libraries which match the loading filter is seen with the is command (and its variants). This contains each inside and exterior (exported) ones.
To get solely the listing of exported symbols, the iE command (and its variants) can be utilized as a substitute.
For example, let’s lookup the swift_retainCount image utilizing the isq command (sooner and fewer verbose than plain is):
To examine if it’s exported, then we will grep once more its title (or its handle) on the output of iEq (data, exported, quiet):
As a result of it’s returned once more, meaning it’s an exported image.
If we do the identical with an emblem which is inside, we get as a substitute:
The place the second command doesn’t yield outcomes, which means that _swift_xpc_retain just isn’t exported, due to this fact it (usually) can’t be referred to as from exterior the library which defines it.
Remember that all symbols can be found as flags too, prefixed with the sym. “flag area” identifier. For instance:
Exploring Goal-C Lessons
Even when most of native app improvement on Apple platforms these days has transitioned to Swift, many system libraries are nonetheless applied in Goal-C, making DSC Goal-C lessons an vital reversing goal.
All of the Goal-C lessons current within the libraries which match the filter can be found in r2, similar to it occurs when opening single executables, utilizing the ic command and subcommands.
iOS system libraries are prelinked collectively in a single huge executable file known as dyld shared library cache.
Nonetheless, over time, Apple has been including varied optimizations particular to the DSC for a way Goal-C metadata is encoded and retrieved — and that’s one thing reverse engineering instruments have to be continually up to date to help.
For instance, on current caches which encode Goal-C class metadata within the “listing of lists” format, while you enumerate the strategies of a category, all strategies from classes on that class are current, whatever the precise library which defines them. On one hand, this might end in fairly lengthy lists of strategies, alternatively, it makes it simpler to search out arcane strategies outlined by classes.
Thankfully, as we already noticed within the examples above, proscribing lengthy lists by grepping in r2 is only a matter of utilizing the ~ command on the output of different instructions.
A great way to visualise this abundance of strategies from classes it to listing all strategies of the NSObject class containing the phrase “carry out”:
One other current optimization Apple added to Goal-C is the power to exclude some strategies from the Goal-C runtime, in reality remodeling them into regular C capabilities. When that occurs within the DSC, we nonetheless can see the debug image (if current) for these strategies, however they gained’t seem within the ic output.
This may be seen by evaluating symbols or flags with the ic output, for instance the _NSPredicateUtilities class has a _predicateSecurityAction class methodology which known as internally from inside the Basis framework, however it’s not a part of the Goal-C runtime, so the ic command doesn’t listing it:
Nonetheless, it’s nonetheless current as a debug image, seen to the isq command (and likewise listed as a flag):
Which makes it doable to go have a look at the disassembly and examine its logic.
One other helpful function when exploring Goal-C lessons is itemizing the ivars too. A simple approach to do it in r2 is to question flags once more and grep for the category title, “discipline.class” and “var”:
Gaining visibility into ivars is vital as a result of they’re offsets into the inner state of the category, which in flip could or will not be uncovered by way of getters and setters. Even when accessor capabilities are current, although, the category code normally refers to its personal fields immediately, and with the ability to see their names vastly helps in understanding what the code does.
Automating Duties with r2pipe
There are repetitive duties which may simply be automated utilizing r2 scripting functionalities. The one I normally go for is r2pipe, which is a approach to execute r2 instructions from scripts in one of many (many) supported languages.
For the sake of this weblog submit collection I’m going to make use of python, however r2pipe can be utilized in the identical approach from many languages.
Now I’m going to introduce an r2pipe script which can be helpful a number of instances throughout the collection, and is straightforward sufficient to speak by way of chunk-by-chunk.
The “dyld_what” Script
This script’s function is to print the trail of the library (if any) to which any handle within the DSC belongs. It comes helpful when it’s essential to refine the loading filter, as we’ll see within the instance beneath.
The handle to lookup (or the corresponding flag title) may be handed as an argument. If no argument is handed, it appears up the present handle.
This works whatever the preliminary filter we set, by utilizing the metadata from the DSC header to binary-search all of the executable pictures for the handle we offer.
It’s designed to work from inside an r2 session, and may be invoked from the r2 immediate utilizing the #!pipe command:
The complete code of the script may be discovered on this Github’s gist: https://gist.github.com/mrmacete/e061f0f0d38a96c75f8177747c26ea01.
It begins with the import statements the place, amongst different issues wanted for this specific script, we import the r2pipe python bundle (be sure you set up it first utilizing pip, as acknowledged within the docs). After that we will open the pipe to the present r2 session, by calling r2pipe.open() with out arguments:
Now we will use the r2 variable to execute instructions and get the output from them as string with r2.cmd() or as a parsed JSON object utilizing r2.cmdj() and that’s just about all there may be to find out about r2pipe!
So let’s see it in motion, because the script proceeds to get the array of pictures from the header utilizing the iH command:
From which we will create the center of this script, which performs a binary search on the pictures, assuming they will’t overlap. For doing that we have to type the array we simply acquired, and extract a helper array with simply the addresses, so we will leverage the bisect python bundle:
The get_path_if_contains double checks if the handle falls into the candidate picture, as a result of pictures are literally interleaved with stubs islands which don’t belong to any specific picture however are shared amongst totally different ones (we’ll see that in a later submit about discovering references throughout libraries).
What stays of the script offers with the enter argument, which might be a numeric handle, a flag title or nothing in any respect.
In case no argument is handed now we have to resolve the worth of the $$ variable which holds the present handle:
As an alternative if we acquired an argument, it could be a named flag, so now we have to resolve its handle by changing regardless of the enter is to a numeric worth:
Lastly, the principle entry factors places all of it collectively:
To make use of this script, we will outline an alias from the r2 immediate, or from ~/.radare2rc referred to as $what (quotes are vital):
On this approach we’ll have the $what command to which we will present the argument immediately as a substitute of writing the lengthy pipe command.
Instance: utilizing $what to refine the filter
As we already know, on fashionable DSC, Goal-C lessons include strategies mixed from all classes on all frameworks, even when they’re not current within the filter. To slim down the above instance, on NSObject now we have:
But when we have a look at the classes on NSObject loaded with the present filter, there’s nothing clearly chargeable for that:
Let’s use the $what alias to run our r2pipe script and uncover which library the implementation of that methodology belongs to, by passing it the handle:
Now, if we reopen the DSC with /IMSharedUtilities.framework added to the filter:
And listing once more the classes on NSObject, this time we get:
The place the final one appears promising:
And we will see the total set of options this class is including, and since now the accountable framework belongs to the filter, we will additionally see all symbols and lessons associated to it in case now we have to dig deeper into how these functionalities are applied.
Conclusion
For those who reached this level, hopefully you made it by way of the primary weblog submit about reversing iOS system libraries utilizing radare2. That was only the start, however now you’re outfitted to leap into experimenting with it by yourself.The following posts will go in depth about discovering cross references, first inside single libraries then throughout totally different libraries. Till then, cling tight and be happy to succeed in out with any questions. Points and pull requests are additionally appreciated on radare2’s GitHub.