Reversing iOS System Libraries Utilizing Radare2: A Deep Dive into Dyld Cache (Half 1)

0
30
Reversing iOS System Libraries Utilizing Radare2: A Deep Dive into Dyld Cache (Half 1)


Editor’s Be aware: Maintaining with ever-changing low-level working system internals is not any small process, particularly in the case of the iOS dyld cache, a key component that underpins how iOS apps work together with the OS layer. This submit, by Francesco Tamagni, focuses on leveraging foundational items of radare2 to learn and interpret the codecs essential to performing cellular safety investigations. 

The customarily-overlooked spine of our business are the people working to construct and keep the instruments together with the safety researchers and pen testers who uncover essential vulnerabilities and privateness points. Bear in mind, their work retains us all protected and Francesco’s effort to proceed to teach newcomers and skilled alike permits the broader neighborhood to do their jobs successfully. We’re proud that the NowSecure group is amongst these main the cost, tackling these challenges, creating instructional materials, and dealing to make sure that the instruments we depend on stay useful and strong to maximise the effectiveness of all pen testers and researchers, whilst the bottom shifts beneath all the neighborhood’s toes.

For the final eight years, my day-to-day job as a analysis engineer for NowSecure has been to assist create instruments to automate dynamic cellular software safety testing of iOS apps, and its orchestration on actual units. A giant a part of that occurs by way of dynamic binary instrumentation, because of Frida. This requires a deep understanding of how the system works, apps work together with it, and knowledge flows. That understanding can solely be achieved by reverse engineering cellular apps and system elements.

After I first started reversing iOS apps, I quickly found that the system library recordsdata don’t reside on the file system, actually not within the path pointed to by the dynamic linking data. That baffled me at first, because it in all probability does most individuals who embark on reverse engineering Apple platforms.

It seems that every one these libraries are as a substitute prelinked collectively in a single huge executable file known as dyld shared library cache. This file is then mapped within the tackle house of all executables operating on the system, by the dynamic loader and linker (dyld).

To be able to look inside these libraries, it is advisable to get acquainted with the dyld shared library cache (DSC) and the instruments out there to navigate these huge binary blobs. My device of selection for this process is radare2, which I like, and for which I’ve been contributing lots of code to help the consistently evolving construction of Apple’s DSC.

This primary installment of a three-part collection of weblog posts covers the fundamentals: receive the DSC and use radare2 to open and navigate dyld shared caches, their metadata, and the code they include. 

That lays the muse for the next posts, which can cowl discovering cross-references as a result of it’s a fundamental facet of reverse engineering and information you with examples.

DSC from Above

The DSC resides on the machine’s file system. Its path and the variety of recordsdata on which it’s cut up into depend upon the OS, its model and the {hardware} kind:

macOS Considered one of:
/System/Library/dyld/dyld_shared_cache_
/System/Volumes/Preboot/Cryptexes/OS/System/Library/dyld/dyld_shared_cache_
iOS Considered one of:
/System/Library/Caches/com.apple.dyld/dyld_shared_cache_
/non-public/preboot/Cryptexes/OS/System/Library/Caches/com.apple.dyld/dyld_shared_cache_
Simulators They’re below ~/Library/Developer/CoreSimulator/Caches/dyld for every put in simulator runtime.

Beginning across the time of iOS 15.x, Apple began to separate the DSC into a number of recordsdata, with the identical naming conference as above, plus a progressive-number suffix. The variety of splits varies wildly throughout techniques and variations, between one and some tens.

The DSC incorporates many of the system libraries and the majority of the entire executable code shipped with the OS, in order that they’re massive by nature, totalling between 1 and 4 GB per OS model.

The way to Get the DSC Information

There are a couple of other ways to acquire the DSC file(s):

  • The simplest and most dependable manner is to make use of the high quality ipsw open supply device by Blacktop which (amongst different wonderful issues) automates the obtain of the precise IPSW file and the extraction of the dyld cache from its file system:
    • Instance obtain: ipsw obtain appledb –os iOS –model ‘17.2.1’ –machine iPhone15,4
    • Instance extract: ipsw extract iPhone15,4_17.2.1_21C66_Restore.ipsw -d
  • Alternatively, it’s potential to retrieve them manually from the IPSW for the goal os (by downloading e.g. from from ipsw.me).
    • extract the second-largest dmg from ipsw zip (was the most important one for pre-cryptex period variations)
      • file system pictures on iOS 18 ipsw recordsdata are additionally encrypted
    • mount it on macOS and fetch the recordsdata in accordance with the paths above
  • You possibly can even extract them from a operating machine, however this requires further care and energy to succeed and probably a separate weblog submit so I received’t go into particulars about it now.

Contained in the DSC

Earlier than opening the DSC utilizing radare2, let’s consider a couple of guiding ideas:

  • We need to open the entire cache, not extract single libraries in separate recordsdata.
    • There are instruments on the market (together with Xcode) which produce single dylib recordsdata from the cache, however the result’s incomplete as a result of the libraries get reworked whereas being linked collectively in a single executable. Specifically, every library can name others simply by direct (or oblique) branches, as a substitute of counting on the standard mechanism of dynamic importing of public symbols. This makes the bundling of libraries within the DSC fairly a one-way and “lossy” course of
  • The cache code is quite a bit, so we don’t need to analyze all of it blindly and wait hours; as a substitute we’ll need to give attention to simply what’s crucial.
    • The recordsdata extracted by Xcode, positioned at “~/Library/Developer/Xcode/iOS DeviceSupport/” can nonetheless be helpful for some preliminary grepping in an effort to slender down the libraries we’re occupied with
  • As all the time with radare2, the extra a priori data we result in the issue we’re making an attempt to resolve, the upper the reward when it comes to efficiency we get from the device.

Warning: maintain calm and use radare2 from GitHub!

That is an previous mantra (there are actual t-shirts on the market with this slogan) however it’s nonetheless true!

Since radare2 is consistently evolving and bleeding edge by nature, we all the time suggest to make use of the most recent code straight from the Git repository, Listed here are the full set up directions however when you’re on a Unix-based machine (together with macOS), it’s merely a matter of opening a terminal and typing:

Reversing iOS System Libraries Utilizing Radare2: A Deep Dive into Dyld Cache (Half 1)

Opening the Entire Dyld Cache

To open the DSC, we have to specify the trail utilizing the dsc:// URL scheme, which tells r2 to make use of the DSC-specific I/O plugin. This takes care of rebasing pointers below the hood, and abstracts the presence of a number of recordsdata within the cache. If the cache consists of a number of recordsdata, simply level it to the primary one (the one with out the numerical suffix).

If we open the cache instantly:

a warning message is straight away printed:

the place r2 is suggesting us to set the R_DYLDCACHE_FILTER setting variable to limit the set of libraries for which metadata (like symbols, strings, and so on.) can be loaded:

  • It’s a colon-separated checklist of library names
  • They are going to be matched towards the precise checklist of libraries current within the cache (case delicate, partial match on full paths)
  • The matching libraries are loaded, and their direct dependencies
    • However the entire cache will nonetheless be mapped and visual

That’s a solution to minimize the preliminary loading time and reminiscence overhead down, and requires realizing forward of time what libraries we’re occupied with (not less than vaguely). For instance, if we would like the Basis framework, the libSystem sub-libraries and libdispatch, we are able to run it like this:

This filter would be the one utilized to all examples beneath, except specified in any other case.

Navigating the Libraries

As soon as the cache file is open, all of the resident libraries contribute their very own Mach-O sections, and r2 treats the outcome as a single executable, the place every part title is prefixed with the library path it’s originating from.

For instance, let’s checklist all sections from all of the libraries that are presently loaded utilizing the iS command. Right here, the output is truncated after the primary 100 traces (for brevity):

Strings, symbols and lessons are all loaded for the libraries matching the filter (and their dependencies), and could be accessed by way of the r2 instructions used for doing the identical on common executables.

Probably the most handy solution to get to named entities in r2 is to checklist flags (f command) after which grep that checklist (~) for partial names. For instance:

To visualise lengthy lists and grep them interactively, r2 gives the ~…  command which may be very useful. It additionally has the visible browse mode, reachable by way of the Vb command, which permits customers to visually navigate objects like flags and lessons. These instructions are fairly onerous to indicate in a weblog submit, although, so I encourage readers to attempt them firsthand.

Any digital tackle can then be seeked to (s command):

Then, for instance, let’s disassemble a couple of directions utilizing pd:

In essence, any r2 command can be utilized simply the identical as it will be on single executables / libraries.

Watch out  to keep away from instructions that require analyzing the entire code or mapped reminiscence. We’ll later share examples of discovering cross-references, the place for efficiency causes we’ll have to limit evaluation to solely the parts of code we’re occupied with.

Get Data Concerning the Cache Itself

Every file of those composing the DSC defines a set of maps which dyld then makes use of to load particular parts of code and knowledge into the corresponding digital reminiscence addresses.

The DSC reminiscence maps could be proven utilizing the iSS (data, checklist segments) command:

The place the paddr column exhibits the offset within the DSC file, and the vaddr column exhibits the corresponding tackle in reminiscence (with no ASLR slide utilized). The Mach-O segments of the one libraries are all clumped collectively within the cache maps with the corresponding reminiscence entry permissions.

Be aware that the default tackle that r2 takes us to when the DSC is opened (0x180000000) is the digital tackle of the primary map outlined within the first file, which can be the place the “essential” DSC header is positioned.

To be able to get details about the DSC header itself and the executable pictures contained within the cache, the iH command (data, header) can be utilized (right here truncated for brevity):

The output is in JSON by default, and it’s useful for automation duties (like r2pipe scripting, as we’ll see shortly).

Exported Symbols vs. Debug Symbols

The whole checklist of symbols outlined by the libraries which match the loading filter is seen with the is command (and its variants). This contains each inner and exterior (exported) ones.

To get solely the checklist of exported symbols, the iE command (and its variants) can be utilized as a substitute.

For instance, let’s lookup the swift_retainCount image utilizing the isq command (quicker and fewer verbose than plain is):

To verify if it’s exported, then we are able to grep once more its title (or its tackle) on the output of iEq (data, exported, quiet):

As a result of it’s returned once more, meaning it’s an exported image.

If we do the identical with a logo which is inner, we get as a substitute:

The place the second command doesn’t yield outcomes, which means that _swift_xpc_retain just isn’t exported, due to this fact it (usually) can’t be known as from outdoors the library which defines it.

Remember that all symbols can be found as flags too, prefixed with the sym. “flag house” identifier. For instance:

Exploring Goal-C Lessons

Even when most of native app improvement on Apple platforms these days has transitioned to Swift, many system libraries are nonetheless applied in Goal-C, making DSC Goal-C lessons an vital reversing goal.

All of the Goal-C lessons current within the libraries which match the filter can be found in r2, similar to it occurs when opening single executables, utilizing the ic command and subcommands.


iOS system libraries are prelinked collectively in a single huge executable file known as dyld shared library cache.

Nonetheless, over time, Apple has been including numerous optimizations particular to the DSC for the way Goal-C metadata is encoded and retrieved — and that’s one thing reverse engineering instruments have to be consistently up to date to help.

For instance, on current caches which encode Goal-C class metadata within the “checklist of lists” format, whenever you enumerate the strategies of a category, all strategies from classes on that class are current, whatever the precise library which defines them. On one hand, this might end in fairly lengthy lists of strategies, then again, it makes it simpler to seek out arcane strategies outlined by classes.

Thankfully, as we already noticed within the examples above, proscribing lengthy lists by grepping in r2 is only a matter of utilizing the ~ command on the output of different instructions.

A great way to visualise this abundance of strategies from classes it to checklist all strategies of the NSObject class containing the phrase “carry out”:

One other current optimization Apple added to Goal-C is the power to exclude some strategies from the Goal-C runtime, the truth is remodeling them into regular C capabilities. When that occurs within the DSC, we nonetheless can see the debug image (if current) for these strategies, however they received’t seem within the ic output.

This may be seen by evaluating symbols or flags with the ic output, for instance the _NSPredicateUtilities class has a _predicateSecurityAction class methodology which is named internally from inside the Basis framework, however it isn’t a part of the Goal-C runtime, so the ic command doesn’t checklist it:

Nonetheless, it’s nonetheless current as a debug image, seen to the isq command (and in addition listed as a flag):

Which makes it potential to go have a look at the disassembly and examine its logic.

One other useful characteristic when exploring Goal-C lessons is itemizing the ivars too. A straightforward solution to do it in r2 is to question flags once more and grep for the category title, “subject.class” and “var”:

Gaining visibility into ivars is vital as a result of they’re offsets into the inner state of the category, which in flip could or might not be uncovered via getters and setters. Even when accessor capabilities are current, although, the category code normally refers to its personal fields instantly, and having the ability to see their names tremendously helps in understanding what the code does.

Automating Duties with r2pipe

There are repetitive duties which might simply be automated utilizing r2 scripting functionalities. The one I normally go for is r2pipe, which is a solution to execute r2 instructions from scripts in one of many (many) supported languages.

For the sake of this weblog submit collection I’m going to make use of python, however r2pipe can be utilized in the identical manner from many languages

Now I’m going to introduce an r2pipe script which can be helpful a number of occasions throughout the collection, and is easy sufficient to speak via chunk-by-chunk.

The “dyld_what” Script

This script’s objective is to print the trail of the library (if any) to which any tackle within the DSC belongs. It comes useful when it’s essential to refine the loading filter, as we’ll see within the instance beneath.

The tackle to lookup (or the corresponding flag title) could be handed as an argument. If no argument is handed, it seems up the present tackle.

This works whatever the preliminary filter we set, by utilizing the metadata from the DSC header to binary-search all of the executable pictures for the tackle we offer.

It’s designed to work from inside an r2 session, and could be invoked from the r2 immediate utilizing the #!pipe command:

The total code of the script could be discovered on this Github’s gist: https://gist.github.com/mrmacete/e061f0f0d38a96c75f8177747c26ea01.

It begins with the import statements the place, amongst different issues wanted for this explicit script, we import the r2pipe python bundle (be sure you set up it first utilizing pip, as acknowledged within the docs). After that we are able to open the pipe to the prevailing r2 session, by calling r2pipe.open() with out arguments:

Now we are able to use the r2 variable to execute instructions and get the output from them as string with r2.cmd() or as a parsed JSON object utilizing r2.cmdj() and that’s just about all there’s to find out about r2pipe!

So let’s see it in motion, because the script proceeds to get the array of pictures from the header utilizing the iH command:

From which we are able to create the guts of this script, which performs a binary search on the pictures, assuming they’ll’t overlap. For doing that we have to kind the array we simply obtained, and extract a helper array with simply the addresses, so we are able to leverage the bisect python bundle:

The get_path_if_contains double checks if the tackle falls into the candidate picture, as a result of pictures are literally interleaved with stubs islands which don’t belong to any explicit picture however are shared amongst totally different ones (we’ll see that in a later submit about discovering references throughout libraries).

What stays of the script offers with the enter argument, which could possibly be a numeric tackle, a flag title or nothing in any respect.

In case no argument is handed we have now to resolve the worth of the $$ variable which holds the present tackle:

As a substitute if we obtained an argument, it could be a named flag, so we have now to resolve its tackle by changing regardless of the enter is to a numeric worth:

Lastly, the primary entry factors places all of it collectively:

To make use of this script, we are able to outline an alias from the r2 immediate, or from ~/.radare2rc known as $what (quotes are vital):

On this manner we’ll have the $what command to which we are able to present the argument instantly as a substitute of writing the lengthy pipe command.

Instance: utilizing $what to refine the filter

As we already know, on trendy DSC, Goal-C lessons include strategies mixed from all classes on all frameworks, even when they’re not current within the filter. To slender down the above instance, on NSObject we have now:

But when we have a look at the classes on NSObject loaded with the present filter, there’s nothing clearly accountable for that:

Let’s use the $what alias to run our r2pipe script and uncover which library the implementation of that methodology belongs to, by passing it the tackle:

Now, if we reopen the DSC with /IMSharedUtilities.framework added to the filter:

And checklist once more the classes on NSObject, this time we get:

The place the final one appears promising:

And we are able to see the complete set of options this class is including, and since now the accountable framework belongs to the filter, we are able to additionally see all symbols and lessons associated to it in case we have now to dig deeper into how these functionalities are applied.

Conclusion

Should you reached this level, hopefully you made it via the primary weblog submit about reversing iOS system libraries utilizing radare2. That was only the start, however now you might be outfitted to leap into experimenting with it by yourself.The subsequent posts will go in depth about discovering cross references, first inside single libraries then throughout totally different libraries. Till then, hold tight and be happy to succeed in out with any questions. Points and pull requests are additionally appreciated on radare2’s GitHub.



LEAVE A REPLY

Please enter your comment!
Please enter your name here