Daneel: Type inference for Dalvik bytecode
In the last blog post about Daneel I mentioned one particular caveat of Dalvik bytecode, namely the existence of untyped instructions, which has a huge impact on how we transform bytecode. I want to take a similar approach as last time and look at one specific example to illustrate those implications. So let us take a look at the following Java method.
public float untyped(float[] array, boolean flag) { if (flag) { float delta = 0.5f; return array[7] + delta; } else { return 0.2f; } }
The above is a straightforward snippet and most of you probably know how the generated Java bytecode will look like. So let’s jump right to the Dalvik bytecode and discuss that in detail.
UntypedSample.untyped:([FZ)F: [regs=5, ins=3, outs=0] 0000: if-eqz v4, 0009 0002: const/high16 v0, #0x3f000000 0004: const/4 v1, #0x7 0005: aget v1, v3, v1 0007: add-float/2addr v0, v1 0008: return v0 0009: const v0, #0x3e4ccccd 000c: goto 0008
Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so. He gets really puzzled at certain points in the code.
- Label 2: What is the type of register
v0
? - Label 4: What is the type of register
v1
? - Label 9: Register
v0
again? What’s the type at this point?
You, as a reader, do have the answer because you know and understand the semantic of the underlying Java code, but Daneel doesn’t, so he tries to infer the types. Let’s look through the code in the same way Daneel does.
At method entry he knows about the types of method parameters. Dalvik passes parameters in the last registers (in this case in v3
and v4
). Also we have a register (in this case v2
) holding a this
reference. So we start out with the following register types at method entry.
UntypedSample.untyped:([FZ)F: [regs=5, ins=3, outs=0] uninit uninit object [float bool
The array to the right represents the inferred register types at each point in the instruction stream as determined by the abstract interpreter. Note that we also have to keep track of the dimension count and the element type for array references. Now let’s look at the first block of instructions.
0002: const/high16 v0, #0x3f000000 u32 uninit object [float bool 0004: const/4 v1, #0x7 u32 u32 object [float bool 0005: aget v1, v3, v1 u32 float object [float bool 0007: add-float/2addr v0, v1 float float object [float bool
Each line shows the register type after the instruction has been processed. At each line Daneel learns something new about the register types.
- Label 2: I don’t know the type of
v0
, only that it holds an untyped 32-bit value. - Label 4: Same applies for
v1
here, it’s an untyped 32-bit value as well. - Label 5: Now I know
v1
is used as an array index, it must have been an integer value. Also the array reference in registerv3
is accessed, so I know the result is a float value. The result is stored inv1
, overwriting it’s previous content. - Label 7: Now I know
v0
is used in a floating-point addition, it must have been a float value.
Keep in mind that at each line, Daneel emits appropriate Java bytecode. So whenever he learns the concrete type of a register, he might need to retroactively patch previously emitted instructions, because some of his assumptions about the type were broken.
Finally we look at the second block of instructions reached through the conditional branch as part of the if
-statement.
0009: const v0, #0x3e4ccccd u32 uninit object [float bool 000c: goto 0008 float uninit object [float bool
When reaching this block we basically have the same information as at method entry. Again Daneel learns in the process.
- Label 9: I don’t know the type of
v0
, only that it holds an untyped 32-bit value. - Label 12: Now I know that
v0
has to be a float value because the unconditional branch targets the join-point at label 8. And I already looked at that code and know that we expect a float value in that register at that point.
This illustrates why our abstract interpreter also has to remember and merge register type information at each join-point. It’s important to keep in mind that Daneel follows the instruction stream from top to bottom, as opposed to the control-flow of the code.
Now imagine scrambling up the code so that instruction stream and control-flow are vastly different from each other, together with a few exception handlers and an optimal register re-usage as produced by some SSA representation. That’s where Daneel still keeps choking at the moment. But we can handle most of the code produced by the dx
tool already and will hunt down all those nasty bugs triggered by obfuscated code as well.
Disclaimer: The abstract interpreter and the method rewriter were mostly written by Rémi Forax, with this post I take no credit for it’s implementation whatsoever, I just want to explain how it works.
Great article you've shared
Great article you've shared here. Thanks!
I really appreciate your hard
I really appreciate your hard work!
The example illustrates the
The example illustrates the challenges involved in translating Dalvik bytecode, especially with untyped instructions, into Java bytecode, which is strongly typed. Here, Daneel needs to figure out the types on-the-fly, often revisiting assumptions as more information emerges.
This example highlights the
This example highlights the challenges of handling untyped instructions in Dalvik bytecode, especially when trying to convert them to typed Java bytecode on the fly. Daneel’s process involves a step-by-step type inference as he reads each instruction, which requires him to retroactively adjust the generated Java bytecode if he later discovers that his assumptions about types were incorrect.
It's worth visiting this
It's worth visiting this site.
Daneel’s work on this snippet
Daneel’s work on this snippet highlights the challenges of transforming Dalvik bytecode into Java bytecode, especially in handling untyped instructions. In Dalvik, instructions can manipulate untyped 32-bit registers, leaving the interpreter (or in this case, Daneel) with the task of inferring the types as they become evident in the code execution path.
In Dalvik bytecode, types of
In Dalvik bytecode, types of registers are not explicitly declared, so Daneel (or an interpreter) must infer them through context clues
Its really helpful... Thank
Its really helpful... Thank you...
https://www.rayofhopeproperties.com/
Its really helpful... Thank
Its really helpful... Thank you...
https://www.rayofhopeproperties.com/
Such a great site to visit.
Such a great site to visit.
This is a fascinating dive
This is a fascinating dive into how the Dalvik bytecode's untyped instructions pose unique challenges in transforming it into typed Java bytecode. Here's a summary of how Daneel, an abstract interpreter for Dalvik bytecode, steps through the code and infers register types progressively to understand and convert these instructions properly.
Thank you so much for sharing
Thank you so much for sharing this great post.
Thank you for sharing this
Thank you for sharing this great post. I really appreciate your insights on this topic.
Thank you for sharing this
Thank you for sharing this great post. I really appreciate your insights on this topic.
Couldnt agree more!!
Couldnt agree more!!
It's great to recheck this
It's great to recheck this site. Nice post as always!
Indeed!!! I'm glad i'm not
Indeed!!! I'm glad i'm not the only one who feels great about this stuff.
I'm so happy with the
I'm so happy with the content!
This is an awesome post! Glad
This is an awesome post! Glad to check this here.
Would love to see more posts
Would love to see more posts here.
I heard that there will be a
I heard that there will be a software Volume 2. I can't wait to see it.
You have a good content!
You have a good content!
Indeed a very helpful too!
Indeed a very helpful too!
I'm so happy with the
I'm so happy with the content!
Such a great site to visit.
Such a great site to visit. Thanks for keeping us here posted.
Daneel: Type inference for
Daneel: Type inference for Dalvik bytecode is a nice-detailed post that we need to know about and there are a lot of people who want to find the best services that provide us information about it. Thanks for sharing the great detail sow e can know about it.
It's good to be back here.
It's good to be back here.
Daniel learns throughout this
Daniel learns throughout this process. That's very inspiring.
I would love to see more
I would love to see more articles like this in the future. Keep up the good work!
I appreciate the effort you
I appreciate the effort you put into researching and providing detailed information.
Ted Law 1075-A E Montague Ave
Ted Law
1075-A E Montague Ave North Charleston, SC 29405
https://www.tedlaw.com/
Ajlouny Injury Law 55
Ajlouny Injury Law
55 Broadway, 23rd Floor New York, NY 10006
https://ajlounyinjurylaw.com/
Richard Schwartz "162 East
Richard Schwartz
"162 East Amite St
Jackson, MS 39201"
https://1call.ms/
Thank you for the share, nice
Thank you for the share, nice post.
Great article! I found it
Great article! I found it really informative and well-written.
Thank you for keeping us
Thank you for keeping us posted with new content here.
This is really helpful. I'm
This is really helpful. I'm finding it hard to put into words how much I appreciate it. Thank you.
It's nice seeing this here
It's nice seeing this here back.
Yeah damn nice to see that!
Yeah damn nice to see that!
Thank you for the valuable
Thank you for the valuable insights.
This is awesome!
This is awesome!
Interesting site to check
Interesting site to check in.Thanks for sharing.
It's good to be back here.
It's good to be back here. Great job always!
This seems interesting topic
This seems interesting topic to read. Thanks for the share.
That is correct. The
That is correct. The information you provided is a big help to us! We appreciate your taking the time to help us out. We are grateful for your willingness to contribute to our cause.
Such a great site to visit.
Such a great site to visit.
I am new to coding but you
I am new to coding but you break this down really well
I always look forward to
I always look forward to seeing more posts here.
Thanks for the share. Great
Thanks for the share. Great work!
Indeed it is amazing!
Indeed it is amazing!