TO BEGIN WITH
After reading this blog, you will know:
- How your computer memory is managed and used in any game or application.
- What the registers are and some basic but useful assembly instructions.
- What reverse engineering is, I can’t teach you any further on this topic, the only way to be a master is to practise.
- How to make a stable (I’ll explain later what stable means) trainer for Stardew Valley, and other .NET games if you are smart. (Why .NET? Because Stardew Valley used .NET, the benefit of it for a hacker is it doesn’t require too much reverse engineering knowledge, easy to start with. BECAUSE WE’LL HAVE ALL THE SOURCE CODE ! ! ! ! !).
PREREQUISITES
Software:
- x86 environment, which means your game should run on a machine with an Intel or AMD CPU
- Cheat Engine (of course…)
- IDA Pro (optional, because it’s a .NET application)
- JetBrains dotPeek (for .NET)
- ReClass.NET (you’ll find it necessary in ANY game you’re trying to hack. Seriously, keep it in your disk)
Programming Basic:
- The best programming language ever: C++
- What a pointer is
- What a stack is (first in, first out)
- An IDE you like with a theme you enjoy (I recommend Monokai Pro personally)
One Sad Truth:
- Making one trainer usually takes weeks, months, or even longer. It’s a path filled with failures. I guarantee you will fail on your first try, you will lose confidence, you will cry, you will curse, you will keep your computer and your game on for a week, try everyday and still find nothing, you will want to smash your computer, you will want to pick up drinking or smoking.
Checked, checked, checked? GOOD! You are ready to go! But before we start, please allow me to say something from my heart.
When I was learning something, I hated hearing the concepts I never knew. Not knowing is stupid, I admit, but I used to be scared to ask questions because admitting not knowing seems even more stupid. Although it’s NOT
So in this blog, I’ll try my best to watch out any new concepts. If there’s something I think you might not know, I’ll explain or tell you not to worry. In other words, I’ll make it super clear. If I missed out something, remember Google and Youtube are your sincere friends.
NOTE: may concepts or theories are for references. It doesn’t mean they are not useful. On the contrary, they are important. But I think there are some things, only after you fail, can you understand them. So don’t worry!
Here goes nothing.
Part 1: Dive into the 1/0 world
I. What’s an application?
An application is composed of executable files that tell your CPU exactly what to do , step by step, more patient than any human being can be.
Cutting Cake Application
Grab a cutting board from the shelf with your right hand
Now you have it in your hand,
put it on the table
Grab a cake from the fridge with your right hand
Now you have it in your hand,
put it on the cutting board
Grab a knife from the drawer with your right hand
Now you have it in your hand,
cut the cake with the knife
Every application is similar to the Cutting Cake Application. Instead of hands, computer uses registers. Instead of shelf, fridge, drawer, computer uses spaces like stack/heap. Instead of grab/put/cut, computer does add/mov/xor operations. The idea I want to emphasize here is: CPU instructions (assembly language, iot) is too damn detailed.
Now imagine the genius Chris Sawyer. This guy wrote the game Rollercoaster Tycoon in assembly. How the hell did he do that?
II. How’ an application produced?
If you have programming experience, I’m sure you are familiar with concepts such as variable
, function
, class
, etc. They are not the exactly words we’ll use in the 0/1 world, but they are connected. Section II and III will show you how they are connected IN THEORY!!! I think I’ll say this a couple times: don’t try so hard to understand them if you can’t. Learning requires experiences, not theories. Some time from today, maybe one week, maybe one month, maybe two years, when you have experience, especially after you hit a few walls, come back here, you will have your “Aha moment”.
Let’s take C++ programs as example. A successful execution of C++ code has two parts: compilation and running. If you are familiar with the basic OOP features (google if you don’t), you’ll know that polymorphism has a run-time one and a compile-time one (I’ll mention the run-time one, virtual functions later). During the compilation, compiler will do a list of things:
-
Prepocessing to handle some codes with
#
, such as#define
,#include
, it will generate the final C++ code for the next step. -
Compilation (wierd, huh? compilation inside compilation, the compilation we generally talk about is the procedure including the entire list) to convert C++ code into assembly instructions. Note: different compilers, or the same compiler with different settings can produce very different assembly instructions from the same code. That’s the topic of compilation optimization.
Compiler Optimization (Optional):
look at the code snipet below:int main() { int i; while (i < 10000) { i ++; } return 0; }
In this piece of code, do we really need to increment the variable i by 1 for 10000 times? No, because it’s not doing anything inside the while loop. So some compilers MIGHT just skip it. Depending on your compiler (g++, Clang or others), you can choose to cancel the optimization. I suggest you try that and see the differences. Another excellent optimization example would be RVO (return value optimization):
class MyClass { public int value; MyClass (int v) : value(v) { std::cout << "MyClass constructor!" << std::endl; } MyClass(const MyClass& otherObject) { value = otherObject.value; std::cout << "MyClass copy constructor!" << std::endl; } ~MyClass() { std::cout << "MyClass destructor!" << std::endl; } }; MyClass createMyObject() { return MyClass(5); } int main() { MyClass myObject = createMyObject(); return 0; }
In the line 23, how many objects are involved? It depends on if your compiler has the RVO feature. Let’s analyse what it’s doing here:
without any RVO:
On the left side of
=
(assignment operator), we declared a MyClass instance:myObject
. So propably 4 bytes of memory (size of an int) would be allocated for this instance, waiting for definition. I said probably because there’s only one integer inside of it. One integer taks up 4 bytes. However, some annoying compiler optimization might take some padding for alignment.On the right side, we called a function,
createMyObject()
, inside it, we defined a temporary instance:MyClass(5)
, in other words, it’s a rvalue (what rvalue is is not required for this post, you can google it if you are curious.)Now combine two parts together, we have an empty space waiting to be filled and a returned instance to copy from. Here comes
=
operator, it will do the assignment, copy the right side instance into left side instance, using the copy constructor.Output could be:
MyClass constructor! # printed by MyClass(5) in line #19 MyClass copy constructor! # printed by the = in line 23 MyClass destructor! # after the main function was done
with RVO:
Why bother? If we have an empty space, why not just build that instance in the empty space directly? Yes, that’s the gist of RVO, the program will construct the instance right in the space allocated, so long to copy constructor.
MyClass constructor! MyClass destructor!
These examples here are just to show that it’s totally fine if the assembly instructions seem different from the original code.
-
Assemblying, which converts assembly codes to machine codes, rather simple. For example:
# machine code -- assembly code # put the number 1/0xEFAB3412 in eax B8 01 00 00 00 -- mov eax, 0x01 B8 12 34 AB EF -- mov eax, 0xEFAB3412 # put the number 1/0xEFAB3412 in ebx BB 01 00 00 00 -- mov ebx, 0x01 BB 12 34 AB EF -- mov ebx, 0xEFAB3412 # copy the value from ebx to eax 89 D8 -- mov eax, ebx # copy the value from eax to ebx 89 C3 -- mov ebx, eax
As the example shows, B8 means putting some number in register eax, BB means putting number in ebx, 89 means copy from one register to another. But which ones specifically? Depends on the next opcode.
You may notice the numbers are kind of reversed, you can google “big endian and small endian” to understand.
-
Linking to connect different source files and header files. You defined a function in a.cpp file but used it in b.cpp file, the linker will know, not the compiler. (Not important for our hacking), as for how the linker would know, it’s another huge topic, they just know.
Congratulations! You’ve made this far! I’m sorry it’s not quite done yet. But we are half way here. It’s totally okay if you don’t understand any of this in Section II or Section III, yes, I’m serious. I promise, these two sections will be very useful when you failed hacking one or two games. Before then, let them stand here, you can make a tea and try to understand them, just try.
III. What’s going on in the memory?
To answer this question, we need to know what memory is.
Now, let’s think about hard drive, the hard drive is used to store files. You can have 1TB SSD, 2TB SSD… Memory has the same purpose: storage. The benefit of memory over SSD is that reading from and writing to memory is way faster.
The connection between hard drive and memory is similar with that between a warehouse and a store. You have a lot of things, but you don’t need them at once. hard drive is where you keep them permanently. What you need for now or near future is stored in memory.
A question arises here, why do I need the warehouse? Why not put everything in my store so I can save the travel? Well, in real life, you don’t have such a big store. Then why don’t we use 1TB RAM in computers? Two reasons: the data in the memory will disappear after power-off. And memory is more expensive than SSD.
Unfortunately, hacking is playing with memory, which means if you have your game fucked up, or your computer fucked up, everything will be lost (of course we can try saving some of it locally) you have to start all over again. I can’t remember how many times BSOD (blue screen of death) has gotten me. Like I said in the beginning, it’s a path full of failures.
Anyway, let’s get into memory. There are usually 4 sections in the memory:
-
Stack, which stores local variables inside a scope. The scope is usually a function, all the variables you define inside the fucntion are in the stack. When the function is finished, these variables will be gone.
-
Heap, which is a huge storing room, variables allocated by the keyword
new
oralloc
will go there. Usually, the heap has more space than stack, so it’s perfect to put some complicated/huge objects in the heap. It’s dynamic, which means when you quit and reopen the game, locations of the same object (your character, your backpack, your weapons) won’t be the same, your computer decide which instance goes into which address.The infamous term “memory leak” happens in the heap, always remember to
delete
afternew
-
Global/Static, which has the static objects, usually a pointer pointing to the objects in the heap. Static memory is very important to make a stable game trainer. Because what we download from the game publisher is the executable files after compilation. Some pointers won’t change even if you quit and reopen the game.
About static, there’s an important but confusing thing to know, it’s easy to understand a static member of a class can have a relative fixed address. However, a static variable defined in ANY function does, too. That has something to do with lifecycle in C++. When you define a static variable inside any function, unlike other local variable which will disappear as the function finishes, its lifecycle will be extended to the running time of the whole program. That’s why static part is also called global part.
-
Code, which is the machine code, binaries representing the assembly instructions. Think it as your application manual for CPU
As said earlier in this part, computer is just 1s and 0s. So how to represent everything with 1s and 0s?
Look at the code snippet below:
class MyClass {
public:
int firstInt;
int secondInt;
MyClass(int i, int j) : firstInt(i), secondInt(j) {}
};
int main() {
MyClass myInstance = MyClass(1, 2);
return 0;
}
While the main function is running, myInstance is in the memory. We know size of an integer is 4 bytes ,or a DWORD (WORD is 2 bytes, DWORD, double word is 4 bytes, QWORD is 8 bytes), myInstance
can be simplified as a data structure of 8 bytes, the first half containing firstInt
, the second half containing secondInt
.
Hence, somewhere in the memory, we can see:
# The addresses are just my assumption
0x0C - 0x03: 01 00 00 00 # firstInt: 1
0x10 - 0x07: 02 00 00 00 # secondInt: 2
Usually, members of an instance would stick together, when you found one important variable, you basically found everything you need! If that’s not the case, there are two things we need to consider:
-
Some compiler rules I’ll talk about, so don’t worry
-
The game author is weird, they did this on purpose. They want to mess up with you by messing up with themselves. It could be that they really really love their game or they really really hate hackers
Now back to the memory, we have the address of the firstInt
and the secondInt
, what about myInstance
address? Note: when we talk about address, we mean starting address, 0x0C for firstInt, 0x10 for secondInt.
In this ideal scenario, address of myInstance
is same as the address of its first member, firstInt
. The address of myInstance
is 0x0C, as well.
Now, let’s make it a little more challenging by thinking about pointers.
class MyClass {
public:
int firstInt;
int secondInt;
MyClass(int i, int j) : firstInt(i), secondInt(j) {}
};
int main() {
MyClass *myInstancePtr = new MyClass(1, 2);
return 0;
}
Let’s say address of *myInstancePtr
is still 0x0C (although impossible in real life), and address of myInstancePtr
is some random number: 0x123456. What’s the connection?
The value starting from address 0x123456 is 0x0C. When you access the pointer, you’ll get the real instance, which means when you check the address of the pointer, you’ll find the real instance’s address.
Here’s an example structure that a real game could use:
/*
In the World class,
I used a fancy paradigm called singleton.
The benefit of singleton class is to make sure there is
ONLY one World instance in the whole program.
Because, well, there is only one world in one game.
Also, the existence of static variables allows us to do the hacking in the easiest way: pointer scan (for games 10 or 20 years ago)
*/
class World {
private:
static World *singletonPtr = nullptr;
...
char *weather;
int area;
...
Character *mainCharacter;
World() : weather("sunny"), area(10*10),... {}
public:
World(const World&) = delete;
World& operator=(const World&) = delete;
static World* generateTheWorld() {
if (this->singletonPtr == nullptr) {
this->singletonPtr = new World();
}
return this->singletonPtr;
}
~World() {}
};
class Character {
char *name;
float xCoordinate;
float yCoordinate;
...
Weapon *firstWeapon;
};
class Weapon {
...
int attack;
};
int main() {
World *gameWorld = new World();
delete gameWorld;
return 0;
}
In this example, if we want to access firstWeapon’s attack, here’s what we need to do:
-
Find address of
gameWorld
(pointer, not the object it’s pointing to), read from it, get address of*gameWorld
(the object the pointer is pointing to) -
Access
*gameWorld
, somewhere near*gameWorld
‘s address liesmainCharacter
‘s address (pointer, not the object it’s pointing to), read from it, get address of*mainCharacter
(the object the pointer is pointing to) -
Access
*
, somewhere nearmainCharacter
*firstWeapon
‘s address lies
‘s address (pointer, not the object it’s pointing to), read from it, get address offirstWeapon
*
(the object the pointer is pointing to)firstWeapon
-
Access
*
, somewhere nearfirstWeapon
*
‘s address lies itsfirstWeapon
attack
‘s value
It’s better known as multi-level pointer structure. Why did I use the word “somewhere”? Because Usually, members of an instance would stick together. Note: instance doesn’t mean the same thing as class.
Let’s think about the 4 parts of memory again. What’s in the heap? What’s in the static/global part? (We usually don’t consider the stack, because there are so many functions, it’s confusing to know what functions are in the stack at one given moment.)
The pointer gameWorld
is static, everything else is in the heap. When we download the game distribution, the reletive address (offset) of the gameWorld pointer never changes, getting the weapon by this gameWorld->mainCharacter->firstWeapon will always succeed.
IV. Hang on, what are we doing here?
Calm down, the knowledge could be overwhelming. In section I and II, we learnt how code is executed. In section III, we learnt one approach to track down a variable in memory. It’s totally fine if you don’t understand or remember everything. I guarantee it’ll be much clear in the next part.
If you need to remember anything, this is it: In offline game hacking, the hardest part is to searching for something (usually a variable) in your computer’s memory. Every variable has an address. Most of the addresses will change everytime you quit & open the game, or even your character dies. While some variable addresses will never change relatively, they are what we need to make a stable trainer.
Now, please check again if you have downloaded all the required softwares and if they can run without any issues. If they are ready, let’s go play some Stardew Valley!
Part 2: Gaming Time
In this part, I’ll show you some data structures of this game and some basic “hacking” technique.
Note: I suppose you found out that I used the word “usually” a lot, and I will continue using it in this part. Reverse engineering is hard, the first reason is that assembly language is too specific to generate a workflow, I’ll show you and example later. The second reason is that it requires a lot of assumptions. For instance, what data type would you use for health? Usually a 4-byte int
, is it possible to use a 8-byte long? Of course! Although this would cause us too much trouble. We won’t even be able to locate the value, to start with. (A more cruel way is to use float…) The only way to do reverse engineering is to make an assumption, then validate or rebut it. Then make another assumption… Nothing is for sure, be bold!
What is hacking?
Find a value, change it, simple like that.
So what’s all the fuss about?
Well, finding and changing a value is easy, ANYONE can do that with Cheat Engine: search for a value, change the value in game, search again… You are guaranteed to find the correct one.
If you are reading this, however, you’re not satisified with the approach above. And I’m sure there are two reasons: first, it’s usually not that efficient to track down value by searching again and again. In some games, some attributes are not quantified so you cannot even start searching (I know there’re “bigger than” and “less than” options in Cheat Engine, they will take much longer time and much more attempts). Second, it’s not FUN!
To have something to show off, let’s dive into the memory and make a stable trainer!
Now, we can talk about the word, stable.
Remember in the part 1, when I said every piece of data (not functions) will be stored in three memory sections, global/static, stack and heap? The value we want to modify is usually not in the stack, because stack stores temporary variables inside a function, they’ll be deleted after execution of the function. While your character, your money, your health are always there. So heap is where our target values usually sit in, but not the section we are looking for in hacking.
The problem with heap is that it’s dynamic, which means if you found the address with your character’s health, when you quit the game and reopen, it won’t have health again. What we are actually doing is finding some “anchor” that will never change (in one specific game version). Solution lies in other two memory sectors: static/ global and code, which remain same relative to the game entry point.
Will update after I get a job. To hell with it. Now, Stardew Valley just went throug a major update to 1.6 version. Seems like they forgot about the new chocolate game??? I’ve been waiting for tooooooo long.
Anyway, with every update comes a whole new compiled files, which means I have to do my hacking again since old static addresses just went invalid.
I’ll do it ASAP and post a new page.
TODO:
- Comparison of assembly instruction with source code, then,
- A gentle touch of calling conventions, then,
- Different registers, then,
- Virtual table, then,
- Find the Farmer instance, then,
- Injection!