[Tutorial] Text handling


Well-Known Member
Jun 23, 2014
Hi, it seems that questions related to text editing/parsing appear quite often so I thought a thread like this would be useful. It doesn't contain a lot of stuff that wasn't posted here previously but there are useful links/resources included through the post.

What is text in cleo?
Text (string) is an array of numbers, each number represents some character. American Standard Code for Information Interchange (ASCII) describes what numbers represent what characters.

Take a look at the "Dec" (decimal) and "Char" columns:

So by writing specific numbers to memory it is possible to write/modify text, it's that simple. Write 97 (1 byte) to a memory address and it will contain letter 'a'. Write 98 to the next address and your text will be 'ab'.

How does PC/cleo know where text ends?
Strings in C language (and cleo) are null terminated which means that in order to have fully functional 'ab' text we would have to write:
- 97 to one address
- 98 to the next address
- 0 to the next address

To what memory can we write?
First we have to ask PC for memory that won't be modified by any other process/activity which looks like that in cleo:
0AC8: [email protected] = allocate_memory_size 100 // "alloc [email protected] 100" using sampfuncs keyword
...which is equivalent to:
"-pls give me 100 bytes of memory and tell me where it is by storing its' address in [email protected] variable"

So after using '0AC8' the [email protected] contains a number which is location where the newly requested/allocated memory resides.

Even after using opcodes like:
0AD3: [email protected] = format "%d + %d = %d" 2 2 4// 'format [email protected] "%d" 123' using sampfuncs keyword
... [email protected] still keeps being nothing more but just a number and can be treated as such.

So let's say that we have the 'ab' text and want to overwrite only the second letter 'b'. Here's how it could be done:
alloc [email protected] 100 // [email protected] becomes pointer to free memory
0AD3: [email protected] = format "ab" // memory is overwritten with: 97 98 0
[email protected] += 1 // pointer to the free memory points to the next address which holds 98 ('b' letter)
0A8C: write_memory [email protected] size 1 value 99 virtual_protect 1 // address that holds 98 ('b') is overwritten with 99 ('c' letter)

// it is good to keep reference to the start of the memory (so memory can be "handed back" to PC using 0AC9: free_allocated_memory [email protected])
// e.g. by using "0085: [email protected] = [email protected]" opcode before increasing [email protected] value
Why people repeat saying to not use "alloc" opcode (0AC8) in a loop without "free" opcode (0AC9)?
Because PC does exactly what is told to do, using alloc in a loop is like asking it:
"-Give more memory" x 99999 times, so after some time there is no space left.
It's a good idea to allocate memory at the top of the code. If the script is large, uses most of the variables and the writer wants to reuse variable that was used for text modification then he can use "alloc" in a loop and use the following opcode that is always executed before modifying that variable:
0AC9: free_allocated_memory [email protected] // free [email protected]
// [email protected] must point to the begining of memory
Which is equivalent to saying:
"-Thanks PC but I don't need that memory anymore, the one stored at [email protected], do whatever you want with that memory."
If it will be used then it's important to keep reference to the begining of the allocated memory. Otherwise wrong address will be provided and the code may crash.

Here's graphical representation of memory being allocated + some more info.

Using sscanf function to get specific parts of text
The function is present in gta_sa process memory at 0x8220AD offset.
(Idk who and how exactly found that function there, using this method with "Search for->Name (label) in current module" doesn't show it)

There are many different data types that can be parsed:
(source page 246)

And printed:
(source page 244)
All these could probably be used with:
0AD0: show_formatted_text_lowpriority "This is %.4X opcode" time 2000 0x0AD0
0AD1: show_formatted_text_highpriority "This is %.4X opcode" time 2000 0x0AD1

Problem to solve
3 different values have to be extracted from "123 Word 987" string. Integer, string and integer.

0AC8: [email protected] = allocate_memory_size 100
0AD3: [email protected] = format "123 Word 987" // initialisation of string that contains some important data that we need to parse

0AC7: [email protected] = var [email protected] offset // must be a pointer to itself because pointer is expected by sscanf (doesn't need memory allocation because it will hold an integer)
0AC8: [email protected] = allocate_memory_size 100
0AC7: [email protected] = var [email protected] offset 

0AA5: call 0x8220AD num_params 5 pop 5 [email protected] [email protected] [email protected] "%d %s %d" [email protected] // sscanf %s stops at whitespaces so single word is received. In scan_string opcode (0AD4) regex would have to be used e.g. %[^ ]

// [email protected] becomes 123
// [email protected] becomes pointer to "Word"
// [email protected] becomes 987
Notice that parameters are passed in reverse order to the normally used one in C/C++, and their quantity is passed as 'num_params' and 'pop'.

I have a command with 3 parameters:
1st is number
2nd is word
3rd is a whole sentence

e.g. /test 1 Mr.Bean Some long sentence with many words

The point is to extract all 3 parameters

Solution (full code)
It's kind of hacky method because sscanf does not include whitespaces when %s is used.
Thanks to %n expression of sscanf it is possible to get the pointer to its' position within the text, which can be used to locate where the 3rd parameter (sentence) begins. In short:
offset of %n within the text + pointer to text begining = pointer to %n (the begining of 3rd parameter)
Then the 3rd parameter can be copied or not. Code below is not copying it, just passing the pointer (if the code was going to modify it somehow then the copy would be made)
/test <number> <single word> <sentence>
/test 1 Mr.Bean Some long sentence with many words


wait 0
until SAMP.Available()

0b34: "test" @activation

alloc [email protected] 128 // command 
alloc [email protected] 200 // veh name

while true
wait 0
    wait 1000

SAMP.IsCommandTyped([email protected])
0AC7: [email protected] = var [email protected] offset // must be a pointer to itself because that's the parameter expected by sscanf
0AC7: [email protected] = var [email protected] offset
0AA5: call 0x8220AD num_params 5 pop 5 [email protected] [email protected] [email protected] "%d %s %n" [email protected] // sscanf %s stops at whitespaces so no expression like %[^ ] is required 
// now [email protected] holds the index of description within the string, it can be added to string pointer and will become pointer to description itself
// that's thanks to the magic of "%n" within sscanf which writes amount of preceeding characters before it to the given pointer
005A: [email protected] += [email protected]  // (int)
chatmsg "number=%d word=%s sentence=%s" -1 [email protected] [email protected] [email protected]
Code that you can play with using online C compile/execute tool:

#include <stdio.h>
#include <string.h> // needed for strcpy() function

int main()
    char text[100] = "1 Mr.Bean Some long sentence with many words";
    // the goal is to get the whole remaining text after "Mr.Bean" which appears to be not that simple using sscanf

    int number=0;
    char name[10];
    int buffer_to_get_char_count=0;
    char sentence[50];
    sscanf(text, "%d %s %n", 
                            &number, // needs to be a pointer 
                            name,    // is a pointer by default
    strcpy(sentence, text + buffer_to_get_char_count); // "text" variable is a pointer, "text[0]" would be the first char
    printf("number=%d word=%s sentence=%s", number, name, sentence);

    return 0;


Using %[] for matching/not-matching specific characters

source of explanation

Some of the code in this post was not tested