[Tutorial] Text handling

Hi, it seems that questions related to text editing/parsing appear quite often so I thought a thread like this would be useful. It doesn't contain a lot of stuff that wasn't posted here previously but there are useful links/resources included through the post.

What is text in cleo?
Text (string) is an array of numbers, each number represents some character. American Standard Code for Information Interchange (ASCII) describes what numbers represent what characters.

Take a look at the "Dec" (decimal) and "Char" columns:
[img=500x300]https://i.imgur.com/bHJ3S2k.gif[/img].

So by writing specific numbers to memory it is possible to write/modify text, it's that simple. Write 97 (1 byte) to a memory address and it will contain letter 'a'. Write 98 to the next address and your text will be 'ab'.

How does PC/cleo know where text ends?
Strings in C language (and cleo) are null terminated which means that in order to have fully functional 'ab' text we would have to write:
- 97 to one address
- 98 to the next address
- 0 to the next address

To what memory can we write?
First we have to ask PC for memory that won't be modified by any other process/activity which looks like that in cleo:
Code:
0AC8: 0@ = allocate_memory_size 100 // "alloc 0@ 100" using sampfuncs keyword
...which is equivalent to:
"-pls give me 100 bytes of memory and tell me where it is by storing its' address in 0@ variable"

So after using '0AC8' the 0@ contains a number which is location where the newly requested/allocated memory resides.

Even after using opcodes like:
Code:
0AD3: 0@ = format "%d + %d = %d" 2 2 4// 'format 0@ "%d" 123' using sampfuncs keyword
... 0@ still keeps being nothing more but just a number and can be treated as such.

So let's say that we have the 'ab' text and want to overwrite only the second letter 'b'. Here's how it could be done:
Code:
alloc 0@ 100 // 0@ becomes pointer to free memory
0AD3: 0@ = format "ab" // memory is overwritten with: 97 98 0
0@ += 1 // pointer to the free memory points to the next address which holds 98 ('b' letter)
0A8C: write_memory 0@ size 1 value 99 virtual_protect 1 // address that holds 98 ('b') is overwritten with 99 ('c' letter)

// it is good to keep reference to the start of the memory (so memory can be "handed back" to PC using 0AC9: free_allocated_memory 0@)
// e.g. by using "0085: 2@ = 0@" opcode before increasing 0@ value

Why people repeat saying to not use "alloc" opcode (0AC8) in a loop without "free" opcode (0AC9)?
Because PC does exactly what is told to do, using alloc in a loop is like asking it:
"-Give more memory" x 99999 times, so after some time there is no space left.
It's a good idea to allocate memory at the top of the code. If the script is large, uses most of the variables and the writer wants to reuse variable that was used for text modification then he can use "alloc" in a loop and use the following opcode that is always executed before modifying that variable:
Code:
0AC9: free_allocated_memory 0@ // free 0@
// 0@ must point to the begining of memory
Which is equivalent to saying:
"-Thanks PC but I don't need that memory anymore, the one stored at 0@, do whatever you want with that memory."
If it will be used then it's important to keep reference to the begining of the allocated memory. Otherwise wrong address will be provided and the code may crash.


Here's graphical representation of memory being allocated + some more info.

Using sscanf function to get specific parts of text
The function is present in gta_sa process memory at 0x8220AD offset.
(Idk who and how exactly found that function there, using this method with "Search for->Name (label) in current module" doesn't show it)

There are many different data types that can be parsed:
[img=430x350]https://i.imgur.com/ssYiT4G.png[/img]
(source page 246)

And printed:
[img=430x350]https://i.imgur.com/a4bsbw2.png[/img]
(source page 244)
All these could probably be used with:
0AD0: show_formatted_text_lowpriority "This is %.4X opcode" time 2000 0x0AD0
0AD1: show_formatted_text_highpriority "This is %.4X opcode" time 2000 0x0AD1


Problem to solve
3 different values have to be extracted from "123 Word 987" string. Integer, string and integer.

Solution
Code:
0AC8: 0@ = allocate_memory_size 100
0AD3: 0@ = format "123 Word 987" // initialisation of string that contains some important data that we need to parse

0AC7: 1@ = var 1@ offset // must be a pointer to itself because pointer is expected by sscanf (doesn't need memory allocation because it will hold an integer)
0AC8: 2@ = allocate_memory_size 100
0AC7: 3@ = var 3@ offset 

0AA5: call 0x8220AD num_params 5 pop 5 3@ 2@ 1@ "%d %s %d" 0@ // sscanf %s stops at whitespaces so single word is received. In scan_string opcode (0AD4) regex would have to be used e.g. %[^ ]

// 1@ becomes 123
// 2@ becomes pointer to "Word"
// 3@ becomes 987

Notice that parameters are passed in reverse order to the normally used one in C/C++, and their quantity is passed as 'num_params' and 'pop'.


Problem
I have a command with 3 parameters:
1st is number
2nd is word
3rd is a whole sentence

e.g. /test 1 Mr.Bean Some long sentence with many words

The point is to extract all 3 parameters

Solution (full code)
It's kind of hacky method because sscanf does not include whitespaces when %s is used.
Thanks to %n expression of sscanf it is possible to get the pointer to its' position within the text, which can be used to locate where the 3rd parameter (sentence) begins. In short:
offset of %n within the text + pointer to text begining = pointer to %n (the begining of 3rd parameter)
Then the 3rd parameter can be copied or not. Code below is not copying it, just passing the pointer (if the code was going to modify it somehow then the copy would be made)
Code:
/*
USAGE:
/test <number> <single word> <sentence>
/test 1 Mr.Bean Some long sentence with many words
*/

{$CLEO}
0000:

repeat
wait 0
until SAMP.Available()

0b34: "test" @activation

alloc 0@ 128 // command 
alloc 2@ 200 // veh name

while true
wait 0
    wait 1000
END

:activation
SAMP.IsCommandTyped(0@)
0AC7: 1@ = var 1@ offset // must be a pointer to itself because that's the parameter expected by sscanf
0AC7: 3@ = var 3@ offset
0AA5: call 0x8220AD num_params 5 pop 5 1@ 2@ 3@ "%d %s %n" 0@ // sscanf %s stops at whitespaces so no expression like %[^ ] is required 
// now 1@ holds the index of description within the string, it can be added to string pointer and will become pointer to description itself
// that's thanks to the magic of "%n" within sscanf which writes amount of preceeding characters before it to the given pointer
005A: 1@ += 0@  // (int)
chatmsg "number=%d word=%s sentence=%s" -1 3@ 2@ 1@
samp.CmdRet()

Code that you can play with using online C compile/execute tool:

Code:
#include <stdio.h>
#include <string.h> // needed for strcpy() function

int main()
{
    char text[100] = "1 Mr.Bean Some long sentence with many words";
    // the goal is to get the whole remaining text after "Mr.Bean" which appears to be not that simple using sscanf


    int number=0;
    
    char name[10];
    
    int buffer_to_get_char_count=0;
    char sentence[50];
    
    sscanf(text, "%d %s %n", 
                            &number, // needs to be a pointer 
                            name,    // is a pointer by default
                            &buffer_to_get_char_count); 
                            
    
    strcpy(sentence, text + buffer_to_get_char_count); // "text" variable is a pointer, "text[0]" would be the first char
                            
    
    printf("number=%d word=%s sentence=%s", number, name, sentence);

    return 0;

}


Using %[] for matching/not-matching specific characters

IuvgrVB.png

source of explanation


Some of the code in this post was not tested
 

monday

Expert
Joined
Jun 23, 2014
Messages
1,127
Solutions
1
Reaction score
158
are you asking about the following opcode?
0AC8: 0@ = allocate_memory_size 100

How to know how much memory to allocate?
 

monday

Expert
Joined
Jun 23, 2014
Messages
1,127
Solutions
1
Reaction score
158
It has to be enough bytes to contain what you'd like to store. If you'd like to store text that has 5 characters, then you should allocate minimum 6 bytes (to include additional null-terminating byte). If you'd like to store text from game memory then you should check what is the maximum limit that text could have, check this out:
https://san-andreas-multiplayer-samp.fandom.com/wiki/Limits

According to that page you'd need to allocate 1025 bytes for a single textdraw, 65 bytes for caption of dialog, 4097 bytes for the "info" section of dialog (I assume it is the main section).

You can allocate less memory if you're 100% sure the game text always has smaller length, but then you're risking to some extent (if you write outside of the allocated memory the game may crash).
 

KamikazeSripterul

Well-known member
Joined
Jun 30, 2019
Messages
353
Reaction score
23
It has to be enough bytes to contain what you'd like to store. If you'd like to store text that has 5 characters, then you should allocate minimum 6 bytes (to include additional null-terminating byte). If you'd like to store text from game memory then you should check what is the maximum limit that text could have, check this out:
https://san-andreas-multiplayer-samp.fandom.com/wiki/Limits

According to that page you'd need to allocate 1025 bytes for a single textdraw, 65 bytes for caption of dialog, 4097 bytes for the "info" section of dialog (I assume it is the main section).

You can allocate less memory if you're 100% sure the game text always has smaller length, but then you're risking to some extent (if you write outside of the allocated memory the game may crash).
Ohh okay, thank you.
 
Top