Hi, it seems that questions related to text editing/parsing appear quite often so I thought a thread like this would be useful. It doesn't contain a lot of stuff that wasn't posted here previously but there are useful links/resources included through the post.
Some of the code in this post was not tested
What is text in cleo?
Text (string) is an array of numbers, each number represents some character. American Standard Code for Information Interchange (ASCII) describes what numbers represent what characters.
Take a look at the "Dec" (decimal) and "Char" columns:
[img=500x300]https://i.imgur.com/bHJ3S2k.gif[/img].
So by writing specific numbers to memory it is possible to write/modify text, it's that simple. Write 97 (1 byte) to a memory address and it will contain letter 'a'. Write 98 to the next address and your text will be 'ab'.
How does PC/cleo know where text ends?
Strings in C language (and cleo) are null terminated which means that in order to have fully functional 'ab' text we would have to write:
- 97 to one address
- 98 to the next address
- 0 to the next address
To what memory can we write?
First we have to ask PC for memory that won't be modified by any other process/activity which looks like that in cleo:
...which is equivalent to:
"-pls give me 100 bytes of memory and tell me where it is by storing its' address in 0@ variable"
So after using '0AC8' the 0@ contains a number which is location where the newly requested/allocated memory resides.
Even after using opcodes like:
... 0@ still keeps being nothing more but just a number and can be treated as such.
So let's say that we have the 'ab' text and want to overwrite only the second letter 'b'. Here's how it could be done:
Why people repeat saying to not use "alloc" opcode (0AC8) in a loop without "free" opcode (0AC9)?
Because PC does exactly what is told to do, using alloc in a loop is like asking it:
"-Give more memory" x 99999 times, so after some time there is no space left.
It's a good idea to allocate memory at the top of the code. If the script is large, uses most of the variables and the writer wants to reuse variable that was used for text modification then he can use "alloc" in a loop and use the following opcode that is always executed before modifying that variable:
Which is equivalent to saying:
"-Thanks PC but I don't need that memory anymore, the one stored at 0@, do whatever you want with that memory."
If it will be used then it's important to keep reference to the begining of the allocated memory. Otherwise wrong address will be provided and the code may crash.
Here's graphical representation of memory being allocated + some more info.
Text (string) is an array of numbers, each number represents some character. American Standard Code for Information Interchange (ASCII) describes what numbers represent what characters.
Take a look at the "Dec" (decimal) and "Char" columns:
[img=500x300]https://i.imgur.com/bHJ3S2k.gif[/img].
So by writing specific numbers to memory it is possible to write/modify text, it's that simple. Write 97 (1 byte) to a memory address and it will contain letter 'a'. Write 98 to the next address and your text will be 'ab'.
How does PC/cleo know where text ends?
Strings in C language (and cleo) are null terminated which means that in order to have fully functional 'ab' text we would have to write:
- 97 to one address
- 98 to the next address
- 0 to the next address
To what memory can we write?
First we have to ask PC for memory that won't be modified by any other process/activity which looks like that in cleo:
Code:
0AC8: 0@ = allocate_memory_size 100 // "alloc 0@ 100" using sampfuncs keyword
"-pls give me 100 bytes of memory and tell me where it is by storing its' address in 0@ variable"
So after using '0AC8' the 0@ contains a number which is location where the newly requested/allocated memory resides.
Even after using opcodes like:
Code:
0AD3: 0@ = format "%d + %d = %d" 2 2 4// 'format 0@ "%d" 123' using sampfuncs keyword
So let's say that we have the 'ab' text and want to overwrite only the second letter 'b'. Here's how it could be done:
Code:
alloc 0@ 100 // 0@ becomes pointer to free memory
0AD3: 0@ = format "ab" // memory is overwritten with: 97 98 0
0@ += 1 // pointer to the free memory points to the next address which holds 98 ('b' letter)
0A8C: write_memory 0@ size 1 value 99 virtual_protect 1 // address that holds 98 ('b') is overwritten with 99 ('c' letter)
// it is good to keep reference to the start of the memory (so memory can be "handed back" to PC using 0AC9: free_allocated_memory 0@)
// e.g. by using "0085: 2@ = 0@" opcode before increasing 0@ value
Why people repeat saying to not use "alloc" opcode (0AC8) in a loop without "free" opcode (0AC9)?
Because PC does exactly what is told to do, using alloc in a loop is like asking it:
"-Give more memory" x 99999 times, so after some time there is no space left.
It's a good idea to allocate memory at the top of the code. If the script is large, uses most of the variables and the writer wants to reuse variable that was used for text modification then he can use "alloc" in a loop and use the following opcode that is always executed before modifying that variable:
Code:
0AC9: free_allocated_memory 0@ // free 0@
// 0@ must point to the begining of memory
"-Thanks PC but I don't need that memory anymore, the one stored at 0@, do whatever you want with that memory."
If it will be used then it's important to keep reference to the begining of the allocated memory. Otherwise wrong address will be provided and the code may crash.
Here's graphical representation of memory being allocated + some more info.
Using sscanf function to get specific parts of text
The function is present in gta_sa process memory at 0x8220AD offset.
(Idk who and how exactly found that function there, using this method with "Search for->Name (label) in current module" doesn't show it)
There are many different data types that can be parsed:
[img=430x350]https://i.imgur.com/ssYiT4G.png[/img]
(source page 246)
And printed:
[img=430x350]https://i.imgur.com/a4bsbw2.png[/img]
(source page 244)
All these could probably be used with:
0AD0: show_formatted_text_lowpriority "This is %.4X opcode" time 2000 0x0AD0
0AD1: show_formatted_text_highpriority "This is %.4X opcode" time 2000 0x0AD1
Problem to solve
3 different values have to be extracted from "123 Word 987" string. Integer, string and integer.
Solution
Notice that parameters are passed in reverse order to the normally used one in C/C++, and their quantity is passed as 'num_params' and 'pop'.
Problem
I have a command with 3 parameters:
1st is number
2nd is word
3rd is a whole sentence
e.g. /test 1 Mr.Bean Some long sentence with many words
The point is to extract all 3 parameters
Solution (full code)
It's kind of hacky method because sscanf does not include whitespaces when %s is used.
Thanks to %n expression of sscanf it is possible to get the pointer to its' position within the text, which can be used to locate where the 3rd parameter (sentence) begins. In short:
offset of %n within the text + pointer to text begining = pointer to %n (the begining of 3rd parameter)
Then the 3rd parameter can be copied or not. Code below is not copying it, just passing the pointer (if the code was going to modify it somehow then the copy would be made)
Code that you can play with using online C compile/execute tool:
Using %[] for matching/not-matching specific characters
source of explanation
The function is present in gta_sa process memory at 0x8220AD offset.
(Idk who and how exactly found that function there, using this method with "Search for->Name (label) in current module" doesn't show it)
There are many different data types that can be parsed:
[img=430x350]https://i.imgur.com/ssYiT4G.png[/img]
(source page 246)
And printed:
[img=430x350]https://i.imgur.com/a4bsbw2.png[/img]
(source page 244)
All these could probably be used with:
0AD0: show_formatted_text_lowpriority "This is %.4X opcode" time 2000 0x0AD0
0AD1: show_formatted_text_highpriority "This is %.4X opcode" time 2000 0x0AD1
Problem to solve
3 different values have to be extracted from "123 Word 987" string. Integer, string and integer.
Solution
Code:
0AC8: 0@ = allocate_memory_size 100
0AD3: 0@ = format "123 Word 987" // initialisation of string that contains some important data that we need to parse
0AC7: 1@ = var 1@ offset // must be a pointer to itself because pointer is expected by sscanf (doesn't need memory allocation because it will hold an integer)
0AC8: 2@ = allocate_memory_size 100
0AC7: 3@ = var 3@ offset
0AA5: call 0x8220AD num_params 5 pop 5 3@ 2@ 1@ "%d %s %d" 0@ // sscanf %s stops at whitespaces so single word is received. In scan_string opcode (0AD4) regex would have to be used e.g. %[^ ]
// 1@ becomes 123
// 2@ becomes pointer to "Word"
// 3@ becomes 987
Notice that parameters are passed in reverse order to the normally used one in C/C++, and their quantity is passed as 'num_params' and 'pop'.
Problem
I have a command with 3 parameters:
1st is number
2nd is word
3rd is a whole sentence
e.g. /test 1 Mr.Bean Some long sentence with many words
The point is to extract all 3 parameters
Solution (full code)
It's kind of hacky method because sscanf does not include whitespaces when %s is used.
Thanks to %n expression of sscanf it is possible to get the pointer to its' position within the text, which can be used to locate where the 3rd parameter (sentence) begins. In short:
offset of %n within the text + pointer to text begining = pointer to %n (the begining of 3rd parameter)
Then the 3rd parameter can be copied or not. Code below is not copying it, just passing the pointer (if the code was going to modify it somehow then the copy would be made)
Code:
/*
USAGE:
/test <number> <single word> <sentence>
/test 1 Mr.Bean Some long sentence with many words
*/
{$CLEO}
0000:
repeat
wait 0
until SAMP.Available()
0b34: "test" @activation
alloc 0@ 128 // command
alloc 2@ 200 // veh name
while true
wait 0
wait 1000
END
:activation
SAMP.IsCommandTyped(0@)
0AC7: 1@ = var 1@ offset // must be a pointer to itself because that's the parameter expected by sscanf
0AC7: 3@ = var 3@ offset
0AA5: call 0x8220AD num_params 5 pop 5 1@ 2@ 3@ "%d %s %n" 0@ // sscanf %s stops at whitespaces so no expression like %[^ ] is required
// now 1@ holds the index of description within the string, it can be added to string pointer and will become pointer to description itself
// that's thanks to the magic of "%n" within sscanf which writes amount of preceeding characters before it to the given pointer
005A: 1@ += 0@ // (int)
chatmsg "number=%d word=%s sentence=%s" -1 3@ 2@ 1@
samp.CmdRet()
Code that you can play with using online C compile/execute tool:
Code:
#include <stdio.h>
#include <string.h> // needed for strcpy() function
int main()
{
char text[100] = "1 Mr.Bean Some long sentence with many words";
// the goal is to get the whole remaining text after "Mr.Bean" which appears to be not that simple using sscanf
int number=0;
char name[10];
int buffer_to_get_char_count=0;
char sentence[50];
sscanf(text, "%d %s %n",
&number, // needs to be a pointer
name, // is a pointer by default
&buffer_to_get_char_count);
strcpy(sentence, text + buffer_to_get_char_count); // "text" variable is a pointer, "text[0]" would be the first char
printf("number=%d word=%s sentence=%s", number, name, sentence);
return 0;
}
Using %[] for matching/not-matching specific characters
source of explanation
Some of the code in this post was not tested