Hey everyone, I have been working on protocol reverse engineering and on an RE assisting tool called netzob (I cloned it and will make a fork publicly available soon enough).
Because the user often works with raw bytes, at some point I needed the user to input bytes. However there is no native way in python to take raw bytes as an input using a function similar to input().
So what should I do?
Therefore I coded a small script which alleviates two problems:
- It convert a string of bytes to a bytes string of bytes without changing the bytes.
- It does not change the input string of bytes as the input() method automatically replaces '\' by '\\' so it can print the string as it was inputed by user.
In order to do so I make use of the codecs library and use the codecs.decode(string,"unicode_escape") to replace the '\\' by a simple '\'.
Then I encode the string of bytes to a bytes string of bytes using ISO-8859-1 and decode it the same way. That is because other encoding formats will not recognize the string as a string of bytes but as an already decoded string.
I hope this code is useful to people who ran in the same issue I did!
"Give us the damn code!"
Here it is ;-) :
#!/usr/bin/python3.5 # coding=utf-8 """ This code highlights two very sometimes annoying behaviors in python3 and explains a workaround. Encoding can be confusing in Python especially when built in functions such as print() and input() arbitrarily decide how to encode your string. Hence the following sometimes undesired behaviors: 1. If you give raw bytes to input() such as \xca\xfe\xba\xbe it will add an extra \ before every \ in order to consider it a string. This problem is alleviated by codecs.decode(string,'unicode_escape') http://stackoverflow.com/questions/5186839/python-replace-with 2. If you convert this string of bytes to bytes, it will consider the bytes as characters and not bytes. Only ISO-8859-1 encoding keeps the original bytes from one conversion to another. """ import binascii import codecs input_string = input('Please input bytes string such as \\xca\\xfe\\xba\\xbe >>> ') def to_string(bytes_string) : return bytes_string.decode('ISO-8859-1') def to_bytes(string) : return string.encode('ISO-8859-1') def input_fix(string): return codecs.decode(string,"unicode_escape") print(input_string) print(" <---- Fixing the fact that input() replaces \\ by \\\\ --->\n ") string_of_bytes = input_fix(input_string) print(string_of_bytes) print('\n') print(" <---- String of bytes to bytesstring --->\n ") bytes_string = to_bytes(string_of_bytes) print(bytes_string) print('\n') print("**************************\n") print(" <---- bytesstring back to string ---> \n") print(to_string(bytes_string))