View previous topic :: View next topic |
Author |
Message |
miniupnp Site Admin
Joined: 14 Apr 2007 Posts: 1593
|
Posted: Mon Jan 11, 2021 8:20 am Post subject: |
|
|
superkoning wrote: | miniupnp wrote: | https://github.com/miniupnp/miniupnp/blob/master/minissdpd/submit_to_minissdpd.py |
Wow, that is fast!
|
Well you see that's about 20 lines of code in python.
The biggest part is the special length encoding for the string length > 127 bytes that could be completely removed if all strings are <= 127 bytes in length. _________________ Main miniUPnP author.
https://miniupnp.tuxfamily.org/ |
|
Back to top |
|
 |
superkoning
Joined: 01 Jan 2021 Posts: 14 Location: ::1
|
Posted: Mon Jan 11, 2021 5:22 pm Post subject: |
|
|
miniupnp wrote: |
Well you see that's about 20 lines of code in python.
The biggest part is the special length encoding for the string length > 127 bytes that could be completely removed if all strings are <= 127 bytes in length. |
Let's check for <127 length:
Code: |
>>> import submit_to_minissdpd
>>> submit_to_minissdpd.codelength("aaaa")
b'\x04aaaa'
>>> submit_to_minissdpd.codelength("a"*126)[:20]
b'~aaaaaaaaaaaaaaaaaaa' |
Yes, 1 byte (that byte is apparantly printable ASCII, so therefore the ~ )
Let's go above 127:
Code: | >>> submit_to_minissdpd.codelength("a"*130)[:20]
b'\x01\x82aaaaaaaaaaaaaaaaaa' |
Yes, 2 bytes
Code: |
>>> 127*127+1
16130
>>> submit_to_minissdpd.codelength("a"*16000)[:20]
b'}\x80aaaaaaaaaaaaaaaaaa' |
Still 2 bytes, as expected
And above that limit:
Code: | >>> submit_to_minissdpd.codelength("a"*17000)[:20]
b'\x01\x84\xe8aaaaaaaaaaaaaaaaa' |
3 bytes
Final check: 0 chars in string:
Code: | >>> submit_to_minissdpd.codelength("")[:20]
b'\x00'
>>> |
So IMHO the python code could be a bit less cryptic (less C style) if we stay below the 16130 char length limit.
EDIT: on re-reading, I now understand your python code. Wow ... impressive.
Last edited by superkoning on Mon Jan 11, 2021 8:48 pm; edited 1 time in total |
|
Back to top |
|
 |
superkoning
Joined: 01 Jan 2021 Posts: 14 Location: ::1
|
Posted: Mon Jan 11, 2021 6:44 pm Post subject: testing with codelength |
|
|
I did some testing with various string lengths:
Code: | for i in (0,1,20,126,127,128,129,250, 256, 127*128, 128*128-1, 128*128):
print("Results of", i,": ",end='')
mystring = "a" * i
encoded = submit_to_minissdpd.codelength(mystring)
lengthbytes = len(encoded) - len(mystring)
for j in encoded[:lengthbytes]:
print(j, " ", end='')
print() |
with this result:
Code: |
Results of 0 : 0
Results of 1 : 1
Results of 20 : 20
Results of 126 : 126
Results of 127 : 127
Results of 128 : 1 128
Results of 129 : 1 129
Results of 250 : 1 250
Results of 256 : 2 128
Results of 16256 : 127 128
Results of 16383 : 127 255
Results of 16384 : 1 128 128 |
That's correct? It's different than I thought: not plain 128 base. Is this a general format, or did you design it?
How does minissdpd interpret this? Let me try:
first byte is always part of the length (and always <128).
The second (and following) byte:
- if <128, then it's part of the ASCII string, and thus not of the length
- if >=128 (and thus US-ASCII), then it's part of the length indicator: subtract 128, and you have the low (lower) significant byte.
Right?
If so, I wrote something more python:
Code: |
import submit_to_minissdpd
def easyeasy(n):
if n <= 127:
return n
elif n <= 16383:
msb = n // 128 # divide, and round (floor)
lsb = n % 128 + 128 # modulo, plus 128 as indicator
return msb, lsb
else:
return "Nooooooooooooo"
for i in (0,1,20,126,127,128,129,250, 256, 127*128, 128*128-1, 128*128):
print("Results of", i,": ",end='')
mystring = "a" * i
encoded = submit_to_minissdpd.codelength(mystring)
lengthbytes = len(encoded) - len(mystring)
for j in encoded[:lengthbytes]:
print(j, " ", end='')
print(" Via easyeasy():", easyeasy(i),end='')
print()
|
The result are the same, so good:
Code: | Results of 0 : 0 Via easyeasy(): 0
Results of 1 : 1 Via easyeasy(): 1
Results of 20 : 20 Via easyeasy(): 20
Results of 126 : 126 Via easyeasy(): 126
Results of 127 : 127 Via easyeasy(): 127
Results of 128 : 1 128 Via easyeasy(): (1, 128)
Results of 129 : 1 129 Via easyeasy(): (1, 129)
Results of 250 : 1 250 Via easyeasy(): (1, 250)
Results of 256 : 2 128 Via easyeasy(): (2, 128)
Results of 16256 : 127 128 Via easyeasy(): (127, 128)
Results of 16383 : 127 255 Via easyeasy(): (127, 255)
Results of 16384 : 1 128 128 Via easyeasy(): Nooooooooooooo |
This python code is more clear to me. Is this want you want, or do you want to keep your more formal & correct code? |
|
Back to top |
|
 |
miniupnp Site Admin
Joined: 14 Apr 2007 Posts: 1593
|
Posted: Mon Jan 11, 2021 11:58 pm Post subject: |
|
|
I may have made a mistake, the multi byte length should encode this way :
Code: | #define CODELENGTH(n, p) if(n>=268435456) *(p++) = (n >> 28) | 0x80; \
if(n>=2097152) *(p++) = (n >> 21) | 0x80; \
if(n>=16384) *(p++) = (n >> 14) | 0x80; \
if(n>=128) *(p++) = (n >> 7) | 0x80; \
*(p++) = n & 0x7f; |
_________________ Main miniUPnP author.
https://miniupnp.tuxfamily.org/ |
|
Back to top |
|
 |
superkoning
Joined: 01 Jan 2021 Posts: 14 Location: ::1
|
Posted: Tue Jan 12, 2021 10:22 pm Post subject: codelength: c-code in python-code |
|
|
miniupnp wrote: | I may have made a mistake, the multi byte length should encode this way :
Code: | #define CODELENGTH(n, p) if(n>=268435456) *(p++) = (n >> 28) | 0x80; \
if(n>=2097152) *(p++) = (n >> 21) | 0x80; \.
if(n>=16384) *(p++) = (n >> 14) | 0x80; \
if(n>=128) *(p++) = (n >> 7) | 0x80; \
*(p++) = n & 0x7f; |
|
OK ... I implemented that c code in python (at least: up to 3 bytes, so < 2097152):
Code: | '''
#define CODELENGTH(n, p) ...
if(n>=16384) *(p++) = (n >> 14) | 0x80; \
if(n>=128) *(p++) = (n >> 7) | 0x80; \
*(p++) = n & 0x7f;
'''
def codelength(n):
p = []
if n>=16384:
p.append( (n >> 14) & 0x7f | 0x80 ) # shift out lower 14 bits, get 7 bits, set high bit to 1
if n>=128:
p.append( (n >> 7) & 0x7f | 0x80 )
p.append( n & 0x7f) # 7 lower bits
return p
for n in (0,1,100,127,128,256, 2000, 20000, 20128, 1000111, 2097000):
print(n, codelength(n)) |
With this result:
Code: | 0 [0]
1 [1]
100 [100]
127 [127]
128 [129, 0]
256 [130, 0]
2000 [143, 80]
20000 [129, 156, 32]
20128 [129, 157, 32]
1000111 [189, 133, 47]
2097000 [255, 254, 104] |
So, that means: it's a length field up to (including) to the byte that has the hight bit NOT set.
Correct?
If so, shall I put it into my github PR? Of course with the proper type casting. |
|
Back to top |
|
 |
superkoning
Joined: 01 Jan 2021 Posts: 14 Location: ::1
|
Posted: Tue Jan 12, 2021 11:22 pm Post subject: |
|
|
See next post.
Last edited by superkoning on Tue Jan 12, 2021 11:26 pm; edited 1 time in total |
|
Back to top |
|
 |
superkoning
Joined: 01 Jan 2021 Posts: 14 Location: ::1
|
Posted: Tue Jan 12, 2021 11:23 pm Post subject: |
|
|
Other style, same result:
Code: |
# bit style:
def codelength2(n):
p = []
p.append( n & 0x7f) # 7 lower bits
n = n >> 7
while n > 0:
p.append ( n & 0x7f | 0x80) # 7 lower bits, with high bit set
n = n >> 7
p.reverse()
return p
# calculus style
def codelength3(n):
p = []
p.append( n % 128) # modulo, so remainder / low 7 bits
n = n // 128 # divide-floor
while n > 0:
p.append ( n % 128 + 128) # set high bit
n = n // 128
p.reverse()
return p
|
Better? |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
© 2007 Thomas Bernard, author of MiniUPNP.
|