Hunt the Null Byte (\0). Goose Eggs?
Pi Shack BASIC is a Capable Tool in Odd Situations
A client of mine bought a Python module and asked me to install it in his Odoo installation. I grabbed the files off of the shared file server and proceeded to install it. In the process of running "py3compile" I received several complaints of "null bytes" in the sources. But "py3compile" was not kind enough to tell me which files! What was I to do?
This is not the first time a Mac wielding client has brought me a corrupted download. The Macs seem to have a nasty habit of making assumptions about content and performing actions that simply shouldn't be done. Why would we want users to be educated and understand the content they're handling?
Since I wasn't riding shot-gun with the user who did the download I don't know what exactly happened. He simply told me where on the server he put the extracted Python files and I grabbed them from there (scp). Then I went through the normal process of setting permissions on files and compiling the Python code. That is when I received about a half dozen complaints of null bytes, but nothing to indicate which files were corrupted. I couldn't imagine the originating developers needing null bytes anywhere in these modules. So I needed a quick way to assess what I figured must be damaged files.
The writing process
I'm going to break down what I did into logical blocks and show them independently. I'll finish with all the pieces stitched together in a final listing.
First I wanted to crawl a subdirectory for Python files
'Crawl working subdirectory for all Python sources. OPEN "i",#1,"|find -type f -name '*.py'" DO UNTIL eof(1) LINE INPUT #1,f$ ' ... LOOP END
I could write a crawler with
FIND$()
,FINFO$()
and a recursiveSUB
easy enough. But why? Linux has "find" and its going to be faster since its compiled C. Its going to be better tested. And it has a whole bunch of filters built-in. So I simplyOPEN
"find" and read each result line intof$
. Besides I wanted a quick solution and I'm not likely to need this again anytime soon. So I skipped all the extra effort.Next I want a Q&D way to detect null bytes
' Load each file read-only OPEN f$ FOR BINARY ACCESS READ AS #2 d$=input$(lof(2),2) CLOSE #2 ' And check for \0 (null byte) IF instr(d$,chr$(0)) THEN PRINT f$
I used the "binary" file mode to prevent "text file" changes. Although on Linux there isn't any "text file" filtering happening. But lets just be safe. I also used the expanded
OPEN
syntax to force the file to be opened read-only. (JIC) I usedinput$(lof(...))
to read the entire file content into a stringd$
in a single read. These are relatively small files, KBs, not even MBs. So no worries here.If
instr()
finds a null byte (chr$(0)
) then I print the file name. This will do for starters. The code up to this point will find and report the problem files.Well I needed something to show me where the offending bytes are
FOR x=1 TO len(d$) y=instr(d$,chr$(0),x) IF y THEN ' Dump content up to this null byte PRINT mid$(d$,x,y-x); ' Turn the null byte into a red highlighted period. COLOR ,4 : PRINT "."; : COLOR ,0 ELSE ' No more nulls found - print the rest y=len(d$) PRINT mid$(d$,x,y-x+1); END IF x=y NEXT LINE INPUT "[ENTER] to continue ...";s$
This loop scans the string for the null bytes and replaces them with periods in a red background. I use the enhanced speed of
instr()
to short-circuit the loop. In psBASIC we can alter the counter variable of aFOR
loop.
That's pretty much the order in which I wrote it. And the whole thing took about a half-hour, including debugging. Although I ended tripping up over the results. More on that...
The finished code
DEFINT a-z
'Crawl working subdirectory for all Python sources.
OPEN "i",#1,"|find -type f -name '*.py'"
DO UNTIL eof(1)
LINE INPUT #1,f$
' Load each file read-only
OPEN f$ FOR BINARY ACCESS READ AS #2
d$=input$(lof(2),2)
CLOSE #2
' And check for \0 (null byte)
IF instr(d$,chr$(0)) THEN PRINT f$ : GOSUB 1000
LOOP
END
'*** DUMP with \0 highlighted ***
1000 FOR x=1 TO len(d$)
y=instr(d$,chr$(0),x)
IF y THEN
' Dump content up to this null byte
PRINT mid$(d$,x,y-x);
' Turn the null byte into a red highlighted period.
COLOR ,4 : PRINT "."; : COLOR ,0
ELSE
' No more nulls found - dump the rest
y=len(d$)
PRINT mid$(d$,x,y-x+1);
END IF
x=y
NEXT
LINE INPUT "[ENTER] to continue ...";s$
RETURN
What I found
The actual source files were not corrupted. In fact I was going nuts because it seemed the files my script found didn't have null bytes, even when viewed with "hexdump". Well... I failed to notice the little dot prefixes on the file names that psBASIC was pumping out. ::DUH!:: It turns out the Mac had added files shadowing all of the original files. These files were 4k of mostly null bytes and "py3complie" was picking them up because they also had the ".py" file extensions. Apparently "py3compile" does not skip hidden files.
psBASIC is exceptionally handy for quick sys-admin tasks that BASH is too clumsy for. I will say that a better display process could be wanted. But this fit my needs.