PDA

View Full Version : Stripping comments and whitespace from javascript



Condor
8 Jul 2009, 1:21 AM
I'm searching for a javascript compressor that can remove all comments, trailing whitespace and multiple newlines from a javascript file.

You can disable obfuscation in most compressors, but I also don't want to remove newlines (only multiple newlines).

Does anyone know such a compressor (a modified version of JSMin perhaps)?

Background:
The new JSBuilder2 tool creates an ext-all-debug.js file with all comments. This not only makes it load slowly, but it also makes it almost impossible to search, because my searches get too much hits in the API docs.

Animal
8 Jul 2009, 1:22 AM
Doesn't yuicompressor do all this?

Animal
8 Jul 2009, 1:27 AM
No, looking at its documentation, there is no -preserve-newline switch.

I just requested one: http://yuilibrary.com/projects/yuicompressor/ticket/2527982

Condor
8 Jul 2009, 1:28 AM
No, YUI compressor with --nomunge and --disable-optimizations still removes newlines and leading whitespace.

JSMin in minimal mode keeps newlines, but still removes leading whitespace.

mystix
8 Jul 2009, 8:55 AM
you can disable yuicompressor newline mangling with the --line-break 0 option (that's a zero, not the letter O).

which means leading whitespace is the only remaining thing that will need to be taken care of.
(we should probably put up a request for tab-to-space conversion in yuicompressor too).

vmorale4
8 Jul 2009, 9:17 AM
You could try a two step approach, i.e:

1. Run the file through a minifier, so that it removes comments,spaces etc...
2. Run the resulting file through a code formatter so that it makes it readable again.

I like to use a code formatter called Polystyle (http://www.polystyle.com/) (Windows only), as it has ton of config options.

Condor
8 Jul 2009, 11:27 AM
Running the code through a formatter would probably change the code style significantly.

I can't use that, since I often create overrides based on the content of ext-all-debug.js (which need to resemble the original code closely).

dj
9 Jul 2009, 2:53 AM
Just do some RegEx magic in your favorite scripting language. E.g. that's how it can look like in ruby:



#!/usr/bin/env ruby
#
# strips comments from JavaScript files while preserving
# the overall structure of the file.
#
# usage:
# strip-debug-version.rb inputfile1.js inputfile2.js
# or
# strip-debug-version.rb < inputfile.js

def strip_extjs_debug_version(data)
non_empty_strings = []
stripped_data = data.gsub(/('|").*?[^\\]\1/) { |m|
non_empty_strings.push $&
"!temp-string-replacement-#{non_empty_strings.size}!"
}
stripped_data.gsub(/\/\*.*?\*\//m,"\n").gsub(/\/\/.*$/,"").gsub(/( |\t)+$/, "").gsub(/\n+/,"\n").gsub(/!temp-string-replacement-([0-9]+)!/) { |m| non_empty_strings[$1.to_i] }
end

if __FILE__ == $0
puts strip_extjs_debug_version(ARGF.read) # reads stdin or arguments
#puts strip_extjs_debug_version(DATA.read) # for testing
end

# test input
__END__
/**
* doc comment
*/
function test(arg) {
'// comment or /* comment */ are still there'
// comment
alert('arg is \''+arg+"'");



}

Condor
12 Jul 2009, 10:18 PM
I finally ended up writing it myself (in Java):

return Pattern.compile("/\\*.*?\\*/", Pattern.DOTALL).matcher(script).replaceAll("") // Remove /*comments*/
.replaceAll("//.*", "") // Remove //comments
.replaceAll("\r\n", "\n") // DOS -> Unix linefeeds
.replaceAll("\\s+\n", "\n") // Trim trailing whitespace
.replaceAll("\t", " ") // Tabs to spaces
.replaceAll("\n\n", "\n") // Remove duplicate linefeeds
.replaceAll("^\n", ""); // Remove leading linefeeds

dj
13 Jul 2009, 2:08 AM
test your code with the test from my script above:


/**
* doc comment
*/
function test(arg) {
'// comment or /* comment */ are still there'
// comment
alert('arg is \''+arg+"'");
}


Comments in strings should be preserved. One not so uncommon case that will bail if they are are not preserved is


url = 'http://www.example.com';

Because regular expressions are context-free (besides assertions like \b) you cannot do it with regular expressions alone.
In the script above I replaced all non-empty strings with temporary values, did the regular expression magic and reinserted the strings. That's one common way to handle those cases where you need to replace something with context awareness.

Condor
13 Jul 2009, 2:44 AM
Fortunately this is not something you have to worry about in the Ext SDK, because all urls are written as:

'http:/' + '/extjs.com/s.gif'

mystix
20 Jul 2009, 9:56 AM
alritey... i finally got down to writing a bash script to do this.

2 bash scripts are required.
the first uses sed to collapse multi-line c-style comments into a single line, then removes them completely -- ripped this from off the net:
[ remccoms.sh ]


# Strip C comments
# by Stewart Ravenhall <[email protected]> -- 4 October 2000
# Un-Korn-ized by Paolo Bonzini <[email protected]> -- 24 November 2000

# Strip everything between /* and */ inclusive

# Copes with multi-line comments,
# disassociated end comment symbols,
# disassociated start comment symbols,
# multiple comments per line

# Check given file exists
program=`echo $0|sed -e 's:.*/::'`
if [ "$#" = 1 ] && [ "$1" != "-" ] && [ ! -f "$1" ]; then
print "$program: $1 does not exist"
exit 2
fi

# Create shell variables for ASCII 1 (control-a)
# and ASCII 2 (control-b)
a="`echo | tr '\012' '\001'`"
b="`echo | tr '\012' '\002'`"

sed '
# If no start comment then go to end of script
/\/\*/!b
:a
s:/\*:'"$a"':g
s:\*/:'"$b"':g

# If no end comment
/'"$b"'/!{
:b

# If not last line then read in next one
$!{
N
ba
}

# If last line then remove from start
# comment to end of line
# then go to end of script
s:'"$a[^$b]"'*$::
bc
}

# Remove comments
'"s:$a[^$b]*$b"'::g
/'"$a"'/ bb

:c
s:'"$a"':/*:g
s:'"$b"':*/:g
' $1



the second does all the work (note: requires dos2unix commandline tool):
[ build.sh ]


#!/bin/bash

#extremely naive argument check. i'm lazy. bite me.
if [ $# -ne 2 ]; then
echo "Usage: $0 JSB2_SOURCE_DIR BUILD_DIR"
exit 1
fi

SOURCE=$1
TARGET=$2

java -jar JSBuilder2.jar -v -p $SOURCE/ext.jsb2 -d $TARGET

echo Cleaning up files...
for FILE in $(find -E $TARGET -type f -regex '.*\.(js|css|html|json|php)$'); do
echo - - $FILE

# 1) convert all js/css/html/json/php files to unix line endings
# 2) convert all tab characters to 4 spaces
dos2unix < $FILE | expand -t4 > $FILE.tmp
if [[ $FILE == *-debug.js ]]; then
# strip all multi-line c-style comments + trailing whitespace from *-debug.js files
./remccoms.sh $FILE.tmp | sed 's/[ \t]*$//' > $FILE
else
# remove trailing whitespace
sed 's/[ \t]*$//' $FILE.tmp > $FILE
fi

rm $FILE.tmp
done


place both scripts into the same directory as JSBuilder2.jar, and run ./build.sh to start the build process.
(be sure to run


chmod +x build.sh
chmod +x remccoms.sh

on both scripts first)


enjoy ;)