4chan download script
Sunday, 31. August 2008 19:44 - daniel - Other - 20 Comments
Downloads all the images from a 4chan image thread. I will probably regret downloading anything from 4chan, but that's not my problem.
Usage: 4chandl <4chan thread url>
Download the script
#!/bin/sh
if [ "$1" = "" ]; then
echo "Usage: `basename $0` <4chan thread url>"
exit 1
fi
echo "4chan downloader"
echo "Downloading untill canceled or 404'd"
LOC=$(echo "$1" | egrep -o '([0-9]*).html' | sed 's/\.html//g' )
echo "Downloading to $LOC"
if [ ! -d $LOC ]; then
mkdir $LOC
fi
cd $LOC
while [ "1" = "1" ]; do
TMP=`mktemp`
TMP2=`mktemp`
wget -O "$TMP" "$1"
if [ "$?" != "0" ]; then
rm $TMP $TMP2
exit 1
fi
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
cat "$TMP2" | sed 's!/cb-nws!!g' > "$TMP"
wget -nc -i $TMP
rm $TMP $TMP2
echo "Waiting 30 seconds befor next run"
sleep 30
done;
Comments
Anon - Friday, 30. January 2009 12:11
FINALLY inb4 404!
Thanks!
Anon - Tuesday, 10. March 2009 13:32
This is useful, thanks
Anon - Monday, 10. August 2009 13:48
Nice work. Thanks
sam2332 - Sunday, 20. September 2009 15:17
can u tell me what the -o parameter of "grep -o" does
and also egrep and sed they are regular expression commands right LOC=$( the $( means execute query right?
the reason im asking these questions is because im converting this program to autoit(windows scripting lanuage) and ive never learned/used bash
OHHH and this if [ "$?" != "0" ]; then
i dont get it lmao
sam2332 - Sunday, 20. September 2009 15:33
ok i think the -o stands for output ...... but -d still eludes me
Daniel - Sunday, 20. September 2009 19:49
The o switch means only matching, only the part matching the regex will be returned instead of the full line.
A $() in bash executes the command inside and puts the output into the variable, it's the same as the backstick operator.
$? is the return code of the last program that has been executed. An exit code of 0 means everything went ok.
-d in a test statement checks if it exists and if it is a directory.
sam2332 - Monday, 21. September 2009 13:59
thank you verry much :D
sam2332 - Tuesday, 22. September 2009 20:24
i jsut have a few more questions egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2" what exactly is egrep im pretty sure i is a regexp tool but as for the syntax to use it im not 100%
oh and where in the code are you getting the collection of links?
sam2332 - Tuesday, 22. September 2009 20:27
oh and | << this what does that do?
Daniel - Tuesday, 22. September 2009 20:58
egrep is the same as grep -E, it will match extended regular expressions.
The links are in the first file downloaded by wget.
| is the pipe character, it pipes the output of a programm to the next one.
If you don't undestand a programm, you should just read the man page. It's faster the me explaining every switch I used here.
http://unixhelp.ed.ac.uk/CGI/man-cgi?sh http://unixhelp.ed.ac.uk/CGI/man-cgi?test http://unixhelp.ed.ac.uk/CGI/man-cgi?grep http://unixhelp.ed.ac.uk/CGI/man-cgi?wget
sam2332 - Thursday, 24. September 2009 3:11
well i finished my program its in a lanuage called autoit its kinda slow but ill post the source so maybe someone can improve on it
http://dl.getdropbox.com/u/226498/script/4chan_img_downloader.au3
Marvlarv - Wednesday, 21. October 2009 17:09
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
if u change it to this egrep 'http://(img|cgi|www).*chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
it would add support to the other chans like 99chan, or 7chan, or 711chan, etc. You get the picture. Thanks for the script
Dani's Blog - Thursday, 24. December 2009 15:00
A few days ago 4chan changed their links and my old download scriptstopped working. Here is the updated version.
efre - Sunday, 28. February 2010 3:29
I love this script. Thanks :)
Anton Eliasson - Monday, 2. August 2010 15:53
Greetings! I've modified your script to save all images in a thread, preserving the original filenames. This is great if you're downloading whole sets of images or other OC. Is it okay with you if I publish it under the terms of GNU GPL v3?
Daniel - Monday, 2. August 2010 16:47
@Anton Eliasson: Sure, go ahead
Anton Eliasson - Tuesday, 3. August 2010 19:01
Great, it's up now. You'll find it here: http://antoneliasson.wordpress.com/2010/08/03/4chan-download-script/ I also added a few comments in the script to make it easier to understand.
bob - Thursday, 21. July 2011 18:17
Nice job. Any chance you can modify it to download from Reddit as well?
Random Saint - Monday, 7. May 2012 14:27
I noticed a minor change lately, preventing this script form working. However, it is easily fixed by simply changing the line
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
to
egrep '//images.4chan.org/[a-z0-9]+/src/([0-9]*).(jpg|png|gif)' "$TMP" -o | sed 's/\//http:\//' > "$TMP2"
as the source code no longer has the http: in there. Thus it is excluded it form the search and added afterwards. Works for me.
You're welcome
Kevin - Thursday, 8. May 2014 7:20
I created this one : http://zector.net/blog/?page_id=259
Try it out