4chan download script
Sun, 31 Aug 2008 22:44 - Daniel - Other - Comments (17)
Downloads all the images from a 4chan image thread. I will probably regret downloading anything from 4chan, but that's not my problem.
Usage: 4chandl <4chan thread url>
Download the script
#!/bin/sh
if [ "$1" = "" ]; then
echo "Usage: `basename $0` <4chan thread url>"
exit 1
fi
echo "4chan downloader"
echo "Downloading untill canceled or 404'd"
LOC=$(echo "$1" | egrep -o '([0-9]*).html' | sed 's/\.html//g' )
echo "Downloading to $LOC"
if [ ! -d $LOC ]; then
mkdir $LOC
fi
cd $LOC
while [ "1" = "1" ]; do
TMP=`mktemp`
TMP2=`mktemp`
wget -O "$TMP" "$1"
if [ "$?" != "0" ]; then
rm $TMP $TMP2
exit 1
fi
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
cat "$TMP2" | sed 's!/cb-nws!!g' > "$TMP"
wget -nc -i $TMP
rm $TMP $TMP2
echo "Waiting 30 seconds befor next run"
sleep 30
done;
Tags: download 4chan shell bash wget grep
Trackbacks
Dani's Blog - Thu, 24 Dec 2009 17:00
4chan download script
A few days ago 4chan changed their links and my old download scriptstopped working. Here is the updated version.
Comments
Anon, Serbia And Montenegro - Fri, 30 Jan 2009 14:11
FINALLY inb4 404!
Thanks!
Anon, Unknown - Tue, 10 Mar 2009 15:32
This is useful, thanks
Anon, France - Mon, 10 Aug 2009 16:48
Nice work. Thanks
sam2332, Unknown - Sun, 20 Sep 2009 18:17
can u tell me what the -o parameter of "grep -o" does
and also egrep and sed they are regular expression commands right
LOC=$(
the $( means execute query right?
the reason im asking these questions is because im converting this program to autoit(windows scripting lanuage) and ive never learned/used bash
OHHH and this
if [ "$?" != "0" ]; then
i dont get it lmao
sam2332, Unknown - Sun, 20 Sep 2009 18:33
ok i think the -o stands for output ......
but -d still eludes me
Daniel, Unknown - Sun, 20 Sep 2009 22:49
The o switch means only matching, only the part matching the regex will be returned instead of the full line.
A $() in bash executes the command inside and puts the output into the variable, it's the same as the backstick operator.
$? is the return code of the last program that has been executed. An exit code of 0 means everything went ok.
-d in a test statement checks if it exists and if it is a directory.
sam2332, Unknown - Mon, 21 Sep 2009 16:59
thank you verry much :D
sam2332, Unknown - Tue, 22 Sep 2009 23:24
i jsut have a few more questions
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
what exactly is egrep im pretty sure i is a regexp tool but as for the syntax to use it im not 100%
oh and where in the code are you getting the collection of links?
sam2332, Unknown - Tue, 22 Sep 2009 23:27
oh and | << this what does that do?
Daniel, Unknown - Tue, 22 Sep 2009 23:58
egrep is the same as grep -E, it will match extended regular expressions.
The links are in the first file downloaded by wget.
| is the pipe character, it pipes the output of a programm to the next one.
If you don't undestand a programm, you should just read the man page. It's faster the me explaining every switch I used here.
http://unixhelp.ed.ac.uk/CGI/man-cgi?sh
http://unixhelp.ed.ac.uk/CGI/man-cgi?test
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
http://unixhelp.ed.ac.uk/CGI/man-cgi?wget
sam2332, Unknown - Thu, 24 Sep 2009 06:11
well i finished my program
its in a lanuage called autoit
its kinda slow but ill post the source so maybe someone can improve on it
http://dl.getdropbox.com/u/226498/script/4chan_img_downloader.au3
Marvlarv, United States - Wed, 21 Oct 2009 20:09
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
if u change it to this egrep 'http://(img|cgi|www).*chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
it would add support to the other chans like 99chan, or 7chan, or 711chan, etc. You get the picture. Thanks for the script
efre, Unknown - Sun, 28 Feb 2010 05:29
I love this script. Thanks :)
Anton Eliasson, Sweden - Mon, 02 Aug 2010 18:53
Greetings!
I've modified your script to save all images in a thread, preserving the original filenames. This is great if you're downloading whole sets of images or other OC. Is it okay with you if I publish it under the terms of GNU GPL v3?
Daniel, Austria - Mon, 02 Aug 2010 19:47
@Anton Eliasson: Sure, go ahead
Anton Eliasson, Sweden - Tue, 03 Aug 2010 22:01
Great, it's up now. You'll find it here: http://antoneliasson.wordpress.com/2010/08/03/4chan-download-script/
I also added a few comments in the script to make it easier to understand.