Calendar

<<March>>
MoTuWeThFrSaSu
1234567
891011121314
15161718192021
22232425262728
293031    

4chan download script

Sun, 31 Aug 2008 22:44 - Daniel - Other - Comments (14)


Downloads all the images from a 4chan image thread. I will probably regret downloading anything from 4chan, but that's not my problem.

Usage: 4chandl <4chan thread url>

Download the script

#!/bin/sh

if [ "$1" = "" ]; then
echo "Usage: `basename $0` <4chan thread url>"
exit 1
fi

echo "4chan downloader"
echo "Downloading untill canceled or 404'd"
LOC=$(echo "$1" | egrep -o '([0-9]*).html' | sed 's/\.html//g' )
echo "Downloading to $LOC"

if [ ! -d $LOC ]; then
mkdir $LOC
fi

cd $LOC

while [ "1" = "1" ]; do
TMP=`mktemp`
TMP2=`mktemp`

wget -O "$TMP" "$1"
if [ "$?" != "0" ]; then
rm $TMP $TMP2
exit 1
fi

egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
cat "$TMP2" | sed 's!/cb-nws!!g' > "$TMP"

wget -nc -i $TMP

rm $TMP $TMP2

echo "Waiting 30 seconds befor next run"
sleep 30
done;



Tags: download 4chan shell bash wget grep



Trackbacks

Dani's Blog - Thu, 24 Dec 2009 17:00

4chan download script

A few days ago 4chan changed their links and my old download scriptstopped working. Here is the updated version.

Comments

gravatar image Anon, Serbia And Montenegro - Fri, 30 Jan 2009 14:11

FINALLY inb4 404!

Thanks!

gravatar image Anon, Unknown - Tue, 10 Mar 2009 15:32

This is useful, thanks

gravatar image Anon, France - Mon, 10 Aug 2009 16:48

Nice work. Thanks

gravatar image sam2332, Unknown - Sun, 20 Sep 2009 18:17

can u tell me what the -o parameter of "grep -o" does

and also egrep and sed they are regular expression commands right
LOC=$(
the $( means execute query right?

the reason im asking these questions is because im converting this program to autoit(windows scripting lanuage) and ive never learned/used bash



OHHH and this
if [ "$?" != "0" ]; then

i dont get it lmao

gravatar image sam2332, Unknown - Sun, 20 Sep 2009 18:33

ok i think the -o stands for output ......
but -d still eludes me

gravatar image Daniel, Unknown - Sun, 20 Sep 2009 22:49

The o switch means only matching, only the part matching the regex will be returned instead of the full line.

A $() in bash executes the command inside and puts the output into the variable, it's the same as the backstick operator.

$? is the return code of the last program that has been executed. An exit code of 0 means everything went ok.

-d in a test statement checks if it exists and if it is a directory.

gravatar image sam2332, Unknown - Mon, 21 Sep 2009 16:59

thank you verry much :D

gravatar image sam2332, Unknown - Tue, 22 Sep 2009 23:24

i jsut have a few more questions
egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

what exactly is egrep im pretty sure i is a regexp tool but as for the syntax to use it im not 100%


oh and where in the code are you getting the collection of links?

gravatar image sam2332, Unknown - Tue, 22 Sep 2009 23:27

oh and | << this what does that do?

gravatar image Daniel, Unknown - Tue, 22 Sep 2009 23:58

egrep is the same as grep -E, it will match extended regular expressions.

The links are in the first file downloaded by wget.

| is the pipe character, it pipes the output of a programm to the next one.

If you don't undestand a programm, you should just read the man page. It's faster the me explaining every switch I used here.

http://unixhelp.ed.ac.uk/CGI/man-cgi?sh
http://unixhelp.ed.ac.uk/CGI/man-cgi?test
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
http://unixhelp.ed.ac.uk/CGI/man-cgi?wget

gravatar image sam2332, Unknown - Thu, 24 Sep 2009 06:11

well i finished my program
its in a lanuage called autoit
its kinda slow but ill post the source so maybe someone can improve on it

http://dl.getdropbox.com/u/226498/script/4chan_img_downloader.au3



gravatar image Marvlarv, United States - Wed, 21 Oct 2009 20:09

egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

if u change it to this egrep 'http://(img|cgi|www).*chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

it would add support to the other chans like 99chan, or 7chan, or 711chan, etc. You get the picture. Thanks for the script

gravatar image efre, Unknown - Sun, 28 Feb 2010 05:29

I love this script. Thanks :)

 

Name
email
Homepage
Remember me
Comment:
 



Trackback-URI