4chan download script

Sunday, 31. August 2008 19:44 - daniel - Other - 20 Comments


Downloads all the images from a 4chan image thread. I will probably regret downloading anything from 4chan, but that's not my problem.

Usage: 4chandl <4chan thread url>

Download the script

#!/bin/sh

if [ "$1" = "" ]; then
echo "Usage: `basename $0` <4chan thread url>"
exit 1
fi

echo "4chan downloader"
echo "Downloading untill canceled or 404'd"
LOC=$(echo "$1" | egrep -o '([0-9]*).html' | sed 's/\.html//g' )
echo "Downloading to $LOC"

if [ ! -d $LOC ]; then
mkdir $LOC
fi

cd $LOC

while [ "1" = "1" ]; do
TMP=`mktemp`
TMP2=`mktemp`

wget -O "$TMP" "$1"
if [ "$?" != "0" ]; then
rm $TMP $TMP2
exit 1
fi

egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"
cat "$TMP2" | sed 's!/cb-nws!!g' > "$TMP"

wget -nc -i $TMP

rm $TMP $TMP2

echo "Waiting 30 seconds befor next run"
sleep 30
done;



Comments

Anon - Friday, 30. January 2009 12:11

FINALLY inb4 404!

Thanks!

Anon - Tuesday, 10. March 2009 13:32

This is useful, thanks

Anon - Monday, 10. August 2009 13:48

Nice work. Thanks

sam2332 - Sunday, 20. September 2009 15:17

can u tell me what the -o parameter of "grep -o" does

and also egrep and sed they are regular expression commands right LOC=$( the $( means execute query right?

the reason im asking these questions is because im converting this program to autoit(windows scripting lanuage) and ive never learned/used bash

OHHH and this if [ "$?" != "0" ]; then

i dont get it lmao

sam2332 - Sunday, 20. September 2009 15:33

ok i think the -o stands for output ...... but -d still eludes me

Daniel - Sunday, 20. September 2009 19:49

The o switch means only matching, only the part matching the regex will be returned instead of the full line.

A $() in bash executes the command inside and puts the output into the variable, it's the same as the backstick operator.

$? is the return code of the last program that has been executed. An exit code of 0 means everything went ok.

-d in a test statement checks if it exists and if it is a directory.

sam2332 - Monday, 21. September 2009 13:59

thank you verry much :D

sam2332 - Tuesday, 22. September 2009 20:24

i jsut have a few more questions egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2" what exactly is egrep im pretty sure i is a regexp tool but as for the syntax to use it im not 100%

oh and where in the code are you getting the collection of links?

sam2332 - Tuesday, 22. September 2009 20:27

oh and | << this what does that do?

Daniel - Tuesday, 22. September 2009 20:58

egrep is the same as grep -E, it will match extended regular expressions.

The links are in the first file downloaded by wget.

| is the pipe character, it pipes the output of a programm to the next one.

If you don't undestand a programm, you should just read the man page. It's faster the me explaining every switch I used here.

http://unixhelp.ed.ac.uk/CGI/man-cgi?sh http://unixhelp.ed.ac.uk/CGI/man-cgi?test http://unixhelp.ed.ac.uk/CGI/man-cgi?grep http://unixhelp.ed.ac.uk/CGI/man-cgi?wget

sam2332 - Thursday, 24. September 2009 3:11

well i finished my program its in a lanuage called autoit its kinda slow but ill post the source so maybe someone can improve on it

http://dl.getdropbox.com/u/226498/script/4chan_img_downloader.au3

Marvlarv - Wednesday, 21. October 2009 17:09

egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

if u change it to this egrep 'http://(img|cgi|www).*chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

it would add support to the other chans like 99chan, or 7chan, or 711chan, etc. You get the picture. Thanks for the script

Dani's Blog - Thursday, 24. December 2009 15:00

A few days ago 4chan changed their links and my old download scriptstopped working. Here is the updated version.

efre - Sunday, 28. February 2010 3:29

I love this script. Thanks :)

Anton Eliasson - Monday, 2. August 2010 15:53

Greetings! I've modified your script to save all images in a thread, preserving the original filenames. This is great if you're downloading whole sets of images or other OC. Is it okay with you if I publish it under the terms of GNU GPL v3?

Daniel - Monday, 2. August 2010 16:47

@Anton Eliasson: Sure, go ahead

Anton Eliasson - Tuesday, 3. August 2010 19:01

Great, it's up now. You'll find it here: http://antoneliasson.wordpress.com/2010/08/03/4chan-download-script/ I also added a few comments in the script to make it easier to understand.

bob - Thursday, 21. July 2011 18:17

Nice job. Any chance you can modify it to download from Reddit as well?

Random Saint - Monday, 7. May 2012 14:27

I noticed a minor change lately, preventing this script form working. However, it is easily fixed by simply changing the line

egrep 'http://(img|cgi).4chan.org/[a-z0-9]+/src/(cb-nws/)?([0-9]*).(jpg|png|gif)' "$TMP" -o > "$TMP2"

to

egrep '//images.4chan.org/[a-z0-9]+/src/([0-9]*).(jpg|png|gif)' "$TMP" -o | sed 's/\//http:\//' > "$TMP2"

as the source code no longer has the http: in there. Thus it is excluded it form the search and added afterwards. Works for me.

You're welcome

Kevin - Thursday, 8. May 2014 7:20

I created this one : http://zector.net/blog/?page_id=259

Try it out