Notice: My personal stance on AI generated artwork. Retweet and share if you agree. Let us discuss, and not immediately scream bloody murder.

Now Viewing: gelbooru dump?
Keep it civil, do not flame or bait other users. If you notice anything illegal or inappropriate being discussed, contact an administrator or moderator.

usrnammeee - Group: Member - Total Posts: 1
user_avatar
gelbooru dump?
Posted on: 08/19/25 03:04PM

Is there a way I can download the entire library? I could probably only download a fraction of everything but I'm not sure how long websites like these will continue to exist, so I want to do all I can to preserve what is there right now.



VIZARD_ - Group: Member - Total Posts: 2055
user_avatar
Posted on: 08/19/25 03:28PM

Someone asked this exact same question not to long long ago I forgot the answer tho hopefully someone who remembers it will answer you



LittleLoliPanties - Group: Member - Total Posts: 290
user_avatar
Posted on: 08/19/25 03:50PM

Not an answer to the downloading question, but someone recently asked about total data size. They never got an exact answer, but a similar question had been answered about 5 years ago and it was 7 TB then. Given growth, it's probably at least 10 TB now. So that at least gives you an idea of the storage you'd need to download everything.



VIZARD_ - Group: Member - Total Posts: 2055
user_avatar
Posted on: 08/19/25 04:11PM

Yhyh there we go thank you for answering i was trying to remember but wasn't sure and didn't want to give op the wrong answer thanks again



smutgoblin - Group: Member - Total Posts: 257
user_avatar
Posted on: 08/19/25 04:42PM

You'd probably have to use the API, you could make a python script (other languages are available) in order to get all post IDs, extract the image URL from each response, and download that. And you'd have to factor in rate limiting and error handling. Plus if you want to preserve all the metadata, then you'd need to store all that in relation to every image.

And if you're going to be grabbing the more extreme content you're probably gonna want to do it over a VPN tunnel for privacy, and the VPN provider might flag it anyway? Or if you're going to use a server somewhere in the cloud, they might flag it. Personally I wouldn't like to risk it.



uniform-lover - Group: Member - Total Posts: 132
user_avatar
Posted on: 08/20/25 01:37AM

I would say with the amount of content it would be safer and faster to do it locally by literally sticking storage devices into the server and download all the content. That would take days though unless it had a read/write speed of over 1500mb.

Far as I know its CloudFlare thats the problem as they have shut off Gelbooru. The site is still operational and works fine but its lost a layer of protection from getting DDoS'ed. If that happens it can cause a serious set back. And whoever hates these kinds of sites can do it if they wanted. So this site is now at risk.

We can only pray some action has been taken before that happens.

Im really not sure what I can do. All I've done thus far is email my local MP (which has always been a labour area, go figure) and explain to him that these censorships and age-restrictions puts my hobby and interests at risk, that I like Japanese art, manga/anime and that I feel like the government is now stripping that hobby away from me. I said that I was born in '89 and that during school and parenting I was brought up to believe this was a free country with freedom to express. And that diversity should be respected but also diversity of interests and hobbies.

Thats roughly what Iv put. I do believe something good will happen soon though. I think they are quite close to awakening the beast they dont know of. Apparently this will also affect the next GTA game. GTA is like the biggest form of media in the world. Its bigger than any movie or book. Now if GTA has to change itself to suit this bill you will expect a huge backlash because it also means it affects other games. Modern gamers and past gamers will unite. That will also include coders and hackers.

Something IS changing in this country. But its just a shame its had to happen now when this should have happened at least two years ago. But I think its still frustration even from the covid/lockdown era, that people have still lost their livelihoods from it.

Something is going to happen and its coming



burner_identification - Group: Member - Total Posts: 761
user_avatar
Posted on: 08/20/25 02:02AM

Or you could ask lozer or somebody to mail you the disks. If gb shuts down, they won't need them anyway. >_<

Jokes aside, I believe an archival effort like this needs to start from the site itself, to make themselves available to DL in such a way that:
- the infrastructure can take it in a cost efficient manner
- it can be incrementally updated - assume the first step is done, you do it, but gb doesn't go down, and one year later you want to archive again, what do you do? You dl the entire thing again? Think not.
- and, hopefully, includes the metadata in the db, since that is also very valuable

This looks simple to start, but is not simple to do well. Please do not start hitting the API scanning all ids as others have said, without talking to the staff first (and it is good that you made this thread).



cipactli - Group: Member - Total Posts: 8
user_avatar
Posted on: 08/20/25 05:16PM

VIZARD_ said:
Someone asked this exact same question not to long long ago I forgot the answer tho hopefully someone who remembers it will answer you


That was my post. There was no concrete answer except that 7 TB figure from years ago. But I've done some research and I think the total size is, very roughly calculated, around 33 TB.

I'm using a software called Imgbrd-Grabber (www.bionus.org/imgbrd-grabber/docs/) to scrape images from Gelbooru (it works with multiple other sites as well). According to the program, the total number of files as of right now is 11,874,704 and the ~84,000 files I have downloaded are about 233 GB.

Downloading all of that isn't feasible (or desirable) in my case so I have trimmed it with extensive tag filtering and excluding rating:general and score:<10 to about 3,000,000 files which would (hopefully) be a bit less than 8 TB, and take ~70 days to download.

I don't think (?) it's a burden for the API anymore than browsing the site normally considering how slow it is.

It's a bit tricky to set up the downloader. If anyone want to try themselves feel free to ask about my setup.



cipactli - Group: Member - Total Posts: 8
user_avatar
Posted on: 08/20/25 05:30PM

smutgoblin said:

And if you're going to be grabbing the more extreme content you're probably gonna want to do it over a VPN tunnel for privacy, and the VPN provider might flag it anyway? Or if you're going to use a server somewhere in the cloud, they might flag it. Personally I wouldn't like to risk it.


As for VPNs, one of the few that privacy experts generally recommend is Mullvad VPN. Especially if you pay by cash or crypto currency. And they can't "flag" you as they don't keep any logs on what you connect to through the tunnel. Of course, you are trusting that they do what they promise. That goes for any VPN or any company you rely on when browsing the web. But Mullvad, and possibly IVPN and Proton though I don't know much about them, are generally recommended by experts. They (at least Mullvad) are also very involved in privacy advocacy.

On the other hand, in terms of privacy, a VPN may be of limited use if one is using a closed source operating system that is actively developing new and creative ways to spy on it's users (arstechnica.com/gadgets/2...year-after-announcing-it/) ¯\_(ツ)_/¯



LittleLoliPanties - Group: Member - Total Posts: 290
user_avatar
Posted on: 08/20/25 06:49PM

cipactli said:
That was my post. There was no concrete answer except that 7 TB figure from years ago. But I've done some research and I think the total size is, very roughly calculated, around 33 TB.

That's probably a huge overestimate, because the oldest stuff has much smaller resolutions and file sizes vs more recent stuff. (post #1 is 400 x 600px and a whopping 104KB.) That image grabber is probably starting from the most recent images and working backward, so as you get further back the average file size will steadily decrease.

I'd say it doubling to 14TB is probably the top end of the possible range for current size and it's likely less than that.



add_replyAdd Reply