Stellar coin - calculate size of buckets/ directory

Stellar coin – calculate size of buckets/ directory

Posted by

Stellar digital currency fascinates me a lot. I do not consider it as a true blockchain as Bitcoins, but more as a distributed database for transactions. I think it is a very smart project that is easy to get into.

In order to work with it, I need to do a full sync of the project’ blockchain, but beforehand I need to know what are requirements in terms of disk space.

I was searching tediously on Google for any article delving the numbers of the disk space required. I only found numbers from the years before. So, this week, 5 of August 2019 I have the following number: 450 GB (for the buckets/ directory) and it is just 11,394,736 files. I hope to write another article covering the size of the PosgreSQL database.

Now, after getting these numbers, I have a direct message to the stellar.org community:

Please give us (the users) the rsync server to sync history transactions.

Yuli, cloudinvet.com

So, here is a way for you to reproduce my calculations:

Download AWS file-list info

I used some AWS ninja tricks. For example “aws s3 –no-sign-request ls s3://url –recursive” command dumps file information from Stellar history repository found on AWS S3.

The blockchain is split into 3 directories. I used the following commands for all 3:

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_001/ --recursive > dir1.txt

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_002/ --recursive > dir2.txt

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_003/ --recursive > dir3.txt

It takes a lot of time to wait for the results to be ready. These commands generate dir1.txt, dir2.txt and dir3.txt files with the file listing information that includes file size (as it is in bold bellow).

2017-04-20 00:19:23        798 prd/core-live/core_live_001/bucket/00/00/01/bucket-0000019fee1c9fc8d806146c887a1785bccb7c284d70bad47e3dbf4174ed2ff3.xdr.gz
2018-03-04 02:52:05 943 prd/core-live/core_live_001/bucket/00/00/0b/bucket-00000b3ca7b69fba27b590c9578308e86710c99e409fd396d46898e6aa489b95.xdr.gz
2017-08-19 15:03:23 876 prd/core-live/core_live_001/bucket/00/00/10/bucket-000010814de8a4af2468d2f701f3603e3921757ed23910a5762c7f2c94fc1f2c.xdr.gz
2017-10-24 05:28:43 710 prd/core-live/core_live_001/bucket/00/00/12/bucket-000012631c3fd6d57eb5a19ceb2e5c63503e67cdeb5dc1c436d4a440a5dc6804.xdr.gz

Next trick – calculate real file size

The file size is returned in bytes. When saving the file on Operation System, each file, even a small file will take the size of the file system I/O block size. In my case, it is 4096 bytes. So, I need to convert file size to real size on disk.

For file size in block conversion I do the following calculation:

file_size_in_blocks = int((file_size_in_bytes+4095)/4096)

Every 256 blocks are 1 Megabyte.

Final calculations

I run the following commands to extract file size in bytes, convert it to file size in blocks and pipe everything to “bc” – a shell calculator.

cat dir1.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

cat dir2.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

cat dir3.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

Finally, I got the next numbers:

(43869274+29818324+43827230)/256 = 459042 Megabytes

Hope this solution helps you. leave your comments.

I LOVE FEEDBACK 🙂