Stellar coin - calculate size of buckets/ directory

Stellar coin – calculate size of buckets/ directory

Stellar digital currency fascinates me a lot. I do not consider it as a true blockchain as Bitcoins, but more as a distributed database for transactions. I think it is a very smart project that is easy to get into.

In order to work with it, I need to do a full sync of the project’ blockchain, but beforehand I need to know what are requirements in terms of disk space.

I was searching tediously on Google for any article delving the numbers of the disk space required. I only found numbers from the years before. So, this week, 5 of August 2019 I have the following number: 450 GB (for the buckets/ directory) and it is just 11,394,736 files. I hope to write another article covering the size of the PosgreSQL database.

Now, after getting these numbers, I have a direct message to the stellar.org community:

Please give us (the users) the rsync server to sync history transactions.

Yuli, cloudinvet.com

So, here is a way for you to reproduce my calculations:

Download AWS file-list info

I used some AWS ninja tricks. For example “aws s3 –no-sign-request ls s3://url –recursive” command dumps file information from Stellar history repository found on AWS S3.

The blockchain is split into 3 directories. I used the following commands for all 3:

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_001/ --recursive > dir1.txt

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_002/ --recursive > dir2.txt

aws s3 --no-sign-request ls s3://history.stellar.org/prd/core-live/core_live_003/ --recursive > dir3.txt

It takes a lot of time to wait for the results to be ready. These commands generate dir1.txt, dir2.txt and dir3.txt files with the file listing information that includes file size (as it is in bold bellow).

2017-04-20 00:19:23        798 prd/core-live/core_live_001/bucket/00/00/01/bucket-0000019fee1c9fc8d806146c887a1785bccb7c284d70bad47e3dbf4174ed2ff3.xdr.gz
2018-03-04 02:52:05 943 prd/core-live/core_live_001/bucket/00/00/0b/bucket-00000b3ca7b69fba27b590c9578308e86710c99e409fd396d46898e6aa489b95.xdr.gz
2017-08-19 15:03:23 876 prd/core-live/core_live_001/bucket/00/00/10/bucket-000010814de8a4af2468d2f701f3603e3921757ed23910a5762c7f2c94fc1f2c.xdr.gz
2017-10-24 05:28:43 710 prd/core-live/core_live_001/bucket/00/00/12/bucket-000012631c3fd6d57eb5a19ceb2e5c63503e67cdeb5dc1c436d4a440a5dc6804.xdr.gz

Next trick – calculate real file size

The file size is returned in bytes. When saving the file on Operation System, each file, even a small file will take the size of the file system I/O block size. In my case, it is 4096 bytes. So, I need to convert file size to real size on disk.

For file size in block conversion I do the following calculation:

file_size_in_blocks = int((file_size_in_bytes+4095)/4096)

Every 256 blocks are 1 Megabyte.

Final calculations

I run the following commands to extract file size in bytes, convert it to file size in blocks and pipe everything to “bc” – a shell calculator.

cat dir1.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

cat dir2.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

cat dir3.txt | gawk '{print int(($3+4095)/4096)"+"} END {print "0X"}' | tr -d "\n" | tr -s "X" "\n" | bc

Finally, I got the next numbers:

(43869274+29818324+43827230)/256 = 459042 Megabytes

Hope this solution helps you. leave your comments.

I LOVE FEEDBACK 🙂

About the author

Yuli Stremovsky
Yuli StremovskyParanoid Security Guy
For the past 15 years I’ve been leading the evolution of startups and enterprises to achieve the highest level of security and compliance. Throughout my career I’ve been a Cyber Security expert and advanced solutions architect with many years of hands on experience both on offensive and defensive side. Knowledgeable at the highest level in application development, networking, data and databases, web applications, large scale Software as a Service solutions, cloud security and blockchain technologies.

I’ve been working with CISO’s of international enterprises, helping them set Information Security strategy, and overseeing the implementation of these recommendations. As part of these projects, I’ve been assisting companies to achieve compliance in GDPR, PCI, HIPAA and SOX.

Among my credits, I was a founder of a database security company GreenSQL/Hexatier which was acquired by Huawei and I’ve co-founded Kesem.io, Secure multi-signature Crypto wallet.

Specialties: Software and cloud architecture, Compliance (GDPR, HIPAA, PCI, SOX), blockchain technologies, software development, secure architectures, project management and low level research.