I stored files inside of Minecraft, and here’s how it works

Files are a funny thing; they’re essentially just a collection of data all inside of one container, and that data is organized into a single-dimensional array of bytes. Many modern operating systems will use a file extension to determine what the file is, and this, in turn, specifies rules of how the data is organized so that it can be interpreted in a meaningful way. However, when it comes to a “file” being a collection of data, it isn’t anything too special. You don’t need a file type on any file. You can save a JPG as a .zip file if you want, and if you force your photo editor to open it, chances are it will just… open anyway.

With that knowledge in mind, data isn’t anything you can’t represent in other forms. We’ve already demonstrated how files can be saved inside Pokémon Emerald, and I decided to take it a step further. What if we could save files inside Minecraft instead? There’s an unlimited world; theoretically, you could save any file you wanted inside the game, just so long as you know how to interpret it afterward. That’s exactly what I did, and while it was painstaking, it’s also a great way to explain how data is saved and referenced.

I’ll have a GitHub link at the bottom of this article, which you can take a look at to run this yourself!

Setting the stage

Understanding data storage

First and foremost, I wanted to find a way to easily represent data in Minecraft in a way that was easily obtainable through legitimate means, and could also still represent a decent amount of data per block. Some of the more complex ideas I had involved stripping wooden logs and using their directions as well, while another idea I had used frames with items inside of them. However, I realized that there are 16 colors in the game, which is perfect. Not only is wool easy to get, but having 16 colors available means that we can store four bits of data inside each wool block, and it also means we get one whole byte every two blocks.

At its core, a file is a sequence of bytes, and when it’s split, this sequence is divided into smaller, manageable segments. This division is done in such a way that each segment is an exact, contiguous subset of the original file’s byte sequence. The process is inherently lossless, meaning it does not alter the content of the bytes themselves. As long as these segments are reassembled in the correct order, the original file can be perfectly recreated. Armed with this knowledge, I created a mapping of hex digits and four-bit sequences to a wool color, which we can use to read and write data. For small files, it’s quite practical to actually build these structures yourself; as I’ll demonstrate later on, a 67-byte file uses 144 blocks of wool, where ten of those blocks are simply padding to ensure an even height and width. I also do not play Bedrock Edition, and this is aimed at the Java edition of Minecraft.

Here’s the table I created with my mappings:

Hex digit

Binary

Wool colour

Block ID (Java)

0000

White

minecraft:white_wool

1

0001

Light Gray

minecraft:light_gray_wool

2

0010

Gray

minecraft:gray_wool

3

0011

Black

minecraft:black_wool

4

0100

Brown

minecraft:brown_wool

5

0101

Red

minecraft:red_wool

6

0110

Orange

minecraft:orange_wool

7

0111

Yellow

minecraft:yellow_wool

8

1000

Lime

minecraft:lime_wool

9

1001

Green

minecraft:green_wool

A

1010

Cyan

minecraft:cyan_wool

B

1011

Light Blue

minecraft:light_blue_wool

C

1100

Blue

minecraft:blue_wool

D

1101

Purple

minecraft:purple_wool

E

1110

Magenta

minecraft:magenta_wool

F

1111

Pink

minecraft:pink_wool

So, for example, if you wanted to write the sequence 1111 0000 1010 0001, it would be:

  • Pink wool
  • White wool
  • Cyan wool
  • Light Gray wool

Thankfully, while there’s a lot of manual block placement involved for someone who is doing this by hand, it’s not too difficult overall to encode data this way. I built an encoder that will create an image you can reference to construct your data format as well.

Encoding data

Creating an mcfunction file

Running our Minecraft File Encoder

Encoding the data is fairly easy, and didn’t take up much time out of the admittedly far too long I spent on storing files in Minecraft in the first place. A hint as to what took far too much time can be seen in the image above, specifically at the number of decoders I tried to implement. We’ll get to that in a bit. However, you can see the encoder ran in the terminal at the bottom of the screen, an image was created, and an “mcfunction” file was generated. An “mcfunction” file is basically a script that can run all of the commands entered into it, so we can instantly place all of the blocks without needing to manually do it ourselves. The image is generated for reference so that you can manually place them, though, if you’d prefer.

To invoke our encoder, we run the following command, which requires the Pillow module installed:

python3 encoder.py hello.txt --cols 12 --y -60

This tells the encoder to only use 12 columns at a time (it defaults to 64), and to use a Y level of -60, as I’m testing this in a superflat world. This is what the above looks like in-game:

Minecraft File Encoding

I added the blocks around the edge for testing purposes when it came to decoding, so really, what you’ll end up with is just the matrix of wool blocks. Depending on what your “cols” value is, it could be a lot wider. We’re finished with encoding now, so it’s time to try and decode our file.

Decoding files from Minecraft

A failed attempt at OCR, though reading world files works fine

Failed decode from a screenshot using OCR in Minecraft

This is where I ran into massive issues, and the solution I settled on is, sadly, not the one I originally wanted. I planned to use image recognition to identify the blocks placed in a screenshot, and this is why I placed those different blocks around the edge to try and identify the edge of the wool matrix. It kind of worked once I used sklearn, but the perspective change and slightly differing lengths in blocks because of this, given their distance to the wool matrix, meant that it wasn’t consistent. It would decode some of it, sometimes, and then other times, not be able to decode it at all. I spent far too much time on various different approaches using an image, but I eventually ended up using Amulet, a Python library that can read directly from a world file.

This worked perfectly, though it has a few downsides. It’s not as simple as just screenshotting what’s in front of you and converting it back to a file, and it requires a lot more manual reconstruction if you want to share a file with a friend via Minecraft using a server, for example. Essentially, you’d need to screenshot it, rebuild it locally in your own world, and then reconstruct it with the decoder. Obviously, nobody would actually like to do that, but I’d also wager nobody is really jumping with joy at the thought of sharing files via Minecraft, even if it were possible to screenshot the wool matrix to pull the file. I just wanted to do it “right”, in an accessible way, and with no requirements to access actual world files.

As you can see below, though, pulling from the world file works perfectly, as you’d expect given the deterministic nature of being able to read individual blocks.

minecraft-decoder-running

There are a couple of limitations when it comes to reading world files; you’ll need to define the X and Y coordinates of the top left of the wool matrix, choose whether you move typically along the X and Z axis (as in, incrementing X and Z as you move across and down), and define the height and width of the matrix. It’s quite a manual process, but it does work. When you first run the program, you’ll be asked for these details:

  • Top left X
  • Top left Y
  • Top left Z
  • Dimension [overworld/nether/end] (default = overworld)
  • Width (cols)
  • height (rows)
  • col step dX dZ [1 0]
  • row step dX dZ [0 1]
  • Padding (trailing white-wool blocks to ignore, 0 for none)

You also need to run it by defining the –world flag, so you run the script like this:

python3 .decode_from_world.py --world '.New World' 

If it comes across an unexpected block, it will raise an error, displaying what block it came across so that you can get a rough idea of what you need to tweak. As well, you’ll need to rename “decoded.bin” to match the expected file format. As previously mentioned, a file type is an external indicator to applications looking to interact with the file, and nothing more. The data stays the same no matter what the file type is. This is also why “containers”, when it comes to video formats, are so important, as they actually define a data structure, compression, and much more.

Minecraft decoded.bin in hexeditor

Once we run our decoder, we can see our output, calculated from mapping each wool block to a hex value and then writing that to a file called decoded.bin:

Hi there, this is a test file to show encoding a file in Minecraft!

While we know it decoded, so it worked, we can even see the hex values and compare them to our wool map. Our file starts off as “48 69 20 74” in hex, which corresponds to:

  • Brown wool
  • Lime wool
  • Orange wool
  • Green wool
  • Gray wool
  • White wool
  • Yellow wool
  • Brown wool

Which matches the blocks that we placed in the game.

Files can be represented by anything

It’s all about knowing how to retrieve it

As we’ve seen previously, files can be represented by anything. If you can define your own structure for reading those files, you can store anything in any format. A string of LEDs can represent 0s and 1s based on their state, or a water bottle could represent two bits of data based on whether it’s empty, a quarter full, half full, or completely full. So long as you know what it means, you can tell others too, and they can interpret the represented data the same way that you can.

This project isn’t meant to be used in its current form. In fact, I’d go so far as to say that you should never use a game to send files to people, especially not in such a tedious manner. Instead, this serves to demonstrate how files can be uniquely stored. If you’re interested in checking out the code I wrote for this project, it’s available on GitHub.

Related

I used YouTube as unlimited storage by storing files as videos

You can technically use YouTube as unlimited cloud storage, though we really don’t recommend it.

Continue Reading