What is the Library of Babel?
Read the home page first.
The Library of Babel is a fictional library that contains every possible unique book of 1,312,000 characters. Each book has 410 pages of 40 lines of 80 characters each. 410 × 40 × 80 = 1,312,000. As each book consists of a combination of the same 32 characters, the total number of unique books is 321,312,000. For comparison, it is estimated that there are around 1080 atoms in the observable universe.
It contains every book that has ever been written, and every book that ever will be written.
How does it work?
Most importantly, the library, and it’s books, are not stored anywhere — more on that below. Instead, a book is generated by a mathematical function when you view it. Once you view it, it is still not stored anywhere — the same way that by computing 2 × 5, the resulting 10 is not being stored anywhere, it is simply the output of the function. The next person to visit that same book will compute the same function again, and get the same answer.
In the most simple terms:
- Each book is given a numerical index. The first book on the first shelf is book 1, the first book on the second shelf is 33 and so on, until the last book in the entire library — which has the index 321,312,000.
- This (usually very large) number is run through a mathematical function to produce another unique number of 1,312,000 digits, which we represent in base-32 (we generally count in base-10, e.g. 0 through 9. Base-32 just means we count with more numerals, in this case 0-9 and then A-V).
- Each digit of the base-32 result is mapped to the character at that position in the limited 32 character alphabet (0 → a, 1 → b, ...) to produce the content of that book.
Crucially, the mathematical function is reversible meaning that we can instead give it the contents of a book and work backwards determine the numerical index of the book that it appears in.
Once a book is generated from it’s index, it is split up and displayed page by page. Each page is given an identifier derived from the numerical index, in the format room.wall.shelf.book.page e.g. 1.2.3.4.5. This is used in the URL of that page.
For a more detailed explanation of the mathematical functions used and to view the source code, please see the GitHub repository.
How do I know it’s not fake? Is it just creating pages with my search terms and saving them so I can see them again later?
It is easy to assume that this website just takes whatever text you search for, inserts it into a random page, and saves that page on disk so that you can find it again at a later date. Similarly with pages of gibberish, what’s to say they aren’t just generated randomly when you request them and saved for later?
Building a Library of Babel this way would require unobtainable amounts of storage. A single book’s content takes up 1,312,000 bytes (1.31 MB), with a single byte per ASCII character. 1,312,000 bytes multiplied by 321,312,000 unique books gives us a total size of 1.753×101,974,750 TB — an infeasible amount of data to store, to say the very least.
Thus to build a virtual Library of Babel, we have to use a method that doesn’t require any storage. Books are generated on the fly based on their numerical index, and the same index will continue to generate the same book forever, unless the algorithm is changed.
The page URLs seem quite short. How can these possibly map to so many unique books?
Well spotted. On this website, the room identifiers are represented as SHA-256 hashes. The actual room identifiers get incredibly long, up to roughly a million characters. This makes them much too long to be used in URLs, so instead a ‘bookmark’ is created and stored for the room in the form of a hash when it is first visited. This hash is used in place of the room identifier in the page URL.
However, there are orders of magnitude more possible room identifiers than there are possible hashes. In theory, eventually when enough rooms are discovered there will be collisions (2 inputs map to the same output hash) and they will start to be overwritten as new, undiscovered rooms are accessed for the first time. In reality, this limit will never be reached.
What makes this version of the Library different?
As far as I am aware, this is the only implementation of the Library that contains every unique book, rather than just every unique page. This results in a much larger library, and means that you can search for content longer than a single page.
This comes as a result of re-writing the calculations to use GMP, a powerful multiple precision arithmetic library for C/C++. Porting the original pure-JS core logic to use GMP allows for much faster and more efficient calculations, making working with the very large numbers required possible.
Where can I learn more about the Library of Babel?
If you haven’t already read the story then start there! The website libraryofbabel.info also contains lots of supplemental information and theory, as well as another (smaller) implementation of the library itself.