]> git.p6c8.net - dumbdbm.git/blob - README.md
Whoops... GitHub doesn't interpret "---" as en dash => just use UTF-8
[dumbdbm.git] / README.md
1 # `DumbDBM_File` – Portable DBM implementation
2
3 With `dumbdbm` / `dbm.dumb` the Python programming language provides a very simple DBM style database (i.e. a key-value database) written entirely in Python, requiring no external library. Being slow and some kind of *dumb*, it is intended as a last resort fallback if no other (more robust) database modules like GDBM, NDBM or Berkeley DB are available.
4
5 In 2011, when I felt boring, I translated the Python module to the Perl programming language. The result was a module named `DumbDBM_File` providing a `tie()` compatible interface for DumbDBM files. This Perl implementation is fully compatible to the original Python one (and contains the same problems, see *Bugs and problems*).
6
7 Beware that this is actually a fun project. I programmed this because I wanted to see if I can do it. And I published it in 2019 to GitHub, because I thought it could be interesting for learning purposes. If possible, please consider using a proper database system.
8
9 ## Synopsis of `DumbDBM_File`
10
11 ```
12 use DumbDBM_File;
13
14 # Opening a database file called "homer.db"
15 # Creating it if necessary
16
17 my %db;
18 tie(%db,'DumbDBM_File','homer.db');
19
20 # Assigning some values
21
22 $db{'name'} = 'Homer';
23 $db{'wife'} = 'Marge';
24 $db{'child'} = 'Bart';
25 $db{'neighbor'} = 'Flanders';
26
27 # Print value of "name": Homer
28
29 print $db{'name'};
30
31 # Overwriting a value
32
33 $db{'child'} = 'Lisa';
34
35 # Remove a value
36 # The value remains in the database file, just the index entry gets removed,
37 # meaning you can't retrieve the value from the database file any more
38
39 delete($db{'neighbor'});
40
41 # Close the database file
42
43 untie %db;
44 ```
45
46 ## Bugs and problems
47
48 This module is a direct port of the Python module containing the same bugs and problems:
49
50 * Seems to contain a bug when updating (I don't know what the bug actually is, I took this information directly from a comment in `dumbdbm`'s source code)
51 * Free space is not reclaimed
52 * No concurrent access is supported (if two processes access the database, they may mess up the index)
53 * This module always reads the whole index file and some updates rewrite the whole index
54 * No read-only mode
55
56 ## Format description
57
58 ### Files
59
60 Consider having a database called `example`, you have up to three files:
61
62 #### `example.dir`
63
64 This is an index file containing information for retrieving the values out of the database. It is a text file containing the key, the file offset and the size of each value.
65
66 #### `example.dir.bak`
67
68 This file **may** containg a backup of the index file.
69
70 #### `example.dat`
71
72 This is the database file containing the values separated by zero-bytes (meaning `\0`).
73
74 ### Index file
75
76 The index file is a text file. It just contains the keys, not the values.
77
78 Each line describes a key and where to find its value in the database file:
79
80 `'key', (pos, siz)`
81
82 * `key`: Key of the data tuple
83 * `pos`: Byte offset in the database file where the value is located
84 * `siz`: Size of the value
85
86 When searching for a value in the database, only the the index file is considered. If a key does not exist in the index file, the corresponding value cannot be retrieved from the database file anymore.
87
88 ### Database file
89
90 The database file is a binary file consisting of blocks with a size of 512 bytes by default. It just contains the values, not the keys.
91
92 The value is inserted into a block. If the value is too big, more than one block is used. This means, a value of 511 bytes uses one block and a value of 512 uses one block. But a value of 513 bytes uses two blocks. If the last block of a value is not completeley used, it gets filled with zero-bytes.
93
94 When a value is modified and the new value fits in the old set of blocks, the old ones are used. Otherwise, a new set of blocks is placed at the end of the file.
95
96 Currently, when a value is removed from the database, only it's entry in the index file is removed, meaning that it is still in the database. This also means, that it will become unaccessible and rendering the corresponding blocks lost. A similar thing happens when a value is moved to different blocks: The index file points to the value in the new blocks, but the old blocks remain unaccessible in the database file.
97
98 ## License
99
100 The original Python module is licensed under the terms of the Python Software License: https://www.python.org/psf/license/
101
102 The Perl implementation `DumbDBM_File` is licensed under the terms of the 2-Clause BSD License (see file *LICENSE*).
103
104 ## Credits
105
106 * `DumbDBM_File`: Patrick Canterino, https://www.patrick-canterino.de/
107 * `dumbdbm` / `dbm.dumb` (original Python implementation:) Python Software Foundation

patrick-canterino.de