some basics about zfs
Couple of days ago my backup program restic encountered errors - something like reading end of file. It turned out that problem wasn’t in backup itself but on server filesystem where backup files are stored. I use some small zfs mirror of two 2TB disks on server to backup there.
First thing to do was to check status pool, in my case pool called dataone:
zpool status dataoneResponse looks something like this when pool is healthy:
NAME STATE READ WRITE CKSUM
dataone ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gpt/data_a ONLINE 0 0 0
gpt/data_b ONLINE 0 0 0In my case there were CKSUM errors on both disks. CKSUM errors means that system is reading data with different checksum than data was stored with. Should errors occured only on one disk zfs would have corrected it automaticly. Unfortunately problem was somewhere else this time and damage happened.
Using -v switch:
zpool status -v dataonesystem will report files where problem occured. I deleted all the files reported here. Command still reports them but with some pseudonames enclosed in <>. Then I played with backup but still there were errors, so I have decided move all files to my portable disk and recreate the pool.
I used advice from cheat sheet and labeling partitions, so basicly destroyed all datasets on pool, destroyed pool, relabeled my 2 disks as data_a and data_b and created new pool as mirror:
zfs destroy -r dataone
zpool destroy dataone
gpart create -s gpt ada0
gpart create -s gpt ada1
gpart add -t freebsd-zfs -l data_a ada0
gpart add -t freebsd-zfs -l date_b ada1
zpool create dataone mirror gpt/data_a gpt/data_bI have copied content back from portable disk to the pool then. There was btrfs on the portable disk btw and it uses crc too.
Now was the time to check the state of disk which in case of zfs do scrub routine. So I run
zpool scrub dataoneand waited. Progress and estimated time can be checked by command zpool status. This command basically reads all the data from disk and when there is a problem it gets reported. So I checked it later and there still were errors. I found out there are problems with wires from power supply. When moving them slightly mobo is dead or back alive. I fixed wires temporarily to one side(later I will exchange power supply), deleted all files from datapool and copy them once again. This time scrub didn’t report any problem and there are no problem up till now. There is recomendation to run scrub time to time to find out possible problems earlier.
Links: