SDFS Using & Modify Guide

Author: Tinoryj

Documentation List of SDFS

Official Guide & Docs

opendedup website on github
opendedup official site
Google Discuss Group

System Environment

Ubuntu 14.04.5
On VMware Fusion 11.0 For macOS
standalone install function
IDEA 2018.1

Modify Aims

Output Chunks process Order after Chunking & before in file deduplication
Output Chunks process Order after in file deduplication & before between files deduplication
Output Chunks process Order when the chunk going to store in the disk.

Install SDFS

In Standalone Model

This install could be overwritten by installing the deb package again.

Step 1: Jump to the position where SDFS's deb package store.

Step 2: Install SDFS and dependencies
    sudo apt-get install fuse libfuse2 ssh openssh-server jsvc libxml2-utils
    sudo dpkg -i sdfs-version.deb
     
Step 3: Change the maximum number of open files allowed
    echo "* hard nofile 65535" >> /etc/security/limits.conf
    echo "* soft nofile 65535" >> /etc/security/limits.conf

Build SDFS

Basic Environment for Package

To package this system, the following dependencies need to install.

Maven – Java Package Manage Tool

To install Maven, JDK is needed.

sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update 
sudo apt-get install openjdk-8-jdk

Using the following command to check the java environment.

java -version

If install success, the following message will be output.

openjdk version "1.8.0_01-internal"
OpenJDK Runtime Environment (build 1.8.0_01-internal-b04)
OpenJDK 64-Bit Server VM (build 25.40-b08, mixed mode)

Then install Maven and verify success.

sudo apt-get install maven
mvn -version

FPM – Packages Creator

Install by apt-get & gem(ruby).

sudo apt-get install ruby-dev build-essential
sudo gem install fpm

Build System deb Package

Modify build.sh

In order to create the deb package, using modified build.sh shell script in /install-packages/.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
VERSION=3.7.8
DEBFILE="sdfs_${VERSION}_amd64.deb"
echo $DEBFILE
sudo rm -rf deb/usr/share/sdfs/lib/*
cd ../
mvn package
cd install-packages
sudo cp ../target/lib/b2-2.0.3.jar deb/usr/share/sdfs/lib/
sudo cp ../target/sdfs-${VERSION}-jar-with-dependencies.jar deb/usr/share/sdfs/lib/sdfs.jar
echo

sudo rm *.deb
sudo rm deb/usr/share/sdfs/bin/libfuse.so.2
sudo rm deb/usr/share/sdfs/bin/libulockmgr.so.1
sudo rm deb/usr/share/sdfs/bin/libjavafs.so
sudo cp DEBIAN/libfuse.so.2 deb/usr/share/sdfs/bin/
sudo cp DEBIAN/libulockmgr.so.1 deb/usr/share/sdfs/bin/
sudo cp DEBIAN/libjavafs.so deb/usr/share/sdfs/bin/
sudo cp ../src/readme.txt deb/usr/share/sdfs/

sudo fpm -s dir -t deb -n sdfs -v $VERSION -C deb/ -d fuse -d libxml2 -d libxml2-utils --vendor datishsystems --deb-no-default-config-files

The key point for build deb package is adding --deb-no-default-config-files to fpm input in line 22.

Add `jre` Package

The origin build package doesn’t contains jre package in /install-packages/deb/usr/share/sdfs/bin/ (because of .gitignore rules).

So you may need to install the official version first and copy /usr/share/sdfs/bin/jre to your repo’s /install-packages/deb/usr/share/sdfs/bin/(The upload jre package may broken or not suitable fo your environment).

Modify `pom.xml` for Maven Project

In line 268 & 269, the two paths may cause mvn package errors.

change ./script to /src/script
create ./test/java file in sdfs project.

After modify, the two line seems like that:

<scriptSourceDirectory>./scripts</scriptSourceDirectory>
<testSourceDirectory>./test/java</testSourceDirectory>

Build and Install

By the following commands to build your SDFS deb package and install it.

cd ./install-packages
sudo ./build.sh
sudo dpkg -i sdfs_3.7.8_amd64.deb

Step 3(line 3) could use for any times by overwriting the old installation.

The first time to build the deb package may take a long time to download needed packages by maven, all the packages will store into ~/.m2/. Then you could build the package quickly by avoiding download again.

Modify SDFS Files

Important Data Struct

Finger

This data struct is almost same with normal chunk data struct in CD-Store and REED.
It’s implemented in Finger.java, the most import datas in that struct is shown below:

private static final byte[] k = new byte[HashFunctionPool.hashLength];
public byte[] chunk; // chunk's logic data
public byte[] hash; // chunk's hash fingerprint
public InsertRecord hl; 
public int start; // start position in single file
public int len; // chunk logic size
public int ap;
public boolean noPersist;
public AsyncChunkWriteActionListener l;
public int claims = -1; // Times the chunk appears in the file 
public String lookupFilter = null;
public String uuid = null; // The file containing the chunk

HashLocPair

The data structure is an intermediate structure of chunk processing. It’s implemented in hashLocPair.java, the most import data in that struct is shown below:

public class HashLocPair implements Comparable<HashLocPair>, Externalizable {
    public static final int BAL = HashFunctionPool.hashLength + 8 + 4 + 4 + 4
            + 4;
    public byte[] hash; // chunk's hash fingerprint
    public byte[] hashloc; // hashed position
    public byte[] data; // chunk's logic data
    public int len; // chunk's logic data size
    public int pos; 
    public int offset;
    public int nlen;
    private boolean dup = false;
    public boolean inserted = false;

The way to get chunks

The default way to get chunk is based on Rabin Fingerprint, it also could be setting into fixed-size chunking by edit XML setting files.

In VariableHashEngine.java (line 61) :

public List<Finger> getChunks(byte [] b,String lookupFilter,String uuid) throws IOException {
    final ArrayList<Finger> al = new        ArrayList<Finger>();
    ff.getChunkFingerprints(b, new  EnhancedChunkVisitor() {
        public void visit(long fingerprint, long chunkStart, long chunkEnd, byte[] chunk) {
            byte[] hash = getHash(chunk);
            Finger f = new Finger(lookupFilter,uuid);
            f.chunk = chunk;
            f.hash = hash;
            f.len = (int) (chunkEnd - chunkStart);
            f.start = (int) chunkStart;
            al.add(f);
        }
    });
    return al;
}

After Chunking Chunks’ Order Output

In File Deduplication

SDFS implements the deduplication process within the file in SparseDedupFile.java. In the public void writeCache(WritableCacheBuffer writeBuffer) function, the dedup in the file is implemented as follows:

HashMap<ByteArrayWrapper, Finger> mp = new HashMap<ByteArrayWrapper, Finger>();
for (Finger f : fs) {
    ByteArrayWrapper ba = new ByteArrayWrapper(f.hash);
    Finger _f = mp.get(ba);
    if (_f == null) {
        f.claims = 1;
        mp.put(ba, f);
    } else {
        _f.claims++;
        mp.put(ba, _f);
    }
}

ByteArryWrapper is a derivative of Byte Arry. It does not have the ability to deduplicate chunks. In the content, ba is the serialization result of the hash of the currently processed chunk. mp.get() will find out whether the hash (key) exists from the HashMap. The hash (key) and finger (value) are inserted into the HashMap when they are not present. The declaration variable of the finger records the number of times the finger appears in the current HashMap.

Output Position

In SparseDedupFile.java, add bit string to hex string function bytesToHex after line 399:

private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {        
    char[] hexChars = new char[bytes.length * 2];
    for ( int j = 0; j < bytes.length; j++ ) {
        int v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

Add output function after line 414:

String metaDataPath = "/sdfsTemp/metaData/" + this.GUID;
for (Finger f : fs) {
    try {
        FileWriter fw = new FileWriter(metaDataPath, true); 
        fw.write(Integer.toString(f.start));
        fw.write("\t");
        fw.write(Integer.toString(f.len));
        fw.write("\t");
        fw.write(Integer.toString(f.chunk.length));
        fw.write("\t");
        fw.write(bytesToHex(f.hash));
        fw.write("\n");
        fw.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
    ···

It will output the after chunking chunks’ order in /sdfsTemp/metaData/GUID for any single sile.

Before Deduplication Chunks’ Order Output (HCServiceProxy Progress)

Between Files Deduplication

In SparseDedupFile.java's writeCache function, after deduplicate progress in single file by HashMap. The Chunks in HashMap (only contains unique chunks) will be added to a ThreadPool to do the next step deduplication (DSE between files deduplication) by FingerPersister runnable function which was designed in class Finger. In this function, every single chunk will be added to HCServiceProxy to find dup and write unique chunk to chunk store. Because this function will be done by multi-thread function, and all that work needs to call HCServiceProxy.writChunk function. So we will add output whenever that function was called.

Output Position

In HCServiceProxy.java, add bit string to hex string function bytesToHex after line 49:

private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {        
    char[] hexChars = new char[bytes.length * 2];
    for ( int j = 0; j < bytes.length; j++ ) {
        int v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

After line 245, add two kinds of output for HCServiceProxy :

All chunks (at /sdfsTemp/dedup/ComeChunks-HC-All-Chunks)
The chunks in a single file (at /sdfsTemp/dedup/ComeChunks-HC-uuid).

public static InsertRecord writeChunk(byte[] hash, byte[] aContents, int ct, String guid,String uuid)
            throws IOException, HashtableFullException {

    String metaDataPath = "/sdfsTemp/dedup/ComeChunks-HC-" + uuid;
    String metaDataPath2 = "/sdfsTemp/dedup/ComeChunks-HC-All-Chunks";
    try {
        FileWriter fw = new FileWriter(metaDataPath, true);
        fw.write(bytesToHex(hash));
        fw.write("\n");
        fw.close();
    } catch (IOException en) {
        en.printStackTrace();
    }
    try {
        FileWriter fw = new FileWriter(metaDataPath2, true);
        fw.write(bytesToHex(hash));
        fw.write("\n");
        fw.close();
    } catch (IOException en) {
        en.printStackTrace();
    }
    // doop = HCServiceProxy.hcService.hashExists(hash);
    if (guid != null && Main.enableLookupFilter) {
        InsertRecord ir = LocalLookupFilter.getLocalLookupFilter(guid).put(hash, aContents, ct,uuid);
        return ir;
    } else
        return HCServiceProxy.hcService.writeChunk(hash, aContents, false, ct,uuid);
}

After Deduplication Chunks’ Order Output (Write to Disk Progress)

The chunk store on the server side is based on the implementation of the AbstractChunkStore template class as follows:

Screen Shot 2018-10-21 at 10.24.14

For standalone model, SDFS will use the BatchFileChunkStore implements.

In BatchFileChunkStore.java , SDFS implements the functions to store unique chunks for all files in the mounted volume. And the encrypt and compress method could be setting here (default is No-encrypt).

public class BatchFileChunkStore implements AbstractChunkStore, AbstractBatchStore, Runnable {
    private String name;
    boolean compress = false;
    boolean encrypt = false;

    private HashMap<Long, Integer> deletes = new HashMap<Long, Integer>();
    boolean closed = false;
    boolean deleteUnclaimed = true;
    File staged_sync_location = new File(Main.chunkStore + File.separator + "syncstaged");
    File container_location = new File(Main.chunkStore);
    int checkInterval = 15000;
    public boolean clustered;
    private int mdVersion = 0;

Whenever any unique chunk wants to store to the logic disk, SDFS will call writeChunk function in BatchFileChunkStore.java line 150.

@Override
public long writeChunk(byte[] hash, byte[] chunk, int len, String uuid) throws IOException {
    try {
        return HashBlobArchive.writeBlock(hash, chunk, uuid);
    } catch (HashExistsException e) {
        throw e;
    } catch (Exception e) {
        SDFSLogger.getLog().warn("error writing hash", e);
        throw new IOException(e);
    }
}

The write function is in HashBlobArchive.java line 734 :

public static long writeBlock(byte[] hash, byte[] chunk, String uuid) throws IOException, ArchiveFullException, ReadOnlyArchiveException {
    if (closed)
        throw new IOException("Closed");
        Lock l = slock.readLock();
        l.lock();
    if (uuid == null || uuid.trim() == "") {
        uuid = "default";
    }
    try {
        for (;;) {
        try {
            HashBlobArchive ar = writableArchives.get(uuid);
            ar.putChunk(hash, chunk);
            return ar.id;
            } catch (HashExistsException e) {
                throw e;
            } catch (ArchiveFullException | NullPointerException | ReadOnlyArchiveException e) {
            if (l != null)
                l.unlock();
            l = slock.writeLock();
            l.lock();
            try {
                HashBlobArchive ar = writableArchives.get(uuid);
                if (ar != null && ar.writeable)
                    ar.putChunk(hash, chunk);
                else {
                    ar = new HashBlobArchive(hash, chunk);
                    ar.uuid = uuid;
                    writableArchives.put(uuid, ar);
                }
                return ar.id;
            } catch (Exception e1) {
                l.unlock();
                l = null;
            } finally {
                if (l != null)
                    l.unlock();
                l = null;
            }
        } catch (Throwable t) {
            SDFSLogger.getLog().error("unable to write", t);
            throw new IOException(t);
        }
    }
    catch (NullPointerException e) {
    SDFSLogger.getLog().error("unable to write data", e);
        throw new IOException(e);
    } finally {
        if (l != null)
            l.unlock();
    }
}

Output Position

In BatchFileChunkStore.java, add bit string to hex string function bytesToHex after line 147:

private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {        
    char[] hexChars = new char[bytes.length * 2];
    for ( int j = 0; j < bytes.length; j++ ) {
        int v = bytes[j] & 0xFF;
        hexChars[j * 2] = hexArray[v >>> 4];
        hexChars[j * 2 + 1] = hexArray[v & 0x0F];
    }
    return new String(hexChars);
}

Add output function after line 152 (in writeChunk function):

@Override
public long writeChunk(byte[] hash, byte[] chunk, int len, String uuid) throws IOException {
    try {
        String metaDataPath = "/sdfsTemp/dedup/" + uuid;
        try {
            FileWriter fw = new FileWriter(metaDataPath, true);
            fw.write(Integer.toString(len));
            fw.write("\t");
            fw.write(Integer.toString(chunk.length));
            fw.write("\t");
            fw.write(bytesToHex(hash));
            fw.write("\t");
            fw.write("\n");
            fw.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return HashBlobArchive.writeBlock(hash, chunk, uuid);
    } catch (HashExistsException e) {
        throw e;
    } catch (Exception e) {
        SDFSLogger.getLog().warn("error writing hash", e);
        throw new IOException(e);
    }
}

本文链接：https://www.cnblogs.com/tinoryj/p/SDFS-Using--Modify-Guide.html

SDFS Using & Modify Guide

Documentation List of SDFS

Official Guide & Docs

Related Documentation

System Environment

Modify Aims

Install SDFS

In Standalone Model

Build SDFS

Basic Environment for Package

Maven – Java Package Manage Tool

FPM – Packages Creator

Build System deb Package

Modify build.sh

Add `jre` Package

Modify `pom.xml` for Maven Project

Build and Install

Modify SDFS Files

Important Data Struct

Finger

HashLocPair

The way to get chunks

After Chunking Chunks’ Order Output

In File Deduplication

Output Position

Before Deduplication Chunks’ Order Output (HCServiceProxy Progress)

Between Files Deduplication

Output Position

After Deduplication Chunks’ Order Output (Write to Disk Progress)

Output Position