SDFS Using & Modify Guide
Author: Tinoryj
Documentation List of SDFS
Official Guide & Docs
opendedup website on github
opendedup official site
Google Discuss Group
Related Documentation
GitLab : SDFS System Research Report
System Environment
- Ubuntu 14.04.5
- On VMware Fusion 11.0 For macOS
- standalone install function
- IDEA 2018.1
Modify Aims
- Output Chunks process Order after Chunking & before in file deduplication
- Output Chunks process Order after in file deduplication & before between files deduplication
- Output Chunks process Order when the chunk going to store in the disk.
Install SDFS
In Standalone Model
This install could be overwritten by installing the deb package again.
Step 1: Jump to the position where SDFS's deb package store.
Step 2: Install SDFS and dependencies
sudo apt-get install fuse libfuse2 ssh openssh-server jsvc libxml2-utils
sudo dpkg -i sdfs-version.deb
Step 3: Change the maximum number of open files allowed
echo "* hard nofile 65535" >> /etc/security/limits.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf
Build SDFS
Basic Environment for Package
To package this system, the following dependencies need to install.
Maven – Java Package Manage Tool
To install Maven, JDK
is needed.
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-8-jdk
Using the following command to check the java environment.
java -version
If install success, the following message will be output.
openjdk version "1.8.0_01-internal"
OpenJDK Runtime Environment (build 1.8.0_01-internal-b04)
OpenJDK 64-Bit Server VM (build 25.40-b08, mixed mode)
Then install Maven and verify success.
sudo apt-get install maven
mvn -version
FPM – Packages Creator
Install by apt-get
& gem
(ruby).
sudo apt-get install ruby-dev build-essential
sudo gem install fpm
Build System deb Package
Modify build.sh
In order to create the deb package, using modified build.sh shell script in /install-packages/
.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
VERSION=3.7.8
DEBFILE="sdfs_${VERSION}_amd64.deb"
echo $DEBFILE
sudo rm -rf deb/usr/share/sdfs/lib/*
cd ../
mvn package
cd install-packages
sudo cp ../target/lib/b2-2.0.3.jar deb/usr/share/sdfs/lib/
sudo cp ../target/sdfs-${VERSION}-jar-with-dependencies.jar deb/usr/share/sdfs/lib/sdfs.jar
echo
sudo rm *.deb
sudo rm deb/usr/share/sdfs/bin/libfuse.so.2
sudo rm deb/usr/share/sdfs/bin/libulockmgr.so.1
sudo rm deb/usr/share/sdfs/bin/libjavafs.so
sudo cp DEBIAN/libfuse.so.2 deb/usr/share/sdfs/bin/
sudo cp DEBIAN/libulockmgr.so.1 deb/usr/share/sdfs/bin/
sudo cp DEBIAN/libjavafs.so deb/usr/share/sdfs/bin/
sudo cp ../src/readme.txt deb/usr/share/sdfs/
sudo fpm -s dir -t deb -n sdfs -v $VERSION -C deb/ -d fuse -d libxml2 -d libxml2-utils --vendor datishsystems --deb-no-default-config-files
The key point for build deb package is adding --deb-no-default-config-files
to fpm
input in line 22.
Add jre
Package
The origin build package doesn’t contains jre
package in /install-packages/deb/usr/share/sdfs/bin/
(because of .gitignore rules).
So you may need to install the official version first and copy /usr/share/sdfs/bin/jre
to your repo’s /install-packages/deb/usr/share/sdfs/bin/
(The upload jre
package may broken or not suitable fo your environment).
Modify pom.xml
for Maven Project
In line 268 & 269, the two paths may cause mvn package
errors.
- change
./script
to/src/script
- create
./test/java
file in sdfs project.
After modify, the two line seems like that:
<scriptSourceDirectory>./scripts</scriptSourceDirectory>
<testSourceDirectory>./test/java</testSourceDirectory>
Build and Install
By the following commands to build your SDFS deb package and install it.
cd ./install-packages
sudo ./build.sh
sudo dpkg -i sdfs_3.7.8_amd64.deb
Step 3(line 3) could use for any times by overwriting the old installation.
The first time to build the deb package may take a long time to download needed packages by maven, all the packages will store into ~/.m2/
. Then you could build the package quickly by avoiding download again.
Modify SDFS Files
Important Data Struct
Finger
This data struct is almost same with normal chunk data struct in CD-Store and REED.
It’s implemented in Finger.java
, the most import datas in that struct is shown below:
private static final byte[] k = new byte[HashFunctionPool.hashLength];
public byte[] chunk; // chunk's logic data
public byte[] hash; // chunk's hash fingerprint
public InsertRecord hl;
public int start; // start position in single file
public int len; // chunk logic size
public int ap;
public boolean noPersist;
public AsyncChunkWriteActionListener l;
public int claims = -1; // Times the chunk appears in the file
public String lookupFilter = null;
public String uuid = null; // The file containing the chunk
HashLocPair
The data structure is an intermediate structure of chunk processing. It’s implemented in hashLocPair.java
, the most import data in that struct is shown below:
public class HashLocPair implements Comparable<HashLocPair>, Externalizable {
public static final int BAL = HashFunctionPool.hashLength + 8 + 4 + 4 + 4
+ 4;
public byte[] hash; // chunk's hash fingerprint
public byte[] hashloc; // hashed position
public byte[] data; // chunk's logic data
public int len; // chunk's logic data size
public int pos;
public int offset;
public int nlen;
private boolean dup = false;
public boolean inserted = false;
The way to get chunks
The default way to get chunk is based on Rabin Fingerprint, it also could be setting into fixed-size chunking by edit XML setting files.
In VariableHashEngine.java
(line 61) :
public List<Finger> getChunks(byte [] b,String lookupFilter,String uuid) throws IOException {
final ArrayList<Finger> al = new ArrayList<Finger>();
ff.getChunkFingerprints(b, new EnhancedChunkVisitor() {
public void visit(long fingerprint, long chunkStart, long chunkEnd, byte[] chunk) {
byte[] hash = getHash(chunk);
Finger f = new Finger(lookupFilter,uuid);
f.chunk = chunk;
f.hash = hash;
f.len = (int) (chunkEnd - chunkStart);
f.start = (int) chunkStart;
al.add(f);
}
});
return al;
}
After Chunking Chunks’ Order Output
In File Deduplication
SDFS implements the deduplication process within the file in SparseDedupFile.java
. In the public void writeCache(WritableCacheBuffer writeBuffer)
function, the dedup in the file is implemented as follows:
HashMap<ByteArrayWrapper, Finger> mp = new HashMap<ByteArrayWrapper, Finger>();
for (Finger f : fs) {
ByteArrayWrapper ba = new ByteArrayWrapper(f.hash);
Finger _f = mp.get(ba);
if (_f == null) {
f.claims = 1;
mp.put(ba, f);
} else {
_f.claims++;
mp.put(ba, _f);
}
}
ByteArryWrapper
is a derivative of Byte Arry
. It does not have the ability to deduplicate chunks. In the content, ba
is the serialization result of the hash of the currently processed chunk. mp.get()
will find out whether the hash (key) exists from the HashMap. The hash (key) and finger (value) are inserted into the HashMap when they are not present. The declaration variable of the finger records the number of times the finger appears in the current HashMap.
Output Position
In SparseDedupFile.java
, add bit string to hex string function bytesToHex
after line 399:
private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; j < bytes.length; j++ ) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Add output function after line 414:
String metaDataPath = "/sdfsTemp/metaData/" + this.GUID;
for (Finger f : fs) {
try {
FileWriter fw = new FileWriter(metaDataPath, true);
fw.write(Integer.toString(f.start));
fw.write("\t");
fw.write(Integer.toString(f.len));
fw.write("\t");
fw.write(Integer.toString(f.chunk.length));
fw.write("\t");
fw.write(bytesToHex(f.hash));
fw.write("\n");
fw.close();
} catch (IOException e) {
e.printStackTrace();
}
···
It will output the after chunking chunks’ order in /sdfsTemp/metaData/GUID
for any single sile.
Before Deduplication Chunks’ Order Output (HCServiceProxy Progress)
Between Files Deduplication
In SparseDedupFile.java's
writeCache function, after deduplicate progress in single file by HashMap. The Chunks in HashMap (only contains unique chunks) will be added to a ThreadPool to do the next step deduplication (DSE between files deduplication) by FingerPersister
runnable function which was designed in class Finger
. In this function, every single chunk will be added to HCServiceProxy to find dup and write unique chunk to chunk store. Because this function will be done by multi-thread function, and all that work needs to call HCServiceProxy.writChunk
function. So we will add output whenever that function was called.
Output Position
In HCServiceProxy.java
, add bit string to hex string function bytesToHex
after line 49:
private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; j < bytes.length; j++ ) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
After line 245, add two kinds of output for HCServiceProxy :
- All chunks (at
/sdfsTemp/dedup/ComeChunks-HC-All-Chunks
) - The chunks in a single file (at
/sdfsTemp/dedup/ComeChunks-HC-uuid
).
public static InsertRecord writeChunk(byte[] hash, byte[] aContents, int ct, String guid,String uuid)
throws IOException, HashtableFullException {
String metaDataPath = "/sdfsTemp/dedup/ComeChunks-HC-" + uuid;
String metaDataPath2 = "/sdfsTemp/dedup/ComeChunks-HC-All-Chunks";
try {
FileWriter fw = new FileWriter(metaDataPath, true);
fw.write(bytesToHex(hash));
fw.write("\n");
fw.close();
} catch (IOException en) {
en.printStackTrace();
}
try {
FileWriter fw = new FileWriter(metaDataPath2, true);
fw.write(bytesToHex(hash));
fw.write("\n");
fw.close();
} catch (IOException en) {
en.printStackTrace();
}
// doop = HCServiceProxy.hcService.hashExists(hash);
if (guid != null && Main.enableLookupFilter) {
InsertRecord ir = LocalLookupFilter.getLocalLookupFilter(guid).put(hash, aContents, ct,uuid);
return ir;
} else
return HCServiceProxy.hcService.writeChunk(hash, aContents, false, ct,uuid);
}
After Deduplication Chunks’ Order Output (Write to Disk Progress)
The chunk store on the server side is based on the implementation of the AbstractChunkStore template class as follows:

For standalone model, SDFS will use the BatchFileChunkStore
implements.
In BatchFileChunkStore.java
, SDFS implements the functions to store unique chunks for all files in the mounted volume. And the encrypt and compress method could be setting here (default is No-encrypt).
public class BatchFileChunkStore implements AbstractChunkStore, AbstractBatchStore, Runnable {
private String name;
boolean compress = false;
boolean encrypt = false;
private HashMap<Long, Integer> deletes = new HashMap<Long, Integer>();
boolean closed = false;
boolean deleteUnclaimed = true;
File staged_sync_location = new File(Main.chunkStore + File.separator + "syncstaged");
File container_location = new File(Main.chunkStore);
int checkInterval = 15000;
public boolean clustered;
private int mdVersion = 0;
Whenever any unique chunk wants to store to the logic disk, SDFS will call writeChunk
function in BatchFileChunkStore.java
line 150.
@Override
public long writeChunk(byte[] hash, byte[] chunk, int len, String uuid) throws IOException {
try {
return HashBlobArchive.writeBlock(hash, chunk, uuid);
} catch (HashExistsException e) {
throw e;
} catch (Exception e) {
SDFSLogger.getLog().warn("error writing hash", e);
throw new IOException(e);
}
}
The write function is in HashBlobArchive.java
line 734 :
public static long writeBlock(byte[] hash, byte[] chunk, String uuid) throws IOException, ArchiveFullException, ReadOnlyArchiveException {
if (closed)
throw new IOException("Closed");
Lock l = slock.readLock();
l.lock();
if (uuid == null || uuid.trim() == "") {
uuid = "default";
}
try {
for (;;) {
try {
HashBlobArchive ar = writableArchives.get(uuid);
ar.putChunk(hash, chunk);
return ar.id;
} catch (HashExistsException e) {
throw e;
} catch (ArchiveFullException | NullPointerException | ReadOnlyArchiveException e) {
if (l != null)
l.unlock();
l = slock.writeLock();
l.lock();
try {
HashBlobArchive ar = writableArchives.get(uuid);
if (ar != null && ar.writeable)
ar.putChunk(hash, chunk);
else {
ar = new HashBlobArchive(hash, chunk);
ar.uuid = uuid;
writableArchives.put(uuid, ar);
}
return ar.id;
} catch (Exception e1) {
l.unlock();
l = null;
} finally {
if (l != null)
l.unlock();
l = null;
}
} catch (Throwable t) {
SDFSLogger.getLog().error("unable to write", t);
throw new IOException(t);
}
}
catch (NullPointerException e) {
SDFSLogger.getLog().error("unable to write data", e);
throw new IOException(e);
} finally {
if (l != null)
l.unlock();
}
}
Output Position
In BatchFileChunkStore.java
, add bit string to hex string function bytesToHex
after line 147:
private final static char[] hexArray = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; j < bytes.length; j++ ) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Add output function after line 152 (in writeChunk function):
@Override
public long writeChunk(byte[] hash, byte[] chunk, int len, String uuid) throws IOException {
try {
String metaDataPath = "/sdfsTemp/dedup/" + uuid;
try {
FileWriter fw = new FileWriter(metaDataPath, true);
fw.write(Integer.toString(len));
fw.write("\t");
fw.write(Integer.toString(chunk.length));
fw.write("\t");
fw.write(bytesToHex(hash));
fw.write("\t");
fw.write("\n");
fw.close();
} catch (IOException e) {
e.printStackTrace();
}
return HashBlobArchive.writeBlock(hash, chunk, uuid);
} catch (HashExistsException e) {
throw e;
} catch (Exception e) {
SDFSLogger.getLog().warn("error writing hash", e);
throw new IOException(e);
}
}