************** HDFS Streaming ************** DeepGNN on Linux supports direct HDFS/ADL stream to memory. In order to use this feature, you must have hadoop installed, a few environment variables set and GE options set. Hadoop Download =============== Pip Install ----------- Follow the Hadoop install guide, `here `_. Make sure to verify the CLI works with the command they give before continuing. Build from source ----------------- If you build DeepGNN from source with bazel, you can use the following target to download HDFS, .. code-block:: bash bazel test //src/cc/tests:hdfs_tests --config=linux Environment Variables ===================== .. code-block:: bash export HADOOP_HOME=/path/to/hadoop If building from source using bazel, keep empty and set this value in code instead .. code-block:: bash export JAVA_HOME=/path/to/java Only enter if building from source or you manually download java jdk .. code-block:: bash export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server/" If CLASSPATH is not already set, it will be set automatically with config_path at the top. cores-site.xml ============== A core-site.xml is the main configuration file for hadoop. Below are some quick examples that can be copy and pasted. You can test core-site files with .. code-block:: bash echo 'export HADOOP_CLASSPATH=$HADOOP_HOME/share/hadoop/tools/lib/*' >> etc/hadoop/hadoop-env.sh sudo bin/hdfs dfs --conf core-site.xml -ls ADL Example .. code-block:: XML fs.adl.impl org.apache.hadoop.fs.adl.AdlFileSystem fs.adl.oauth2.refresh.url https://login.microsoftonline.com/TODO\_TENANT\_ID/oauth2/token fs.adl.oauth2.access.token.provider.type ClientCredential fs.adl.oauth2.client.id TODO\_CLIENT\_ID fs.adl.oauth2.credential TODO\_PASSWORD io.file.buffer.size 4194304 fs.parallel\-copy.use true fs.parallel\-copy.detect.text true fs.parallel\-copy.text\-file.scope\-compatible true fs.permissions.umask\-mode 002 HDFS Localhost Example .. code-block:: XML fs.defaultFS hdfs://localhost:9000 Graph Engine Usage ================== Leverage this feature by setting --data_dir to an hdfs or adl link, adding --stream and --config_path path/to/core-site.xml.