dgsh — directed graph shell Introduction dgsh (pronounced dagsh) is a Unix-style shell extending bash. It allows constructing complex big data and stream processing pipelines. Supports non-linear, non-uniform operations forming directed acyclic graphs (DAGs). Pipelines typically execute across multiple processor cores to improve throughput. Enables sophisticated usage of existing Unix tools and custom components. Reference: Spinellis & Fragkoulis, "Extending Unix Pipelines to DAGs," IEEE Trans. on Computers, 2017. --- Inter-process Communication (IPC) in dgsh Multipipes: Unix pipelines that handle multiple input/output channels. Example: comm with two inputs and three outputs representing line differences and intersections. Multipipe blocks: Groups of asynchronous processes inside {{ ... }} receiving multiple inputs and outputs. Example: Running md5sum and wc -c in parallel on the same input producing two outputs. Stored values: Named buffers holding the last record of a stream for asynchronous read/write by different graph nodes. Implemented using Unix domain sockets via dgsh-writeval and dgsh-readval. --- dgsh Syntax Based on bash scripts with added multipipe blocks using {{ ... }} syntax. Multipipe blocks allow parallel asynchronous execution of multiple commands/pipelines. Pipelines can connect inside and outside multipipe blocks forming DAGs. Example: Runs two echoes concurrently, feeding paste to produce hello world. Multipipe blocks can be recursively nested. --- Adapted Tools for dgsh Many Unix commands adapted to support multiple input/output channels (0..N). Examples: cat (dgsh-tee): 0..N inputs and outputs. comm: 2 inputs, 3 outputs (only-in-first, only-in-second, both). cut: 1 input, N outputs with --multistream. sort -m: merges multiple sorted streams. dgsh-readval, dgsh-writeval: for stored values. dgsh-wrap: wraps non-dgsh commands for pipeline participation. dgsh automatically wraps most commands as filters, assigning appropriate input/output capabilities. POSIX commands with no I/O are treated as input-only, output-only, or neither accordingly. --- Downloading and Installation Supported Platforms Debian, Ubuntu, FreeBSD, Mac OS X. Cygwin port in progress. GraphViz recommended for visualization. Debian/Ubuntu Prerequisites: make, automake, gcc, libtool, pkg-config, texinfo, help2man, bison, check, gperf, git, xz-utils, gettext. Test tools: wbritish, wamerican, libfftw3-dev, csh, curl, bzip2. Installation: Clone repo: git clone --recursive https://github.com/dspinellis/dgsh.git Configure: make config Build: make Install: sudo make install Can set custom PREFIX before config. Test with make test. FreeBSD Similar steps; main difference is the use of gmake instead of make. Required packages include automake, bison, check, git, gmake, gperf, help2man, texinfo, bash. Additional ports for testing: bzip2, curl. Same cloning and build commands with gmake. Test with gmake test. --- Reference Documentation Manual pages available in HTML and PDF form for: Main command: dgsh Helper commands: dgsh-tee, dgsh-wrap, dgsh-writeval, dgsh-readval, dgsh-monitor, dgsh-parallel, perm, dgsh-httpval, dgsh-merge-sum, dgsh-conc, dgsh-enumerate, dgsh_negotiate (API). --- Example Scripts and Use Cases Compression Benchmark Compares compressors (xz, bzip2, gzip) on standard input without touching disk. Outputs file type, size, and compressed sizes. Demonstrates multipipe blocks and dgsh-tee. Git Commit Statistics Lists authors and days ordered by commit counts from git log. Uses functions and streams in multipipe blocks. C Code Metrics Summarizes metrics like number of