Input & Output Files
The divnet-rs
GitHub page has some example R scripts to help you get data out of R and in to divnet-rs
as well as scripts to help you get data from divnet-rs
back into R for further analysis. Of course, you can also generate the input files by hand, but it is not as easy! They aren't the nicest R scripts you've ever seen, but they should help you to get started!
Input files
The divnet-rs
program takes two input files: a count table and a file with
sample data. Both should be CSV files.
Count table
The count table (aka OTU table, aka ASV table) is a taxa-by-sample representation of your experiment.
Here is an example:
taxa,s1,s2,s3
t1,200,1,2
t2,210,2,1
t3,180,2,1
t4,1,230,235
t5,2,220,215
This data set has five taxa (t1
, t2
, t3
, t4
, t5
) and three samples
(s1
, s2
, s3
). Note that these can be named whatever you want.
The taxa
specifier is ignored and so you can write whatever you want there.
E.g., if you have amplicon sequence variants, you could put asv
there instead
of taxa
.
The values are counts, so they should be positive integers only.
Sample data
The sample data file is a little weird looking, I will admit. It is basically
the output of the R function model.matrix
. It converts your dummy variables
to 0 and 1.
Important note: the order of the samples in the sample data file has to match with the order of the samples in the count table file. This is not ideal, but currently that's how it works :/
A simple example
In this case, there is only one covariate: snazzy
. Here, I have labeled it as
snazzyyes
indicating that samples with a 1
are snazzy (i.e., positive for
condition snazzy
) and samples with a 0
are not snazzy (negative for
condition snazzy
). So s1
is snazzy, but s2
and s3
are NOT snazzy.
sample,snazzyyes
s1,1
s2,0
s3,0
Another example
Here are a couple of lines from the sample data file for the Lee dataset that's included in DivNet.
sample,charbiofilm,charcarbonate,charglassy,charwater
BW1,0,0,0,1
BW2,0,0,0,1
R10,0,0,1,0
R11,0,0,1,0
As you can see, the variable of interest is char
. It has the following
columns:
charbiofilm
(1
for yes, it's a biofilm sample,0
for no it is not)charcarbonate
(1
for yes, it's a carbonate sample,0
for no it is not)charglassy
(1
for yes, it's a glassy sample,0
for no it is not)charwater
(1
for yes, it's a water sample,0
for no it is not)
Now the Lee data has a fifth category, alered
. It is not listed here as that
is the way the model.matrix
dummy encoding works. You don't need a column for
it, (and if you do include it in your dummy encoding things can get wonky) any
sample with a 0
in all the colunms is an altered
sample.
Output files
Here is the output file you get if you run the example files in <source root>/test_files/small
. (The little ones you see above!)
# this is replicate 0
replicate,sample,t1,t2,t3,t4,t5
0,s1,0.33855749713477606,0.34823818972504217,0.30865008153567924,0.0018295805613595625,0.0027246510431431564
0,s2,0.0030379849794267364,0.0028353087370401467,0.0033011806108468353,0.490330204063212,0.5004953216094743
0,s3,0.0030379849794267364,0.0028353087370401467,0.0033011806108468353,0.490330204063212,0.5004953216094743
# this is replicate 1
1,s1,0.3627204546824228,0.36468668498369233,0.26626637816414933,0.00008013857010885896,0.006246343599626745
1,s2,0.0006637323562423841,0.0027111558898110805,0.003702116838977756,0.5432176368099867,0.44970535810498213
1,s3,0.0006637323562423841,0.0027111558898110805,0.003702116838977756,0.5432176368099867,0.44970535810498213
# this is replicate 2
2,s1,0.5294663181856507,0.2428125790405528,0.22595059438495904,0.0016861984655594231,0.00008430992327797039
2,s2,0.003824488773323916,0.003824488773323916,0.005408643892378329,0.5034205045920187,0.48352187396895513
2,s3,0.003824488773323916,0.003824488773323916,0.005408643892378329,0.5034205045920187,0.48352187396895513
# this is replicate 3
3,s1,0.5901081082784421,0.22421302607955013,0.17869297960014005,0.0015961885594974962,0.005389697482370069
3,s2,0.0030619926513927457,0.0006994389587677033,0.004747225069582607,0.5271925248691112,0.4642988184511458
3,s3,0.0030619926513927457,0.0006994389587677033,0.004747225069582607,0.5271925248691112,0.4642988184511458
# this is replicate 4
4,s1,0.42087439504861085,0.3408217058601948,0.22533144275449316,0.0035274481524021776,0.009445008184299079
4,s2,0.004061119432165891,0.0007170056373956505,0.0007467384699807625,0.5003428103976993,0.4941323260627584
4,s3,0.004061119432165891,0.0007170056373956505,0.0007467384699807625,0.5003428103976993,0.4941323260627584
# this is replicate 5
5,s1,0.20558302906768744,0.49136400424687354,0.29390176182754923,0.005463507248674078,0.0036876976092156933
5,s2,0.000702351053334611,0.0024864420846098466,0.0032837333411371906,0.43124409424160054,0.5622833792793178
5,s3,0.000702351053334611,0.0024864420846098466,0.0032837333411371906,0.43124409424160054,0.5622833792793178
Again, not all that nice for human consumption, but it will be nice and easy to parse in R. Check out the scripts I mentioned above for an example of this!