Reading Data

6 replies [Last post]
pdeboeck's picture
Offline
Joined: 08/04/2009

A forum regarding reading in data

pdeboeck's picture
Offline
Joined: 08/04/2009
A quick solution to a problem

A quick solution to a problem we had on my end. Several people want to be able to read a lower diagonal matrix.

ReadLowerDiagonal <- function(file) {
n <- max(count.fields(file))
X <- read.table(file, fill = TRUE, col.names = 1:n)
for(col in 1:dim(X)[1]) {
for(row in 1:dim(X)[2]) {
if(col>=row) next
X[col,row] <- X[row,col]
}
}
return(X)
}

Use:

ReadLowerDiagonal("myFile.dat")

tbates's picture
Offline
Joined: 07/31/2009
I see there is also

I see there is also

read.moments() Input a Covariance, Correlation, or Raw Moment Matrix

Description

This functions makes it simpler to input covariance, correlation, and raw-moment matrices to be analyzed by the sem function. The matrix is input in lower-triangular form on as many lines as is convenient, omitting the above-diagonal elements. The elements on the diagonal may also option-ally be omitted, in which case they are taken to be 1.
Usage
read.moments(file = "", diag = TRUE,
names = as.character(paste("X", 1:n, sep = "")))

http://cran.r-project.org/web/packages/sem/sem.pdf

mspiegel's picture
Offline
Joined: 07/31/2009
Here's a slightly simpler

Here's a slightly simpler solution. Nice trick with the max(count.fields()), I didn't think of that. Don't worry too much about the wacky "x[upper.tri(x)] <- t(x)[upper.tri(x)]". It's how we fill the upper triangle, when we need to populate the entries by column because R is craaazy. I needed to pull this trick in OpenMx, and it took me several tries until I implemented it correctly.

readLowerDiagonal <- function(file) {
   n <- max(count.fields(file))
   x <- read.table(file, fill = TRUE, col.names = 1:n)
   x[upper.tri(x)] <- t(x)[upper.tri(x)]
   return(x)
}

neale's picture
Offline
Joined: 07/31/2009
It seems to me that this

It seems to me that this approach is very easily broken, if the file in question has any long lines wrapped. A lot of programs will write wrapped files of this sort, as they have a finite number of fields specified in their format statement (e.g. in FORTRAN 12(F8.5,1X)). In this example, the approach described above would work fine for lower triangular matrices up to order 12 but would break rather badly with larger matrices. I'd rather see a function count the total number of fields and make sure that it's a triangular number before agreeing to read it in. The easiest way to do this is to have the user declare how big the matrix should be before reading it in.

To do something more automatic, we'd need to check that the total number of elements in the file (nels<-sum(count.fields)) was a triangular number, i.e., following the series Tn = n(n+1)/2. Solving the quadratic equation in n
n^2 + n - 2nels = 0
and checking whether either of the solutions is a positive integer should do the trick. But perhaps there's a simpler way; there usually is in R :).

Steve's picture
Offline
Joined: 07/30/2009
Here's a wee bit more robust

Here's a wee bit more robust version with an option for filling in the upper triangle.

ReadLowerTriangle <- function(file, nrows, fill=TRUE) {
    xvector <- scan(file)
    X <- matrix(NA, nrows, nrows)
    i <- 1
    for(row in 1:nrows) {
        for(col in 1:nrows) {
            if(col>row) next
            X[row,col] <- xvector[i]
            i <- i + 1
            if (fill)
                X[col,row] <- X[row,col]
        }
    }
    return(X)
}

Given a supremely ugly data "test.dat"

1 2 3
4
5 6

We get output that looks like this:

> ReadLowerTriangle("test.dat", 3)
Read 6 items
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    2    3    5
[3,]    4    5    6
 
> ReadLowerTriangle("test.dat", 3, fill=FALSE)
Read 6 items
     [,1] [,2] [,3]
[1,]    1   NA   NA
[2,]    2    3   NA
[3,]    4    5    6

Sorry, no quadratic factorization before noon, so you need to supply the number of rows

neale's picture
Offline
Joined: 07/31/2009
:) It's now 12:17 pm so I can

:) It's now 12:17 pm so I can report that if we check

.5*(-1+sqrt(1+8*length(xvector)))%%1 == 0

Then we can complain if the number of elements in xvector is not triangular. Seems to check out with my lousy code below...

b<-1; while (b < 50) {print (b); print((.5 * (-1 + sqrt(1 + 8 * b)))%%1 == 0); b<-b+1 }