You could adhere to R conventions more closely

8 replies [Last post]
pjohnson's picture
Offline
Joined: 09/19/2009

Hi, I'm new here. I'm a political scientist/programmer. I was drawn in by conversation about SEM among my psych colleagues at the University of Kansas. Now that you have the gcc-4.3 version of lpsolve.la in your package, I can build OpenMx without any trouble on Ubuntu 9.04 with R 2.9.2

I've been reading through your OpenMxUserGuide and testing the examples and I'm having some reluctance about the style of your R interface/code. I'm pretty sure that R users who come to try OpenMx will have difficulty digesting the way you have designed your interface.

I've been reading your source code a bit and playing with examples. I believe these ideas are feasible, but you do have a lot of code :)

Idea 1. Set the "name" slot of your objects implicitly. Running

myThing <- mxAlgebra( Z )

should create an R object named "myThing" and the mxAlgebra name attribute should be automatically set as "myThing". (Why do you need an mx name slot, I don't know quite yet, but I know you have one).

To an R user, it would be second nature to create a matrix

x <- matrix( c(1,2,3,4), ncol=2, dimnames=list(NULL, c("A","B")))

This thing "x" is an object whose name is x. It will be quite foreign to R users that you have syntax that allows a person to name an object differently than its name. I can't imagine any reason why you should allow:

oneName <- mxAlgebra(A + B, name = "C")

the mxAlgebra object's name is "C" but on the outside its R name is "oneName"??

It seems to me huge confusion will follow from this.

It would be much better if you allowed a user to write

myMxMatrix <- mxAlgebra(A+B)

and if it is necessary to have a "name" slot in mxAlgebra, the interpreter should detect the name and create that slot. And, preferably, it would be impossible for a user to name something differently from the "name" slot inside the object.

Idea 2. If you think of things in that way, then the R commands to run Mx models would be simpler. In the way you describe the mxModel interface, you can have a command that goes on for several pages.

It is much more usual in R to have declarations first, as in

A <- mxWhatever
B <- mxWhatever
C <- mxWhatever

mymodel <- mxModel( A, B, C )

Take a look at the examples for "systemfit", for example.

Or the "sem" package for R that John Fox distributes.

Idea 3. Create separate functions mxModel() and update(). mxModel can only create new objects. update can revise mxModel so that it creates an object with a given name. This is the way other R functions work.

myNamedThing <- lm (y~x)

mxModel is now allowing a person to put in a name of an existing object, or a new name. In other R functions, we usually have a function "update" that performs the revision on the existing result. We don't have

myLM <- lm("someNameHere", function="y~x")

and then

lm(myLM, function="y~x+z")

as you seem to want to do in OpenMx.

Instead, an output object with a name exists, and the update function can modify it.

I guess I should say also that mxModel should not have a name="" option, but rather the output object's name should be noted, as is the case in other R functions.

Idea 4. When you introduce R code, you should use the most succinct way.

On page 12, you have this:

#Simulate Data

x <- rnorm (1000, 0, 1)
testData <- as.matrix(x)
selVars <- c("X")
dimnames(testData) <- list(NULL, selVars)

Many of us would not use as.matrix, but it does work. But there is no real need to create the object "selVars" or run the dimnames after. In 2 steps

x <- rnorm(1000) ### don't need the 0 and 1 because they are default
testData <- matrix(x, ncol=1, dimnames=list(NULL,c("X")))

or in one step

testData <- matrix(rnorm(1000), ncol=1, dimnames=list(NULL,c("X")))

If one did want to create the matrix and then assign column names later, as in your example, I suggest

colnames(testData) <- c("X")

rather than dimnames.

I don't understand yet (because I'm only 1/2 way through your manual) why it is not sufficient in OpenMx to have a vector X created thus:

X <- rnorm(1000)

That is the same thing as the Nx1 matrix. It seems to me you are creating some extra work by wrapping a single column inside a matrix object. But I bet I'm going to find a reason for that later :)

Oh, well, I have run out of energy. I'll be glad to hear your feedback to my feedback!

Paul Johnson
Professor, Political Science
University of Kansas
http://pj.freefaculty.org

Steve's picture
Offline
Joined: 07/30/2009
Hi Paul, Welcome to the

Hi Paul,

Welcome to the wonderful world of SEM.

I'll structure my post in a slightly different way. Rather than addressing your specific suggestions (some of which are eerily familiar from many long discussions in our programmer meetings), I would like to discuss a basic philosophical issue at the heart of the design of OpenMx. Essentially, I'd like to talk about the "why" rather than the "what".

Coming from R, you would likely think that SEM is a way of specifying matrix equations and setting up linear algebra problems for numerical optimization. And, yes, that is true.

But it turns out that many people think of SEM in terms of acyclic digraphs. And this is also true.

OpenMx attempts to unify these two philosophies for how linear algebraic problems can be specified. Named entities (or even elements within named entities) can point to other named entities in an OpenMx model. We enforce rules on this pointing behavior so that the whole structure can be flattened into a system of equations with (possibly nonlinear) constraints.

This "pointing to named entities behavior" allows very powerful, flexible, and robust specification. However, the biggest downside is, as you point out, that the named entities inside an OpenMx model are in a different namespace than values out in the R namespace. If we force the two namespaces to be the same, the paradigm can break in important ways. By forcing users to give us an "inside OpenMx" name rather than allowing a default of the R value name, we are pushing the difference between the namespaces to the forefront.

I believe that there should be a documentation section early in the tutorial that talks about this dichotomy between namespaces. But it does not exist yet. Hopefully soon.

The other question that may not obvious from an R user's standpoint is "what is this dang mxMatrix and why does it have so many slots?" The short answer is that an mxMatrix is a matrix plus a bunch of meta information. The meta information allows us to be very flexible about how elements of a matrix are bounded, or constrained, or are functions of other named entities. The end result is a system whereby many types of statistical dependencies and candidate models for comparison can be represented.

mkeller's picture
Offline
Joined: 08/04/2009
Hi all, I think this

Hi all,

I think this namespace issue was also my biggest confusion in starting OpenMx. Being a long-time R user, it felt very odd and non-intuitive. It also doesn't help me that I have very little experience with the S4 dialect. Not many packages seem to use it. As Steve said, it might be helpful to make this namespace issue - and the reasons for it - explicit early on for R users. It might also be helpful to give a very brief intro into S4 somewhere, or a link to such. Best,
Matt

Steve's picture
Offline
Joined: 07/30/2009
The core programming team

The core programming team argued long and hard about whether to move to S4.

Our thinking is that this new specification, which is supposedly the "future of R", according to the R base team, will eventually be widely adopted. There are some benefits to S4, but IMO not so persuasive that everyone is changing over en masse. Time will tell whether our decision was correct.

mspiegel's picture
Offline
Joined: 07/31/2009
I have been trying to come up

I have been trying to come up with a clear example that shows why the slot "name" has been separated from the variable name. An example has come up in one of the forum discussions. Let's assume for the moment that we have overloaded the " <- " operator for the MxModel class, such that when a variable is assigned an instance of the MxModel class, the slot 'name' stores the name of the variable. The following is a short function that can be used to make assignments to nested submodels within a model. It doesn't matter for this discussion why we would want to do this. However, people want to do this (at least one person who posted the forum question).

updateSubmodel &lt;- function(model, submodelname, ...) {
   if (model@name == submodelname) {
       model &lt;- mxModel(model, ...)
   }
   model@submodels &lt;- lapply(model@submodels, updateSubmodel, submodelname, ...)
   return(model)
}

If we had overloaded the "<-" operator, then the function updateSubmodel would have the side-effect of renaming the input model to be "model". This example happens to have a solution, we could replace model <- mxModel() with return(mxModel()) since we can terminate the recursion at that point. But it's likely there are other examples where several consecutive operations need to be applied to a model. Even that problem could be solved, probably, with clever use of anonymous inner functions and/or some form of currying. But all these clever tricks would be to avoid the side-effect associated with the "<-" operator. A simpler design yields no side-effects for the assignment operator.

mspiegel's picture
Offline
Joined: 07/31/2009
Thank you for your comments.

Thank you for your comments. There are many of them to address, so I will focus on a subset of them for this post. There seems to be some confusion about terms from the R language specification. Let's define these terms so that we all use the same words to mean the same things.

x <- matrix( c(1,2,3,4), ncol=2, dimnames=list(NULL, c("A","B")))
This thing "x" is an object whose name is x.

The expression matrix(...) yields a value. That value is of type double. Type double is one of the built-in types in R. Other built-in types include integer, logical, character, list, etc. 'x' is a variable. The variable 'x' stores the value yielded by the expression matrix(...).

In addition to the built-in types, it is possible to create your own types. These are known as S4 object types. mxAlgebra(), mxMatrix(), and mxModel() all return values that are instances of S4 object types. Object types are defined by the slots that store values. In the OpenMx object types, a 'name' slot has been defined.

In R, all values are passed by value as the arguments to a function. This include built-in values and S4 object values. A pass-by-value strategy for objects is different from the strategy employed in other popular values, such as Java or Python, where object references are passed by value.

The pass-by-reference style you have suggested is not the recommended practice in R. It is possible to accomplish, using some combination of closures and/or the global assignment operator (<<-). We tried this in an earlier implementation, using the R.oo package. Our experience is that faking a pass-by-reference style in a R language library is disingenuous to novice R users, because they must re-learn that all the functions in the R langauge base are pass-by-value, and it is frustrating to advanced R users, who expect all functions to by pass-by-value.

pjohnson's picture
Offline
Joined: 09/19/2009
Thanks. This is thought

Thanks. This is thought provoking for me. I can see from the r help lists you have been discussing these same problems with them for some time. So I am not sneaking up on you :;

I completely understand you are allowed to create slots in S4 and call them whatever you want. However, you are creating trouble by allowing objects to have different "R names" from "OpenMx names".

In your documentation, there's an example of a giant mxModel usage on pages 5-6, and mostly I'm objecting against that style. A giant monolithic function call is, well, bad. Its hard to understand, hard to teach. Compare to the way the sem package help pages make calls.

I say it is preferable to style this so that we would have

A <- whatever
B <- whatever
myMXmodel <- mxModel(A,B)

When people try that "one piece" at a time approach, they will run into the problem that "R names" are different from "OpenMx names"

I expect big trouble.

whatever <- mxMatrix (..., name="somethingElse")

anotherThing <- mxMatrix(..., name="somethingElse")

To the Ruser, there are 2 different objects, whatever and anotherThing. However, if the person is unlucky enough to put them both into a mxModel, and mx is reading the name slots, well, bad things will happen.

But I think I can learn to stand your name convention if you would make the documentation avoid the giant monolithic mxModel usages, as on p. 5-6-7.

As a parting shot, I just have to say mspiegel is wrong about one thing. He says:

> The expression matrix(...) yields a value. That value is of type double

Not! The output of matrix is an R object that has several attributes, which includes slots for the values, the number of columns, rows, dimnames, and so forth.

Run
example(matrix)
attributes(mdat)

So there. I shold be able to win at least one argument when it doesn't require you to change any code.

mspiegel's picture
Offline
Joined: 07/31/2009
It looks like I have confused

It looks like I have confused the usual programming language terminology with the R language specification terminology. In most programming language specifications, an object is an instance of a class (see http://en.wikipedia.org/wiki/Object-oriented_programming). In the R language specification, any value is referred to as an object (see http://cran.r-project.org/doc/manuals/R-lang.html#Objects). All R objects have attributes. Only S4 objects have slots. Attributes: http://cran.r-project.org/doc/manuals/R-lang.html#Attributes. Slots: http://cran.r-project.org/doc/manuals/R-lang.html#Object_002doriented-pr....

tbates's picture
Offline
Joined: 07/31/2009
Thank you for your comments.

Thank you for your comments. Michael Spiegel and others will be better able to comment on most elements of this I but ...

I've created a Coming to OpenMx from R page. Feel free to contribute anything that will help R people get on top of OpenMx.

	Idea 1. Set the "name" slot of your objects implicitly.
	I can't imagine any reason why you should allow:
	oneName <- mxAlgebra(A + B, name = "C")
	the mxAlgebra object's name is "C" but on the outside its R name is "oneName"??

Very often OpenMx objects are not assigned to a variable at all, so the internal name would have to be set manually many (most?) times.

Also not sure that people would appreciate having the name slot changed if they assigned the object to a new R variable?

	Idea 2. It is much more usual in R to have declarations first, as in
 
	A <- mxWhatever
	B <- mxWhatever
	C <- mxWhatever
 
	mymodel <- mxModel( A, B, C )

That is exactly what we can do in OpenMx so not quite sure what this point is? If you mean that not all mx Objects are defined outside of a model, then that is correct: OpenMx models have a series of slots into which objects can be assembled inline, or as R objects. It is a user-convenience which is chosen.

That said, the documentation is moving to a declare-and-assemble style
programming where possible and convenient.

Idea 3. Create separate functions mxModel() and update(). 
	mxModel is now allowing a person to put in a name of an existing object, 
	or a new name. In other R functions, we usually 
	have a function "update" that performs the 
	revision on the existing result.

lm-type functions both build and run a model, and update by default updates and runs a model.

In OpenMx, by contrast, we have separate build (mxModel()) and run (mxRun) functions.

In addition, lm-type models have formulae interfaces, but OpenMx doesn't.

So update would be just be a helper function that worked something like

	update <- function(model, newThing, bRemove, bEvaluate) {
	mxModel(model, newThing, remove = bRemove)
	if(bEvaluate){
		model <- mxRun(model)
	}
	return model
}

	Idea 4. When you introduce R code, you should use the most 
	succinct way.... You create object like "selVars" which aren't needed and
   run things like dimnames after not during. In 2 steps

Using colnames where only columns are being set makes sense and may be more readable. Good idea.

selVars is used repeatedly in most scripts, so it usually makes sense to create it. The demo examples (a work in progress) also try to be expandable, so naming things rather than hard coding them help with that.

Including the default values for functions is designed to help beginners in R (which you are not) understand what parameters are available, even if they are being set to their defaults. Hence also naming parameters even when their order is sufficient.