Where should example data live?

9 replies [Last post]
tbates's picture
Offline
Joined: 07/31/2009

Hi all,
Currently, the example on the front page doesn't work out of the box, because.

demoData = read.csv("demoOneFactor.csv", header=TRUE)
# cannot open file 'demoOneFactor.csv': No such file or directory

Just because it's so obvious on the home page, I think a lot of people will copy and paste it in, so it would be nice if it ran...

I can't see a datafile in the package called "demoOneFactor.csv"

Using the ability of R to access the web, we could do:
read.csv("http://openmx.psyc.virginia.edu/data/demoOneFactor.csv", header = TRUE)

PS: Are we going to use "T" and "F" for TRUE and FALSE in the documentation?

They are just variables (like n and o), and as such can set to things other than TRUE/FALSE. For minimizing hard to track errors, it might be safer to stick to TRUE AND FALSE, though this is longer. Else add "T=TRUE; F=FALSE;" to the top of the scripts.

Steve's picture
Offline
Joined: 07/30/2009
Not sure if those demo

Not sure if those demo scripts and the data file made it onto the demo directory for the first binary release. If they aren't in the demo directory, they should be.

I think I'd rather put a word about running the demo on the front page into the installation directions. That way people will find the demo directory right away. As the installation instructions stand, that wasn't clear to a couple of our beta testers.

mspiegel's picture
Offline
Joined: 07/31/2009
Just a reminder that any data

Just a reminder that any data that is used by a demo should be read with the data() command. The files read by the data command live in /trunk/data, and for information on the file format required use ?data. The command read.csv() and other similar read() commands only work on files that reside in the current working directory.

Steve's picture
Offline
Joined: 07/30/2009
While that does mean that

While that does mean that people don't have to think about directories, I do not find it to be good R practice when one is teaching students about how to organize their analyses. I find it much preferable to suggest that they learn to create a project directory and place their data in the project directory and then run all their analyses from that project directory.

\begin{soapbox}

I have found that this saves an enormous amount of confusion later on. Otherwise students tend to dump all of their data files into one directory and become confused as to which file belongs to which project.

Now, I know that this doesn't do the automagical switching between read.csv() and read.table() and source() and load(). But do you really want to hide that from people?

\end{soapbox}

If it is the will of the project, I will move demoOneFactor.csv over into trunk/data and change the front page to data(). I am not convinced that is best practice, though. I'd rather talk them through making a project directory at the beginning of each tutorial section in order to reinforce good organizational behavior.

Steve's picture
Offline
Joined: 07/30/2009
Ok. Change to data() and

Ok. Change to data() and moved file are in rev 701.

I don't find repetition at the beginning of a tutorial chapter to be a bad thing. It is boring for us who know what we are doing, but it gives a sense of safety and comfort for a learner who is struggling. As long as it is well marked, the fast movers will skip it.

tbates's picture
Offline
Joined: 07/31/2009
I have a feeling (tm) that

I have a feeling (tm) that

demoData <- data("demoOneFactor.csv", header=T)

Won't work?
Shouldn't it be:

data(demoOneFactor) # reads in the data (and knows about suffixes etc.)
demoData <- demoOneFactor # copy the data into the frame of our choosing

alternatively replace all instances of demoData with demoOneFactor and then just say
data(demoOneFactor) # reads "demoOneFactor.csv" into demoOneFactor as frame

****************
The second two (‘.tab’, ‘.txt’, or ‘.TXT’, and ‘.csv’ or ‘.CSV’ files) will always result in the creation of a single variable with the same name as the data set.

data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths'

t

Steve's picture
Offline
Joined: 07/30/2009
So true. Also, one finds

So true.

Also, one finds the "csv" which stands for "comma separated variables" is forced by data() to be "semicolon separated variables". One more reason I'm not a fan of the whole data directory system that R implements.

But I have made the required changes and they are committed in rev 702.

Jeff's picture
Offline
Joined: 07/31/2009
I think for demos it should

I think for demos it should be data(...), especially for the front page or embedded examples in help files. If we want users to download an archive of data files, sorted however you like, for the user guide, that would be fine, but the Rish convention here is to use "data" for whatever we consider demos.

If you want to write a wiki page explaining the project directory and put the link in a comment next to the data line ("to use non-demo data, see http://...") and/or a large reference or include in the User Guide that would be great--and of benefit to the R community. It's an important step and one that needs to be explained, but only once and not in lieu of an R convention.

And while I absolutely agree with your fine-use-of-a-latex-macro, soapbox, I don't think we should be teaching them data-handling best-practices at the beginning of each tutorial, but only point them to the wiki. If not, where do we draw the line? And are we going to explain both windows and *nix each time? We've said many times throughout the course of the project that we'll leave certain R issues to R (and the user to learn/deal with), and this seems like a time where we should do just that: OpenMx starts at the data object and before that it's R.

EDIT: To make myself clear, I'm not saying we should just leave the user to deal with learning R themselves, but to explain something at the start of every tutorial seems a bit much. Write once, link everywhere....

tbates's picture
Offline
Joined: 07/31/2009
i concur: demo = data()

i concur: demo = data()

mspiegel's picture
Offline
Joined: 07/31/2009
I understand the desire to

I understand the desire to teach students best practices so that they will create a project directory and properly organize their files. But OTOH we have already seen instances of beta-testers who just want everything to work "out of the box". I see only two options for Happy Meal OpenMx: (i) either we use the data() function which means we know where the data resides, or (ii) follow Tim Bates suggestion and place a URL in the script, which assumes the user has a working internet connection. Actually for tutorials that are placed online option (ii) is not a bad idea. But for demos that we intend users to run in stand-alone mode than the preferred approach is (i).