Getting Data from a Breve Simulation efficiently

Hey again. I'm currently trying to write a java program that will act as a front end to a breve application. My current problem is that I need to get a lot of data from the Breve Program into my java program, and I need to do it quickly.

My Java program runs the windows command line and tells it to run a breve simulation and will collect data via the breve simulations print statements to the command prompt. The problem is that I will need to print approximately 1 million lines (which takes about 10 seconds on this computer). I made the program print 30 messages per 'print' command, which reduces the amount of time from 10 seconds to just under 8 seconds.

This still isn't fast enough. Because of this, I have two questions:

1) How can I run Breve directly using the 'Process' class in java - specifically, I want to run the command line version of Breve and specify a breve program to run and possibly arguments, what should the string to do that on Windows look like (I guess it's 'pathtoBreve/breve.exe args Breve-Program-to-Run' but am not sure).

2) Is there a faster way to get data from a breve program? Would it be more effective to look at creating a TCP connection to the Breve program, or would file IO be faster. Even better, is there a way I can use shared data structures in memory to access information directly? I understand ~1 million is a large number, but surely there has to be some way to read all that data faster than simply using 'print' statements.

Thanks a lot for your help.

That sounds like wayyyyyy

That sounds like wayyyyyy too round-about, although in python it would be very easy (just open breve through a pipe as provided by python).

I believe java still has a way of doing what you want; check this out
http://www.macs.hw.ac.uk/cs/online/9nm1/1/4.htm

As for going the file io route, there is a very handy (although unsupported) ramdisk driver for XP which I've used for this exact purpose, although I can't find the exact link for it now. Some digging on google might turn it up.

You might run into problems with synchronization if you are reading/writing to files for interprocess communication; my past experience was that this is avoidable with some scripting using autohotkey. There are some very handy scripts on their forum if you look, for things like named pipes(this is what you really want), blocking execution until a process returns, or message passing- these are all pretty useful as as far as I'm aware you don't have that convenience with java in Windows unless you use sockets.

Fixed some of it...

Tripwire - I was using the Process class in Java to run the simulation and then tying the output stream of the process to the input stream of the java program. I did this through the command prompt originally because I couldn't figure how to run Breve directly. Since posting that, I figured out how to run Breve directly from Java.

I also adjusted the way I was printing information to populate a list and return that, which resulted in a marked performance increase (1.5 seconds from 15 seconds), however I am having problems because the list appears to be truncated (and I think java is truncating the list, not breve as the size of the list is correct). I'll look up the ramdisk driver thing as well.

I don't have that much experience with python, and especially not python with Steve (although I do have some), so I'm not exactly sure what you're talking about with the pipe thing. Could you please explain that a little bit more?

Thanks a lot.

Its ok, python is actually

Its ok, python is actually really easy to move into if you already know java. The syntax and terminology might have slight differences but you shouldn't have any problem getting familiar with it. It's got a lot of nice idioms that come in handy and can make your code a lot more compact while maintaining readability.

As for the pipe thing, its pretty much exactly what you are doing with the buffered input reader in your java program. It routes the output from some program as a data stream in another program. Python has many builtin functions which allow you to strip or format/filter the text. You can pass information between processes in this way by sending command line arguments to spawned processes and having their parent process read back information from the child's stdout. I'm not sure how the performance of javas input stream class relates to python's pipe class or if there would be a noticeable difference.
Allocating shared memory is a pain in windows but using the named pipe autohotkey script should be very effective! http://www.autohotkey.com/forum/topic25867.html You'll have to adapt the code from there but it will give you a good udnerstanding of how to access Window's own internal named pipes (which Windows itself uses for IPC). You can download "junction" at microsoft.com which will allow you to retarget a directory on your HD to your ramdisk, if you like. That way the named pipe will in effect be acting as a superficial kind of shared memory; windows allows full duplex operation on named pipes.

You might also consider the AHK script which allows messaging; this is a pretty solid means of sending data because you use Window's internal message Loop which is fairly decent; its what Windows uses to communicate your mouse clicks to programs like mspaint.

I think the problem is in Breve

Thanks a lot, that makes sense to me, I'll check out the link. I've done some more testing, however, and the problem I have is as follows.

I need to print out a large amount of data. Each print operation takes a 'long' time, in the sense that if I print out each one of a million lines individually, it takes 15 seconds.

I can print out all of the data at once as a list, but Breve is truncating my print statement. It isn't truncating the list itself as far as I can tell - only the print statement is affected (that is, printing the size of the list returns 1 million, printing the list itself prints up to ~230 and then messes up).

I know that some of that is re-stated, but I'm trying to narrow down the causes of the problems I'm experiencing.

Not knowing that much about the capabilities of Breve and Python, would there be a feasible workaround using a stream or stream equivalent in Python to output to my java program?

Thanks a whole lot for your help.

Breve contains both xml

Breve contains both xml archive capabilities, and file io capabilities through the File class. You can either archive as xml and use some free xml parser on the other end to parse it, or just encode your output as a single file. Likely there is some limitation in breve's print function which prevents a gigantic wad of data from being printed at once. If you want to rewrite your steve file as python you can use pythons builtin file commands, some of which could be very useful, like pickle, which allows you to convert any object into a stream and dump it into a file. I'm not sure if theres some way in java to unpickle a python pickle, but you could always use Jython (which handily combines the best of bost worlds).
Seriously though, easiest option would just be using the file class. the documentation is here: http://www.spiderland.org/documentation/steveclasses/File.html

Wouldn't using the file

Wouldn't using the file class actually make the project run more slowly than the pipelining approach, though? The problem with the current approach is that it's not fast enough, I'm not sure how adding the extra step of reading and writing to the hard drive can speed things up. What I need to find out is a way to pipe data to Java from Breve quickly.

1,000,000 items takes about 15 seconds for Breve to print out 1 at a time, but about 1.5 seconds if it prints out a list. Breve can't print out a list that size in one print statement. It also takes 14 seconds to read the data from a file. Of that 14 seconds, Java spends 3 seconds reading the file, and Breve spends 11 seconds writing the list to the hard drive. 14 seconds vs 15 seconds is not the kind of improvement I need to get.

Edit - now it's taking around 5 seconds using file IO. That's not really good enough, but it just strikes me as odd that writing files to the hard drive should be better than piping the data out.

When things are "written

When things are "written out" by using a print statement, they are still using file I.O, although they might not be written to the HD. The output stream from your breve program is a file called "standard ouput" and when you read that with java by opening a buffered input reader you are essentially reading it as you would a file.

If you are using the ramdisk for storage, you don't have to touch a HDD at any point; HOWEVER, Modern OS will often cache IO operations so that they don't even get written to the disk if they are accessed and erased in a very small amount of time, so you won't necessarily be penalized for just checking/writing/reading/removing with the file class

This isn't really what you want to do anyway though.
A pipe will let you keep an open stream of data between two processes, and its certainly fast enough for what you have in mind, considering a majority of IPC on Windows uses named pipes anyway.

All the checking/writing/loading/removing files is generally going to slow your program down. If you used a unix derived OS it would be quite easy to make a named pipe for communication between breve and your program. A named pipe in unix will generally be single duplex so processes will only get to write to the pipe if there is someone waiting to read at the other end; the same applies for reading. As you insist on using windows there is no easy way to use named pipes, but you can fiddle with them by using the autohotkey script I linked earlier, although it might be more work. The key advantage to using an named pipe is it will be fast and provides synchronization.

Thanks

That makes a lot of sense to me. The project I'm working on needs to run on Windows, I don't really have a choice on that. The autohotkey script will probably work, all I need to do now is get python to work with the command line version of Breve, something I am not looking forward to doing.

I might have misunderstood

I might have misunderstood you, but you needn't use python or even ahk at all.
Can you go into more detail on what your program does?
Are you running breve multiple times in a row and collecting data from each instance?
Why not stop using print statements and just accumulate all that information into one long buffer and save it once at the end; on thinking about it further I don't think there would be any other way to get your info out of breve faster.

Well, for right now I'm

Well, for right now I'm trying to setup the interface between Java and the Breve program.

I'm writing a piece of software that will be performing image analysis. The idea is that I'll have a pretty front end Java GUI, and that I'll use Breve to analyze an image and send data back to Java when it's done. Currently, the image sizes I'm working with are 1024x768, which comes out to ~750,000 pixels. I've written a program in Breve that will take an image, divide it up into smaller images, and get and return the values of the sum of the red, green, and blue pixels for each box in the image. Most of that stuff works fine. A lot of the image analysis I'm going to be doing will involve comparisons between other similar images, so each analysis phase needs to be done quickly enough that comparing a block of, say, 5-10 images doesn't take 5 minutes.

The problems I have currently are mainly due to my confusion about I can get data to the java program. I'm a much better programmer in Java than I am in either Steve or Python, so I do the bulk of my work there.

Initially, I was returning the data from each pixel and having Java keep a count internally. This isn't a good solution, as I could simply add up all of the counts in Breve and return those. This sped things up initially, but it still runs into problems because increasing the number of boxes increases the number of print statements. For numbers like 4 boxes per image, this is fine. For numbers like 1000 boxes per image, problems arise. For larger numbers of boxes (which, if this program is expanded to look at larger images, or groups of images), the program will eventually grind down. I don't know how many statements I will need printed, but after going through this I've noticed a few things.

1) the Steve language has a limitation on the size of data you can print at once. I cannot, for instance, populate a list with all of the integers between 0 and 250 and print the entire list out.

2) Python does not have this limitation, and Java can read the data from the python program if I write it in Python.

3) For some reason, the command line version of Breve will not run python programs - I get an error about 'Import Site Failed' and then a bunch of stuff about how it can't find the module 'os' and python can't be used.

Eventually, Breve will either be run multiple times, or once with a lot of arguments passed to it and for each argument perform the same analysis and compare and return results.

Because of problem 1) above, I can't create a long buffer and output it into Java, small buffers can work but it's tricky to truncate a list based on the length of the elements in the list as opposed to the number of elements themselves.

Suggestion

Don't Print, think about it. This is a parser not a compiled application, you are running a script. The Course Work has a server example, or write to a file, at least that is passed to a low level system command. It's not steve that has the 250 limit that is an os configuration. Think Client Server, rather than Document Object.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.