This is a simple example of synthetic data, generated using the cocktail party simulator.
All of these data files come from the same network: a 12 person party with 1 host. All guests know the host and 2 other people (so D knows A (the host) and C and E (its two neighbors).
In the simulation, we add two factors:
sampling (how many observations do we take to build the matrix). in many cases, we are undersampling (not getting enough samples to really capture the phenomenon, which will lead to noisy measurements)
measurement noise (random chance added to the numbers). basically, this says that when we make an observation, there’s a chance it might be a random event (two people that do not know each other still may talk to each other, or two people are talking to each other, but we missed it)
This example should allow you to see how well your techniques deal with these two factors. The underlying phenomenon is the same (so we would hope to have very similar representations), but the errors might make that harder to discover.
The datafiles have the names formed as:
P 12 x 100 – 0 – 1
which means:
- 12 person party (all these are the same)
- x means that its the single host party (we’ll see other networks in future data)
- 100 means 100 samples
- 0 means no noise (6 means the +/- 3 noise added to each conversation selection)
- 1 is the trial (there are two trials of each condition given)
Here is a ZIP of a bunch of these: p12x.zip (16 to be exact)
(right now, I can’t upload individual CSV files – but we’re working on fixing that)