Blindspin 2: How to do science by dumpster-diving

When a project has a zero budget, everything has to be hacked and improvised, typically with duct tape. The data collection system for project Blindspin is a good example. (For a project description, see “Does it make sense to ride a bike with your eyes shut”)

If we had a budget, we would be looking for a high-end mobile data logger with millisecond accuracy. Since we don’t, we would very much prefer to use smartphone that someone has thrown away. And it seems we can.

The basic requirement is not very complex. Our system will consist of a pair of electronic goggles which are normally opaque, but which the cyclist can turn transparent by pressing a switch. We need to record the time of the keypress, hopefully with millisecond accuracy. We also need to collect GPS location information, so that we can determine the path the cyclist drove while blinded.

There are GPS data logger apps galore for Android. We found that the AndroSensor software is almost perfect. It collects GPS location, accelerometer information, and all other sensor data at resolutions of up to 50 milliseconds. The only problem is how to input information about the key presses. There is no sensor channel for that.

However, we realized that AndroSensor can record the ambient sound level in dB. So we decided to use the audio channel to store button data. In the simplest case, button down (vision occluded) is a loud noise, button up is a quiet noise.

A major problem is that AndroSensor (and most other software we looked at) always uses the phone’s own microphone, even when a line in is used. Thus, it is necessary to input the noise directly into the external microphone.

For a pre-test, we came up with a somewhat rubegoldbergish approach, but one that works. To generate the noise, we used an aviation scanner that has a reasonably large tangent button. The scanner’s autogain means that if there are no aviation transmissions, there is no sound output. However, if the tangent button is pressed, the noise is heard. The gain can be set so that the difference in noise levels is tens of dB.

To eliminate outside noise, the speaker was attached to the phone’s microphone with  Blu-Tack (sinitarra), and the whole thing covered with more Blu-Tack. Thus, the microphone hears almost no external sounds at all. When the tangent is pressed, it hears the noise from the scanner, at tens of decibels.
Blog_medium
The speaker is buried within the mass of Blu-Tack and pressed directly onto the phone’s microphone.

The whole system was attached to the bike handlebars with zip ties. Several different setups were used; the one in the picture below is operated by the index finger. A simpler way was to mount the scanner facing the other way and below the handlebar, so that it could be operated by the thumb.

 

Blog_Bikepic

 

The ergonomics of the system are horrible. However, at this stage it is simply needed to demonstrate the data collection method. Subject A tested it on a straight road and a curved one. Whenever A closed his eyes, he pressed down on the tangent. Whenever he opened them, he let go of the tangent.

The image below is the first data ever produced in this project. The red blue line is the speed given by AndroSensor. The red lines are the dB levels.When the red line is above 60 dB, the eyes were closed and the tangent was pressed, and thus the scanner outputted noise directly into the phone’s microphone.
Plot1

 

We have collected more data from subject A, but will not release data yet. Why? It is not ready yet, but even more importantly we don’t want to skew the results that the volunteers might get. Such ignorance is almost always desired in human research (with the exception of self-testing). The volunteers should have no idea whether it is possible to keep the eyes shut for one second, or for twenty — and they really should not know what we are even looking for at this stage.

For subject A in this particular case, he had a total of 9 occlusions within a 20-second period, with approximately 500-ms eyes-open periods. The occlusion times are between 1 and 2 seconds for this subject for this point on this track for this setup. The occlusion times may be longer in other circumstances… or they may be shorter.

In the final application, we will use a somewhat more complex keypress arrangement, since we will be using an Arduino to control the system. Most likely, we will use a buzzer to create an audible signal about the state of the goggles (loud buzz when goggles opaque, silent when goggles transparent).

The specs of this system are not ideal, but they are actually good enough even for real science. Data are stored at 50-millisecond intervals, but even in simulators, typical intervals are 100 ms. The biggest problem is timing the key press; even if we can get a perfectly sharp rising edge, we will have a 50-millisecond uncertainty in the timing. In practice, it may be even larger if the edges are not completely sharp. We can thus reasonably expect to get a 100-ms time resolution, but not much better. That will be sufficient, as long as we are careful to note it in the analysis.

Of course, this system does have major disadvantages, such as unknown delays in the phone software. We will design a better system if at all possible. But this is a fallback solution, which in the very worst case we can use as the actual solution. Costing zero euros.

See also Blindspin project page.

 

 

Datapisteiden synkeän elämän julmaa matematiikkaa

Joulun kunniaksi olen pohtinut, miten lannistavaa olisi olla datapiste. Yksinkertaisessakin tutkimuksessa voi tulla miljoonia datapisteitä. Lopuksi niistä survotaan kaava, joka on muotoa Y= A + B*X1. Jokainen piste haluaisi päästä A:ksi A:n paikalle; vain yksi pääsee, muut tuomitaan ikuiseen kadotukseen. Mitä elämää se sellainen oikein on?

Ajatus on tullut mieleen, kun olen murskannut tämänhetkisen projektini numeroita. Projektin päämäärä ja yksityiskohdat eivät ole tässä olennaisia (ovatko ne muuallakaan, on makukysymys).  Siinä ajelutettiin noin sata ihmistä ajosimulaattorin läpi. Jokainen ajo kesti lähes tunnin. Dataa on tallennettu kymmenen kertaa sekunnissa.

Koska simulaattoriaika on kallista, ajosuorituksista tallennettiinn kaikki mahdollinen. Ajajasta tallennettiin noin kolmekymmentä parametriä. Lisäksi pidettiin kirjaa siitä, missä muut simulaation objektit ovat. Objekteja on noin viisikymmentä, ja kaikista tallennettiin kahdeksan parametriä. Jokaisella rivillä oli siis yli 500 numeroa. Kymmenen kertaa sekunnissa tunnin ajan tarkoittaa, että jokaisesta kuskista tallennettiin lähes 20 miljoonaa numeroa.

Yhteensä projektin aikana kerättiin siis lähes 2 miljardia datapistettä.

Näiden miljardien tragedia on siinä, että melkein kaikki niistä tapettiin ennen kuin ne edes näkevät päivänvalon. Kuskin toimintaa mittaavat 30 parametriä sentään vaivauduttiin ottamaan talteen. Muista objekteista sen sijaan tallennettiin vain etäisyystieto; yhteensä 50 numeroa riviltä. Ensimmäisen teurastuksen läpäisi siis vain 10% luvuista, eli 90% joutui heti datapisteiden taivaaseen. 200 miljoonaa datapistettä jäljellä.

Käyttökelpoisia tienpätkiä oli lopulta noin neljäsosa: 50 miljoonaa datapistettä. Tässä vaiheessa alkoi selvitä, mitkä parametrit ylipäätään ovat analyysissä tärkeitä. Viisikymmentä tallannetua parametriä voitiin tiivistää hieman yli kymmeneen. Kymmenen miljoonaa datapistettä jäljellä. Näiden tallennusvälilä pystyttiin vielä harventamaan, pyöristämällä sijainnit lähimpään täyteen metriin. Varsinaiseen dataprässiin päätyi enää nelisen miljoonaa datapistettä (400,000 mittausta, jokaisessa 10 parametriä).

Prässissä kokeiltiin erilaisia menetelmiä, mm lineaarisia monimuuttujamalleja. Loppujen lopuksi kuitenkin yksinkertaisin oli parasta: kuskit keskiarvoistettiin, niin että sadasta koehenkilöstä saatiin survottua yksi “keskimääräinen” kuski. Noin 99% datapisteistä koki siis irvokkaan keskiarvoistuskuoleman, menettäen kaiken sen yksilöllisyyden joka tekee numerosta numeron.

Tässä vaiheessa jäljellä oli siis 4000 mittausta, jokaisessa kymmenen parametriä. Pyörittely osoitti, että näistä vain yksi oli lopulta tärkeä (riippuva muuttuja Y), ja sen pystyi parhaiten selittämään kaksi riippumatonta muuttujaa (X1 ja X2).

Koko tutkimustulos tiivistyi siis kaavaksi

  Y = A + B*X1 + C*X2.

Toisin sanoen, tehtäväksi jäi määritellä kolme vakiota (A,B,C). Tämä siis oli koko prosessin loppputulos: kolme numeroa. Alun kahdesta miljardista. Ja tämä kaikki vain siksi, että pari akateemista nörttiä saisi taas yhden julkaisun lisää.

Jotta nöyryytys olisi täydellinen, näissä vakioissa on vain kaksi merkitsevää desimaalia, kun alkuperäinen data kerättiin vähintään kuuden merkitsevän desimaalin tarkkuudella. Numero on onnellinen, kun se on tarkka; jokaisen desimaalin menetys on kuin kadottaisi raajan.

Voin verrata tätä suoraan omaan elämääni. Tilastojen perusteella maailmassa on noin 3.5 miljardia työikäistä ihmistä, eli vajaa kaksi miljardia miestä. Tässä kilpailussa meidät laitettaisiin toistuvasti valtavan tehosekoittimen läpi. Häviäjät valutettaisiin viemäriin, voittajia mössättäisiin taas uudelleen. Lopussa papukaijamerkin saisivat ne kolme, joista on vielä jotakin jäljellä.

(Teoriassa voisi toki ajatella, että palkinnoksi jäisivät ne lähes kaksi miljardia työikäistä naista jotka nyt olisivat vapailla markkinoilla. Mutta moniraaja-amputaatiohalvaantunelle se on lähinnä akateeminen ilo).

Oma elämä ei tunnukaan enää yhtä kurjalta, kun tätä miettii. Yhtä mitättömältä toki. Mutta on silti parempi olla yksi joskus osittain terve Ö monien joukossa kuin ainoa täysrampa A ei kenenkään joukossa.

Lisää outoa matematiikkaa: WeirdMath.

 

The mathematics of equality

[T]he data suggest that all humans are equal, as long as one is measuring the wrong thing with the wrong instrument. The same methodology also points to the unity of all beingness.

In Finnish / suomeksi: click here. More postings in a similar vein: WeirdMath.  

Derawi et al 2010 showed that it is possible to identify a person with nearly 80% accuracy by using just a simple accelerometer on a mobile phone. Peoples’ walking styles are different and consistent, and analysis of the gait can identify the person.

In this follow-up study we determined whether a test person can be identified from accelerometer data when the person only stares at the mobile phone. There were 34 test subjects. Part of the test (14 subjects) was a classical double-blind study, in which the participants had no idea what they were supposed to be doing. Part of the test (20 subjects) was a postmodern triple-blind study, in which the subjects did not know that they were participating (the phone was simply placed close to them without the subjects noticing anything).

The experiment was done by placing a pink Samsung Galaxy S2 on the table, and recording its accelerometer data with the AndroSense software. Data were stored at 50-millisecond intervals. The participants were asked to stare at the phone for 20 seconds. In the triple-blind test, logging was done for 20 seconds without telling the test subject. Eight-second samples of all measurements were taken for further analysis.

As a pilot, the test was extended to some animals, as well as other forms of sentient existence. It is unclear whether the pilot was double-blind or triple-blind, as the test subjects did not seem to understand the instructions they were given.

Figure 1: All data. The x-component of the accelerometer was used. To account for tilts, the data were normalized to zero by subtracting the average. An ANOVA test was run.

All

 

Figure 2: Four typical human profiles. It is not possible to statistically identify the test subjects. Age and gender do not affect the results.

People

 

Figure 3: Four animal profiles (dog, cat, cow, bug). The animals cannot be distinguished from one another, nor indeed from humans.

Animals

 

Figure 4: Other organic subjects: an apple, a tree, a woolen sock, and some navel fluff. Since the sock was dirty and the fluff was fresh, it can be reasonably assumed that all subjects were sentient. The profiles cannot be distinguished from the other test subjects.

Other

 

Figure 5: The ANOVA test shows that the null hypothesis cannot be rejected for any subjects. There are no statistically significant differences between any of the subjects.

Anova2

This means that a mobile phone’s accelerometer cannot determine who is staring at the phone. More fundamentally, the triple-blind test shows that the accelerometer cannot even determine whether the subject knows he is supposed to be staring at the phone. The experiment thus shows no differences whatsoever between people.

Extending the study to non-human subjects suggests that there is no statistically observable difference between for example an engineer, a cow, and navel fluff.

In summary, the data suggest that all humans are equal, as long as one is measuring the wrong thing with the wrong instrument. The same methodology also points to the unity of all beingness.

Translate »