Tag Archives: Python

Viitenumeroiden virheistä

31.12.2013 Niko Porjo

Jos tekee vähän kirjoitusvirheitä numeroita kirjoittaessa ei kannata tarkistaa viitenumeroa. Jos tekee paljon niin kannattaa tarkistaa pitkät.

Pedanttina minua on pitkään harmittanut laskujen viitenumeroiden tarkistusnumero järjestelmä. Ei niinkään sen määräytyminen kuin se ettei samassa yhteydessä kerrota asiaan liittyvistä riskeistä. Tarkistusnumeroita on nimittäin vain kymmenen joten satunnaisessa viitteessä on kymmenen prosentin mahdollisuus saada oikea numero. Siis jos arpoo kaikki viitenumeron numerot niin kerran kymmenestä se menee tarkastuksesta läpi. Homma tietysti perustuu siihen, että ihmiset yrittävät kirjoittaa viitenumeron oikein ja yksittäiset virheet jäävät kiinni varmasti.

Itse viitenumeron muodostaminen on yksinkertaista, kuten esim. täältä näkee. Viitenumeron numerot kerrotaan yksitellen numeroilla 7, 3, 1, 7, 3, 1,… Tulot summataan ja vähennetään tulos seuraavasta täydestä kymmenestä (10 => 0). Operaation seurauksena tarkistusnumerot jakautuvat tasaisesti:

Ensimmäinen: 0
Viimeinen: 9
[1 1 1 1 1 1 1 1 1 1]
Ensimmäinen: 10
Viimeinen: 99
[9 9 9 9 9 9 9 9 9 9]
Ensimmäinen: 100
Viimeinen: 999
[90 90 90 90 90 90 90 90 90 90]
Ensimmäinen: 1000
Viimeinen: 9999
[900 900 900 900 900 900 900 900 900 900]
Ensimmäinen: 10000
Viimeinen: 99999
[9000 9000 9000 9000 9000 9000 9000 9000 9000 9000]
Ensimmäinen: 100000
Viimeinen: 999999
[90000 90000 90000 90000 90000 90000 90000 90000 90000 90000]
Ensimmäinen: 1000000
Viimeinen: 9999999
[900000 900000 900000 900000 900000 900000 900000 900000 900000 900000]

Yllä laitoin koneen laskemaan tarkastussumman ensimmäiselle 1e7 luvulle ja pistin ylös mikä tarkistusnumero oli tuloksena.

Viitenumeron maksimi pituus on 19+1. Jos kirjoitusvirheen todennäköisyys on yhtä suuri jokaisen 19 merkin kohdalla, kasvaa todennäköisyys tarkistus systeemin pettämisestä viitenumeron pituuden mukana. Käyttäen itseäni koe-eläimenä arvoin 200 viiden numeron mittaista lukusarjaa ja kirjoitin numerot koneella kuten yleensä laskuja maksaessa.

Kuva 0. Näkymä näppäilyvirheiden testaus ohjelmasta. “xx” rivit siirtävät luettavan ja kirjoitettavan niin kauas etten pystynyt niitä samanaikaisesti näkemään.

Jos tuntui että tein virheen korjasin, tarvittaessa katsomalla kirjoitettua numeroa. Muuten kirjoitin luvut katsomatta tulosta. Lopputuloksena olin jotakuinkin varma turhasta testistä sillä en uskonut tehneeni yhtään virhettä. Tein kuitenkin kahdeksan virhettä.

Jos oletetaan etten kuitenkaan kämmännyt kuin kerran numeroa kohti niin virheen todennäköisyys on vähintään Pv=0.008 yhden kirjoitetun numeron kohdalla. Käytin numeronäppistä ja virheet muilla menetelmillä voivat olla toisia, mutta tämä antanee ihan kohtuullisen arvion suuruusluokasta. Hyvin lyhyellä googlaamisella en löytänyt mitään hyvää lähdettä joten käytän myöhemmin tätä itse mittaamaani.

Virheiden todennäköisyys ei välttämättä ole täysin riippumaton aiemmin tehdyistä virheistä, esimerkiksi sellainen tilanne jossa kaksi virhettä syntyy peräkkäin merkkien vaihtaessa paikkaa voi olla kohtuullisen todennäköinen.

Yhden virheen tilanteessa virhe tulee aina havaituksi, mutta useamman virheen tilanteessa läpi menee noin kymmenen prosenttia virheellisistä viitenumeroista, kuva alla. Jostain syystä kahden virheen tilanteissa skriptin laskema läpimenoprosentti on jopa vähän korkeampi. Skriptin virheellisyyden todennäköisyys ei ole olematon. Usean virheen tapauksessa tulos vaikuttaa järkevältä: satunnaiselle luku sarjalle arvottu tarkistusnumero on 0.1 todennäköisyydellä se oikea.

Kuva 1. Virhettä sataa arvonta kertaa kohti eri mittaisille viitenumeroille. Nollaan menevät viivat liittyvät viitenumeron pituuteen: neljä merkkiä pitkässä viitteessä ei voi olla viittä virhettä.

Tässä kohtaa kannattaa ripotella vähän suolaa, sillä todennäköisyyslaskenta ei ole vahvimpia puoliani. Yksinkertaistettuna todennäköisyys viitenumero jossa on enemmän kuin yksi virhe on yksi miinus ne tapaukset joissa virheitä ei ole ja ne joissa virheitä on vain yksi,

Pk=1-((1-Pv)n+n*(1-Pv)n-1*Pv) (1)

missä Pv on todennäköisyys kirjoittaa merkki väärin, 1-Pv on todennäköisyys kirjoittaa se oikein, (1-Pv)n on todennäköisyys kirjoittaa n merkkiä oikein, (1-Pv)n-1*Pv on todennäköisyys kirjoitta n numeroa pitkässä viitenumerossa yksi merkki väärin, n kertaa edellinen ottaa huomioon mahdollisuuden kämmätä kerran jokaisen merkin kohdalla.

Yksinkertaistuksia ovat mm. edellä mainittu virheiden toisistaan riippumattomuus ja tarkastusnumeron virheettömyys sekä oletus että viitteeseen voi kirjoittaa useita virheitä sitä huomaamatta.

Kuva 2. Todennäköisyys tehdä enemmän kuin yksi virhe viitenumeron pituuden mukaan, tehtäessä 8 virhettä tuhannessa.

Tarkistin tuloksen myös laskemalla skriptillä saman todennäköisyyden kuin kaavalla 1. Miljoonalla toistolla per viitenumeron pituus syntyi hyvä vastaavuus kuten kuvasta näkyy.

Matemaattisen ja tietoteknisen lähestymisen meriittejä pohtineille kerrottakoon kaavalla laskemiseen meneen noin 133 ms kun pitkällä skriptillä meni 864.17 ms, mikä on kuulemma suurin piirtein sama aika kuin se jonka vaimo on menettänyt elämästään tämän blogin kirjoittamiseen liittyvän ilakoinnin aiheuttaman mielipahan vuoksi.

Ihmisten kyky kirjoittaa oikein vaihtelee, joten jos oletetaan minut keskiverto oikein kirjoittajaksi numeroiden osalta (!) niin jotkut tekevät enemmän virheitä jolloin todennäköisyys useampi virheiseen viitenumeroon kasvaa. Kun 19 merkkiselle viitenumerolle Pv=0.008 tasolla noin yksi sadasta sisältää enemmän kuin yhden virheen niin Pv=0.015 tasolla niitä on jo enemmän kuin kolme sadasta.

Kuva 3. Kuten kuva 2, mutta kirjoitusvirheen todennäköisyys Pv=0.015

Koska monivirheisistä viitenumeroista kuitenkin vain yksi kymmenestä menee tarkastuksessa läpi voidaan sanoa että minulla ja ehkä keskiverto näppäilijällä yksi pitkä viitenumero tuhannesta menee läpi. Siis, jos oletetaan kirjoittajan korjaavan kehotuksen jälkeen virheet täydellisesti.

Jos laskun maksaja ansaitsee 15 € tunnissa (netto) ja käyttää 19 numeroisen viitteen tarkastamiseen 10 sekuntia niin tuhannen viitteen tarkastaminen maksaa noin 40 €. Jos homman selvittäminen läpi menneen virheen jälkeen maksaa 50 € ajassa ja viivästysmaksussa karhukirjeen johdosta niin näyttäisi olevan melkein sama kumman tekee. Jos epäilee Pv:nsä olevan isompi niin tarkastaminen alkaa nopeasti kannattamaan.

Käytetyt skriptit:

[sourcecode language=”python”]

import ViiteVirhe as VV
import numpy as np
import matplotlib.pyplot as plt
import pylab as P
import random

# testaa pitkien viitenumeroiden tarkastussumman jakaumaa
a=VV.TarkNumJak(19,10000000)
hist, bins=np.histogram(a)
width = 0.7 * (bins[1] – bins[0])
center = (bins[:-1] + bins[1:]) / 2
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
k=np.mean(hist)
ax.bar(center, (hist-k)/1e6, align=’center’, width=width)
ax.set_title(‘Viitenumeron tarkastusnumeron (jakauma-minimi)/1e6, 10 M 19 numeroista’)
plt.show()

# Käy läpi lyhyempiä järjestyksessä
for i in range(7):
k=10**i
if k==1: k=0
print(‘Ensimmäinen: ‘ +str(k))
print(‘Viimeinen: ‘ + str(int(‘9’*(i+1))))
a=VV.TarkNumJak2(k,int(‘9’*(i+1))+1)
hist, bins=np.histogram(a)
print(hist)

r=random.SystemRandom()
S=”
oikein=0
vaarin=0
for i in range(200):
for k in range(5):
S=S+str(r.randint(0,9))
print(S)
for k in range(20):
print(‘xx’)
s=input(‘>>>> ‘)
if s==S:
oikein+=1
else:
vaarin+=1
S=”
print(‘Oikein ‘ + str(oikein))
print(‘Vaarin ‘ + str(vaarin))

# Läpi menevät, tarkistus summa aina oikein
import scipy as sc
TT=sc.zeros((19,19))
for i in range(1,20):
print(i)
for k in range(1,i+1):
for l in range(1000):
TT[i-1,k-1]+=VV.nErrorsCSC(i-1,k)
print(TT/1000)
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.plot([i+1 for i in range(19)],TT.T/1000) #[i+1 for i in range(19)],
ax.set_title(”)
plt.show()

# Virheellisen viitteen läpimenon todennäköisyys
import time
Pv=0.015
start_time=time.time()
M=np.zeros(18)
for i in range(2,20):
M[i-2]=VV.POfWrong(i,Pv)
print(‘Numeroita: ‘ + str(i)+ ‘ Todennäköisyys n virheitä >=2 : ‘ +
str(M[i-2]))
print(M)
print(‘Aikaa meni: ‘ + str(time.time()-start_time))

start_time=time.time()
MM=np.zeros(18)
N=1000000
for i in range(2,20):
MM[i-2]=VV.POfWrong2(i, Pv, N)
print(i)
print(MM)
print(‘Aikaa meni: ‘ + str(time.time()-start_time))
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.plot([i for i in range(2,20)],MM/N, ‘b+-‘,
label=’skripti’, markersize=10) #[i+1 for i in range(19)],
ax.plot([i for i in range(2,20)],M, ‘r-‘, label=’kaava (1)’) #[i+1 for i in range(19)],
legend = ax.legend(loc=’center’, shadow=True)
frame = legend.get_frame()
frame.set_facecolor(‘0.90’)
for label in legend.get_texts():
label.set_fontsize(‘large’)
for label in legend.get_lines():
label.set_linewidth(1.5)
ax.set_title(‘Ennemmän kuin yksi virhe viitenumeron rungossa (Pv=’ +
str(Pv)+ ‘)’)
ax.set_xlabel(‘Viitenumeron pituus’)
ax.set_ylabel(‘Todennäköisyys’)
ax.set_ylim(0.0001,.1)
ax.set_xlim(0,21)
ax.set_yscale(‘log’)
ax.grid(axis=’both’, which=’both’)
plt.show()

[/sourcecode]

Ja kutsutut funktiot:

[sourcecode language=”python”]
""" Arpoo viitenumeroita, laskee tarkistussumman ja tilastoi
virheiden korjattavuutta
"""

def TarkNumJak(pituus,kertoja):
import scipy as sc
import math as mt
import random

r=random.SystemRandom()
kerroin=sc.array([7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1])
TarkNum=sc.zeros(kertoja)
for k in range(kertoja):
TarkSum=0
## Viite=sc.zeros((19,1))
for i in range(pituus):
apu=r.randint(0,9)
## Viite[i]=apu
TarkSum+=apu*kerroin[i]
if mt.ceil(TarkSum/10)*10==TarkSum:
TarkSum+=10
TarkNum[k]=mt.ceil(TarkSum/10)*10-TarkSum
if TarkNum[k]==10:
TarkNum[k]=0
return TarkNum

def TarkNumJak2(Smallest,Largest):
""" Largest-1 is the largest number considered
"""
import scipy as sc
import math as mt
TarkSum=0
kerroin=sc.array([7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1])
TarkNum=sc.zeros((Largest,1))
for i in range(Smallest,Largest):
TarkSum=0
I=str(i)
for k in range(len(I)):
TarkSum+=int(I[k])*kerroin[k]
if mt.ceil(TarkSum/10)*10==TarkSum:
TarkSum+=10
TarkNum[i]=mt.ceil(TarkSum/10)*10-TarkSum
if TarkNum[i]==10:
TarkNum[i]=0
## print(TarkNum)
return TarkNum[Smallest:]

def nErrorsCSC(pituus, virheita):
""" n errors Check sum Always correct"""
import scipy as sc
import random
import math as mt

r=random.SystemRandom()
S=”
Se=”
V=[None]*(pituus+1)
for i in range(pituus+1):
V[i]=i
random.shuffle(V)
for k in range(pituus+1):
apu=r.randint(0,9)
S=S+str(apu)
Se=list(S)
if virheita>pituus+1:
virheita=pituus
for k in range(virheita):
apu2=r.randint(0,9)
while int(Se[V[k]])==apu2:
apu2=r.randint(0,9)
Se[V[k]]=str(apu2)
Se="".join(Se)
TarkSum=0
TarkSume=0
TarkNum=0
TarkNume=0
kerroin=sc.array([7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1,7,3,1])
for k in range(pituus+1):
TarkSum+=int(S[k])*kerroin[k]
TarkSume+=int(Se[k])*kerroin[k]
if mt.ceil(TarkSum/10)*10==TarkSum:
TarkSum+=10
TarkNum=mt.ceil(TarkSum/10)*10-TarkSum
if TarkNum==10:
TarkNum=0
if mt.ceil(TarkSume/10)*10==TarkSume:
TarkSume+=10
TarkNume=mt.ceil(TarkSume/10)*10-TarkSume
if TarkNume==10:
TarkNume=0

if TarkNum==TarkNume:
return 1
else:
return 0

def POfWrong(n, Pv):
a=1-((1-Pv)**n+n*(1-Pv)**(n-1)*Pv)
return(a)

def POfWrong2(n, Pv, N):
import random
r=random.SystemRandom()
v=0
for i in range(N):
V=0
for k in range(n):
if r.random()<=Pv:
V+=1
if V>1:
v+=1
return(v)

[/sourcecode]

Fast1, Maps

Number of TV stations

05.02.2013 Niko Porjo

Map 1. Number of TV stations per country. Colors are set by calculating log(N+1), so the scale is not linear.

The map is based on Wikipedia’s list of countries by number of television broadcast stations (Feb 2013). I made it using my own Python script and it is based on this map. As the original map is in svg format it is fairly easy to add more text definitions to it in order to change colors and add the color bar.

The Executions map was made using Google’s tools, but I found the experience a bit cramped. I did look around to find a suitable tool set in the web, but there seemed to be something wrong with every one. Since I have wanted to do this for a long time I decided to go for it.

Why TV stations? It was the first list I found in Wikipedia that had many countries in it.

Edit: Map was updated 7 Feb. I found a bug in the code, it may have had an effect on the map.

Data Analysis

Analysis of the 2012 municipal elections in Finland II

30.01.2013 Niko Porjo 1 Comment

I have continued my study into Python programming, see part I for earlier results. The code might not be very pythonic, despite some effort to that direction. With a little sugar coating one might say that I’m being pragmatic, a cynic might tell that I lose my bad habits slowly. At least I have tried to comment a little bit, which makes it easier for me to remember what I have been attempting to do.

In the analysis I looked into the stability of the election result. This was done by a Monte Carlo analysis. Despite the fancy name in practice I just manipulated the result by introducing random changes and then calculated what the result would have been with the new vote counts.

My motive was the eternal discussion about people not voting and how an individual has little impact on the result.

The analysis went roughly like this:

Query the database for the number of votes for each candidate and sum these to get the number of votes given to each party
Manipulate the result and calculate a new result
Repeat 2 many times
Calculate the average number of seats for each party and make a note of the largest and smallest number of councilmen
Continue from 2 using a larger deviation in the manipulation.

It would have been possible to directly query the database for the elected candidates, but I wanted to do the calculation myself and compare the result for the actual confirmed result. This gave the opportunity to check how well the algorithm works. The result I calculated was not the same for all municipalities, this is because in case of same distribution figures or within a party list with the same number of votes the result is decided by a lottery. Due to the random nature of the lottery, it’s result can not be repeated in the code. Instead I allocated the seats according to the order of an internal list.

It is good to note that the my results here are in some sense suggestive only, as the effort put to confirming that the code works correctly was not at a level that would be required for example for scientific publication.

The manipulation itself was done like this:

tot=EA[k][2]*(random.uniform(-1,1)*B[m]+1)

Or put in another way:where,

The number of votes a party accumulated in the election was multiplied by a number that was between 0.999 and 1.001 when the delta was smallest and between 0 and 2 when the delta was largest. Each time the manipulation was done a random number was drawn for each of the parties. Drawing the random number and the manipulation was done 10 000 times for each municipality to see how the result varies for each value of B. It is good to note that selection of the number of iteration was based on the “I feel like it” method that has been criticized, sometimes harshly.

The selected manipulation method doesn’t directly match with any real situation, although it is similar for example to cases where the active members of a local election team catch the plague at a critical moment or a rich benefactor enables a particularly well funded campaign. In these cases the changes in the result might be similar to what is seen in the images below. The mean number of seats gives a hint on where the number of seats gained has mostly been, close to maximum or minimum.

The clearest result can be seen in how much must the number of votes change to change the result. In the figures parameter B is shown as “delta=x” where x is the value of B used. If the minimum and maximum number of seats (red bar in the figures) gained do not differ from the average (blue in the figures) the result has been the same for all iterations, which can be interpreted as a stable result at this level of variation in the voting behaviour.

It should be noted that because the number of votes a party got in the voting is used as basis for calculations, it is not possible to get candidates elected even with large deltas if the party got for example three votes in the actual election. On the other hand even a large number of votes quickly evaporates when the value of R is very close to minus one. When the changes in the number of votes are large, it is possible that the overall number of votes given can be larger that the number of eligible voters in the municipality.

I chose five municipalities at random and looked at their result more closely: Janakkala, Kalajoki, Karkkila, Liperi and Savukoski.

I use the commonly used abbreviations for the larger parties in the text below.

In Janakkala the number of PS councilmen could have changed with a 5 % change in the number of votes they got, in absolute terms this means 54 votes. Similarly KESK could have gotten one seat more with the same relative change, corresponding to 84 votes. With a 20 % change KD could have lost their only seat.

In Savukoski if one person more would have made their way to the polling station and voted for PS the result would have been different. PS would have gotten as many councilmen as VAS. In the real elections VAS got three times as many seats as PS. Whether this would have changed any decisions is of course a different question.

Figure1. With a maximum change of 2 % the Janakkala result is unchanged.

Figure 2. With a maximum change of 5 % the number of councilmen can change. SDP and PS could lose two seats. With these changes KESK would always gain if there was a change, although most of the time it gets the same result.

Figure 3. With a maximum change of 20 % any part of the result can be different from the real one.

In figures 1 through 3 increase in parameter B starts to show as larger variation in the result. In Janakkala the result is most stable for KD and the for VIHR.

Figure 4. In Kalajoki Pro, SDP and VIHR have fairly stable results since a 10 % change wouldn’t have an effect in their number of councilmen.

Figure 5. It turns out the one VIHR seat is the most stable. KD would rather lose their seat than gain more.

Figure 6. In Karkkila the result would first start to change between SDP and VAS at the 2 % level, SDP would lose one seat.

Figure 7. At the 5 % level only KESK has a stable result.

Figure 8. In Karkkila the parties had fairly similar results, at the 10 % level changes can be seen in all the results.

Figure 9. In Liperi the first changes can be seen between PS and YL LS at 1 % level.

Figure 10. In the Savukoski council half a percent change in the number of votes can change the result. However it would most likely not change any decisions.

Figure 11. At 5 % level KESK could lose its majority.

Table 1.

Jakke Mäkelän kotisivu

Tag Archives: Python

Viitenumeroiden virheistä

Like this:

Number of TV stations

Like this:

Analysis of the 2012 municipal elections in Finland II

Like this:

Zygomatica.com: Ratkaisuihin ongelmia

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Zygomatica.com: Ratkaisuihin ongelmia