small uniq trick
Christos Ricudis
ricudis at itc.auth.gr
Mon Jan 28 11:34:37 EET 2008
Φρίξος wrote:
> O/H Christos Ricudis έγραψε:
>
>> Otan exete ena pelwrio arxeio ths morfhs
>>
>> a
>> a
>> a
>> b
>> c
>> c
>> c
>> b
>> b
>> b
>> a
>> c
>> b
>> c
>> a
>>
>>
>> kai 8elete na kanete kati san
>>
>> cat lala.koko | sort | uniq
>>
>> sas symferei perissotero na kanete
>>
>> cat lala.koko | uniq | sort | uniq
>>
>> H prwth uniq 8a kopsei ta repetitions sto arxiko arxeio, elafrynontas
>> kata poly to baros ths sort prin to teliko uniq.
>>
>>
> OK, I'm stupid. Giati 2 uniq? Meta thn prwth den feygoun OLA ta repetitions?
>
Oxi, kai akribws ekei brisketai to kolpo. H uniq diwxnei oles tis
SYNEXOMENES epanalhpseis.
Aploikh analysh :
Gia na diwksei h uniq oles tis epanalhpseis genika, xreiazetai na
diabasei oloklhro to arxeio N grammwn, kai na sygkrinei ka8e grammh me
oles tis ypoloipes gia na dei an exoun ksanaemfanistei. Ayto apaitei
N^2 operations.
Gi ayto akribws kai h uniq apaitei to input arxeio ths na einai sorted.
S'ayth thn periptwsh, to arxeio ginetai scan mono mia fora, ara exoume N
operations.
An ypo8esoume oti h sort xrhsimopoiei ton quicksort algori8mo, poy
apaitei N*logN operations sth mesh periptwsh, to running time ths sort |
unique ginetai O(N)+O(N*logN) = O(N+N*logN) = O(N*logN) < O(N^2).
Sth sygkekrimenh periptwsh twra, ayto poy kanoyme einai oti to prwto
perasma apo thn uniq afairei tis epanalhpseis poy tyxainei na einai hdh
synexomenes prin thn taksinomhsh. An eimaste tyxeroi, ayto meiwnei kata
POLY to N sthn sort - pou einai kai to dominant factor sto running time.
Ta pragmatika dedomena sthn periptwsh mou exoun ws ekshs :
bash-3.2$ wc -l data1
2523466 data1
bash-3.2$ uniq data1 | wc -l
525483
opote sthn periptwsh sort | uniq exoume 2523466 * log(2523466) +
2523466 = 39722241 operations
kai sthn periptwsh uniq | sort | uniq exoume 2523466 + 525483 *
log(525483) + 525483 = 9970649 operations.
More information about the Linux-greek-users
mailing list