small uniq trick

Christos Ricudis ricudis at itc.auth.gr
Mon Jan 28 11:34:37 EET 2008


Φρίξος wrote:
> O/H Christos Ricudis έγραψε:
>   
>> Otan exete ena pelwrio arxeio ths morfhs
>>
>> a
>> a
>> a
>> b
>> c
>> c
>> c
>> b
>> b
>> b
>> a
>> c
>> b
>> c
>> a
>>
>>
>> kai 8elete na kanete kati san
>>
>> cat lala.koko | sort | uniq
>>
>> sas symferei perissotero na kanete
>>
>> cat lala.koko | uniq | sort | uniq
>>
>> H prwth uniq 8a kopsei ta repetitions sto arxiko arxeio, elafrynontas 
>> kata poly to baros ths sort prin to teliko uniq.
>>
>>     
> OK, I'm stupid. Giati 2 uniq? Meta thn prwth den feygoun OLA ta repetitions?
>   

Oxi, kai akribws ekei brisketai to kolpo. H uniq diwxnei oles tis 
SYNEXOMENES epanalhpseis.

Aploikh analysh :

Gia na diwksei h uniq oles tis epanalhpseis genika, xreiazetai na 
diabasei oloklhro to arxeio N grammwn, kai na sygkrinei ka8e grammh me 
oles tis ypoloipes gia na dei an exoun ksanaemfanistei. Ayto apaitei 
N^2  operations.

Gi ayto akribws kai h uniq apaitei to input arxeio ths na einai sorted. 
S'ayth thn periptwsh, to arxeio ginetai scan mono mia fora, ara exoume N 
operations.

An ypo8esoume oti h sort xrhsimopoiei ton quicksort algori8mo, poy 
apaitei N*logN operations sth mesh periptwsh, to running time ths sort | 
unique ginetai O(N)+O(N*logN) = O(N+N*logN) = O(N*logN) < O(N^2).

Sth sygkekrimenh periptwsh twra, ayto poy kanoyme einai oti to prwto 
perasma apo thn uniq afairei tis epanalhpseis poy tyxainei na einai hdh 
synexomenes prin thn taksinomhsh. An eimaste tyxeroi, ayto meiwnei kata 
POLY to N sthn sort - pou einai kai to dominant factor sto running time.

Ta pragmatika dedomena sthn periptwsh mou exoun ws ekshs :

bash-3.2$ wc -l data1
 2523466 data1

bash-3.2$ uniq data1 | wc -l
  525483

opote sthn periptwsh sort | uniq exoume  2523466 * log(2523466) + 
2523466 = 39722241 operations

kai sthn periptwsh uniq | sort | uniq exoume 2523466 + 525483 * 
log(525483) + 525483 = 9970649 operations.





More information about the Linux-greek-users mailing list