|
 |
dissabte, 28 / octubre / 2006 |
|
|
How To Break Anonymity of the Netflix Prize Dataset. A començaments de mes, Netflix va organitzar un concurs, amb un premi d'un milió de dòlars, per tal de millorar el seu sistema de recomanacions de pel·lícules. Per ajudar a calcular aquest nou mètode va publicar una base de dades de prova, amb les pel·lícules escollides per un grup dels seus usuaris i, en principi, sense cap mena de dada que pogués ser utilitzada per identificar les persones.
L'article al que enllaço, de la universitat de Texas, documenta l'anàlisi efectuat a la base de dades i com, amb un mínim de coneixements, pot identificar els usuaris d'aquesta base de dades.
As part of the Netflix Prize contest, Netflix recently released a dataset containing movie ratings of a significant fraction of their subscribers. The dataset is intended to be anonymous, and all customer identifying information has been removed.
We demonstrate that an attacker who knows only a little bit about an individual subscriber can easily identify this subscriber’s record if it is present in the dataset, or, at the very least, identify a small set of records which include the subscriber’s record. For example, an attacker who knows the subscriber’s ratings on 6 movies that are not among the top 100 most rated movies and approximate dates when these ratings were entered into the system has an 80% chance of successfully identifying the subscriber’s record. This knowledge need not be precise, e.g., the dates may only be known to the attacker with a 14-day error, the ratings may be known only approximately, and some of the ratings may even be completely wrong. Even without any knowledge of rating dates, knowing the subscriber’s ratings on 6 movies outside the top 100 most rated movies reduces the set of plausible candidate records to 8 (out of almost 500,000) for 70% of the subscribers whose records have been released. With the candidate set so small, deanonymization can then be completed with additional human analysis.
A successful deanonymization of the Netflix dataset has possible implications for the Netflix Prize, which promises $1 million for a 10% improvement in the quality of Netflix movie recommendations. Given movie ratings of Netflix users who have made their ratings public, or perhaps obtained from public sources such as the Internet Movie Database (IMDb), a contestant may be able to identify their records in the Netflix dataset (if they are among the Netflix subscribers whose anonymized records have been released). With the complete knowledge of a subscriber’s IMDb ratings, it becomes much easier to “predict” how he or she rated any given movie on Netflix.
|
20:55 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
Un truc de Firefox 2 que he trobat genial: tot sovint em passa que tanco una pestanya que no voldries haver tancat (per mala punteria o equivocació).
Doncs bé, Firefox 2 incorpora un "desfer la pestanya tancada". És la tecla Control-Shift-T... només cal prémer-la immediatament després d'haver tancat la pestanya i aquesta torna a estar al mateix punt on era abans de tancar-la.
|
19:15 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
He notat que el suport d'Unicode del CentOS, quan s'utilitza exclusivament en mode text (i a la consola) és bastant dolent. Té problemes de rendiment (triga *minuts* en poder expandir un nom amb comodins) i en determinades operacions on hi ha una utilització de comodins, pot arribar a bloquejar la màquina per impossibilitat d'obrir fitxers.
El problema, pel que he anat veient a la meva màquina, és que en determinades circumstàncies entra en un bucle alhora d'executar /bin/unicode_start. N'he arribat a veure més de 100 instàncies obertes simultàniament. Això només afecta quan s'està utilitzant la consola de l'ordinador. En una connexió remota no succeeix.
De moment la única solució que conec passa per desactivar totalment l'ús d'Unicode a la màquina:
- Editar el fitxer .bash_profile i canviar el locale a un no-Unicode:
export LANG=ca_AD
- Editar el fitxer /etc/sysconfig/i18n i canviar el locale i el suport d'idiomes:
# LANG="en_US.UTF-8" # SUPPORTED="ca_AD.UTF-8:ca_AD:ca:en_US.UTF-8:en_US:en LANG="ca_AD:en_US" SUPPORTED="ca_AD:ca:en_US:en"
A nivell funcional no he trobat cap diferència entre tenir l'Unicode o no tenir-lo a la consola del sistema.
|
17:01 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
[Tim Berners-Lee] Reinventing HTML
Making standards is hard work. Its hard because it involves listening to other people and figuring out what they mean, which means figuring out where they are coming from, how they are using words, and so on.
A particular case is HTML... The plan is to charter a completely new HTML group...
I'll be asking these groups to be very accountable, to have powerful issue tracking systems on the w3.org web site, and to be responsive in spirit as well as in letter to public comments.
|
15:56 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
[El Pais] Un antiguo directivo de Microsoft será el próximo turista espacial. Aquest ex-directiu és el Charles Simonyi... el pare de la notació hongaresa:
In Systems Hungarian notation, the most common form, the prefix encodes the actual data type of the variable. For example:
- ulAccountNum : variable is an unsigned long integer
- szName : variable is a zero-terminated string; this was one of Simonyi's original suggested prefixes
Apps Hungarian notation doesn't encode the actual data type but rather, it gives a hint as to what the variable's purpose is, or what it represents.
- rwPosition : variable represents a row
- usName : variable represents an unsafe string, which needs to be translated by some function to make it safe
- strName : Variable represents a string containing the name, but does not specify how that string is implemented.
|
13:39 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
[NetworkWorld] The importance of wireless security. La popularització de les xarxes sense fils afecta a la planificació de la seguretat de la xarxa.
Wireless networks are forcing organizations to completely rethink how they secure their networks and devices to prevent attacks and misuse that expose critical assets and confidential data. By their very nature, wireless networks are difficult to roll out, secure and manage, even for the most savvy network administrators.
(...)
To ensure effective, automated wireless threat protection, companies and government organizations should implement a complete wireless security solution covering assets across the enterprise that enables them to discover vulnerabilities, assess threats, prevent attacks, and ensure ongoing compliance - in the most secure, easy-to-use and cost-effective manner available.
IT departments must have a pre-emptive plan of action to prevent malicious attacks and employee misuse which compromise an organization's data privacy and enforce security policies for wireless use - both inside and outside their facilities. Whether or not a company has authorized the use of wireless or has a 'no wireless' policy, their networks, data, devices and users are exposed and at risk.
|
13:35 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
[An Information Security Place] IE7 breaks Juniper SSL VPN. L'Internet Explorer 7 no és compatible amb les VPN per SSL de Juniper (i pel que tinc entés amb cap VPN per SSL). Això no seria especialment greu si no fos per que el proper 1 de novembre Microsoft inclourà l'IE7 com actualizació autmàtica. Per impedir això, hi ha una eina que bloqueja, encara que temporalment, l'actualització a l'IE7.
|
13:17 (# Enllaç permanent) | Comentaris: | Trackback:
|
|
© Copyright 2003-2006 Xavier Caballe. . Si no s'indica expressament el contrari, el material publicat en aquest weblog es distribueix d'acord amb la llicència Creative Commons. El contingut és responsabilitat única i exclusivament del seu autor i no té cap relació amb les seves activitats professionals.
|
 |
 |
 |
 |
Contingut actualitzat
Categories
Darrers comentaris
Arxiu
Contingut antic
(ja no s'actualitza)
Versions anteriors
d'aquesta pàgina
|
 |
 |
 |
 |
|