Part1:重合指数及其无偏估计值 重合指数:设某种语言由n个字母组成,每个 字母i发生的概率为pi(1≤i≤n),则重合指数就 是指两个随机字母相同的概率,记为C IC=∑p, i=l 般用IC的无偏估计值C来近似计算IC.其中 的xi表示字母i出现的频次,L表示文本长度,n 表示某种语言中包含的字母数。 r-器出
Part1 : 重合指数及其无偏估计值 • 重合指数:设某种语言由n个字母组成,每个 字母i发生的概率为pi(1≤i≤n),则重合指数就 是指两个随机字母相同的概率,记为IC 1 n i i IC p = = • 一般用IC的无偏估计值IC’来近似计算IC. 其中 的xi表示字母i出现的频次,L表示文本长度,n 表示某种语言中包含的字母数。 1 ( 1) ' ( 1) n i i i x x IC = L L − = −
C'值的特点 ·随机英文文本的1C'总是大约为0.038. ·而一段有意义的英文文本的IC'总是大约为 0.065
IC’值的特点 • 随机英文文本的IC’总是大约为0.038. • 而一段有意义的英文文本的IC’总是大约为 0.065
Example1:随机英文文本明文及其c 随机英文文本为:mooiybyvfnkrvxapqzaeo jvoygudguaymoejvshxwhdkowboea nocqpuuebguddjzankbwqaojiqsamryvduqcynqogosfrmusuu ogiidivjpzdjqtatohyqoukuhukqzfqkvssvnbotuxijieyvaz nrrutuwbnleciqbhtglvluytpqrigyxy jaxuo jzansmstkhdja qkqrcywlrgulsfauilgmmffqkljddogluwgkirkgvzbitxxwtt exjxunxketyazopqmfztsxckdaygexdexouyrcjstsucycpqre pps jwiqmzrxhhmzjevsgtihakmhqkbfmzhqzzjteetzgyydfcs afhdochcbmmqniamahucidpcatxccbkibjwwwzpxdth jdxxqte azrvbqpluzrwbeplqfdfeceggrskzrasw juhagbwgxtohhumir vuqxrptkyavgihalecqcpfxrjdnjbscrahfuaebsobfppnpkyg vkdohloqjteqmnjtyijdmhlsbszheftjcoppxtlbqlhaplbtmf rkfbmipdztutxnudpyeohjjbtxxoykfivqmltddalkwwfixsqx neirtxjrivkifqlhjfhlifpwkfcwcfbniizagvgv 随机英文文本的IC无偏估计值为: 0.0380
Example 1:随机英文文本明文及其IC’
Example1:随机英文文本密文及其Ic (移位加密key=17) 加密后的随机英文文本: dffzpspmwebimorghgrvfamfpxluxlrpdfvamjyonyubfnsfvr efthgllvsxluuagrebsnhrfazhjrdipmulhtpehfxfjwidljll fxzzuzmagquahkrkfyphflblylbhqwhbmjjmesfklozazvpmrq eiilklnsecvtzhsykxcmclpkghizxpoparolfaqrejdjkbyuar hbhitpncixlc jwrlzcxddwwhbcauufxclnxbzibxmqszkoonkk voaoleobvkprqfghdwqk jotburpxvouvoflpita jkjltptghiv ggjanzhdqioyydqavmjxkzyrbdyhbswdqyhqqakvvkqxppuwtj rwyuftytsddhezrdryltzugtrkottsbzsannnqgoukyauoohkv rqimshgclqinsvgchwuwvtvxxijbqirjnalyrxsnxokfyyldzi mlhoigkbprmxzyrcvthtgwoiaueasjtirywlrvsjfswggegbpx mbufycfhakvhdeakpzaudycjsjqyvwkatfggokeshcyrgcskdw ibwsdzguqklkoelugpvfyaaskoofpbwzmhdckuurcbnnwzo jho evzikoaizmbzwhcyawyczwgnbwtntwsezzqrxmxm 加密后的随机英文文本的IC无偏估计值为: 0.0380 注:英文文本中字母的编码为 az….025
Example 1:随机英文文本密文及其IC’ (移位加密key=17) 注:英文文本中字母的编码为 a~z…….0~25
Example2:一个有意义的英文text Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining.Existingtechniques using differential privacy,however,cannot effectively handle the publication of high-dimensional data.In particular,when the input dataset contains a large number of attributes,existing methods incur higher computing complexity and lower information to noise ratio,which renders the published data next to useless.This proposal aims to reduce computing complexity and signal to noise ratio.The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity,in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces,and then,the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset.Some crucial science problems would be investigated below:(i)constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy,where the score function is optimized to reduce the sensitivity using mutual information,equivalence classes in maximum joint distribution and dynamic programming;(ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network,via the Laplace mechanism of differential privacy.(iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions,without explicitly materializing the noisy global distribution.The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects. 其Ic'为:0.0659
Example 2:一个有意义的英文text • Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods incur higher computing complexity and lower information to noise ratio, which renders the published data next to useless. This proposal aims to reduce computing complexity and signal to noise ratio. The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginal distributions via optimizing score function and reducing sensitivity, in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces, and then, the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset. Some crucial science problems would be investigated below: (i) constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy, where the score function is optimized to reduce the sensitivity using mutual information, equivalence classes in maximum joint distribution and dynamic programming; (ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network, via the Laplace mechanism of differential privacy. (iii)exploring how to generate synthetic data from the differentially private Bayesian network and conditional distributions, without explicitly materializing the noisy global distribution. The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects. 其IC’为:0.0659
Part2:Virginia加密 多表密码是利用多个单表代替密码构成的密码体制。 它在对明文进行加密的过程中依照密钥的指示轮流 便用多个单表代替密码。 明文M=(mvm2,mn,密钥K=(kk2…,ka),密文 C=(C1C2..Cn) 加密变换:C+ta=Ekim+td=m+ta+k mod n 解密变换:m+ta=Dkc+td=C+td-ki mod n e a a Z C V T W Q NG R Z G V WA G 密钥空间为26d
Part2 : Virginia加密 • 多表密码是利用多个单表代替密码构成的密码体制。 它在对明文进行加密的过程中依照密钥的指示轮流 使用多个单表代替密码。 • 明文M=(m1 ,m2 ,…,mn ),密钥 K=(k1 ,k2 ,…,kd ) ,密文 C=(c1 ,c2 ,…,cn ) • 加密变换:ci+td=Eki(mi+td)=mi+td+ki mod n • 解密变换: mi+td=Dki(ci+td)=ci+td - ki mod n 密钥空间为26d
Example 3:plaintext.txt differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatareleaseandprivacypr eservingdataminingexistingtechniquesusingdifferentialprivacyhowevercannoteffectivelyhandlethep ublicationofhighdimensionaldatainparticularwhentheinputdatasetcontainsalargenumberofattribute sexistingmethodsincurhighercomputingcomplexityandlowerinformationtonoiseratiowhichrendersth epublisheddatanexttouselessthisproposalaimstoreducecomputingcomplexityandsignaltonoiseratiot hestartingpointistoapproximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensio nalmarginaldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationofnois yconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensionalsubspacesandth enthesampletuplesfromthenoisyapproximationdistributionareusedtogenerateandreleasethesynthet icdatasetsomecrucialscienceproblemswouldbeinvestigatedbelowiconstructingalowkdegreebayesian networkoverthehighdimensionaldatasetviaexponentialmechanismindifferentialprivacywherethesco refunctionisoptimizedtoreducethesensitivityusingmutualinformationequivalenceclassesinmaximum jointdistributionanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionald istributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechanismofdiffer entialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallyprivatebayesiannetworka ndconditionaldistributionswithoutexplicitlymaterializingthenoisyglobaldistributiontheproposedsolu tionmayhavetheoreticalandtechnicalsignificanceforsyntheticdatagenerationwithdifferentialprivacyo nbusinessprospects 利用Virginia加密,key=infosec
Example 3: plaintext.txt • differentialprivacyisthestateoftheartgoalfortheproblemofprivacypreservingdatareleaseandprivacypr eservingdataminingexistingtechniquesusingdifferentialprivacyhowevercannoteffectivelyhandlethep ublicationofhighdimensionaldatainparticularwhentheinputdatasetcontainsalargenumberofattribute sexistingmethodsincurhighercomputingcomplexityandlowerinformationtonoiseratiowhichrendersth epublisheddatanexttouselessthisproposalaimstoreducecomputingcomplexityandsignaltonoiseratiot hestartingpointistoapproximatethefulldistributionofhighdimensionaldatasetwithasetoflowdimensio nalmarginaldistributionsviaoptimizingscorefunctionandreducingsensitivityinwhichgenerationofnois yconditionaldistributionswithdifferentialprivacyiscomputedinasetoflowdimensionalsubspacesandth enthesampletuplesfromthenoisyapproximationdistributionareusedtogenerateandreleasethesynthet icdatasetsomecrucialscienceproblemswouldbeinvestigatedbelowiconstructingalowkdegreebayesian networkoverthehighdimensionaldatasetviaexponentialmechanismindifferentialprivacywherethesco refunctionisoptimizedtoreducethesensitivityusingmutualinformationequivalenceclassesinmaximum jointdistributionanddynamicprogrammingiistudyingthealgorithmtocomputeasetofnoisyconditionald istributionsfromjointdistributionsinthesubspaceofbayesiannetworkviathelaplacemechanismofdiffer entialprivacyiiiexploringhowtogeneratesyntheticdatafromthedifferentiallyprivatebayesiannetworka ndconditionaldistributionswithoutexplicitlymaterializingthenoisyglobaldistributiontheproposedsolu tionmayhavetheoreticalandtechnicalsignificanceforsyntheticdatagenerationwithdifferentialprivacyo nbusinessprospects 利用Virginia加密, key=infosec
Example 3:ciphertext lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaidj mxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazckImcoixmehofrqbrktwgvqi jzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenInyoarrieywuyniebvwrvprnbhyvInyokivkbshsmpanqojkgv hrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncfxqvbn gwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczharikbrddi zqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqtmraqgvfnc fenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnhsqlmqvnsrjif cpnbywgvfnhazkbInbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrvqnqdjmxipdwk gquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmugnudjszqzfhasplv xhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbuirzbgzwquebzzvfgq aaskxkonysvfgtbbwuspagwiuxkvtfzgamIrlfwidiljgaepvrykgvmwijfllgpvlvvmomaxwgrctqfhswgbinowb rwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwrifbbwsvyemgmskipav ywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqfyfafwbvtbsrfllsoemexe tujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqnglvkjhywgrunetabskvgiwk xtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjfqsksjipbvfzhvkdnhmamkm kuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjndskmcvajhostsnsrusplvywgrct qnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkxtqozhaspbujdjsrwfjrksncgncfq cgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorssjevqbskaxjlvktwvgvgnodttqifqqs pjhxwfiuacwcktgkgxvzlj 计算该文本1个串,串长=1609)的重合指数 无偏估计值1C=0.0418
Example 3: ciphertext • lvktwvgvgnodttqifqqmubujglevmbkhziczglcsphweyvwttwoqseshxenjsgaxejgwvxqalrsxczrqsswgiaidj mxipddjiumeawfkfigfaarkvtjlawvqalhwgjvvviwwwavsuvmhnrwsfxkiyufazcklmcoixmehofrqbrktwgvqi jzqlcvqqsllgxhgzagcbvtbgjjqtmraqgvfncfenlnyoarrieywuyniebvwrvprnbhyvlnyokivkbshsmpanqojkgv hrpwvqnnyhjmdcgjgwbkagnbyqgbutrkmpkhwvakjmehcetwbvsuusoxyjlaxaiaizgagzvstgvoigncfxqvbn gwvcbvtkzmepejbvitagmshydtvxvwhfigfbwbvbbzgwpgafyvawrzbuckenivrglstmqzqwgquczharikbrddi zqgdofhuqtsodxqvbngwvcbvthziubnwharixbnblmubbfdhvqfvrolivprkidpfqfyfafwbvtbgjjqtmraqgvfnc fenlnyokivevyvswgbbkzgafqzjbkmqvnqasviqafzvmubenpmxkwaxjaeqxgnaadkvtxqgvgnhsqlmqvnsrjif cpnbywgvfnhazkblnbolkkulsfitigncfshvbngqgqvqnhaspiyiwkxtqozhaspajnhzhknsjfwrvqnqdjmxipdwk gquczhwhkvnxslshtbbraqgvfncfenahggheemffbvxjmayvwwcucqslyrtrxtjsobujbgmugnudjszqzfhasplv xhjmdcgncfetmhxsvxqorssjevmnsrjinmnxsllgalshzivqpioleumgxceiezhhwspukvjbuirzbgzwquebzzvfgq aaskxkonysvfgtbbwuspagwiuxkvtfzgamlrlfwidiljgaepvrykgvmwijfllgpvlvvmomaxwgrctqfhswgbinowb rwajblmctzjqzepqfrwfhknsjfwrvqnqdjmxipdkzitmgmskgqzrkifgvqbswksrbvrwrifbbwsvyemgmskipav ywnmvghxwfkocgzodmpnbwasxkwajemmxiyjbuietnxgwwkvzflaqwuwtwfxfqfyfafwbvtbsrfllsoemexe tujeouvsuamubhimaribujodkqzvyvexqkbrdmxgifjhgjpwvxmusplvywgrctqnglvkjhywgrunetabskvgiwk xtqozhaspavshziucoxdsggwsgoqiuqnsbwxywepjaevprqohpckrrsulcvvxagjfqsksjipbvfzhvkdnhmamkm kuzgvkvtmcoxqorssjevmfdbllgbvhrsxcnetallglvktwvgvgnodpaxenjsxgjndskmcvajhostsnsrusplvywgrct qnglvkjhywgruevyvgyvmkuzagkbydasxgzvfzadkvtyvwrqqfdudsdiyiwkxtqozhaspbujdjsrwfjrksncgncfq cgufjwxjmbwslmeiyfbvxgkuswuenavlbajkknsqwjqzfdbllgbvhrsxcorssjevqbskaxjlvktwvgvgnodttqifqqs pjhxwfiuacwcktgkgxvzlj 计算该文本(1个串,串长=1609)的重合指数 无偏估计值IC=0.0418
Example3:将ciphertext分成2个子串 子串】:1 kwggotqfquugemkzcgcpwywtossxnsaegvqlscrswiijxpdim子串2: vtvvndtiqmb jlvbhizlshevtwqehe jgx jwxarxzqsgadmidjue affgarv javahgvvwwvumnwfkyfzkmoxeorbkwvizlvqlghzgbt wkifaktlwqlwjviwasvhrsxiuaclcimhfqrtgqjqcqs1xgacvb gjtrqvnfnnoriyunevrpnhvnoiksspnokvrwqnhmc jwkgbqbtk jqmagfcelyarewyibwvrbylykvbhmaqjghpvny jdggbanygurm phvkmhewvusx jaaazazsgogcxvnwcvkmp jvtgsytxwfgbbbzwg kwa jectbsuoylxiiggvtvinfqbgvbtzeebiamhdvvhifwvbgpa fvwzuknvgsmzwqchrkrdzgohgsdqbgvbtzunhrxnlubdvfrlvr yarbceirltqqguzaibdiqdfutoxvnwevhibwaibbmbfhqvoipk ipqyawvbjqmagfcelykvvvwbkgfzbmvqsiazmbnmkajexnaktq dffffbtgjtrqvnfnnoieysgbzaq.jkqnavqfvuepxwxaqgadvxg vnslgnricnygfhzbnokusiinfhbggvnapywxqzap jhhn jwvndm ghqmvs jfpbwvnaklblklftgcsvnqqqhsiiktohsanzksfrqqjx idkqchhvxlhbrqvnfnhgemfv javwuglrrtsb jgundsqfapvhmc pwguzwknsstbagfceaghefbxmywccsytx joubmgu jzzhs1x jdg nfthsxos jvnrimxlglhiqilugcizhsuvbizgwubzfqakknsftb cemxvqrsems jnnslaszvpoemxeehwpk jurbzqezvgasxoyvgbw upgixvfgmrfiijaprkvwjlgvvmmxgcqhwbnwrabmt jzpfwhn jw sawuktzallwdlgevygmiflplvoawrtfsgiobwjlczqeqrfksfr vndmidztgsgzkfvbwsbrrfbsymmkpvwmgxfogomnwskaemi jue qqjxpkimmkqrigqskrvwibwvegsiaynvhwkezdpbaxw jmxybit ngwvfawwwxqyawvbrlseeeueusauhmrb jdqvvxkrmgfh jwxupv xwkzlqutfffffbtsflomxt jovumbiaiuokzyeqbdxi jgpvmsly wrtnlkhwrntbkgwxqzapvhicxsgsoiqswyejepqhcrslvxgfss geqgv jygueasviktohsaszuodgwgqunbxwpavropkrucva jqkj ibfhknmmmugktexos jvfblbhsceallkwggopxnsgnsmv jotnrs pvzvdhakkzvvmoqrsemdlgvrxntlgvtvvndae jx jdkcahsssup lygcqgv jyguvvymuakyaxzfaktvrqdddywxqzapudsw jkngcqg vwrtnlkhwreygvkzgbdsgvzdvywqfusiiktohsbjjrfrscnfcu fw jbsmifvguwealaknq jzdlgvrxosjvbkxlkwggotqfqphwiaw jxmwleybxksunvb jkswqfblbhscrseqsa jvtvvndtiqs jxfucc kggvl tkxZ 字串长度: 805 字串长度: 804 该子串的重合指数的无偏估计式IC 0.0427 该子串的重合指数的无偏估计式IC 0.0411 计算2个子串的重合指数无偏估计值的平 均值为1C=0.0419
Example 3:将ciphertext分成2个子串 计算2个子串的重合指数无偏估计值的平 均值为IC=0.0419
Example3:将ciphertext分成3个子串 1 ltgntigbgvkigswvtqsesxgxlxrsidxdiefiakjwawviwsmrfi 子串2:vwvot fmulmhclpewwshngewqrcqwajiduakgavlv1gvwauhwxy femiefbtv jlqlxzct jtavenyreuivvnynkkhpqkhwnhd jbgybr akcxhrrwqzcqlhabb jmqffloryyewpbvyibsaogrvn jcgknquk pwkeebuo jxagztonxbwbkejightvffbbwavruevlmqqzrbdqou kv jhtvuxlaiavgicqnvvzpbtmyvwibvzpfazenrsqwuhirigfq sxbwbhuwrblbdgrirdqfwt jtavenyivsbzf jmnsqzunxaaxakx oqnvvzbhinmbhfovkpfabb jmqfflovywbgqbqqvavbpkxegavq vhlyrfnwfabbkligfvgansykahnnhswadxdgowyshhoffaref asmnichanzlokstnsbauhnixoaahk irnjiwqzhnltrgnehhmb j awusrxsuggdzfs 子串3:kvgdqqujebzzchytoexjajvaszsgimpjmwffrtaqhjvwvvnsku 1mcehpvuzzuzfakosgb uaikfarwigpyvi zlomoqkgiqvsgggvgqrgnenaiwnbrrhlovsmnjvpqymgwabgtmnfvqmpzmsqkgbkbwfwy mspymhfcopwxam hamcwssyaizgsvgfvgctmevasdxhgwbggywbkigtzgcakdzdht uosmhaboqyxbmihpxsv wcnvhgnakixoaa fkivhdmkuvtoosvdlvs ctlvwvoanxnkvh dvgctinaxbufvvlpifyfvgqrgnenkevgkazkvaifmemw jqndtg fdtwqudixoabdrjsgfg fxblibgsevakqq nqqsipyvhknlufichngqaiwtzsjznfvqmpkuhkxsbavcngefvm ickkv 字串长度: 537 vcqyrjbbuuszalhdnehvosvsinlahviegezwujibwezgaxnvtw 该子串的重合指数的无偏每 pwxtglfdjergwfglmagthgnbaltqprhswqdxdigkzivssvrbse 0.0417 mivngwozmbswexjinwvlwtxfabbfsme juuuirudzvqrxfgwmpy rql jwutsgkqhpsioswousxeapocrlvgqspfvnamzkmxr jmbghx elltgnpesjscjsnulwcnvhgevvugyszzkyrfdiwtzsujwrnnqu wmsefxuuabks jflbrcsebaltgntiqjwuwtgz 字串长度: 536 该子串的重合指数的无偏估计式IC 0.0424 计算3个子串的重合指数无偏估计值的平 均值为1C=0.0419
Example 3:将ciphertext分成3个子串 计算3个子串的重合指数无偏估计值的平 均值为IC=0.0419