Efficient algorithm to find all "character-equal" strings?

How can we write an efficient function that outputs "[homoglyph equivalents][1]" of an input string? **Example 1** (pseudo-code): homoglyphs_list = [ ["o", "0"], // "o" and "0" are homoglyphs ["i", "l", "1"] // "i" and "l" and "1" are homoglyphs ] input_string = "someinput" output = [ "someinput", "s0meinput", "somelnput", "s0melnput", "some1nput", "s0me1nput" ] **Example 2**: homoglyphs_list = [ ["rn", "m", "nn"], ] input_string = "rnn" output = ["rnn", "rm", "mn", "rrn", "nnn", "nm", "nrn"] **Example 3**: homoglyphs_list = [ ["d", "ci", "a"], // "o" and "0" are homoglyphs ["i", "l", "1"] // "i" and "l" and "1" are homoglyphs ] /* notice that with the two rules above, we can infer "d" = "ci" = "a" = "cl" = "c1" */ input_string = "pacerier" output = [ "pacerier", "pacerler", "pacer1er", "pcicerier", "pcicerler", "pcicer1er", "pclcerier", "pc1cerier", "pclcerler", "pc1cerler", "pclcer1er", "pc1cer1er", "pdcerier", "pdcerler", "pdcer1er" ] Note: The order of the members in the output array isn't important, and we can assume that the given homoglyph mappings are assumed to be *proper* (inputs wouldn't give us an "infinite loop"). My current algorithm works, but it's using raw bruteforcing and performance is awful. E.g. an input of `"mmmmm"` with homoglyphs `["rn", "m", "nn"]` takes 38 seconds to run: // This is php code (non-pseudo just so we could test the running time), // but the question remains language-agnostic public function Func($in, Array $mappings){ $out_table = array(); $out_table[$in] = null; while(true){ $number_of_entries_so_far = count($out_table); foreach(array_keys($out_table) as $key){ foreach($mappings as $mapping){ foreach($mapping as $value){ for($a=0, $aa=strlen($key); $a<$aa; ++$a){ $pos = strpos($key, $value, $a); if($pos === false){ continue; } foreach($mapping as $equal_value){ if($value === $equal_value){ continue; } $out_table[substr($key, 0, $pos) . $equal_value . substr($key, $pos + strlen($value))] = null; } } } } } if($number_of_entries_so_far === count($out_table)){ // it means we have tried bruteforcing but added no additional entries, // so we can quit now break; } } return array_keys($out_table); } How can we implement an efficient (fast) homoglyph expansion algorithm? [1]: https://en.wikipedia.org/wiki/Homoglyph#Zero_and_O.3B_one.2C_l_and_I
And what it will do if I write like $homoglyph_mappings[0] = array("n", "nn" , "nnn"); ??

以上就是Efficient algorithm to find all "character-equal" strings?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » JavaScript 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏