Fastest way to do a case-insensitive substring search in C/C++?

Note

The question below was asked in 2008 about some code from 2003. As the OP's **update** shows, this entire post has been obsoleted by vintage 2008 algorithms and persists here only as a historical curiosity. ---------- I need to do a fast case-insensitive substring search in C/C++. My requirements are as follows: * Should behave like strstr() (i.e. return a pointer to the match point). * Must be case-insensitive (doh). * Must support the current locale. * Must be available on Windows (MSVC++ 8.0) or easily portable to Windows (i.e. from an open source library). Here is the current implementation I am using (taken from the GNU C Library):
/* Return the offset of one string within another.
   Copyright (C) 1994,1996,1997,1998,1999,2000 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

/*
 * My personal strstr() implementation that beats most other algorithms.
 * Until someone tells me otherwise, I assume that this is the
 * fastest implementation of strstr() in C.
 * I deliberately chose not to comment it.  You should have at least
 * as much fun trying to understand it, as I had to write it :-).
 *
 * Stephen R. van den Berg, berg@pool.informatik.rwth-aachen.de	*/

/*
 * Modified to use table lookup instead of tolower(), since tolower() isn't
 * worth s*** on Windows.
 *
 * -- Anders Sandvig (anders@wincue.org)
 */

#if HAVE_CONFIG_H
# include 
#endif

#include 
#include 

typedef unsigned chartype;

char char_table[256];

void init_stristr(void)
{
  int i;
  char string[2];

  string[1] = '\0';
  for (i = 0; i < 256; i++)
  {
    string[0] = i;
    _strlwr(string);
    char_table[i] = string[0];
  }
}

#define my_tolower(a) ((chartype) char_table[a])

char *
my_stristr (phaystack, pneedle)
     const char *phaystack;
     const char *pneedle;
{
  register const unsigned char *haystack, *needle;
  register chartype b, c;

  haystack = (const unsigned char *) phaystack;
  needle = (const unsigned char *) pneedle;

  b = my_tolower (*needle); 
  if (b != '\0')
  {
    haystack--;				/* possible ANSI violation */
    do
	  {
	    c = *++haystack;
	    if (c == '\0')
	      goto ret0;
	  }
    while (my_tolower (c) != (int) b);

    c = my_tolower (*++needle);
    if (c == '\0')
	    goto foundneedle;

    ++needle;
    goto jin;

    for (;;)
    {
      register chartype a;
	    register const unsigned char *rhaystack, *rneedle;

	    do
	    {
	      a = *++haystack;
	      if (a == '\0')
		      goto ret0;
	      if (my_tolower (a) == (int) b)
		      break;
	      a = *++haystack;
	      if (a == '\0')
		      goto ret0;
        shloop:
	      ;
	    }
      while (my_tolower (a) != (int) b);

jin:	  
      a = *++haystack;
  	  if (a == '\0')
	      goto ret0;

	    if (my_tolower (a) != (int) c)
	      goto shloop;

	    rhaystack = haystack-- + 1;
	    rneedle = needle;

	    a = my_tolower (*rneedle);

	    if (my_tolower (*rhaystack) == (int) a)
	      do
	      {
		      if (a == '\0')
		        goto foundneedle;

		      ++rhaystack;
          a = my_tolower (*++needle);
		      if (my_tolower (*rhaystack) != (int) a)
		        break;
		      
          if (a == '\0')
		        goto foundneedle;
		      
          ++rhaystack;
		      a = my_tolower (*++needle);
	      }
	      while (my_tolower (*rhaystack) == (int) a);

	    needle = rneedle;		/* took the register-poor approach */

  	  if (a == '\0')
	      break;
    }
  }
foundneedle:
  return (char*) haystack;
ret0:
  return 0;
}
Can you make this code faster, or do you know of a better implementation? Note: I noticed that the GNU C Library now has a new implementation of strstr(), but I am not sure how easily it can be modified to be case-insensitive, or if it is in fact faster than the old one (in my case). I also noticed that the old implementation is still used for wide character strings, so if anyone knows why, please share.

Update

Just to make things clear—in case it wasn't already—I didn't write this function, it's a part of the GNU C Library. I only modified it to be case-insensitive.

Also, thanks for the tip about strcasestr() and checking out other implementations from other sources (like OpenBSD, FreeBSD, etc.). It seems to be the way to go. The code above is from 2003, which is why I posted it here in hope for a better version being available, which apparently it is. :)


This tip won't help but you should at least clean out all unnecessary code, like the code you skip with the 'goto jin' statement.

以上就是Fastest way to do a case-insensitive substring search in C/C++?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » CSS3 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏