openmp theory vs in-practice efficiency?

As i increase the number of cores for an embarrasing parallel linear problem (a for loop where each iteration does lots of computation, all independant from the other iterations), the efficiency decreases (efficiency as Ts/(p*Tp) ) somehow linearly respect to the number of cores i know that in practice thread scheduling, OS, and cache problems can slowdown an implementation a lot. i can add that i do get speedup, and the problem in theory has linear speedup which in theory has efficiency 1 as p increases. question then: How does the OS, thread scheduling, memory acceses, and other types of technical limitations affect the efficiency of the algorithm as the number of processors increases???? should it affect at all?
It's impossible to answer, because it depends on the problem and the implementation. And "acceptable" seems subjective to me.

