Recent Topics

<!--more-->, 'pid' URLs and duplicate content

Started by on Feb 02, 2007 – Contents updated: Feb 02, 2007

Feb 02, 2007 00:33    

In file:
[list]

Code

inc/MODEL/items/_item.class.php
[/list:u]
in function:
[list]

Code

get_content
[/list:u]
at about line 1195, we can see:
[list]

Code

else
      { // We are offering to read more
        $output = $content_parts[0];
        $output .= $before_more .
                    '<a href="'.$this->get_permanent_url( 'pid', $more_file ).'#more'.$this->ID.'">'.
                    $more_link_text.'</a>' .
                    $after_more;
      }
[/list:u]
where one might expect the following version:
[list]

Code

else
      { // We are offering to read more
        $output = $content_parts[0];
        $output .= $before_more .
                    '<a href="'.$this->get_permanent_url( '', $more_file ).'#more'.$this->ID.'">'.
                    $more_link_text.'</a>' .
                    $after_more;
      }
[/list:u]
I might be wrong, but there is no reason to force 'pid' URL format on the:
[list]

Code

[teaserbreak]
[/list:u]
pseudo-tags found in posts.

If the blog uses other URL format, that creates duplicate content search engines dislike, since they consider that as web spam. [url=http://www.google.com/support/webmasters/bin/answer.py?answer=35769]Google's guidelines to webmasters[/url] include recommendations about avoiding duplicate content pages:
[list]Quality guidelines - specific guidelines
[list]

  • Avoid hidden text or hidden links.
  • Don't employ cloaking or sneaky redirects.
  • Don't send automated queries to Google.
  • Don't load pages with irrelevant words.
  • Don't create multiple pages, subdomains, or domains with substantially duplicate content.
  • Don't create pages that install viruses, trojans, or other badware.
  • Avoid "doorway" pages created just for search engines, or other "cookie cutter" approaches such as affiliate programs with little or no original content.
  • If your site participates in an affiliate program, make sure that your site adds value. Provide unique and relevant content that gives users a reason to visit your site first.[/list:u][/list:u]
  • There are other duplicated URLs in b2evolution and I am going to investigate them in the next few weeks... Attached comments claim usage of the 'pid' URL format helps to prevend these URLs from being splitted on several lines. However, what happens when the server doesn't support the 'pid' format (ie. Apache's mod_negociation not activated/installed)?

    Please notice I am very sensitive about respecting search engine guidelines, especially the duplicate content issue, since I extensively use the:
    [list]

    Code

    [teaserbreak]
    [/list:u]
    pseudo-tag (it prevents the whole posts' content to appear on several URLs on the site) and 80% of [url=http://blog.lesperlesduchat.com]my site[/url]'s traffic comes from search engines, about 70% of people come from Google itself.

    (Since I'm currently moving from b2evolution 0.9.x.x to b2evolution 1.9.2, I'm inspecting the code here and there...)

    Feb 02, 2007 03:29

    I just removed it from my 1.9.2 installations without issue. Thanks for the good catch, but I'd hardly call it a bug. More like an improvement given that it works exactly as it's supposed to.

    Feb 02, 2007 03:52

    EdB wrote:

    I just removed it from my 1.9.2 installations without issue. Thanks for the good catch, but I'd hardly call it a bug. More like an improvement given that it works exactly as it's supposed to.

    Does it? :roll: Say it's a kind of minor, class D bug. Deal? ;)

    I don't remember having encountered that issue on my b2evolution 0.9.x version and I don't understand why the original behavior has changed.

    Hopefully, it wasn't difficult to fix! B)

    Feb 02, 2007 04:04

    Class D bug - me likes! Those would be like "this really bugs me because it's not cool". Yeah I'd say it's doing it's job even if it's job is something that shouldn't be done. I will have to implement this on an installation without using clean urls to see what happens then. Should be no problem because defaulting to default is, well, default!

    I'm thinking you're right about it being a change. I seem to remember though an old hack that tweaked this bit, but (if memory happens to serve) I changed something from whatever TO 'pid' to get the desired "pretty url" effect. I don't recall clearly though so you're probably correct.

    So what are the other classes of bugs?
    A = breaks the public or admin side of the blog with no known answer.
    B = breaks the public or admin side of the blog with a difficult or complex workaround or hack available.
    C = breaks the public or admin side of the blog with a reasonably easy workaround or hack available.
    E = not the best way to do things with a difficult or complex workaround or hack available.
    D = not the best way to do things with a reasonably easy workaround or hack available.

    That'd mean the alphabet would have to be re-written, but so what: it's long overdue for Alphabet2.0 eh?

    Feb 02, 2007 05:19

    Oh and by the way: welcome back!

    Feb 02, 2007 15:19

    EdB wrote:

    So what are the other classes of bugs?
    A = breaks the public or admin side of the blog with no known answer.
    B = breaks the public or admin side of the blog with a difficult or complex workaround or hack available.
    C = breaks the public or admin side of the blog with a reasonably easy workaround or hack available.
    E = not the best way to do things with a difficult or complex workaround or hack available.
    D = not the best way to do things with a reasonably easy workaround or hack available.

    I've been working a while in the video games industry where the following definitions are often applied. There is no "official" definition of those terms, but they are commonly shared by most development teams as well as video games console constructors where video games meet their Quality Assurance teams during the submission process:

    • class A bug: (must be fixed before release)
      • crashes (random or not, whatever the frequence of these crashes might be),
      • miscellaneous issues making it impossible to use the application for what it is intended for (that would include random or not lost of data),
      • security holes would also apply here,
      • any other issue making it impossible the application to be released (that might imply copyright violations, etc.);[/list:u]
      • class B bug: (should be fixed before release)
        • major issues making it difficult to use the application for what it is intended for,
        • major user-interface issues (including those relative to (G)UI design when they are painful),
        • minor security issues;[/list:u]
        • class C bugs (it would be fine to fix them as well):
          • strange/uncommon behaviors,
          • minor GUI issues,
          • minor graphical issues,
          • minor other issues;[/list:u]
          • class D bugs (the application can be shipped with those bugs, even if it would be better without them):
            • every other little issues,
            • suggestions.[/list:u][/list:u]
            • When releasing a video game application on video game consoles, each game is submitted to every video games consoles constructor's Quality Assurence team for each of the video games consoles territories separately (Japan, Europe, America). The game is approved (and then can be sent for duplication and shipped to the market) after a 6-week submission period when:

              • there is no known class A bug;
              • there is no or there are few known class B bugs;
              • the sum of class C bugs is still reasonable;
              • class D bugs does never affect a software release.[/list:u]
              • As an example of class A bug, I remember a N64 video game submission to Nintendo of America (NOA). They discovered that the game might lost saved data in some rare circumstancies: when savinig his game on the cartridge, the player might remove the cartridge from the video game console. While restarting the game, all the saved data slots might be lost. We've been asked to fix that behavior so no more than the currently saved slot might be affected. We fixed the issue by duplicating the saved data, so the player might lost his last saved slot at most, and in most cases, the previously saved game was successfully retrieved.

                In most cases, after a two to three months of internal QA tests and a four to eight weeks constructor submission, a console video game is released with a total of about 1,000 known bugs, most of them being class C and D bugs.

                EdB wrote:

                That'd mean the alphabet would have to be re-written, but so what: it's long overdue for Alphabet2.0 eh?

                ;-)

    Feb 02, 2007 15:20

    EdB wrote:

    Oh and by the way: welcome back!

    It's a pleasure! :)


    Form is loading...

    b2evolution CMS – This forum is powered by b2evolution CMS, a complete engine for your website.